tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

low-level.rst (8032B)


      1 Low-Level Details
      2 =================
      3 
      4 .. warning:: This section of the documentation covers low-level implementation
      5             details of h2. This is most likely to be of use to h2
      6             developers and to other HTTP/2 implementers, though it could well
      7             be of general interest. Feel free to peruse it, but if you're
      8             looking for information about how to *use* h2 you should
      9             consider looking elsewhere.
     10 
     11 State Machines
     12 --------------
     13 
     14 h2 is fundamentally built on top of a pair of interacting Finite State
     15 Machines. One of these FSMs manages per-connection state, and another manages
     16 per-stream state. Almost without exception (see :ref:`priority` for more
     17 details) every single frame is unconditionally translated into events for
     18 both state machines and those state machines are turned.
     19 
     20 The advantages of a system such as this is that the finite state machines can
     21 very densely encode the kinds of things that are allowed at any particular
     22 moment in a HTTP/2 connection. However, most importantly, almost all protocols
     23 are defined *in terms* of finite state machines: that is, protocol descriptions
     24 can be reduced to a number of states and inputs. That makes FSMs a very natural
     25 tool for implementing protocol stacks.
     26 
     27 Indeed, most protocol implementations that do not explicitly encode a finite
     28 state machine almost always *implicitly* encode a finite state machine, by
     29 using classes with a bunch of variables that amount to state-tracking
     30 variables, or by using the call-stack as an implicit state tracking mechanism.
     31 While these methods are not immediately problematic, they tend to lack
     32 *explicitness*, and can lead to subtle bugs of the form "protocol action X is
     33 incorrectly allowed in state Y".
     34 
     35 For these reasons, we have implemented two *explicit* finite state machines.
     36 These machines aim to encode most of the protocol-specific state, in particular
     37 regarding what frame is allowed at what time. This target goal is sometimes not
     38 achieved: in particular, as of this writing the *stream* FSM contains a number
     39 of other state variables that really ought to be rolled into the state machine
     40 itself in the form of new states, or in the form of a transformation of the
     41 FSM to use state *vectors* instead of state *scalars*.
     42 
     43 The following sections contain some implementers notes on these FSMs.
     44 
     45 Connection State Machine
     46 ~~~~~~~~~~~~~~~~~~~~~~~~
     47 
     48 The "outer" state machine, the first one that is encountered when sending or
     49 receiving data, is the connection state machine. This state machine tracks
     50 whole-connection state.
     51 
     52 This state machine is primarily intended to forbid certain actions on the basis
     53 of whether the implementation is acting as a client or a server. For example,
     54 clients are not permitted to send ``PUSH_PROMISE`` frames: this state machine
     55 forbids that by refusing to define a valid transition from the ``CLIENT_OPEN``
     56 state for the ``SEND_PUSH_PROMISE`` event.
     57 
     58 Otherwise, this particular state machine triggers no side-effects. It has a
     59 very coarse, high-level, functionality.
     60 
     61 A visual representation of this FSM is shown below:
     62 
     63 .. image:: _static/h2.connection.H2ConnectionStateMachine.dot.png
     64   :alt: A visual representation of the connection FSM.
     65   :target: _static/h2.connection.H2ConnectionStateMachine.dot.png
     66 
     67 
     68 .. _stream-state-machine:
     69 
     70 Stream State Machine
     71 ~~~~~~~~~~~~~~~~~~~~
     72 
     73 Once the connection state machine has been spun, any frame that belongs to a
     74 stream is passed to the stream state machine for its given stream. Each stream
     75 has its own instance of the state machine, but all of them share the transition
     76 table: this is because the table itself is sufficiently large that having it be
     77 per-instance would be a ridiculous memory overhead.
     78 
     79 Unlike the connection state machine, the stream state machine is quite complex.
     80 This is because it frequently needs to encode some side-effects. The most
     81 common side-effect is emitting a ``RST_STREAM`` frame when an error is
     82 encountered: the need to do this means that far more transitions need to be
     83 encoded than for the connection state machine.
     84 
     85 Many of the side-effect functions in this state machine also raise
     86 :class:`ProtocolError <h2.exceptions.ProtocolError>` exceptions. This is almost
     87 always done on the basis of an extra state variable, which is an annoying code
     88 smell: it should always be possible for the state machine itself to police
     89 these using explicit state management. A future refactor will hopefully address
     90 this problem by making these additional state variables part of the state
     91 definitions in the FSM, which will lead to an expansion of the number of states
     92 but a greater degree of simplicity in understanding and tracking what is going
     93 on in the state machine.
     94 
     95 The other action taken by the side-effect functions defined here is returning
     96 :ref:`events <h2-events-basic>`. Most of these events are returned directly to
     97 the user, and reflect the specific state transition that has taken place, but
     98 some of the events are purely *internal*: they are used to signal to other
     99 parts of the h2 codebase what action has been taken.
    100 
    101 The major use of the internal events functionality at this time is for
    102 validating header blocks: there are different rules for request headers than
    103 there are for response headers, and different rules again for trailers. The
    104 internal events are used to determine *exactly what* kind of data the user is
    105 attempting to send, and using that information to do the correct kind of
    106 validation. This approach ensures that the final source of truth about what's
    107 happening at the protocol level lives inside the FSM, which is an extremely
    108 important design principle we want to continue to enshrine in h2.
    109 
    110 A visual representation of this FSM is shown below:
    111 
    112 .. image:: _static/h2.stream.H2StreamStateMachine.dot.png
    113   :alt: A visual representation of the stream FSM.
    114   :target: _static/h2.stream.H2StreamStateMachine.dot.png
    115 
    116 
    117 .. _priority:
    118 
    119 Priority
    120 ~~~~~~~~
    121 
    122 In the :ref:`stream-state-machine` section we said that any frame that belongs
    123 to a stream is passed to the stream state machine. This turns out to be not
    124 quite true.
    125 
    126 Specifically, while ``PRIORITY`` frames are technically sent on a given stream
    127 (that is, `RFC 7540 Section 6.3`_ defines them as "always identifying a stream"
    128 and forbids the use of stream ID ``0`` for them), in practice they are almost
    129 completely exempt from the usual stream FSM behaviour. Specifically, the RFC
    130 has this to say:
    131 
    132    The ``PRIORITY`` frame can be sent on a stream in any state, though it
    133    cannot be sent between consecutive frames that comprise a single
    134    header block (Section 4.3).
    135 
    136 Given that the consecutive header block requirement is handled outside of the
    137 FSMs, this section of the RFC essentially means that there is *never* a
    138 situation where it is invalid to receive a ``PRIORITY`` frame. This means that
    139 including it in the stream FSM would require that we allow ``SEND_PRIORITY``
    140 and ``RECV_PRIORITY`` in all states.
    141 
    142 This is not a totally onerous task: however, another key note is that h2
    143 uses the *absence* of a stream state machine to flag a closed stream. This is
    144 primarily for memory conservation reasons: if we needed to keep around an FSM
    145 for every stream we've ever seen, that would cause long-lived HTTP/2
    146 connections to consume increasingly large amounts of memory. On top of this,
    147 it would require us to create a stream FSM each time we received a ``PRIORITY``
    148 frame for a given stream, giving a malicious peer an easy route to force a
    149 h2 user to allocate nearly unbounded amounts of memory.
    150 
    151 For this reason, h2 circumvents the stream FSM entirely for ``PRIORITY``
    152 frames. Instead, these frames are treated as being connection-level frames that
    153 *just happen* to identify a specific stream. They do not bring streams into
    154 being, or in any sense interact with h2's view of streams. Their stream
    155 details are treated as strictly metadata that h2 is not interested in
    156 beyond being able to parse it out.
    157 
    158 
    159 .. _RFC 7540 Section 6.3: https://tools.ietf.org/html/rfc7540#section-6.3