low-level.rst (8032B)
1 Low-Level Details 2 ================= 3 4 .. warning:: This section of the documentation covers low-level implementation 5 details of h2. This is most likely to be of use to h2 6 developers and to other HTTP/2 implementers, though it could well 7 be of general interest. Feel free to peruse it, but if you're 8 looking for information about how to *use* h2 you should 9 consider looking elsewhere. 10 11 State Machines 12 -------------- 13 14 h2 is fundamentally built on top of a pair of interacting Finite State 15 Machines. One of these FSMs manages per-connection state, and another manages 16 per-stream state. Almost without exception (see :ref:`priority` for more 17 details) every single frame is unconditionally translated into events for 18 both state machines and those state machines are turned. 19 20 The advantages of a system such as this is that the finite state machines can 21 very densely encode the kinds of things that are allowed at any particular 22 moment in a HTTP/2 connection. However, most importantly, almost all protocols 23 are defined *in terms* of finite state machines: that is, protocol descriptions 24 can be reduced to a number of states and inputs. That makes FSMs a very natural 25 tool for implementing protocol stacks. 26 27 Indeed, most protocol implementations that do not explicitly encode a finite 28 state machine almost always *implicitly* encode a finite state machine, by 29 using classes with a bunch of variables that amount to state-tracking 30 variables, or by using the call-stack as an implicit state tracking mechanism. 31 While these methods are not immediately problematic, they tend to lack 32 *explicitness*, and can lead to subtle bugs of the form "protocol action X is 33 incorrectly allowed in state Y". 34 35 For these reasons, we have implemented two *explicit* finite state machines. 36 These machines aim to encode most of the protocol-specific state, in particular 37 regarding what frame is allowed at what time. This target goal is sometimes not 38 achieved: in particular, as of this writing the *stream* FSM contains a number 39 of other state variables that really ought to be rolled into the state machine 40 itself in the form of new states, or in the form of a transformation of the 41 FSM to use state *vectors* instead of state *scalars*. 42 43 The following sections contain some implementers notes on these FSMs. 44 45 Connection State Machine 46 ~~~~~~~~~~~~~~~~~~~~~~~~ 47 48 The "outer" state machine, the first one that is encountered when sending or 49 receiving data, is the connection state machine. This state machine tracks 50 whole-connection state. 51 52 This state machine is primarily intended to forbid certain actions on the basis 53 of whether the implementation is acting as a client or a server. For example, 54 clients are not permitted to send ``PUSH_PROMISE`` frames: this state machine 55 forbids that by refusing to define a valid transition from the ``CLIENT_OPEN`` 56 state for the ``SEND_PUSH_PROMISE`` event. 57 58 Otherwise, this particular state machine triggers no side-effects. It has a 59 very coarse, high-level, functionality. 60 61 A visual representation of this FSM is shown below: 62 63 .. image:: _static/h2.connection.H2ConnectionStateMachine.dot.png 64 :alt: A visual representation of the connection FSM. 65 :target: _static/h2.connection.H2ConnectionStateMachine.dot.png 66 67 68 .. _stream-state-machine: 69 70 Stream State Machine 71 ~~~~~~~~~~~~~~~~~~~~ 72 73 Once the connection state machine has been spun, any frame that belongs to a 74 stream is passed to the stream state machine for its given stream. Each stream 75 has its own instance of the state machine, but all of them share the transition 76 table: this is because the table itself is sufficiently large that having it be 77 per-instance would be a ridiculous memory overhead. 78 79 Unlike the connection state machine, the stream state machine is quite complex. 80 This is because it frequently needs to encode some side-effects. The most 81 common side-effect is emitting a ``RST_STREAM`` frame when an error is 82 encountered: the need to do this means that far more transitions need to be 83 encoded than for the connection state machine. 84 85 Many of the side-effect functions in this state machine also raise 86 :class:`ProtocolError <h2.exceptions.ProtocolError>` exceptions. This is almost 87 always done on the basis of an extra state variable, which is an annoying code 88 smell: it should always be possible for the state machine itself to police 89 these using explicit state management. A future refactor will hopefully address 90 this problem by making these additional state variables part of the state 91 definitions in the FSM, which will lead to an expansion of the number of states 92 but a greater degree of simplicity in understanding and tracking what is going 93 on in the state machine. 94 95 The other action taken by the side-effect functions defined here is returning 96 :ref:`events <h2-events-basic>`. Most of these events are returned directly to 97 the user, and reflect the specific state transition that has taken place, but 98 some of the events are purely *internal*: they are used to signal to other 99 parts of the h2 codebase what action has been taken. 100 101 The major use of the internal events functionality at this time is for 102 validating header blocks: there are different rules for request headers than 103 there are for response headers, and different rules again for trailers. The 104 internal events are used to determine *exactly what* kind of data the user is 105 attempting to send, and using that information to do the correct kind of 106 validation. This approach ensures that the final source of truth about what's 107 happening at the protocol level lives inside the FSM, which is an extremely 108 important design principle we want to continue to enshrine in h2. 109 110 A visual representation of this FSM is shown below: 111 112 .. image:: _static/h2.stream.H2StreamStateMachine.dot.png 113 :alt: A visual representation of the stream FSM. 114 :target: _static/h2.stream.H2StreamStateMachine.dot.png 115 116 117 .. _priority: 118 119 Priority 120 ~~~~~~~~ 121 122 In the :ref:`stream-state-machine` section we said that any frame that belongs 123 to a stream is passed to the stream state machine. This turns out to be not 124 quite true. 125 126 Specifically, while ``PRIORITY`` frames are technically sent on a given stream 127 (that is, `RFC 7540 Section 6.3`_ defines them as "always identifying a stream" 128 and forbids the use of stream ID ``0`` for them), in practice they are almost 129 completely exempt from the usual stream FSM behaviour. Specifically, the RFC 130 has this to say: 131 132 The ``PRIORITY`` frame can be sent on a stream in any state, though it 133 cannot be sent between consecutive frames that comprise a single 134 header block (Section 4.3). 135 136 Given that the consecutive header block requirement is handled outside of the 137 FSMs, this section of the RFC essentially means that there is *never* a 138 situation where it is invalid to receive a ``PRIORITY`` frame. This means that 139 including it in the stream FSM would require that we allow ``SEND_PRIORITY`` 140 and ``RECV_PRIORITY`` in all states. 141 142 This is not a totally onerous task: however, another key note is that h2 143 uses the *absence* of a stream state machine to flag a closed stream. This is 144 primarily for memory conservation reasons: if we needed to keep around an FSM 145 for every stream we've ever seen, that would cause long-lived HTTP/2 146 connections to consume increasingly large amounts of memory. On top of this, 147 it would require us to create a stream FSM each time we received a ``PRIORITY`` 148 frame for a given stream, giving a malicious peer an easy route to force a 149 h2 user to allocate nearly unbounded amounts of memory. 150 151 For this reason, h2 circumvents the stream FSM entirely for ``PRIORITY`` 152 frames. Instead, these frames are treated as being connection-level frames that 153 *just happen* to identify a specific stream. They do not bring streams into 154 being, or in any sense interact with h2's view of streams. Their stream 155 details are treated as strictly metadata that h2 is not interested in 156 beyond being able to parse it out. 157 158 159 .. _RFC 7540 Section 6.3: https://tools.ietf.org/html/rfc7540#section-6.3