tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

framing_format.txt (5039B)


      1 Snappy framing format description
      2 Last revised: 2013-10-25
      3 
      4 This format decribes a framing format for Snappy, allowing compressing to
      5 files or streams that can then more easily be decompressed without having
      6 to hold the entire stream in memory. It also provides data checksums to
      7 help verify integrity. It does not provide metadata checksums, so it does
      8 not protect against e.g. all forms of truncations.
      9 
     10 Implementation of the framing format is optional for Snappy compressors and
     11 decompressor; it is not part of the Snappy core specification.
     12 
     13 
     14 1. General structure
     15 
     16 The file consists solely of chunks, lying back-to-back with no padding
     17 in between. Each chunk consists first a single byte of chunk identifier,
     18 then a three-byte little-endian length of the chunk in bytes (from 0 to
     19 16777215, inclusive), and then the data if any. The four bytes of chunk
     20 header is not counted in the data length.
     21 
     22 The different chunk types are listed below. The first chunk must always
     23 be the stream identifier chunk (see section 4.1, below). The stream
     24 ends when the file ends -- there is no explicit end-of-file marker.
     25 
     26 
     27 2. File type identification
     28 
     29 The following identifiers for this format are recommended where appropriate.
     30 However, note that none have been registered officially, so this is only to
     31 be taken as a guideline. We use "Snappy framed" to distinguish between this
     32 format and raw Snappy data.
     33 
     34  File extension:         .sz
     35  MIME type:              application/x-snappy-framed
     36  HTTP Content-Encoding:  x-snappy-framed
     37 
     38 
     39 3. Checksum format
     40 
     41 Some chunks have data protected by a checksum (the ones that do will say so
     42 explicitly). The checksums are always masked CRC-32Cs.
     43 
     44 A description of CRC-32C can be found in RFC 3720, section 12.1, with
     45 examples in section B.4.
     46 
     47 Checksums are not stored directly, but masked, as checksumming data and
     48 then its own checksum can be problematic. The masking is the same as used
     49 in Apache Hadoop: Rotate the checksum by 15 bits, then add the constant
     50 0xa282ead8 (using wraparound as normal for unsigned integers). This is
     51 equivalent to the following C code:
     52 
     53  uint32_t mask_checksum(uint32_t x) {
     54    return ((x >> 15) | (x << 17)) + 0xa282ead8;
     55  }
     56 
     57 Note that the masking is reversible.
     58 
     59 The checksum is always stored as a four bytes long integer, in little-endian.
     60 
     61 
     62 4. Chunk types
     63 
     64 The currently supported chunk types are described below. The list may
     65 be extended in the future.
     66 
     67 
     68 4.1. Stream identifier (chunk type 0xff)
     69 
     70 The stream identifier is always the first element in the stream.
     71 It is exactly six bytes long and contains "sNaPpY" in ASCII. This means that
     72 a valid Snappy framed stream always starts with the bytes
     73 
     74  0xff 0x06 0x00 0x00 0x73 0x4e 0x61 0x50 0x70 0x59
     75 
     76 The stream identifier chunk can come multiple times in the stream besides
     77 the first; if such a chunk shows up, it should simply be ignored, assuming
     78 it has the right length and contents. This allows for easy concatenation of
     79 compressed files without the need for re-framing.
     80 
     81 
     82 4.2. Compressed data (chunk type 0x00)
     83 
     84 Compressed data chunks contain a normal Snappy compressed bitstream;
     85 see the compressed format specification. The compressed data is preceded by
     86 the CRC-32C (see section 3) of the _uncompressed_ data.
     87 
     88 Note that the data portion of the chunk, i.e., the compressed contents,
     89 can be at most 16777211 bytes (2^24 - 1, minus the checksum).
     90 However, we place an additional restriction that the uncompressed data
     91 in a chunk must be no longer than 65536 bytes. This allows consumers to
     92 easily use small fixed-size buffers.
     93 
     94 
     95 4.3. Uncompressed data (chunk type 0x01)
     96 
     97 Uncompressed data chunks allow a compressor to send uncompressed,
     98 raw data; this is useful if, for instance, uncompressible or
     99 near-incompressible data is detected, and faster decompression is desired.
    100 
    101 As in the compressed chunks, the data is preceded by its own masked
    102 CRC-32C (see section 3).
    103 
    104 An uncompressed data chunk, like compressed data chunks, should contain
    105 no more than 65536 data bytes, so the maximum legal chunk length with the
    106 checksum is 65540.
    107 
    108 
    109 4.4. Padding (chunk type 0xfe)
    110 
    111 Padding chunks allow a compressor to increase the size of the data stream
    112 so that it complies with external demands, e.g. that the total number of
    113 bytes is a multiple of some value.
    114 
    115 All bytes of the padding chunk, except the chunk byte itself and the length,
    116 should be zero, but decompressors must not try to interpret or verify the
    117 padding data in any way.
    118 
    119 
    120 4.5. Reserved unskippable chunks (chunk types 0x02-0x7f)
    121 
    122 These are reserved for future expansion. A decoder that sees such a chunk
    123 should immediately return an error, as it must assume it cannot decode the
    124 stream correctly.
    125 
    126 Future versions of this specification may define meanings for these chunks.
    127 
    128 
    129 4.6. Reserved skippable chunks (chunk types 0x80-0xfd)
    130 
    131 These are also reserved for future expansion, but unlike the chunks
    132 described in 4.5, a decoder seeing these must skip them and continue
    133 decoding.
    134 
    135 Future versions of this specification may define meanings for these chunks.