[ tor-browser ].git.dasho

commit 84935df73c98e1e2fb1e1ee10a297562cc2d4bfb
parent a5e07594e9b0c108e48cfe11fc664761df7958cd
Author: Randell Jesup <rjesup@mozilla.com>
Date:   Tue,  7 Oct 2025 17:56:00 +0000

Bug 1917965: Add design doc r=necko-reviewers,kershaw

Differential Revision: https://phabricator.services.mozilla.com/D258947

Diffstat:
M netwerk/docs/cache2/doc.rst  | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-

1 file changed, 66 insertions(+), 1 deletion(-)
diff --git a/netwerk/docs/cache2/doc.rst b/netwerk/docs/cache2/doc.rst
@@ -16,7 +16,7 @@ be clear directly from the `IDL files <https://searchfox.org/mozilla-central/sea
 -  The cache API is **completely thread-safe** and **non-blocking**.
 -  There is **no IPC support**.  It's only accessible on the default
    chrome process.
--  When there is no profile the new HTTP cache works, but everything is
+-  When there is no profile the HTTP cache works, but everything is
    stored only in memory not obeying any particular limits.
 
 .. _nsICacheStorageService:
@@ -567,3 +567,68 @@ checking - the memory cache pool is controlled by
 ``browser.cache.memory.capacity``, the disk entries pool is already
 described above. The pool can be accessed and modified only on the cache
 background thread.
+
+Compression Dictionaries
+---------------------------
+
+Compression Dictionaries are specced by the IETF:
+https://datatracker.ietf.org/doc/draft-ietf-httpbis-compression-dictionary/
+
+See also: https://developer.chrome.com/blog/shared-dictionary-compression
+and https://github.com/WICG/compression-dictionary-transport
+
+Gecko's design for compression dictionary support:
+
+We have special dict:<origin> entries with a listing of all dictionaries
+for that origin, stored in metadata.
+
+When a fetch is made, we check if there's a dict:<origin> cache entry.  If
+not, we know there are no dictionaries.  If there is an entry, and we
+haven't previously loaded it into memory, we read and parse the metadata
+and create in-memory structures for all dictionaries for <origin>.   This
+includes the data needed to match and decide if we want to send a
+"Available-Dictionary:" header with the request.
+
+If a response to any request is received and it has a "Use-As-Dictionary"
+header, we create a new dictionary entry in-memory and flag it for saving
+to the dict:<origin> metadata.  We set the stream up to decompress before
+storing into the cache (see later options for alternatives in the future),
+so that we can be ensured to be able to decompress later.   We start
+accumulating a hash value for the metadata entry.  Once the resource is
+fully received, we finalize the hash value and the metadata can be written.
+
+When a response is received with dcb or dcz compress (dictionaries), we use
+the cache entry for the dictionary that we sent in Available-Dictionary to
+decompress the resource.  This means reading it into memory and then
+allowing the decompression to occur.
+
+Several of these actions require a level of asynchronous action (waiting
+for a cache entry to be loaded for use as a dictionary, or waiting for a
+dict:<origin> entry to be loaded.  This is generally handled via lambdas.
+
+The metadata and in-memory entries are kept in sync with the cache by
+clearing entries out when cache entries are Doomed.  This also interacts
+with Clear Site Data and cookie clear headers (see IETF spec).
+
+Dictionary loading can also be triggered via <link rel="Compression
+Dictionary" ...> and link headers.  These will cause prefetches of the
+dictionaries.
+
+Things to watch on landing:
+- Cache hitrate
+- dictionary utilization
+-- Add probes
+- pageload metrics
+-- Would require OHTTP-based collection
+
+Future optimizations:
+- Compressing dictionaries with zstd in the cache
+--  Trades CPU use and some latency decoding dictionary-encoded files for hitrate
+-- Perhaps only above some size
+- Compressing dictionary-encoded files with zstd in the cache
+--  Trades CPU use for hitrate
+-- Perhaps only above some size
+- Preemptively reading dict:<origin> entries into memory in the background
+  at startup
+-- Up to some limit
+- LRU-ing dict:<origin> entries and dropping old ones from memory

	tor-browser The Tor Browser
	git clone https://git.dasho.dev/tor-browser.git
	Log \| Files \| Refs \| README \| LICENSE