tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

index.rst (8668B)


      1 ============
      2 SpiderMonkey
      3 ============
      4 
      5 *SpiderMonkey* is the *JavaScript* and *WebAssembly* implementation library of
      6 the *Mozilla Firefox* web browser. The implementation behaviour is defined by
      7 the `ECMAScript <https://tc39.es/ecma262/>`_ and `WebAssembly
      8 <https://webassembly.org/>`_ specifications.
      9 
     10 Much of the internal technical documentation of the engine can be found
     11 throughout the source files themselves by looking for comments labelled with
     12 `[SMDOC]`_. Information about the team, our processes, and about embedding
     13 *SpiderMonkey* in your own projects can be found at https://spidermonkey.dev.
     14 
     15 Specific documentation on a few topics is available at:
     16 
     17 .. toctree::
     18   :maxdepth: 1
     19 
     20   build
     21   test
     22   hacking_tips
     23   Debugger/index
     24   SavedFrame/index
     25   feature_checklist
     26   bytecode_checklist
     27   use_counter
     28 
     29 
     30 Components of SpiderMonkey
     31 ##########################
     32 
     33 ๐Ÿงน Garbage Collector
     34 *********************
     35 
     36 .. toctree::
     37   :maxdepth: 2
     38   :hidden:
     39 
     40   Overview <gc>
     41   Rooting Hazard Analysis <HazardAnalysis/index>
     42   Running the Analysis <HazardAnalysis/running>
     43 
     44 *JavaScript* is a garbage collected language and at the core of *SpiderMonkey*
     45 we manage a garbage-collected memory heap. Elements of this heap have a base
     46 C++ type of `gc::Cell`_. Each round of garbage collection will free up any
     47 *Cell* that is not referenced by a *root* or another live *Cell* in turn.
     48 
     49 See :doc:`GC overview<gc>` for more details.
     50 
     51 
     52 ๐Ÿ“ฆ JS::Value and JSObject
     53 **************************
     54 
     55 *JavaScript* values are divided into either objects or primitives
     56 (*Undefined*, *Null*, *Boolean*, *Number*, *BigInt*, *String*, or *Symbol*).
     57 Values are represented with the `JS::Value`_ type which may in turn point to
     58 an object that extends from the `JSObject`_ type. Objects include both plain
     59 *JavaScript* objects and exotic objects representing various things from
     60 functions to *ArrayBuffers* to *HTML Elements* and more.
     61 
     62 Most objects extend ``NativeObject`` (which is a subtype of ``JSObject``)
     63 which provides a way to store properties as key-value pairs similar to a hash
     64 table. These objects hold their *values* and point to a *Shape* that
     65 represents the set of *keys*. Similar objects point to the same *Shape* which
     66 saves memory and allows the JITs to quickly work with objects similar to ones
     67 it has seen before. See the `[SMDOC] Shapes`_ comment for more details.
     68 
     69 C++ (and Rust) code may create and manipulate these objects using the
     70 collection of interfaces we traditionally call the **JSAPI**.
     71 
     72 
     73 ๐Ÿ—ƒ๏ธ JavaScript Parser
     74 *********************
     75 
     76 In order to evaluate script text, we parse it using the *Parser* into an
     77 `Abstract Syntax Tree`_ (AST) temporarily and then run the *BytecodeEmitter*
     78 (BCE) to generate `Bytecode`_ and associated metadata. We refer to this
     79 resulting format as `Stencil`_ and it has the helpful characteristic that it
     80 does not utilize the Garbage Collector. The *Stencil* can then be
     81 instantiated into a series of GC *Cells* that can be mutated and understood
     82 by the execution engines described below.
     83 
     84 Each function as well as the top-level itself generates a distinct script.
     85 This is the unit of execution granularity since functions may be set as
     86 callbacks that the host runs at a later time. There are both
     87 ``ScriptStencil`` and ``js::BaseScript`` forms of scripts.
     88 
     89 By default, the parser runs in a mode called *syntax* or *lazy* parsing where
     90 we avoid generating full bytecode for functions within the source that we are
     91 parsing. This lazy parsing is still required to check for all *early errors*
     92 that the specification describes. When such a lazily compiled inner function
     93 is first executed, we recompile just that function in a process called
     94 *delazification*. Lazy parsing avoids allocating the AST and bytecode which
     95 saves both CPU time and memory. In practice, many functions are never
     96 executed during a given load of a webpage so this delayed parsing can be
     97 quite beneficial.
     98 
     99 
    100 โš™๏ธ JavaScript Interpreter
    101 **************************
    102 
    103 The *bytecode* generated by the parser may be executed by an interpreter
    104 written in C++ that manipulates objects in the GC heap and invokes native
    105 code of the host (eg. web browser). See `[SMDOC] Bytecode Definitions`_ for
    106 descriptions of each bytecode opcode and ``js/src/vm/Interpreter.cpp`` for
    107 their implementation.
    108 
    109 
    110 โšก JavaScript JITs
    111 *******************
    112 
    113 .. toctree::
    114   :maxdepth: 1
    115   :hidden:
    116 
    117   MIR-optimizations/index
    118 
    119 In order to speed up execution of *bytecode*, we use a series of Just-In-Time
    120 (JIT) compilers to generate specialized machine code (eg. x86, ARM, etc)
    121 tailored to the *JavaScript* that is run and the data that is processed.
    122 
    123 As an individual script runs more times (or has a loop that runs many times)
    124 we describe it as getting *hotter* and at certain thresholds we *tier-up* by
    125 JIT-compiling it. Each subsequent JIT tier spends more time compiling but
    126 aims for better execution performance.
    127 
    128 Baseline Interpreter
    129 --------------------
    130 
    131 The *Baseline Interpreter* is a hybrid interpreter/JIT that interprets the
    132 *bytecode* one opcode at a time, but attaches small fragments of code called
    133 *Inline Caches* (ICs) that rapidly speed-up executing the same opcode the next
    134 time (if the data is similar enough). See the `[SMDOC] JIT Inline Caches`_
    135 comment for more details.
    136 
    137 Baseline Compiler
    138 -----------------
    139 
    140 The *Baseline Compiler* use the same *Inline Caches* mechanism from the
    141 *Baseline Interpreter* but additionally translates the entire bytecode to
    142 native machine code. This removes dispatch overhead and does minor local
    143 optimizations. This machine code still calls back into C++ for complex
    144 operations. The translation is very fast but the ``BaselineScript`` uses
    145 memory and requires ``mprotect`` and flushing CPU caches.
    146 
    147 WarpMonkey
    148 ----------
    149 
    150 The *WarpMonkey* JIT replaces the former *IonMonkey* engine and is the
    151 highest level of optimization for the most frequently run scripts. It is able
    152 to inline other scripts and specialize code based on the data and arguments
    153 being processed.
    154 
    155 We translate the *bytecode* and *Inline Cache* data into a Mid-level
    156 `Intermediate Representation`_ (Ion MIR) representation. This graph is
    157 transformed and optimized before being *lowered* to a Low-level Intermediate
    158 Representation (Ion LIR). This *LIR* performs register allocation and then
    159 generates native machine code in a process called *Code Generation*.
    160 
    161 See `MIR Optimizations`_ for an overview of MIR optimizations.
    162 
    163 The optimizations here assume that a script continues to see data similar
    164 what has been seen before. The *Baseline* JITs are essential to success here
    165 because they generate *ICs* that match observed data. If after a script is
    166 compiled with *Warp*, it encounters data that it is not prepared to handle it
    167 performs a *bailout*. The *bailout* mechanism reconstructs the native machine
    168 stack frame to match the layout used by the *Baseline Interpreter* and then
    169 branches to that interpreter as though we were running it all along. Building
    170 this stack frame may use special side-table saved by *Warp* to reconstruct
    171 values that are not otherwise available.
    172 
    173 
    174 ๐ŸŸช WebAssembly
    175 ***************
    176 
    177 In addition to *JavaScript*, the engine is also able to execute *WebAssembly*
    178 (WASM) sources.
    179 
    180 WASM-Baseline (RabaldrMonkey)
    181 -----------------------------
    182 
    183 This engine performs fast translation to machine code in order to minimize
    184 latency to first execution.
    185 
    186 WASM-Ion (BaldrMonkey)
    187 ----------------------
    188 
    189 This engine translates the WASM input into same *MIR* form that *WarpMonkey*
    190 uses and uses the *IonBackend* to optimize. These optimizations (and in
    191 particular, the register allocation) generate very fast native machine code.
    192 
    193 
    194 .. _gc::Cell: https://searchfox.org/mozilla-central/search?q=[SMDOC]+GC+Cell
    195 .. _JSObject: https://searchfox.org/mozilla-central/search?q=[SMDOC]+JSObject+layout
    196 .. _JS::Value: https://searchfox.org/mozilla-central/search?q=[SMDOC]+JS%3A%3AValue+type&path=js%2F
    197 .. _[SMDOC]: https://searchfox.org/mozilla-central/search?q=[SMDOC]&path=js%2F
    198 .. _[SMDOC] Shapes: https://searchfox.org/mozilla-central/search?q=[SMDOC]+Shapes
    199 .. _[SMDOC] Bytecode Definitions: https://searchfox.org/mozilla-central/search?q=[SMDOC]+Bytecode+Definitions&path=js%2F
    200 .. _[SMDOC] JIT Inline Caches: https://searchfox.org/mozilla-central/search?q=[SMDOC]+JIT+Inline+Caches
    201 .. _Stencil: https://searchfox.org/mozilla-central/search?q=[SMDOC]+Script+Stencil
    202 .. _Bytecode: https://en.wikipedia.org/wiki/Bytecode
    203 .. _Abstract Syntax Tree: https://en.wikipedia.org/wiki/Abstract_syntax_tree
    204 .. _Intermediate Representation: https://en.wikipedia.org/wiki/Intermediate_representation
    205 .. _MIR Optimizations: ./MIR-optimizations/index.html