tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

sparse.rst (6318B)


      1 .. _build_sparse:
      2 
      3 ================
      4 Sparse Checkouts
      5 ================
      6 
      7 The Firefox repository is large: over 230,000 files. That many files
      8 can put a lot of strain on machines, tools, and processes.
      9 
     10 Some version control tools have the ability to only populate a
     11 working directory / checkout with a subset of files in the repository.
     12 This is called *sparse checkout*.
     13 
     14 Various tools in the Firefox repository are configured to work
     15 when a sparse checkout is being used.
     16 
     17 Sparse Checkouts in Mercurial
     18 =============================
     19 
     20 Mercurial 4.3 introduced **experimental** support for sparse checkouts
     21 in the official distribution (a Facebook-authored extension has
     22 implemented the feature as a 3rd party extension for years).
     23 
     24 To enable sparse checkout support in Mercurial, enable the ``sparse``
     25 extension::
     26 
     27   [extensions]
     28   sparse =
     29 
     30 The *sparseness* of the working directory is managed using
     31 ``hg debugsparse``. Run ``hg help debugsparse`` and ``hg help -e sparse``
     32 for more info on the feature.
     33 
     34 When a *sparse config* is enabled, the working directory only contains
     35 files matching that config. You cannot ``hg add`` or ``hg remove`` files
     36 outside the *sparse config*.
     37 
     38 .. warning::
     39 
     40   Sparse support in Mercurial 4.3 does not have any backwards
     41   compatibility guarantees. Expect things to change. Scripting against
     42   commands or relying on behavior is strongly discouraged.
     43 
     44 In-Tree Sparse Profiles
     45 =======================
     46 
     47 Mercurial supports defining the sparse config using files under version
     48 control. These are called *sparse profiles*.
     49 
     50 Essentially, the sparse profiles are managed just like any other file in
     51 the repository. When you ``hg update``, the sparse configuration is
     52 evaluated against the sparse profile at the revision being updated to.
     53 From an end-user perspective, you just need to *activate* a profile once
     54 and files will be added or removed as appropriate whenever the versioned
     55 profile file updates.
     56 
     57 In the Firefox repository, the ``build/sparse-profiles`` directory
     58 contains Mercurial *sparse profiles* files.
     59 
     60 Each *sparse profile* essentially defines a list of file patterns
     61 (see ``hg help patterns``) to include or exclude. See
     62 ``hg help -e sparse`` for more.
     63 
     64 Mach Support for Sparse Checkouts
     65 =================================
     66 
     67 ``mach`` detects when a sparse checkout is being used and its
     68 behavior may vary to accommodate this.
     69 
     70 By default it is a fatal error if ``mach`` can't load one of the
     71 ``mach_commands.py`` files it was told to. But if a sparse checkout
     72 is being used, ``mach`` assumes that file isn't part of the sparse
     73 checkout and to ignore missing file errors. This means that
     74 running ``mach`` inside a sparse checkout will only have access
     75 to the commands defined in files in the sparse checkout.
     76 
     77 Sparse Checkouts in Automation
     78 ==============================
     79 
     80 ``hg robustcheckout`` (the extension/command used to perform clones
     81 and working directory operations in automation) supports sparse checkout.
     82 However, it has a number of limitations over Mercurial's default sparse
     83 checkout implementation:
     84 
     85 * Only supports 1 profile at a time
     86 * Does not support non-profile sparse configs
     87 * Does not allow transitioning from a non-sparse to sparse checkout or
     88  vice-versa
     89 
     90 These restrictions ensure that any sparse working directory populated by
     91 ``hg robustcheckout`` is as consistent and robust as possible.
     92 
     93 ``run-task`` (the low-level script for *bootstrapping* tasks in
     94 automation) has support for sparse checkouts.
     95 
     96 TaskGraph tasks using ``run-task`` can specify a ``sparse-profile``
     97 attribute in YAML (or in code) to denote the sparse profile file to
     98 use. e.g.::
     99 
    100   run:
    101       using: run-command
    102       command: <command>
    103       sparse-profile: taskgraph
    104 
    105 This automagically results in ``run-task`` and ``hg robustcheckout``
    106 using the sparse profile defined in ``build/sparse-profiles/<value>``.
    107 
    108 Pros and Cons of Sparse Checkouts
    109 =================================
    110 
    111 The benefits of sparse checkout are that it makes the repository appear
    112 to be smaller. This means:
    113 
    114 * Less time performing working directory operations -> faster version
    115  control operations
    116 * Fewer files to consult -> faster operations
    117 * Working directories only contain what is needed -> easier to understand
    118  what everything does
    119 
    120 Fewer files in the working directory also contributes to disadvantages:
    121 
    122 * Searching may not yield hits because a file isn't in the sparse
    123  checkout. e.g. a *global* search and replace may not actually be
    124  *global* after all.
    125 * Tools performing filesystem walking or path globbing (e.g.
    126  ``**/*.js``) may fail to find files because they don't exist.
    127 * Various tools and processes make assumptions that all files in the
    128  repository are always available.
    129 
    130 There can also be problems caused by mixing sparse and non-sparse
    131 checkouts. For example, if a process in automation is using sparse
    132 and a local developer is not using sparse, things may work for the
    133 local developer but fail in automation (because a file isn't included
    134 in the sparse configuration and not available to automation.
    135 Furthermore, if environments aren't using exactly the same sparse
    136 configuration, differences can contribute to varying behavior.
    137 
    138 When Should Sparse Checkouts Be Used?
    139 =====================================
    140 
    141 Developers are discouraged from using sparse checkouts for local work
    142 until tools for handling sparse checkouts have improved. In particular,
    143 Mercurial's support for sparse is still experimental and various Firefox
    144 tools make assumptions that all files are available. Developers should
    145 use sparse checkout at their own risk.
    146 
    147 The use of sparse checkouts in automation is a performance versus
    148 robustness trade-off. Use of sparse checkouts will make automation
    149 faster because machines will only have to manage a few thousand files
    150 in a checkout instead of a few hundred thousand. This can potentially
    151 translate to minutes saved per machine day. At the scale of thousands
    152 of machines, the savings can be significant. But adopting sparse
    153 checkouts will open up new avenues for failures. (See section above.)
    154 If a process is isolated (in terms of file access) and well-understood,
    155 sparse checkout can likely be leveraged with little risk. But if a
    156 process is doing things like walking the filesystem and performing
    157 lots of wildcard matching, the dangers are higher.