sparse.rst (6318B)
1 .. _build_sparse: 2 3 ================ 4 Sparse Checkouts 5 ================ 6 7 The Firefox repository is large: over 230,000 files. That many files 8 can put a lot of strain on machines, tools, and processes. 9 10 Some version control tools have the ability to only populate a 11 working directory / checkout with a subset of files in the repository. 12 This is called *sparse checkout*. 13 14 Various tools in the Firefox repository are configured to work 15 when a sparse checkout is being used. 16 17 Sparse Checkouts in Mercurial 18 ============================= 19 20 Mercurial 4.3 introduced **experimental** support for sparse checkouts 21 in the official distribution (a Facebook-authored extension has 22 implemented the feature as a 3rd party extension for years). 23 24 To enable sparse checkout support in Mercurial, enable the ``sparse`` 25 extension:: 26 27 [extensions] 28 sparse = 29 30 The *sparseness* of the working directory is managed using 31 ``hg debugsparse``. Run ``hg help debugsparse`` and ``hg help -e sparse`` 32 for more info on the feature. 33 34 When a *sparse config* is enabled, the working directory only contains 35 files matching that config. You cannot ``hg add`` or ``hg remove`` files 36 outside the *sparse config*. 37 38 .. warning:: 39 40 Sparse support in Mercurial 4.3 does not have any backwards 41 compatibility guarantees. Expect things to change. Scripting against 42 commands or relying on behavior is strongly discouraged. 43 44 In-Tree Sparse Profiles 45 ======================= 46 47 Mercurial supports defining the sparse config using files under version 48 control. These are called *sparse profiles*. 49 50 Essentially, the sparse profiles are managed just like any other file in 51 the repository. When you ``hg update``, the sparse configuration is 52 evaluated against the sparse profile at the revision being updated to. 53 From an end-user perspective, you just need to *activate* a profile once 54 and files will be added or removed as appropriate whenever the versioned 55 profile file updates. 56 57 In the Firefox repository, the ``build/sparse-profiles`` directory 58 contains Mercurial *sparse profiles* files. 59 60 Each *sparse profile* essentially defines a list of file patterns 61 (see ``hg help patterns``) to include or exclude. See 62 ``hg help -e sparse`` for more. 63 64 Mach Support for Sparse Checkouts 65 ================================= 66 67 ``mach`` detects when a sparse checkout is being used and its 68 behavior may vary to accommodate this. 69 70 By default it is a fatal error if ``mach`` can't load one of the 71 ``mach_commands.py`` files it was told to. But if a sparse checkout 72 is being used, ``mach`` assumes that file isn't part of the sparse 73 checkout and to ignore missing file errors. This means that 74 running ``mach`` inside a sparse checkout will only have access 75 to the commands defined in files in the sparse checkout. 76 77 Sparse Checkouts in Automation 78 ============================== 79 80 ``hg robustcheckout`` (the extension/command used to perform clones 81 and working directory operations in automation) supports sparse checkout. 82 However, it has a number of limitations over Mercurial's default sparse 83 checkout implementation: 84 85 * Only supports 1 profile at a time 86 * Does not support non-profile sparse configs 87 * Does not allow transitioning from a non-sparse to sparse checkout or 88 vice-versa 89 90 These restrictions ensure that any sparse working directory populated by 91 ``hg robustcheckout`` is as consistent and robust as possible. 92 93 ``run-task`` (the low-level script for *bootstrapping* tasks in 94 automation) has support for sparse checkouts. 95 96 TaskGraph tasks using ``run-task`` can specify a ``sparse-profile`` 97 attribute in YAML (or in code) to denote the sparse profile file to 98 use. e.g.:: 99 100 run: 101 using: run-command 102 command: <command> 103 sparse-profile: taskgraph 104 105 This automagically results in ``run-task`` and ``hg robustcheckout`` 106 using the sparse profile defined in ``build/sparse-profiles/<value>``. 107 108 Pros and Cons of Sparse Checkouts 109 ================================= 110 111 The benefits of sparse checkout are that it makes the repository appear 112 to be smaller. This means: 113 114 * Less time performing working directory operations -> faster version 115 control operations 116 * Fewer files to consult -> faster operations 117 * Working directories only contain what is needed -> easier to understand 118 what everything does 119 120 Fewer files in the working directory also contributes to disadvantages: 121 122 * Searching may not yield hits because a file isn't in the sparse 123 checkout. e.g. a *global* search and replace may not actually be 124 *global* after all. 125 * Tools performing filesystem walking or path globbing (e.g. 126 ``**/*.js``) may fail to find files because they don't exist. 127 * Various tools and processes make assumptions that all files in the 128 repository are always available. 129 130 There can also be problems caused by mixing sparse and non-sparse 131 checkouts. For example, if a process in automation is using sparse 132 and a local developer is not using sparse, things may work for the 133 local developer but fail in automation (because a file isn't included 134 in the sparse configuration and not available to automation. 135 Furthermore, if environments aren't using exactly the same sparse 136 configuration, differences can contribute to varying behavior. 137 138 When Should Sparse Checkouts Be Used? 139 ===================================== 140 141 Developers are discouraged from using sparse checkouts for local work 142 until tools for handling sparse checkouts have improved. In particular, 143 Mercurial's support for sparse is still experimental and various Firefox 144 tools make assumptions that all files are available. Developers should 145 use sparse checkout at their own risk. 146 147 The use of sparse checkouts in automation is a performance versus 148 robustness trade-off. Use of sparse checkouts will make automation 149 faster because machines will only have to manage a few thousand files 150 in a checkout instead of a few hundred thousand. This can potentially 151 translate to minutes saved per machine day. At the scale of thousands 152 of machines, the savings can be significant. But adopting sparse 153 checkouts will open up new avenues for failures. (See section above.) 154 If a process is isolated (in terms of file access) and well-understood, 155 sparse checkout can likely be leveraged with little risk. But if a 156 process is doing things like walking the filesystem and performing 157 lots of wildcard matching, the dangers are higher.