tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

manual.rst (10303B)


      1 :orphan:
      2 
      3 Turning on Firefox tests for a new configuration (manual)
      4 =========================================================
      5 
      6 You are ready to go with turning on Firefox tests for a new config.  Once you
      7 get to this stage, you will have seen a try push with all the tests running
      8 (many not green) to verify some tests pass and there are enough machines
      9 available to run tests.
     10 
     11 For the purpose of this document, assume you are tasked with upgrading Windows
     12 10 OS from 1803 -> 1903. To simplify this we can call this `windows_1903`, and
     13 we need to:
     14 
     15 * push to try
     16 * analyze test failures
     17 * disable tests in manifests
     18 * repeat try push until no failures
     19 * file bugs for test failures
     20 * land changes and turn on tests
     21 * turn on run only failures
     22 
     23 There are many edge cases, and I will outline them inside each step.
     24 
     25 
     26 Push to Try Server
     27 ------------------
     28 
     29 As you have new machines (or cloud instances) available with the updated
     30 OS/config, it is time to push to try.
     31 
     32 In order to run all tests, we would need to execute:
     33  ``./mach try fuzzy --no-artifact -q 'test-windows !-raptor- !-talos- --rebuild 10``
     34 
     35 There are a few exceptions here:
     36 
     37 * Perf tests don't need to be run (hence the ``!-raptor- !-talos-``)
     38 * Need to make sure we are not building with artifact builds (hence the
     39   ``--no-artifact``)
     40 * There are jobs hidden behind tier-3, some for a good reason (code coverage is
     41   a good example, but fission tests might not be green)
     42 
     43 The last piece to sort out is running on the new config, here are some
     44 considerations for new configs:
     45 
     46  * duplicated jobs (i.e. fission, a11y-checks), you can just run those specific
     47    tasks: ``./mach try fuzzy --no-artifact -q 'test-windows fission --rebuild
     48    5``
     49  * new OS/hardware (i.e. aarch64, os upgrade), you need to reference the new
     50    hardware, typically this is with ``--worker-override``: ``./mach try fuzzy
     51    --no-artifact -q 'test-windows --rebuild 10 --worker-override
     52    t-win10-64=gecko-t/t-win10-64-1903``
     53 
     54    * the risk here is a scenario where hardware is limited, then ``--rebuild
     55      10`` will create too many tasks and some will expire.
     56    * in low hardware situations, either run a subset of tests (i.e.
     57      web-platform-tests, mochitest), or ``--rebuild 3`` and repeat.
     58 
     59 
     60 Analyze Test Failures
     61 ---------------------
     62 
     63 A try push will take many hours, it is best to push when you start work and
     64 then results will be ready later in your day, or push at the end of your day
     65 and results will be ready when you come back to work the next day.  Please
     66 make sure some tasks start before walking away, otherwise a small typo can
     67 delay this process by hours or a full day.
     68 
     69 The best way to look at test failures is to use Push Health to avoid misleading
     70 data.  Push Health will bucket failures into possible regressions, known
     71 regression, etc. When looking at 5 data points (from ``--rebuild 10``), this
     72 will filter out intermittent failures.
     73 
     74 There are many reasons you might have invalid or misleading data:
     75 
     76 # Tests fail intermittently, we need a pattern to know if it is consistent or
     77 intermittent.
     78 # We still want to disable high frequency intermittent tests, those are just
     79 annoying.
     80 # You could be pushing off a bad base revision (regression or intermittent that
     81 comes from the base revision).
     82 # The machines you run on could be bad, skewing the data.
     83 # Infrastructure problems could cause jobs to fail at random places, repeated
     84 jobs filter that out.
     85 # Some failures could affect future tests in the same browser session or tasks.
     86 # If a crash occurs, or we timeout- it is possible that we will not run all of
     87 the tests in the task, therefore believing a test was run 5 times, but maybe it
     88 was only run once (and failed), or never run at all.
     89 # Task failures that do not have a test name (leak on shutdown, crash on
     90 shutdown, timeout on shutdown, etc.)
     91 
     92 That is a long list of reasons to not trust the data, luckily most of the time
     93 using ``--rebuild 10`` will give us enough data to give enough confidence we
     94 found all failures and can ignore random/intermittent failures.
     95 
     96 Knowing the reasons for misleading data, here is a way to use `Push Health
     97 <https://treeherder.mozilla.org/push-health/push?revision=abaff26f8e084ac719bea0438dba741ace3cf5d8&repo=try&testGroup=pr>`__.
     98 
     99 * Alternatively, you could use the `API
    100   <https://treeherder.mozilla.org/api/project/try/push/health/?revision=abaff26f8e084ac719bea0438dba741ace3cf5d8>`__
    101   to get raw data and work towards building a tool
    102 * If you write a tool, you need to parse the resulting JSON file and keep in
    103   mind to build a list of failures and match it with a list of jobnames to find
    104   how many times the job ran and failed/passed.
    105 
    106 The main goal here is to know what <path>/<filenames> are failing, and having a
    107 list of those.  Ideally you would record some additional information like
    108 timeout, crash, failure, etc.  In the end you might end up with::
    109 
    110     dom/html/test/test_fullscreen-api.html, scrollbar
    111     gfx/layers/apz/test/mochitest/test_group_hittest.html, scrollbar
    112     image/test/mochitest/test_animSVGImage.html, timeout
    113     browser/base/content/test/general/browser_restore_isAppTab.js, crashed
    114 
    115 
    116 
    117 
    118 Disable Tests in the Manifest Files
    119 -----------------------------------
    120 
    121 The code sheriffs have been using `this documentation
    122 <https://wiki.mozilla.org/Auto-tools/Projects/Stockwell/disable-recommended>`__
    123 for training and reference when they disable intermittents.
    124 
    125 First you need to add a keyword to be available in the manifest (e.g. ``skip-if
    126 = windows_1903``).
    127 
    128 There are many exceptions, the bulk of the work will fall into one of 4
    129 categories:
    130 
    131 # `manifestparser <mochitest_xpcshell_manifest_keywords>`_: \*.toml (mochitest*,
    132 firefox-ui, marionette, xpcshell) easy to edit by adding a ``skip-if =
    133 windows_1903 # <comment>``, a few exceptions here
    134 # `reftest <reftest_manifest_keywords>`_: \*.list (reftest, crashtest) need to
    135 add a ``fuzzy-if(windows_1903, A, B)``, this is more specific
    136 # web-platform-test: testing/web-platform/meta/\*\*.ini (wpt, wpt-reftest,
    137 etc.) need to edit/add testing/web-platform/meta/<path>/<testname>.ini, and add
    138 expected results
    139 # Other (compiled tests, jsreftest, etc.) edit source code, ask for help.
    140 
    141 Basically we want to take every non-intermittent failure found from push health
    142 and edit the manifest, this typically means:
    143 
    144 * Finding the proper manifest.
    145 * Adding the right text to the manifest.
    146 
    147 To find the proper manifest, it is typically <path>/<harness>.[toml|list].
    148 There are exceptions and if in doubt use searchfox.org/ to find the manifest
    149 which contains the testname.
    150 
    151 Once you have the manifest, open it in an editor, and search for the exact test
    152 name (there could be similar named tests).
    153 
    154 Rerun Try Push, Repeat as Necessary
    155 -----------------------------------
    156 
    157 It is important to test your changes and for a new platform that will be
    158 sheriffed, to rerun all the tests at scale.
    159 
    160 With your change in a commit, push again to try with ``--rebuild 10`` and come
    161 back the next day.
    162 
    163 As there are so many edge cases, it is quite likely that you will have more
    164 failures, mentally plan on 3 iterations of this, where each iteration has fewer
    165 failures.
    166 
    167 Once you get a full push to show no persistent failures, it is time to land
    168 those changes and turn on the new tests. There is a large risk here that the
    169 longer you take to find all failures, the greater the chance of:
    170 
    171  * Bitrot of your patch
    172  * New tests being added which could fail on your config
    173  * Other edits to tests/tools which could affect your new config
    174 
    175 Since the new config process is designed to find failures fast and get the
    176 changes landed fast, we do not need to ask developers for review, that comes
    177 after the new config is running successfully where we notify the teams of what
    178 tests are failing.
    179 
    180 File Bugs for Test Failures
    181 ---------------------------
    182 
    183 Once the failure jobs are running on mozilla-central, now we have full coverage
    184 and the ability to run tests on try server.  There could be >100 tests that are
    185 marked as ``skip-if`` and that would take a lot of time to file bugs.  Instead
    186 we will file a bug for each manifest that is edited, typically this reduces the
    187 bugs to about 40% the total tests (average out to 2.5 test failures/manifest).
    188 
    189 When filing the bug, indicate the timeline, how to run the failure, link to the
    190 bug where we created the config, describe briefly the config change (i.e.
    191 upgrade windows 10 from version 1803 to 1903), and finally needinfo the triage
    192 owner indicating this is a heads up and these tests are running reguarly on
    193 mozilla-central for the next 6-7 weeks.
    194 
    195 Land Changes and Turn on Tests
    196 ------------------------------
    197 
    198 After you have a green test run, it is time to land the patches.  There could
    199 be changes needed to the taskgraph in order to add the new hardware type and
    200 duplicate tests to run on both the old and the new, or create a new variant and
    201 denote which tests to run on that variant.
    202 
    203 Using our example of ``windows_1903``, this would be a new worker type that
    204 would require these edits:
    205 
    206 * `transforms/tests.py <https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py#97>`__ (duplicate windows 10 entries)
    207 * `test-platforms.py <https://searchfox.org/mozilla-central/source/taskcluster/kinds/test/test-platforms.yml#229>`__ (copy windows10 debug/opt/shippable/asan entries and make win10_1903)
    208 * `test-sets.py <https://searchfox.org/mozilla-central/source/taskcluster/kinds/test/test-sets.yml#293>`__ (ideally you need nothing, otherwise copy ``windows-tests`` and edit the test list)
    209 
    210 In general this should allow you to have tests scheduled with no custom flags
    211 in try server and all of these will be scheduled by default on
    212 ``mozilla-central``, ``autoland``, and ``release-branches``.
    213 
    214 Turn on Run Only Failures
    215 -------------------------
    216 
    217 Now that we have tests running regularly, the next step is to take all the
    218 disabled tests and run them in the special failures job.
    219 
    220 We have a basic framework created, but for every test harness (i.e. xpcshell,
    221 mochitest-gpu, browser-chrome, devtools, web-platform-tests, crashtest, etc.),
    222 there will need to be a corresponding tier-3 job that is created.
    223 
    224 TODO: point to examples of how to add this after we get our first jobs running.