tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

expectation.rst (14258B)


      1 Test Metadata
      2 =============
      3 
      4 Directory Layout
      5 ----------------
      6 
      7 Metadata files must be stored under the ``metadata`` directory passed
      8 to the test runner. The directory layout follows that of
      9 web-platform-tests with each test source path having a corresponding
     10 metadata file. Because the metadata path is based on the source file
     11 path, files that generate multiple URLs e.g. tests with multiple
     12 variants, or multi-global tests generated from an ``any.js`` input
     13 file, share the same metadata file for all their corresponding
     14 tests. The metadata path under the ``metadata`` directory is the same
     15 as the source path under the ``tests`` directory, with an additional
     16 ``.ini`` suffix.
     17 
     18 For example a test with URL::
     19 
     20  /spec/section/file.html?query=param
     21 
     22 generated from a source file with path::
     23 
     24  <tests root>/spec/section.file.html
     25 
     26 would have a metadata file ::
     27 
     28  <metadata root>/spec/section/file.html.ini
     29 
     30 As an optimisation, files which produce only default results
     31 (i.e. ``PASS`` or ``OK``), and which don't have any other associated
     32 metadata, don't require a corresponding metadata file.
     33 
     34 Directory Metadata
     35 ~~~~~~~~~~~~~~~~~~
     36 
     37 In addition to per-test metadata, default metadata can be applied to
     38 all the tests in a given source location, using a ``__dir__.ini``
     39 metadata file. For example to apply metadata to all tests under
     40 ``<tests root>/spec/`` add the metadata in ``<tests
     41 root>/spec/__dir__.ini``.
     42 
     43 Metadata Format
     44 ---------------
     45 The format of the metadata files is based on the ini format. Files are
     46 divided into sections, each (apart from the root section) having a
     47 heading enclosed in square braces. Within each section are key-value
     48 pairs. There are several notable differences from standard .ini files,
     49 however:
     50 
     51 * Sections may be hierarchically nested, with significant whitespace
     52   indicating nesting depth.
     53 
     54 * Only ``:`` is valid as a key/value separator
     55 
     56 A simple example of a metadata file is::
     57 
     58  root_key: root_value
     59 
     60  [section]
     61    section_key: section_value
     62 
     63    [subsection]
     64       subsection_key: subsection_value
     65 
     66  [another_section]
     67    another_key: [list, value]
     68 
     69 Conditional Values
     70 ~~~~~~~~~~~~~~~~~~
     71 
     72 In order to support values that depend on some external data, the
     73 right hand side of a key/value pair can take a set of conditionals
     74 rather than a plain value. These values are placed on a new line
     75 following the key, with significant indentation. Conditional values
     76 are prefixed with ``if`` and terminated with a colon, for example::
     77 
     78  key:
     79    if cond1: value1
     80    if cond2: value2
     81    value3
     82 
     83 In this example, the value associated with ``key`` is determined by
     84 first evaluating ``cond1`` against external data. If that is true,
     85 ``key`` is assigned the value ``value1``, otherwise ``cond2`` is
     86 evaluated in the same way. If both ``cond1`` and ``cond2`` are false,
     87 the unconditional ``value3`` is used.
     88 
     89 Conditions themselves use a Python-like expression syntax. Operands
     90 can either be variables, corresponding to data passed in, numbers
     91 (integer or floating point; exponential notation is not supported) or
     92 quote-delimited strings. Equality is tested using ``==`` and
     93 inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are
     94 used in the expected way. Parentheses can also be used for
     95 grouping. For example::
     96 
     97  key:
     98    if (a == 2 or a == 3) and b == "abc": value1
     99    if a == 1 or b != "abc": value2
    100    value3
    101 
    102 Here ``a`` and ``b`` are variables, the value of which will be
    103 supplied when the metadata is used.
    104 
    105 Web-Platform-Tests Metadata
    106 ---------------------------
    107 
    108 When used for expectation data, metadata files have the following format:
    109 
    110 * A section per test URL provided by the corresponding source file,
    111   with the section heading being the part of the test URL following
    112   the last ``/`` in the path (this allows multiple tests in a single
    113   metadata file with the same path part of the URL, but different
    114   query parts). This may be omitted if there's no non-default
    115   metadata for the test.
    116 
    117 * A subsection per subtest, with the heading being the title of the
    118   subtest. This may be omitted if there's no non-default metadata for
    119   the subtest.
    120 
    121 * The following known keys:
    122 
    123   :expected:
    124      The expectation value or values of each (sub)test. In
    125      the case this value is a list, the first value represents the
    126      typical expected test outcome, and subsequent values indicate
    127      known intermittent outcomes e.g. ``expected: [PASS, ERROR]``
    128      would indicate a test that usually passes but has a known-flaky
    129      ``ERROR`` outcome.
    130 
    131   :disabled:
    132     Any values apart from the special value ``@False``
    133     indicates that the (sub)test is disabled and should either not be
    134     run (for tests) or that its results should be ignored (subtests).
    135 
    136   :restart-after:
    137     Any value apart from the special value ``@False``
    138     indicates that the runner should restart the browser after running
    139     this test (e.g. to clear out unwanted state).
    140 
    141   :fuzzy:
    142     Used for reftests. This is interpreted as a list containing
    143     entries like ``<meta name=fuzzy>`` content value, which consists of
    144     an optional reference identifier followed by a colon, then a range
    145     indicating the maximum permitted pixel difference per channel, then
    146     semicolon, then a range indicating the maximum permitted total
    147     number of differing pixels. The reference identifier is either a
    148     single relative URL, resolved against the base test URL, in which
    149     case the fuzziness applies to any comparison with that URL, or
    150     takes the form lhs URL, comparison, rhs URL, in which case the
    151     fuzziness only applies for any comparison involving that specific
    152     pair of URLs. Some illustrative examples are given below.
    153 
    154   :implementation-status:
    155     One of the values ``implementing``,
    156     ``not-implementing`` or ``backlog``. This is used in conjunction
    157     with the ``--skip-implementation-status`` command line argument to
    158     ``wptrunner`` to ignore certain features where running the test is
    159     low value.
    160 
    161   :tags:
    162     A list of labels associated with a given test that can be
    163     used in conjunction with the ``--tag`` command line argument to
    164     ``wptrunner`` for test selection.
    165 
    166   In addition there are extra arguments which are currently tied to
    167   specific implementations. For example Gecko-based browsers support
    168   ``min-asserts``, ``max-asserts``, ``prefs``, ``lsan-disabled``,
    169   ``lsan-allowed``, ``lsan-max-stack-depth``, ``leak-allowed``, and
    170   ``leak-threshold`` properties.
    171 
    172 * Variables taken from the ``RunInfo`` data which describe the
    173   configuration of the test run. Common properties include:
    174 
    175   :product: A string giving the name of the browser under test
    176   :browser_channel: A string giving the release channel of the browser under test
    177   :debug: A Boolean indicating whether the build is a debug build
    178   :os: A string  the operating system
    179   :version: A string indicating the particular version of that operating system
    180   :processor: A string indicating the processor architecture.
    181 
    182   This information is typically provided by :py:mod:`mozinfo`, but
    183   different environments may add additional information, and not all
    184   the properties above are guaranteed to be present in all
    185   environments. The definitive list of available properties for a
    186   specific run may be determined by looking at the ``run_info`` key
    187   in the ``wptreport.json`` output for the run.
    188 
    189 * Top level keys are taken as defaults for the whole file. So, for
    190   example, a top level key with ``expected: FAIL`` would indicate
    191   that all tests and subtests in the file are expected to fail,
    192   unless they have an ``expected`` key of their own.
    193 
    194 An simple example metadata file might look like::
    195 
    196  [test.html?variant=basic]
    197    type: testharness
    198 
    199    [Test something unsupported]
    200       expected: FAIL
    201 
    202    [Test with intermittent statuses]
    203       expected: [PASS, TIMEOUT]
    204 
    205  [test.html?variant=broken]
    206    expected: ERROR
    207 
    208  [test.html?variant=unstable]
    209    disabled: http://test.bugs.example.org/bugs/12345
    210 
    211 A more complex metadata file with conditional properties might be::
    212 
    213  [canvas_test.html]
    214    expected:
    215      if os == "mac": FAIL
    216      if os == "windows" and version == "XP": FAIL
    217      PASS
    218 
    219 Note that ``PASS`` in the above works, but is unnecessary since it's
    220 the default expected result.
    221 
    222 A metadata file with fuzzy reftest values might be::
    223 
    224  [reftest.html]
    225    fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20]
    226 
    227 In this case the default fuzziness for any comparison would be to
    228 require a maximum difference per channel of less than or equal to 10
    229 and less than or equal to 200 total pixels different. For any
    230 comparison involving ref1.html on the right hand side, the limits
    231 would instead be a difference per channel not more than 20 and a total
    232 difference count of not less than 200 and not more than 300. For the
    233 specific comparison ``subtest1.html == ref2.html`` (both resolved against
    234 the test URL) these limits would instead be 10 to 15 and 0 to 20,
    235 respectively.
    236 
    237 Generating Expectation Files
    238 ----------------------------
    239 
    240 wpt provides the tool ``wpt update-expectations`` command to generate
    241 expectation files from the results of a set of test runs. The basic
    242 syntax for this is::
    243 
    244  ./wpt update-expectations [options] [logfile]...
    245 
    246 Each ``logfile`` is a wptreport log file from a previous run. These
    247 can be generated from wptrunner using the ``--log-wptreport`` option
    248 e.g. ``--log-wptreport=wptreport.json``.
    249 
    250 ``update-expectations`` takes several options:
    251 
    252 --full  Overwrite all the expectation data for any tests that have a
    253        result in the passed log files, not just data for the same run
    254        configuration.
    255 
    256 --disable-intermittent  When updating test results, disable tests that
    257                        have inconsistent results across many
    258                        runs. This can precede a message providing a
    259                        reason why that test is disable. If no message
    260                        is provided, ``unstable`` is the default text.
    261 
    262 --update-intermittent  When this option is used, the ``expected`` key
    263                       stores expected intermittent statuses in
    264                       addition to the primary expected status. If
    265                       there is more than one status, it appears as a
    266                       list. The default behaviour of this option is to
    267                       retain any existing intermittent statuses in the
    268                       list unless ``--remove-intermittent`` is
    269                       specified.
    270 
    271 --remove-intermittent  This option is used in conjunction with
    272                       ``--update-intermittent``.  When the
    273                       ``expected`` statuses are updated, any obsolete
    274                       intermittent statuses that did not occur in the
    275                       specified log files are removed from the list.
    276 
    277 Property Configuration
    278 ~~~~~~~~~~~~~~~~~~~~~~
    279 
    280 In cases where the expectation depends on the run configuration ``wpt
    281 update-expectations`` is able to generate conditional values. Because
    282 the relevant variables depend on the range of configurations that need
    283 to be covered, it's necessary to specify the list of configuration
    284 variables that should be used. This is done using a ``json`` format
    285 file that can be specified with the ``--properties-file`` command line
    286 argument to ``wpt update-expectations``. When this isn't supplied the
    287 defaults from ``<metadata root>/update_properties.json`` are used, if
    288 present.
    289 
    290 Properties File Format
    291 ++++++++++++++++++++++
    292 
    293 The file is JSON formatted with two top-level keys:
    294 
    295 :``properties``:
    296  A list of property names to consider for conditionals
    297  e.g ``["product", "os"]``.
    298 
    299 :``dependents``:
    300  An optional dictionary containing properties that
    301  should only be used as "tie-breakers" when differentiating based on a
    302  specific top-level property has failed. This is useful when the
    303  dependent property is always more specific than the top-level
    304  property, but less understandable when used directly. For example the
    305  ``version`` property covering different OS versions is typically
    306  unique amongst different operating systems, but using it when the
    307  ``os`` property would do instead is likely to produce metadata that's
    308  too specific to the current configuration and more difficult to
    309  read. But where there are multiple versions of the same operating
    310  system with different results, it can be necessary. So specifying
    311  ``{"os": ["version"]}`` as a dependent property means that the
    312  ``version`` property will only be used if the condition already
    313  contains the ``os`` property and further conditions are required to
    314  separate the observed results.
    315 
    316 So an example ``update-properties.json`` file might look like::
    317 
    318  {
    319    "properties": ["product", "os"],
    320    "dependents": {"product": ["browser_channel"], "os": ["version"]}
    321  }
    322 
    323 Examples
    324 ~~~~~~~~
    325 
    326 Update all the expectations from a set of cross-platform test runs::
    327 
    328  wpt update-expectations --full osx.log linux.log windows.log
    329 
    330 Add expectation data for some new tests that are expected to be
    331 platform-independent::
    332 
    333  wpt update-expectations tests.log
    334 
    335 Why a Custom Format?
    336 --------------------
    337 
    338 Introduction
    339 ------------
    340 
    341 Given the use of the metadata files in CI systems, it was desirable to
    342 have something with the following properties:
    343 
    344 * Human readable
    345 
    346 * Human editable
    347 
    348 * Machine readable / writable
    349 
    350 * Capable of storing key-value pairs
    351 
    352 * Suitable for storing in a version control system (i.e. text-based)
    353 
    354 The need for different results per platform means either having
    355 multiple expectation files for each platform, or having a way to
    356 express conditional values within a certain file. The former would be
    357 rather cumbersome for humans updating the expectation files, so the
    358 latter approach has been adopted, leading to the requirement:
    359 
    360 * Capable of storing result values that are conditional on the platform.
    361 
    362 There are few extant formats that clearly meet these requirements. In
    363 particular although conditional properties could be expressed in many
    364 existing formats, the representation would likely be cumbersome and
    365 error-prone for hand authoring. Therefore it was decided that a custom
    366 format offered the best tradeoffs given the requirements.