expectation.rst (14258B)
1 Test Metadata 2 ============= 3 4 Directory Layout 5 ---------------- 6 7 Metadata files must be stored under the ``metadata`` directory passed 8 to the test runner. The directory layout follows that of 9 web-platform-tests with each test source path having a corresponding 10 metadata file. Because the metadata path is based on the source file 11 path, files that generate multiple URLs e.g. tests with multiple 12 variants, or multi-global tests generated from an ``any.js`` input 13 file, share the same metadata file for all their corresponding 14 tests. The metadata path under the ``metadata`` directory is the same 15 as the source path under the ``tests`` directory, with an additional 16 ``.ini`` suffix. 17 18 For example a test with URL:: 19 20 /spec/section/file.html?query=param 21 22 generated from a source file with path:: 23 24 <tests root>/spec/section.file.html 25 26 would have a metadata file :: 27 28 <metadata root>/spec/section/file.html.ini 29 30 As an optimisation, files which produce only default results 31 (i.e. ``PASS`` or ``OK``), and which don't have any other associated 32 metadata, don't require a corresponding metadata file. 33 34 Directory Metadata 35 ~~~~~~~~~~~~~~~~~~ 36 37 In addition to per-test metadata, default metadata can be applied to 38 all the tests in a given source location, using a ``__dir__.ini`` 39 metadata file. For example to apply metadata to all tests under 40 ``<tests root>/spec/`` add the metadata in ``<tests 41 root>/spec/__dir__.ini``. 42 43 Metadata Format 44 --------------- 45 The format of the metadata files is based on the ini format. Files are 46 divided into sections, each (apart from the root section) having a 47 heading enclosed in square braces. Within each section are key-value 48 pairs. There are several notable differences from standard .ini files, 49 however: 50 51 * Sections may be hierarchically nested, with significant whitespace 52 indicating nesting depth. 53 54 * Only ``:`` is valid as a key/value separator 55 56 A simple example of a metadata file is:: 57 58 root_key: root_value 59 60 [section] 61 section_key: section_value 62 63 [subsection] 64 subsection_key: subsection_value 65 66 [another_section] 67 another_key: [list, value] 68 69 Conditional Values 70 ~~~~~~~~~~~~~~~~~~ 71 72 In order to support values that depend on some external data, the 73 right hand side of a key/value pair can take a set of conditionals 74 rather than a plain value. These values are placed on a new line 75 following the key, with significant indentation. Conditional values 76 are prefixed with ``if`` and terminated with a colon, for example:: 77 78 key: 79 if cond1: value1 80 if cond2: value2 81 value3 82 83 In this example, the value associated with ``key`` is determined by 84 first evaluating ``cond1`` against external data. If that is true, 85 ``key`` is assigned the value ``value1``, otherwise ``cond2`` is 86 evaluated in the same way. If both ``cond1`` and ``cond2`` are false, 87 the unconditional ``value3`` is used. 88 89 Conditions themselves use a Python-like expression syntax. Operands 90 can either be variables, corresponding to data passed in, numbers 91 (integer or floating point; exponential notation is not supported) or 92 quote-delimited strings. Equality is tested using ``==`` and 93 inequality by ``!=``. The operators ``and``, ``or`` and ``not`` are 94 used in the expected way. Parentheses can also be used for 95 grouping. For example:: 96 97 key: 98 if (a == 2 or a == 3) and b == "abc": value1 99 if a == 1 or b != "abc": value2 100 value3 101 102 Here ``a`` and ``b`` are variables, the value of which will be 103 supplied when the metadata is used. 104 105 Web-Platform-Tests Metadata 106 --------------------------- 107 108 When used for expectation data, metadata files have the following format: 109 110 * A section per test URL provided by the corresponding source file, 111 with the section heading being the part of the test URL following 112 the last ``/`` in the path (this allows multiple tests in a single 113 metadata file with the same path part of the URL, but different 114 query parts). This may be omitted if there's no non-default 115 metadata for the test. 116 117 * A subsection per subtest, with the heading being the title of the 118 subtest. This may be omitted if there's no non-default metadata for 119 the subtest. 120 121 * The following known keys: 122 123 :expected: 124 The expectation value or values of each (sub)test. In 125 the case this value is a list, the first value represents the 126 typical expected test outcome, and subsequent values indicate 127 known intermittent outcomes e.g. ``expected: [PASS, ERROR]`` 128 would indicate a test that usually passes but has a known-flaky 129 ``ERROR`` outcome. 130 131 :disabled: 132 Any values apart from the special value ``@False`` 133 indicates that the (sub)test is disabled and should either not be 134 run (for tests) or that its results should be ignored (subtests). 135 136 :restart-after: 137 Any value apart from the special value ``@False`` 138 indicates that the runner should restart the browser after running 139 this test (e.g. to clear out unwanted state). 140 141 :fuzzy: 142 Used for reftests. This is interpreted as a list containing 143 entries like ``<meta name=fuzzy>`` content value, which consists of 144 an optional reference identifier followed by a colon, then a range 145 indicating the maximum permitted pixel difference per channel, then 146 semicolon, then a range indicating the maximum permitted total 147 number of differing pixels. The reference identifier is either a 148 single relative URL, resolved against the base test URL, in which 149 case the fuzziness applies to any comparison with that URL, or 150 takes the form lhs URL, comparison, rhs URL, in which case the 151 fuzziness only applies for any comparison involving that specific 152 pair of URLs. Some illustrative examples are given below. 153 154 :implementation-status: 155 One of the values ``implementing``, 156 ``not-implementing`` or ``backlog``. This is used in conjunction 157 with the ``--skip-implementation-status`` command line argument to 158 ``wptrunner`` to ignore certain features where running the test is 159 low value. 160 161 :tags: 162 A list of labels associated with a given test that can be 163 used in conjunction with the ``--tag`` command line argument to 164 ``wptrunner`` for test selection. 165 166 In addition there are extra arguments which are currently tied to 167 specific implementations. For example Gecko-based browsers support 168 ``min-asserts``, ``max-asserts``, ``prefs``, ``lsan-disabled``, 169 ``lsan-allowed``, ``lsan-max-stack-depth``, ``leak-allowed``, and 170 ``leak-threshold`` properties. 171 172 * Variables taken from the ``RunInfo`` data which describe the 173 configuration of the test run. Common properties include: 174 175 :product: A string giving the name of the browser under test 176 :browser_channel: A string giving the release channel of the browser under test 177 :debug: A Boolean indicating whether the build is a debug build 178 :os: A string the operating system 179 :version: A string indicating the particular version of that operating system 180 :processor: A string indicating the processor architecture. 181 182 This information is typically provided by :py:mod:`mozinfo`, but 183 different environments may add additional information, and not all 184 the properties above are guaranteed to be present in all 185 environments. The definitive list of available properties for a 186 specific run may be determined by looking at the ``run_info`` key 187 in the ``wptreport.json`` output for the run. 188 189 * Top level keys are taken as defaults for the whole file. So, for 190 example, a top level key with ``expected: FAIL`` would indicate 191 that all tests and subtests in the file are expected to fail, 192 unless they have an ``expected`` key of their own. 193 194 An simple example metadata file might look like:: 195 196 [test.html?variant=basic] 197 type: testharness 198 199 [Test something unsupported] 200 expected: FAIL 201 202 [Test with intermittent statuses] 203 expected: [PASS, TIMEOUT] 204 205 [test.html?variant=broken] 206 expected: ERROR 207 208 [test.html?variant=unstable] 209 disabled: http://test.bugs.example.org/bugs/12345 210 211 A more complex metadata file with conditional properties might be:: 212 213 [canvas_test.html] 214 expected: 215 if os == "mac": FAIL 216 if os == "windows" and version == "XP": FAIL 217 PASS 218 219 Note that ``PASS`` in the above works, but is unnecessary since it's 220 the default expected result. 221 222 A metadata file with fuzzy reftest values might be:: 223 224 [reftest.html] 225 fuzzy: [10;200, ref1.html:20;200-300, subtest1.html==ref2.html:10-15;20] 226 227 In this case the default fuzziness for any comparison would be to 228 require a maximum difference per channel of less than or equal to 10 229 and less than or equal to 200 total pixels different. For any 230 comparison involving ref1.html on the right hand side, the limits 231 would instead be a difference per channel not more than 20 and a total 232 difference count of not less than 200 and not more than 300. For the 233 specific comparison ``subtest1.html == ref2.html`` (both resolved against 234 the test URL) these limits would instead be 10 to 15 and 0 to 20, 235 respectively. 236 237 Generating Expectation Files 238 ---------------------------- 239 240 wpt provides the tool ``wpt update-expectations`` command to generate 241 expectation files from the results of a set of test runs. The basic 242 syntax for this is:: 243 244 ./wpt update-expectations [options] [logfile]... 245 246 Each ``logfile`` is a wptreport log file from a previous run. These 247 can be generated from wptrunner using the ``--log-wptreport`` option 248 e.g. ``--log-wptreport=wptreport.json``. 249 250 ``update-expectations`` takes several options: 251 252 --full Overwrite all the expectation data for any tests that have a 253 result in the passed log files, not just data for the same run 254 configuration. 255 256 --disable-intermittent When updating test results, disable tests that 257 have inconsistent results across many 258 runs. This can precede a message providing a 259 reason why that test is disable. If no message 260 is provided, ``unstable`` is the default text. 261 262 --update-intermittent When this option is used, the ``expected`` key 263 stores expected intermittent statuses in 264 addition to the primary expected status. If 265 there is more than one status, it appears as a 266 list. The default behaviour of this option is to 267 retain any existing intermittent statuses in the 268 list unless ``--remove-intermittent`` is 269 specified. 270 271 --remove-intermittent This option is used in conjunction with 272 ``--update-intermittent``. When the 273 ``expected`` statuses are updated, any obsolete 274 intermittent statuses that did not occur in the 275 specified log files are removed from the list. 276 277 Property Configuration 278 ~~~~~~~~~~~~~~~~~~~~~~ 279 280 In cases where the expectation depends on the run configuration ``wpt 281 update-expectations`` is able to generate conditional values. Because 282 the relevant variables depend on the range of configurations that need 283 to be covered, it's necessary to specify the list of configuration 284 variables that should be used. This is done using a ``json`` format 285 file that can be specified with the ``--properties-file`` command line 286 argument to ``wpt update-expectations``. When this isn't supplied the 287 defaults from ``<metadata root>/update_properties.json`` are used, if 288 present. 289 290 Properties File Format 291 ++++++++++++++++++++++ 292 293 The file is JSON formatted with two top-level keys: 294 295 :``properties``: 296 A list of property names to consider for conditionals 297 e.g ``["product", "os"]``. 298 299 :``dependents``: 300 An optional dictionary containing properties that 301 should only be used as "tie-breakers" when differentiating based on a 302 specific top-level property has failed. This is useful when the 303 dependent property is always more specific than the top-level 304 property, but less understandable when used directly. For example the 305 ``version`` property covering different OS versions is typically 306 unique amongst different operating systems, but using it when the 307 ``os`` property would do instead is likely to produce metadata that's 308 too specific to the current configuration and more difficult to 309 read. But where there are multiple versions of the same operating 310 system with different results, it can be necessary. So specifying 311 ``{"os": ["version"]}`` as a dependent property means that the 312 ``version`` property will only be used if the condition already 313 contains the ``os`` property and further conditions are required to 314 separate the observed results. 315 316 So an example ``update-properties.json`` file might look like:: 317 318 { 319 "properties": ["product", "os"], 320 "dependents": {"product": ["browser_channel"], "os": ["version"]} 321 } 322 323 Examples 324 ~~~~~~~~ 325 326 Update all the expectations from a set of cross-platform test runs:: 327 328 wpt update-expectations --full osx.log linux.log windows.log 329 330 Add expectation data for some new tests that are expected to be 331 platform-independent:: 332 333 wpt update-expectations tests.log 334 335 Why a Custom Format? 336 -------------------- 337 338 Introduction 339 ------------ 340 341 Given the use of the metadata files in CI systems, it was desirable to 342 have something with the following properties: 343 344 * Human readable 345 346 * Human editable 347 348 * Machine readable / writable 349 350 * Capable of storing key-value pairs 351 352 * Suitable for storing in a version control system (i.e. text-based) 353 354 The need for different results per platform means either having 355 multiple expectation files for each platform, or having a way to 356 express conditional values within a certain file. The former would be 357 rather cumbersome for humans updating the expectation files, so the 358 latter approach has been adopted, leading to the requirement: 359 360 * Capable of storing result values that are conditional on the platform. 361 362 There are few extant formats that clearly meet these requirements. In 363 particular although conditional properties could be expressed in many 364 existing formats, the representation would likely be cumbersome and 365 error-prone for hand authoring. Therefore it was decided that a custom 366 format offered the best tradeoffs given the requirements.