tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

locale.rst (23295B)


      1 .. role:: js(code)
      2   :language: javascript
      3 
      4 =================
      5 Locale management
      6 =================
      7 
      8 A locale is a combination of language, region, script, and regional preferences the
      9 user wants to format their data into.
     10 
     11 There are multiple models of locale data structures in the industry that have varying degrees
     12 of compatibility between each other. Historically, each major platform has used their own,
     13 and many standard bodies provided conflicting proposals.
     14 
     15 Mozilla, alongside with most modern platforms, follows Unicode and W3C recommendation
     16 and conforms to a standard known as `BCP 47`_ which describes a low level textual
     17 representation of a locale known as `language tag`.
     18 
     19 A few examples of language tags: *en-US*, *de*, *ar*, *zh-Hans*, *es-CL*.
     20 
     21 Locales and Language Tags
     22 =========================
     23 
     24 Locale data structure consists of four primary fields.
     25 
     26 - Language (Example: English - *en*, French - *fr*, Serbian - *sr*)
     27 - Script (Example: Latin - *Latn*, Cyrylic - *Cyrl*)
     28 - Region (Example: United States - *US*, Canada - *CA*, Russia - *RU*)
     29 - Variants (Example: Mac OS - *macos*, Windows - *windows*, Linux - *linux*)
     30 
     31 `BCP 47`_ specifies the syntax for each of those fields (called subtags) when
     32 represented as a string. The syntax defines the allowed selection of characters,
     33 their capitalization, and the order in which the fields should be defined.
     34 
     35 Most of the base subtags are valid ISO codes, such as `ISO 639`_ for
     36 language subtag, or `ISO 3166-1`_ for region.
     37 
     38 The examples above present language tags with several fields omitted, which is allowed
     39 by the standard.
     40 
     41 On top of that, a locale may contain:
     42 
     43 - extensions and private fields
     44     These fields can be used to carry additional information about a locale.
     45     Mozilla currently has partial support for them in the JS implementation and plans to
     46     extend support to all APIs.
     47 - extkeys and "grandfathered" tags (unfortunate language, but part of the spec)
     48     Mozilla does not support these yet.
     49 
     50 
     51 An example locale can be visualized as:
     52 
     53 .. code-block:: javascript
     54 
     55  {
     56      "language": "sr",
     57      "script": "Cyrl",
     58      "region": "RU",
     59      "variants": [],
     60      "extensions": {},
     61      "privateuse": [],
     62  }
     63 
     64 which can be then serialized into a string: **"sr-Cyrl-RU"**.
     65 
     66 .. important::
     67 
     68  Since locales are often stored and passed around the codebase as
     69  language tag strings, it is important to always use an appropriate
     70  API to parse, manipulate and serialize them.
     71  Avoid `Do-It-Yourself` solutions which leave your code fragile and may
     72  break on unexpected language tag structures.
     73 
     74 Locale Fallback Chains
     75 ======================
     76 
     77 Locale sensitive operations are always considered "best-effort". That means that it
     78 cannot be assumed that a perfect match will exist between what the user requested and what
     79 the API can provide.
     80 
     81 As a result, the best practice is to *always* operate on locale fallback chains -
     82 ordered lists of locales according to the user preference.
     83 
     84 An example of a locale fallback chain may be: :js:`["es-CL", "es-ES", "es", "fr", "en"]`.
     85 
     86 The above means a request to format the data according to the Chilean Spanish if possible,
     87 fall back to Spanish Spanish, then any (generic) Spanish, French and eventually to
     88 English.
     89 
     90 .. important::
     91 
     92  It is *always* better to use a locale fallback chain over a single locale.
     93  In case there's only one locale available, a list with one element will work
     94  while allowing for future extensions without a costly refactor.
     95 
     96 Language Negotiation
     97 ====================
     98 
     99 Due to the imperfections in data matching, all operations on locales should always
    100 use a language negotiation algorithm to resolve the best available set of locales,
    101 based on the list of all available locales and an ordered list of requested locales.
    102 
    103 Such algorithms may vary in sophistication and number of strategies. Mozilla's
    104 solution is based on modified logic from `RFC 5656`_.
    105 
    106 The three lists of locales used in negotiation:
    107 
    108 - **Available** - locales that are locally installed
    109 - **Requested** - locales that the user selected in decreasing order of preference
    110 - **Resolved** - result of the negotiation
    111 
    112 The result of a negotiation is an ordered list of locales that are available to
    113 the system, and the consumer is expected to attempt using the locales in the
    114 resolved order.
    115 
    116 Negotiation should be used in all scenarios like selecting language resources,
    117 calendar, number formatting, etc.
    118 
    119 Single Locale Matching
    120 ----------------------
    121 
    122 Every negotiation strategy goes through a list of steps in an attempt to find the
    123 best possible match between locales.
    124 
    125 The exact algorithm is custom, and consists of a 6 level strategy:
    126 
    127 ::
    128 
    129  1) Attempt to find an exact match for each requested locale in available
    130     locales.
    131     Example: ['en-US'] * ['en-US'] = ['en-US']
    132 
    133  2) Attempt to match a requested locale to an available locale treated
    134     as a locale range.
    135     Example: ['en-US'] * ['en'] = ['en']
    136                            ^^
    137                            |-- becomes 'en-*-*-*'
    138 
    139  3) Attempt to use the maximized version of the requested locale, to
    140     find the best match in available locales.
    141     Example: ['en'] * ['en-GB', 'en-US'] = ['en-US']
    142                ^^
    143                |-- ICU likelySubtags expands it to 'en-Latn-US'
    144 
    145  4) Attempt to look for a different variant of the same locale.
    146     Example: ['ja-JP-win'] * ['ja-JP-mac'] = ['ja-JP-mac']
    147                ^^^^^^^^^
    148                |----------- replace variant with range: 'ja-JP-*'
    149 
    150  5) Attempt to look for a maximized version of the requested locale,
    151     stripped of the region code.
    152     Example: ['en-CA'] * ['en-ZA', 'en-US'] = ['en-US', 'en-ZA']
    153                ^^^^^
    154                |----------- look for likelySubtag of 'en': 'en-Latn-US'
    155 
    156  6) Attempt to look for a different region of the same locale.
    157     Example: ['en-GB'] * ['en-AU'] = ['en-AU']
    158                ^^^^^
    159                |----- replace region with range: 'en-*'
    160 
    161 Filtering / Matching / Lookup
    162 -----------------------------
    163 
    164 When negotiating between lists of locales, Mozilla's :js:`LocaleService` API
    165 offers three language negotiation strategies:
    166 
    167 Filtering
    168 ^^^^^^^^^
    169 
    170 This is the most common scenario, where there is an advantage in creating a
    171 maximal possible list of locales that the user may benefit from.
    172 
    173 An example of a scenario:
    174 
    175 .. code-block:: javascript
    176 
    177    let requested = ["fr-CA", "en-US"];
    178    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-CH"];
    179 
    180    let result = Services.locale.negotiateLanguages(requested, available);
    181 
    182    result == ["fr-CA", "fr", "fr-CH", "en-GB", "en-ZA"];
    183 
    184 In the example above the algorithm was able to match *"fr-CA"* as a perfect match,
    185 but then was able to find other matches as well - a generic French is a very
    186 good match, and Swiss French is also very close to the top requested language.
    187 
    188 In case of the second of the requested locales, unfortunately American English
    189 is not available, but British English and South African English are.
    190 
    191 The algorithm is greedy and attempts to match as many locales
    192 as possible. This is usually what the developer wants.
    193 
    194 Matching
    195 ^^^^^^^^
    196 
    197 In less common scenarios the code needs to match a single, best available locale for
    198 each of the requested locales.
    199 
    200 An example of this scenario:
    201 
    202 .. code-block:: javascript
    203 
    204    let requested = ["fr-CA", "en-US"];
    205    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];
    206 
    207    let result = Services.locale.negotiateLanguages(
    208      requested,
    209      available,
    210      undefined,
    211      Services.locale.langNegStrategyMatching);
    212 
    213    result == ["fr-CA", "en-GB"];
    214 
    215 The best available locales for *"fr-CA"* is a perfect match, and for *"en-US"*, the
    216 algorithm selected British English.
    217 
    218 Lookup
    219 ^^^^^^
    220 
    221 The third strategy should be used in cases where no matter what, only one locale
    222 can be ever used. Some third-party APIs don't support fallback and it doesn't make
    223 sense to continue resolving after finding the first locale.
    224 
    225 It is still advised to continue using this API as a fallback chain list, just in
    226 this case with a single element.
    227 
    228 .. code-block:: javascript
    229 
    230    let requested = ["fr-CA", "en-US"];
    231    let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];
    232 
    233    let result = Services.locale.negotiateLanguages(
    234      requested,
    235      available,
    236      Services.locale.defaultLocale,
    237      Services.locale.langNegStrategyLookup);
    238 
    239    result == ["fr-CA"];
    240 
    241 Default Locale
    242 --------------
    243 
    244 Besides *Available*, *Requested* and *Resolved* locale lists, there's also a concept
    245 of *DefaultLocale*, which is a single locale out of the list of available ones that
    246 should be used in case there is no match to be found between available and
    247 requested locales.
    248 
    249 Every Firefox is built with a single default locale - for example
    250 **Firefox zh-CN** has *DefaultLocale* set to *zh-CN* since this locale is guaranteed
    251 to be packaged in, have all the resources, and should be used if the negotiation fails
    252 to return any matches.
    253 
    254 .. code-block:: javascript
    255 
    256    let requested = ["fr-CA", "en-US"];
    257    let available = ["it", "de", "zh-CN", "pl", "sr-RU"];
    258    let defaultLocale = "zh-CN";
    259 
    260    let result = Services.locale.negotiateLanguages(requested, available, defaultLocale);
    261 
    262    result == ["zh-CN"];
    263 
    264 Chained Language Negotiation
    265 ----------------------------
    266 
    267 In some cases the user may want to link a language selection to another component.
    268 
    269 For example, a Firefox extension may come with its own list of available locales, which
    270 may have locales that Firefox doesn't.
    271 
    272 In that case, negotiation between user requested locales and the add-on's list may result
    273 in a selection of locales superseding that of Firefox itself.
    274 
    275 
    276 .. code-block:: none
    277 
    278         Fx Available
    279        +-------------+
    280        |  it, fr, ar |
    281        +-------------+                 Fx Locales
    282                      |                +--------+
    283                      +--------------> | fr, ar |
    284                      |                +--------+
    285            Requested |
    286     +----------------+
    287     | es, fr, pl, ar |
    288     +----------------+                 Add-on Locales
    289                      |                +------------+
    290                      +--------------> | es, fr, ar |
    291      Add-on Available |               +------------+
    292    +-----------------+
    293    |  de, es, fr, ar |
    294    +-----------------+
    295 
    296 
    297 In that case, an add-on may end up being displayed in Spanish, while Firefox UI will
    298 use French. In most cases this results in a bad UX.
    299 
    300 In order to avoid that, one can chain the add-on negotiation and take Firefox's resolved
    301 locales as a `requested`, and negotiate that against the add-ons' `available` list.
    302 
    303 .. code-block:: none
    304 
    305        Fx Available
    306       +-------------+
    307       |  it, ar, fr |
    308       +-------------+                Fx Locales (as Add-on Requested)
    309                     |                +--------+
    310                     +--------------> | fr, ar |
    311                     |                +--------+
    312           Requested |                         |                Add-on Locales
    313    +----------------+                         |                +--------+
    314    | es, fr, pl, ar |                         +------------->  | fr, ar |
    315    +----------------+                         |                +--------+
    316                                               |
    317                              Add-on Available |
    318                             +-----------------+
    319                             |  de, es, ar, fr |
    320                             +-----------------+
    321 
    322 Available Locales
    323 =================
    324 
    325 In Gecko, available locales come from the `Packaged Locales` and the installed
    326 `language packs`. Language packs are a variant of WebExtensions providing just
    327 localized resources for one or more languages.
    328 
    329 The primary notion of which locales are available is based on which locales Gecko has
    330 UI localization resources for, and other datasets such as internationalization may
    331 carry different lists of available locales.
    332 
    333 Requested Locales
    334 =================
    335 
    336 The list of requested locales can be read and set using :js:`LocaleService::requestedLocales` API.
    337 
    338 Using the API will perform necessary sanity checks and canonicalize the values.
    339 
    340 After the sanitization, the value will be stored in a pref :js:`intl.locale.requested`.
    341 The pref usually will store a comma separated list of valid BCP47 locale
    342 codes, but it can also have two special meanings:
    343 
    344 - If the pref is not set at all, Gecko will use the default locale as the requested one.
    345 - If the pref is set to an empty string, Gecko will look into OS app locales as the requested.
    346 
    347 The former is the current default setting for Firefox Desktop, and the latter is the
    348 default setting for Firefox for Android.
    349 
    350 If the developer wants to programmatically request the app to follow OS locales,
    351 they can assign :js:`null` to :js:`requestedLocales`.
    352 
    353 Regional Preferences
    354 ====================
    355 
    356 Every locale comes with a set of default preferences that are specific to a culture
    357 and region. This contains preferences such as calendar system, way to display
    358 time (24h vs 12h clock), which day the week starts on, which days constitute a weekend,
    359 what numbering system and date time formatting a given locale uses
    360 (for example "MM/DD" in en-US vs "DD/MM" in en-AU).
    361 
    362 For all such preferences Gecko has a list of default settings for every region,
    363 but there's also a degree of customization every user may want to make.
    364 
    365 All major operating systems have a Settings UI for selecting those preferences,
    366 and since Firefox does not provide its own, Gecko looks into the OS for them.
    367 
    368 A special API :js:`mozilla::intl::OSPreferences` handles communication with the
    369 host operating system, retrieving regional preferences and altering
    370 internationalization formatting with user preferences.
    371 
    372 One thing to notice is that the boundary between regional preferences and language
    373 selection is not strong. In many cases the internationalization formats
    374 will contain language specific terms and literals. For example a date formatting
    375 pattern into Japanese may look like this - *"2018年3月24日"*, or the date format
    376 may contains names of months or weekdays to be translated
    377 ("April", "Tuesday" etc.).
    378 
    379 For that reason it is tricky to follow regional preferences in a scenario where Operating
    380 System locale selection does not match the Firefox UI locales.
    381 
    382 Such behavior might lead to a UI case like "Today is 24 października" in an English Firefox
    383 with Polish date formats.
    384 
    385 For that reason, by default, Gecko will *only* look into OS Preferences if the *language*
    386 portion of the locale of the OS and Firefox match.
    387 That means that if Windows is in "**en**-AU" and Firefox is in "**en**-US" Gecko will look
    388 into Windows Regional Preferences, but if Windows is in "**de**-CH" and Firefox
    389 is in "**fr**-FR" it won't.
    390 In order to force Gecko to look into OS preferences irrelevant of the language match,
    391 set the flag :js:`intl.regional_prefs.use_os_locales` to :js:`true`.
    392 
    393 UI Direction
    394 ------------
    395 
    396 Since the UI direction is so tightly coupled with the locale selection, the
    397 main method of testing the directionality of the Gecko app lives in LocaleService.
    398 
    399 :js:`LocaleService::IsAppLocaleRTL` returns a boolean indicating if the current
    400 direction of the app UI is right-to-left.
    401 
    402 Default and Last Fallback Locales
    403 =================================
    404 
    405 Every Gecko application is built with a single locale as the default one. Such locale
    406 is guaranteed to have all linguistic resources available, should be used
    407 as the default locale in case language negotiation cannot find any match, and also
    408 as the last locale to look for in a fallback chain.
    409 
    410 If all else fails, Gecko also support a notion of last fallback locale, which is
    411 currently hardcoded to *"en-US"*, and is the very final locale to try in case
    412 nothing else (including the default locale) works.
    413 Notice that Unicode and ICU use *"en-GB"* in that role because more English speaking
    414 people around the World recognize British regional preferences than American (metric vs.
    415 imperial, Fahrenheit vs Celsius etc.).
    416 Mozilla may switch to *"en-GB"* in the future.
    417 
    418 Packaged Locales
    419 ================
    420 
    421 When the Gecko application is being packaged it bundles a selection of locale resources
    422 to be available within it. At the moment, for example, most Firefox for Android
    423 builds come with almost 100 locales packaged into it, while Desktop Firefox comes
    424 with usually just one packaged locale.
    425 
    426 There is currently work being done on enabling more flexibility in how
    427 the locales are packaged to allow for bundling applications with different
    428 sets of locales in different areas - dictionaries, hyphenations, product language resources,
    429 installer language resources, etc.
    430 
    431 Web Exposed Locales
    432 ====================
    433 
    434 For anti-tracking or some other reasons, we tend to expose spoofed locale to web content instead
    435 of default locales. This can be done by setting the pref :js:`intl.locale.privacy.web_exposed`.
    436 The pref is a comma separated list of locale, and empty string implies default locales.
    437 
    438 The pref has no function while :js:`privacy.spoof_english` is set to 2, where *"en-US"* will always
    439 be returned.
    440 
    441 Multi-Process
    442 =============
    443 
    444 Locale management can operate in a client/server model. This allows a Gecko process
    445 to manage locales (server mode) or just receive the locale selection from a parent
    446 process (client mode).
    447 
    448 The client mode is currently used by all child processes of Desktop Firefox, and
    449 may be used by, for example, GeckoView to follow locale selection from a parent
    450 process.
    451 
    452 To check the mode the process is operating in, the :js:`LocaleService::IsServer` method is available.
    453 
    454 Note that :js:`L10nRegistry.registerSources`, :js:`L10nRegistry.updateSources`, and
    455 :js:`L10nRegistry.removeSources` each trigger an IPC synchronization between the parent
    456 process and any extant content processes, which is expensive. If you need to change the
    457 registration of multiple sources, the best way to do so is to coalesce multiple requests
    458 into a single array and then call the method once.
    459 
    460 Mozilla Exceptions
    461 ==================
    462 
    463 There's currently only a single exception of the BCP47 used, and that's
    464 a legacy "ja-JP-mac" locale. The "mac" is a variant and BCP47 requires all variants
    465 to be 5-8 character long.
    466 
    467 Gecko supports the limitation by accepting the 3-letter variants in our APIs and also
    468 provides a special :js:`appLocalesAsLangTags` method which returns this locale in that form.
    469 (:js:`appLocalesAsBCP47` will canonicalize it and turn into `"ja-JP-macos"`).
    470 
    471 Usage of language negotiation etc. shouldn't rely on this behavior.
    472 
    473 Events
    474 ======
    475 
    476 :js:`LocaleService` emits two events: :js:`intl:app-locales-changed` and
    477 :js:`intl:requested-locales-changed` which all code can listen to.
    478 
    479 Those events may be broadcasted in response to new language packs being installed, or
    480 uninstalled, or user selection of languages changing.
    481 
    482 In most cases, the code should observe the :js:`intl:app-locales-changed`
    483 and react to only that event since this is the one indicating a change
    484 in the currently used language settings that the components should follow.
    485 
    486 Testing
    487 =======
    488 
    489 Many components may have logic encoded to react to changes in requested, available
    490 or resolved locales.
    491 
    492 In order to test the component's behavior, it is important to replicate
    493 the environment in which such change may happen.
    494 
    495 Since in most cases it is advised for a component to tie its
    496 language negotiation to the main application (see `Chained Language Negotiation`),
    497 it is not enough to add a new locale to trigger the language change.
    498 
    499 First, it is necessary to add a new locale to the available ones, then change
    500 the requested, and only that will result in a new negotiation and language
    501 change happening.
    502 
    503 There are two primary ways to add a locale to available ones.
    504 
    505 Testing Localization
    506 --------------------
    507 
    508 If the goal is to test that the correct localization ends up in the correct place,
    509 the developer needs to register a new :js:`L10nFileSource` in :js:`L10nRegistry` and
    510 provide a mock cached data to be returned by the API.
    511 
    512 It may look like this:
    513 
    514 .. code-block:: javascript
    515 
    516    let source = L10nFileSource.createMock(
    517      "mock-source", "app",
    518      ["ko-KR", "ar"],
    519      "resource://mock-addon/localization/{locale}",
    520      [
    521        {
    522          path: "resource://mock-addon/localization/ko-KR/test.ftl",
    523          source: "key = Value in Korean"
    524        },
    525        {
    526          path: "resource://mock-addon/localization/ar/test.ftl",
    527          source: "key = Value in Arabic"
    528        }
    529      ]
    530    );
    531 
    532    L10nRegistry.getInstance().registerSources([source]);
    533 
    534    let availableLocales = Services.locale.availableLocales;
    535 
    536    assert(availableLocales.includes("ko-KR"));
    537    assert(availableLocales.includes("ar"));
    538 
    539    Services.locale.requestedLocales = ["ko-KR"];
    540 
    541    let appLocales = Services.locale.appLocalesAsBCP47;
    542    assert(appLocales[0], "ko-KR");
    543 
    544 From here, a resource :js:`test.ftl` can be added to a `Localization` and for ID :js:`key`
    545 the correct value from the mocked cache will be returned.
    546 
    547 Testing Locale Switching
    548 ------------------------
    549 
    550 The second method is much more limited, as it only mocks the locale availability,
    551 but it is also simpler:
    552 
    553 .. code-block:: javascript
    554 
    555    Services.locale.availableLocales = ["ko-KR", "ar"];
    556    Services.locale.requestedLocales = ["ko-KR"];
    557 
    558    let appLocales = Services.locale.appLocalesAsBCP47;
    559    assert(appLocales[0], "ko-KR");
    560 
    561 In the future, Mozilla plans to add a third way for add-ons (`bug 1440969`_)
    562 to allow for either manual or automated testing purposes disconnecting its locales
    563 from the main application ones.
    564 
    565 Testing the outcome
    566 -------------------
    567 
    568 Except of testing for reaction to locale changes, it is advised to avoid writing
    569 tests that expect a certain locale to be selected, or certain internationalization
    570 or localization data to be used.
    571 
    572 Doing so locks down the test infrastructure to be only usable when launched in
    573 a single locale environment and requires those tests to be updated whenever the underlying
    574 data changes.
    575 
    576 In the case of testing locale selection it is best to use a fake locale like :js:`x-test`, that
    577 will not be present at the beginning of the test.
    578 
    579 In the case of testing for internationalization data it is best to use :js:`resolvedOptions()`,
    580 to verify the right data is being used, rather than comparing the output string.
    581 
    582 In the case of localization, it is best to test against the correct :js:`data-l10n-id`
    583 being set or, in edge cases, verify that a given variable is present in the string using
    584 :js:`String.prototype.includes`.
    585 
    586 Deep Dive
    587 =========
    588 
    589 Below is a list of articles with additional
    590 details on selected subjects:
    591 
    592 .. toctree::
    593   :maxdepth: 1
    594 
    595   locale_env
    596   locale_startup
    597 
    598 Feedback
    599 ========
    600 
    601 In case of questions, please consult Intl module peers.
    602 
    603 
    604 .. _RFC 5656: https://tools.ietf.org/html/rfc5656
    605 .. _BCP 47: https://tools.ietf.org/html/bcp47#section-2.1
    606 .. _ISO 639: http://www.loc.gov/standards/iso639-2/php/code_list.php
    607 .. _ISO 3166-1: https://www.iso.org/iso-3166-country-codes.html
    608 .. _Intl.Locale: https://bugzilla.mozilla.org/show_bug.cgi?id=1433303
    609 .. _fluent-locale: https://docs.rs/fluent-locale/
    610 .. _bug 1440969: https://bugzilla.mozilla.org/show_bug.cgi?id=1440969