index.rst (9401B)
1 ====================================== 2 Managing the built-in en-US dictionary 3 ====================================== 4 5 The en-US build of Firefox includes a built-in Hunspell dictionary based on the 6 `SCOWL`_ dataset. This document describes the process to add new words to the 7 dictionary, or update it to the current upstream version. 8 9 For more information about Hunspell or the affix file format, you can check 10 `the Ubuntu man page for hunspell 11 <https://manpages.ubuntu.com/manpages/bionic/man5/hunspell.5.html>`_. 12 13 Requesting to add new words to the en-US dictionary 14 =================================================== 15 16 If you’d like to add new words to the dictionary, you can add your request to 17 `this bug <https://bugzilla.mozilla.org/show_bug.cgi?id=enus-dictionary>`_: 18 19 * Include all possible forms, e.g. plural and genitive forms for nouns, 20 different tenses for verbs. 21 * Try to provide information on the terms you want to add, in particular 22 references to external sources that confirm the usage of the term (e.g. 23 Merriam-Webster or Oxford online dictionaries). 24 25 .. note:: 26 27 If you’re fixing the existing bug with pending requests, make sure to `file a 28 new bug`_ and move the alias ``enus-dictionary`` (in the *Details* section) 29 from the old bug to the new one. 30 31 Adding new words to the en-US dictionary 32 ======================================== 33 34 This section describes the process for adding new words to the dictionary: 35 36 #. Get a clone of mozilla-central (see :ref:`Firefox Contributors' Quick 37 Reference`), if you don’t already have one, and make sure you can build it 38 successfully. 39 #. Move in the dictionary sources directory using this command: 40 ``cd extensions/spellcheck/locales/en-US/hunspell/dictionary-sources``. 41 #. Identify the current version of SCOWL by checking the file 42 ``README_en_US.txt`` (at the beginning of the file there is a line similar to 43 ``Generated from SCOWL Version 2020.12.07``, where ``2020.12.07`` is the 44 SCOWL version). 45 #. Download the same version of the dictionary from the `SCOWL`_ homepage or 46 `SourceForce`_ as a tarball (tag.gz) and unpack it in the working directory. 47 Rename the resulting folder from ``scowl-YYYY.MM.DD`` to ``scowl``. 48 #. There’s a special script used for editing dictionaries. The script 49 only works if you have the environment variable ``EDITOR`` set to the 50 executable of an editor program; if you don’t have it set, you can use 51 ``EDITOR=vim sh edit-dictionary.sh`` to edit using ``vim`` (or you can 52 substitute it with another editor), or you can just type 53 ``sh edit-dictionary.sh`` if you have an ``EDITOR`` already specified. 54 55 Copy and paste the full list of words, then save and quit the editor. It’s 56 not necessary to put the words in alphabetical order, as it will be corrected 57 by the script. 58 59 Note: you might need to install ``aspell`` on your system (e.g. via 60 ``brew install aspell`` on macOS). 61 #. Run the script ``sh make-new-dict.sh`` to generate a new dictionary and make 62 sure it runs without errors. For more details on this script, see the 63 `make-new-dict.sh`_ section. 64 #. Do a sanity check on the resulting dictionary file ``en_US-mozilla.dic``. For 65 example, make sure that the size is about the same as the original dictionary 66 (or slightly larger). 67 #. If everything looks correct, use ``sh install-new-dict.sh`` to copy the 68 generated file in the right position. 69 #. Build Firefox and test your updated dictionary. Once you’re 70 satisfied, use the process described in :ref:`write_a_patch` to create a 71 patch. 72 73 Note that the update script will modify 2 versions of the dictionary, and both 74 need to be committed: 75 76 * ``en-US.dic``: the dictionary actually shipping in the build, it uses 77 ISO-8859-1 encoding. 78 * ``utf8/en-US.dic``: a version of the same dictionary with UTF-8 encoding. This 79 is used to work around issues with Phabricator, and it allows to display 80 actual changes in the diff. 81 82 Exclude words from suggestions 83 ============================== 84 85 It’s possible to completely exclude words from suggested alternatives by adding 86 an affix rule ``!`` at the end of the definition in the ``.dic`` file. For 87 example: 88 89 * ``bum`` would be changed to ``bum/!`` (note the additional forward slash). 90 * ``bum/MS`` would be changed to ``bum/MS!``. 91 92 In order to exclude a word from suggestions, follow the instructions available 93 in `Adding new words to the en-US dictionary`_. Instead of running the 94 ``edit-dictionary.sh`` script (point 5), use a text editor to edit the file 95 ``en-US.dic`` directly, then proceed with the remaining instructions. 96 97 .. warning:: 98 99 Make sure to open ``en-US.dic`` with the correct encoding. For example, Visual 100 Studio Code will try to open it as ``UTF-8``, and it needs to be reopened with 101 encoding ``Western (ISO 8859-1)``. 102 103 Upgrading dictionary to a new upstream version of SCOWL 104 ======================================================= 105 106 The English dictionary available in mozilla-central is based on the 107 `SCOWL`_ dictionary. Some scripts distributed with the SCOWL package are 108 used to generate the files for the en-US dictionary. 109 110 The working directory for this process is 111 ``extensions/spellcheck/locales/en-US/hunspell/dictionary-sources``. 112 113 #. Download the latest version of the dictionary from the `SCOWL`_ homepage or 114 `SourceForce`_ as a tarball (tag.gz) and unpack it in the working directory. 115 Rename the resulting folder from ``scowl-YYYY.MM.DD`` to ``scowl``. 116 #. Run the script ``sh make-new-dict.sh`` to generate a new dictionary and make 117 sure it runs without errors. For more details on this script, see the 118 `make-new-dict.sh`_ section. 119 #. Do a sanity check on the resulting dictionary file ``en_US-mozilla.dic``. For 120 example, make sure that the size is about the same as the original dictionary 121 (or slightly larger). 122 #. If everything looks correct, use ``sh install-new-dict.sh`` to copy the 123 generated file in the right position and use the process described in 124 :ref:`write_a_patch` to create a patch. 125 126 Info about the file structure 127 ============================= 128 129 mozilla-specific.txt 130 -------------------- 131 132 This file contains Mozilla-specific words that should not be submitted 133 upstream. For example, ``Firefox`` should go in this file (see `bug 237921`_). 134 135 Note that the file ``5-mozilla-specific.txt`` is generated by expanding 136 ``mozilla-specific.txt`` and should not be edited directly. 137 138 utf8 folder 139 ----------- 140 141 ``dictionary-sources/utf8`` is used to store a copy with UTF-8 encoding of the 142 dictionary files. This is used to work around limitations in Phabricator, which 143 treats ISO-8859-1 files as binary and won’t display a diff when updating them. 144 145 Info about the included scripts 146 =============================== 147 148 make-new-dict.sh 149 ---------------- 150 151 The dictionary upgrade scripts ``make-new-dict.sh`` works by expanding (i.e. 152 “unmunching”) the affix compression dictionaries to create wordlists and 153 use those to generate a new dictionary. 154 155 The upgrade script expects the current upstream version to be kept in the 156 directory ``orig``. 157 158 The script will create a few files in ``dictionary-sources/support_file`` in the 159 following order: 160 161 * ``0-special.txt`` contains numbers and ordinals expanded from SCOWL 162 ``en.dic.supp``. 163 * ``1-base.txt`` contains words expanded from ``en_US-custom.dic`` in the 164 **previous** version of SCOWL (from the ``orig`` folder). 165 * ``2-mozilla.txt`` contains words expanded from the current Mozilla dictionary. 166 * ``3-upstream.txt`` contains words expanded from ``en_US-custom.dic`` in the 167 **new** version of SCOWL (from the ``scowl/speller`` folder). 168 * ``2-mozilla-removed.txt`` contains words that are only available in the SCOWL 169 dictionary, i.e. removed by Mozilla. 170 * ``2-mozilla-added.txt`` contains words that are only available in the current 171 Mozilla dictionary, i.e. added by Mozilla. 172 * ``4-patched.txt`` contains words from the new SCOWL dictionary 173 (``3-upstream.txt``), with words from (``2-mozilla-removed.txt``) removed and 174 words (``2-mozilla-added.txt``) added. 175 * ``5-mozilla-specific.txt`` is expanded from ``mozilla-specific.txt`` using the 176 current affix rules from the Mozilla dictionary. 177 * ``5-mozilla-removed.txt`` and ``5-mozilla-added.txt`` contain words that are 178 respectively removed and added by Mozilla compared to the **new** SCOWL 179 version. These files could be used to submit upstream changes, but words 180 included in ``5-mozilla-specific.txt`` should be removed from this list. 181 182 The new dictionary is available as ``en_US-mozilla.dic`` and should be copied 183 over using the ``install-new-dict.sh`` script. 184 185 install-new-dict.sh 186 ------------------- 187 188 The script: 189 190 * Creates a copy of ``orig`` as ``support_files/orig-bk`` and copies the new 191 upstream version to ``orig``. 192 * Copies the existing Mozilla dictionary in ``support_files/mozilla-bk``. 193 * Converts the dictionary (.dic) generated by ``make-new-dict.sh`` from UTF-8 to 194 ISO-8859-1 and moves it to the parent folder. 195 * Sets the affix file (.aff) to use ``ISO8859-1`` as ``SET`` instead of the 196 original ``UTF-8``, removes ``ICONV`` patterns (input conversion tables). 197 198 199 .. _SCOWL: http://wordlist.aspell.net 200 .. _file a new bug: https://bugzilla.mozilla.org/show_bug.cgi?id=enus-dictionary 201 .. _SourceForce: https://sourceforge.net/projects/wordlist/files/SCOWL/ 202 .. _bug 237921: https://bugzilla.mozilla.org/show_bug.cgi?id=237921