neovim

Neovim text editor
git clone https://git.dasho.dev/neovim.git
Log | Files | Refs | README

spell.txt (67986B)


      1 *spell.txt*     Nvim
      2 
      3 
      4 	  VIM REFERENCE MANUAL	  by Bram Moolenaar
      5 
      6 
      7 Spell checking						*spell*
      8 
      9                                      Type |gO| to see the table of contents.
     10 
     11 ==============================================================================
     12 1. Quick start					*spell-quickstart* *E756*
     13 
     14 This command switches on spell checking: >
     15 
     16 :setlocal spell spelllang=en_us
     17 
     18 This switches on the 'spell' option and specifies to check for US English.
     19 
     20 The words that are not recognized are highlighted with one of these:
     21 SpellBad	word not recognized			|hl-SpellBad|
     22 SpellCap	word not capitalised			|hl-SpellCap|
     23 SpellRare	rare word				|hl-SpellRare|
     24 SpellLocal	wrong spelling for selected region	|hl-SpellLocal|
     25 
     26 Vim only checks words for spelling, there is no grammar check.
     27 
     28 If the 'mousemodel' option is set to "popup" and the cursor is on a badly
     29 spelled word or it is "popup_setpos" and the mouse pointer is on a badly
     30 spelled word, then the popup menu will contain a submenu to replace the bad
     31 word.  Note: this slows down the appearance of the popup menu.
     32 
     33 To search for the next misspelled word:
     34 
     35 						*]s*
     36 ]s			Move to next misspelled word after the cursor.
     37 		A count before the command can be used to repeat.
     38 		'wrapscan' applies.
     39 
     40 						*[s*
     41 [s			Like "]s" but search backwards, find the misspelled
     42 		word before the cursor.  Doesn't recognize words
     43 		split over two lines, thus may stop at words that are
     44 		not highlighted as bad.  Does not stop at word with
     45 		missing capital at the start of a line.
     46 
     47 						*]S*
     48 ]S			Like "]s" but only stop at bad words, not at rare
     49 		words or words for another region.
     50 
     51 						*[S*
     52 [S			Like "]S" but search backwards.
     53 
     54 						*]r*
     55 ]r			Move to next "rare" word after the cursor.
     56 		A count before the command can be used to repeat.
     57 		'wrapscan' applies.
     58 
     59 						*[r*
     60 [r			Like "]r" but search backwards, find the "rare"
     61 		word before the cursor.  Doesn't recognize words
     62 		split over two lines, thus may stop at words that are
     63 		not highlighted as rare.
     64 
     65 
     66 To add words to your own word list:
     67 
     68 						*zg*
     69 zg			Add word under the cursor as a good word to the first
     70 		name in 'spellfile'.  A count may precede the command
     71 		to indicate the entry in 'spellfile' to be used.  A
     72 		count of two uses the second entry.
     73 
     74 		In Visual mode the selected characters are added as a
     75 		word (including white space!).
     76 		When the cursor is on text that is marked as badly
     77 		spelled then the marked text is used.
     78 		Otherwise the word under the cursor, separated by
     79 		non-word characters, is used.
     80 
     81 		If the word is explicitly marked as bad word in
     82 		another spell file the result is unpredictable.
     83 
     84 						*zG*
     85 zG			Like "zg" but add the word to the internal word list
     86 		|internal-wordlist|.
     87 
     88 						*zw*
     89 zw			Like "zg" but mark the word as a wrong (bad) word.
     90 		If the word already appears in 'spellfile' it is
     91 		turned into a comment line.  See |spellfile-cleanup|
     92 		for getting rid of those.
     93 
     94 						*zW*
     95 zW			Like "zw" but add the word to the internal word list
     96 		|internal-wordlist|.
     97 
     98 zuw							*zug* *zuw*
     99 zug			Undo |zw| and |zg|, remove the word from the entry in
    100 		'spellfile'.  Count used as with |zg|.
    101 
    102 zuW							*zuG* *zuW*
    103 zuG			Undo |zW| and |zG|, remove the word from the internal
    104 		word list.  Count used as with |zg|.
    105 
    106 					*:spe* *:spellgood* *E1280*
    107 :[count]spe[llgood] {word}
    108 		Add {word} as a good word to 'spellfile', like with
    109 		|zg|.  Without count the first name is used, with a
    110 		count of two the second entry, etc.
    111 
    112 :spe[llgood]! {word}	Add {word} as a good word to the internal word list,
    113 		like with |zG|.
    114 
    115 						*:spellw* *:spellwrong*
    116 :[count]spellw[rong] {word}
    117 		Add {word} as a wrong (bad) word to 'spellfile', as
    118 		with |zw|.  Without count the first name is used, with
    119 		a count of two the second entry, etc.
    120 
    121 :spellw[rong]! {word}	Add {word} as a wrong (bad) word to the internal word
    122 		list, like with |zW|.
    123 
    124 						*:spellra* *:spellrare*
    125 :[count]spellra[re] {word}
    126 		Add {word} as a rare word to 'spellfile', similar to
    127 		|zw|.  Without count the first name is used, with
    128 		a count of two the second entry, etc.
    129 
    130 		There are no normal mode commands to mark words as
    131 		rare as this is a fairly uncommon command and all
    132 		intuitive commands for this are already taken.  If you
    133 		want you can add mappings with e.g.: >
    134 	nnoremap z?  :exe ':spellrare  ' .. expand('<cWORD>')<CR>
    135 	nnoremap z/  :exe ':spellrare! ' .. expand('<cWORD>')<CR>
    136 <			|:spellundo|, |zuw|, or |zuW| can be used to undo this.
    137 
    138 :spellra[re]! {word}	Add {word} as a rare word to the internal word
    139 		list, similar to |zW|.
    140 
    141 :[count]spellu[ndo] {word}				*:spellu* *:spellundo*
    142 		Like |zuw|.  [count] used as with |:spellgood|.
    143 
    144 :spellu[ndo]! {word}	Like |zuW|.  [count] used as with |:spellgood|.
    145 
    146 
    147 After adding a word to 'spellfile' with the above commands its associated
    148 ".spl" file will automatically be updated and reloaded.  If you change
    149 'spellfile' manually you need to use the |:mkspell| command.  This sequence of
    150 commands mostly works well: >
    151 :edit <file in 'spellfile'>
    152 <	(make changes to the spell file) >
    153 :mkspell! %
    154 
    155 More details about the 'spellfile' format below |spell-wordlist-format|.
    156 
    157 						*internal-wordlist*
    158 The internal word list is used for all buffers where 'spell' is set.  It is
    159 not stored, it is lost when you exit Vim.  It is also cleared when 'encoding'
    160 is set.
    161 
    162 
    163 Finding suggestions for bad words:
    164 						*z=*
    165 z=			For the word under/after the cursor suggest correctly
    166 		spelled words.  This also works to find alternatives
    167 		for a word that is not highlighted as a bad word,
    168 		e.g., when the word after it is bad.
    169 		In Visual mode the highlighted text is taken as the
    170 		word to be replaced.
    171 		The results are sorted on similarity to the word being
    172 		replaced.
    173 		This may take a long time.  Hit CTRL-C when you get
    174 		bored.
    175 
    176 		If the command is used without a count the
    177 		alternatives are listed and you can enter the number
    178 		of your choice or press <Enter> if you don't want to
    179 		replace.  You can also use the mouse to click on your
    180 		choice (only works if the mouse can be used in Normal
    181 		mode and when there are no line wraps).  Click on the
    182 		first line (the header) to cancel.
    183 
    184 		The suggestions listed normally replace a highlighted
    185 		bad word.  Sometimes they include other text, in that
    186 		case the replaced text is also listed after a "<".
    187 
    188 		If a count is used that suggestion is used, without
    189 		prompting.  For example, "1z=" always takes the first
    190 		suggestion.
    191 
    192 		If 'verbose' is non-zero a score will be displayed
    193 		with the suggestions to indicate the likeliness to the
    194 		badly spelled word (the higher the score the more
    195 		different).
    196 		When a word was replaced the redo command "." will
    197 		repeat the word replacement.  This works like "ciw",
    198 		the good word and <Esc>.  This does NOT work for Thai
    199 		and other languages without spaces between words.
    200 
    201 				*:spellr* *:spellrepall* *E752* *E753*
    202 :spellr[epall]		Repeat the replacement done by |z=| for all matches
    203 		with the replaced word in the current window.
    204 
    205 In Insert mode, when the cursor is after a badly spelled word, you can use
    206 CTRL-X s to find suggestions.  This works like Insert mode completion.  Use
    207 CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
    208 
    209 The 'spellsuggest' option influences how the list of suggestions is generated
    210 and sorted.  See 'spellsuggest'.
    211 
    212 The 'spellcapcheck' option is used to check the first word of a sentence
    213 starts with a capital.  This doesn't work for the first word in the file.
    214 When there is a line break right after a sentence the highlighting of the next
    215 line may be postponed.  Use |CTRL-L| when needed.  Also see |set-spc-auto| for
    216 how it can be set automatically when 'spelllang' is set.
    217 
    218 The 'spelloptions' option has a few more flags that influence the way spell
    219 checking works.  For example, "camel" splits CamelCased words so that each
    220 part of the word is spell-checked separately.
    221 
    222 Vim counts the number of times a good word is encountered.  This is used to
    223 sort the suggestions: words that have been seen before get a small bonus,
    224 words that have been seen often get a bigger bonus.  The COMMON item in the
    225 affix file can be used to define common words, so that this mechanism also
    226 works in a new or short file |spell-COMMON|.
    227 
    228 ==============================================================================
    229 2. Remarks on spell checking				*spell-remarks*
    230 
    231 PERFORMANCE
    232 
    233 Vim does on-the-fly spell checking.  To make this work fast the word list is
    234 loaded in memory.  Thus this uses a lot of memory (1 Mbyte or more).  There
    235 might also be a noticeable delay when the word list is loaded, which happens
    236 when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
    237 To minimize the delay each word list is only loaded once, it is not deleted
    238 when 'spelllang' is made empty or 'spell' is reset.  When 'encoding' is set
    239 all the word lists are reloaded, thus you may notice a delay then too.
    240 
    241 
    242 REGIONS
    243 
    244 A word may be spelled differently in various regions.  For example, English
    245 comes in (at least) these variants:
    246 
    247 en		all regions
    248 en_au		Australia
    249 en_ca		Canada
    250 en_gb		Great Britain
    251 en_nz		New Zealand
    252 en_us		USA
    253 
    254 Words that are not used in one region but are used in another region are
    255 highlighted with SpellLocal |hl-SpellLocal|.
    256 
    257 Always use lowercase letters for the language and region names.
    258 
    259 When adding a word with |zg| or another command it's always added for all
    260 regions.  You can change that by manually editing the 'spellfile'.  See
    261 |spell-wordlist-format|.  Note that the regions as specified in the files in
    262 'spellfile' are only used when all entries in 'spelllang' specify the same
    263 region (not counting files specified by their .spl name).
    264 
    265 						*spell-german*
    266 Specific exception: For German these special regions are used:
    267 de		all German words accepted
    268 de_de		old and new spelling
    269 de_19		old spelling
    270 de_20		new spelling
    271 de_at		Austria
    272 de_ch		Switzerland
    273 
    274 						*spell-russian*
    275 Specific exception: For Russian these special regions are used:
    276 ru		all Russian words accepted
    277 ru_ru		"IE" letter spelling
    278 ru_yo		"YO" letter spelling
    279 
    280 						*spell-yiddish*
    281 Yiddish requires using "utf-8" encoding, because of the special characters
    282 used.  If you are using latin1 Vim will use transliterated (romanized) Yiddish
    283 instead.  If you want to use transliterated Yiddish with utf-8 use "yi-tr".
    284 In a table:
    285 'encoding'	'spelllang'
    286 utf-8		yi		Yiddish
    287 latin1		yi		transliterated Yiddish
    288 utf-8		yi-tr		transliterated Yiddish
    289 
    290 						*spell-cjk*
    291 Chinese, Japanese and other East Asian characters are normally marked as
    292 errors, because spell checking of these characters is not supported.  If
    293 'spelllang' includes "cjk", these characters are not marked as errors.  This
    294 is useful when editing text with spell checking while some Asian words are
    295 present.
    296 
    297 
    298 SPELL FILES						*spell-load*
    299 
    300 Vim searches for spell files in the "spell" subdirectory of the directories in
    301 'runtimepath'.  The name is: LL.EEE.spl, where:
    302 LL	the language name
    303 EEE	the value of 'encoding'
    304 
    305 The value for "LL" comes from 'spelllang', but excludes the region name.
    306 Examples:
    307 'spelllang'	LL ~
    308 en_us		en
    309 en-rare		en-rare
    310 medical_ca	medical
    311 
    312 Only the first file is loaded, the one that is first in 'runtimepath'.  If
    313 this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
    314 All the ones that are found are used.
    315 
    316 Additionally, the files related to the names in 'spellfile' are loaded.  These
    317 are the files that |zg| and |zw| add good and wrong words to.
    318 
    319 Exceptions:
    320 - Vim uses "latin1" when 'encoding' is "iso-8859-15".  The euro sign doesn't
    321  matter for spelling.
    322 - When no spell file for 'encoding' is found "ascii" is tried.  This only
    323  works for languages where nearly all words are ASCII, such as English.  It
    324  helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
    325  is being edited.  For the ".add" files the same name as the found main
    326  spell file is used.
    327 
    328 For example, with these values:
    329 'runtimepath' is "~/.config/nvim,/usr/share/nvim/runtime/,~/.config/nvim/after"
    330 'encoding'    is "iso-8859-2"
    331 'spelllang'   is "pl"
    332 
    333 Vim will look for:
    334 1. ~/.config/nvim/spell/pl.iso-8859-2.spl
    335 2. /usr/share/nvim/runtime/spell/pl.iso-8859-2.spl
    336 3. ~/.config/nvim/spell/pl.iso-8859-2.add.spl
    337 4. /usr/share/nvim/runtime/spell/pl.iso-8859-2.add.spl
    338 5. ~/.config/nvim/after/spell/pl.iso-8859-2.add.spl
    339 
    340 This assumes 1. is not found and 2. is found.
    341 
    342 If 'encoding' is "latin1" Vim will look for:
    343 1. ~/.config/nvim/spell/pl.latin1.spl
    344 2. /usr/share/nvim/runtime/spell/pl.latin1.spl
    345 3. ~/.config/nvim/after/spell/pl.latin1.spl
    346 4. ~/.config/nvim/spell/pl.ascii.spl
    347 5. /usr/share/nvim/runtime/spell/pl.ascii.spl
    348 6. ~/.config/nvim/after/spell/pl.ascii.spl
    349 
    350 This assumes none of them are found (Polish doesn't make sense when leaving
    351 out the non-ASCII characters).
    352 
    353 A spell file might not be available in the current 'encoding'.  See
    354 |spell-mkspell| about how to create a spell file.  Converting a spell file
    355 with "iconv" will NOT work!
    356 
    357 					    *spell-sug-file* *E781*
    358 If there is a file with exactly the same name as the ".spl" file but ending in
    359 ".sug", that file will be used for giving better suggestions.  It isn't loaded
    360 before suggestions are made to reduce memory use.
    361 
    362 			    *E758* *E759* *E778* *E779* *E780* *E782*
    363 When loading a spell file Vim checks that it is properly formatted.  If you
    364 get an error the file may be truncated, modified or intended for another Vim
    365 version.
    366 
    367 
    368 SPELLFILE CLEANUP					*spellfile-cleanup*
    369 
    370 The |zw| command turns existing entries in 'spellfile' into comment lines.
    371 This avoids having to write a new file every time, but results in the file
    372 only getting longer, never shorter.  To clean up the comment lines in all
    373 ".add" spell files do this: >
    374 :runtime spell/cleanadd.vim
    375 
    376 This deletes all comment lines, except the ones that start with "##".  Use
    377 "##" lines to add comments that you want to keep.
    378 
    379 You can invoke this script as often as you like.  A variable is provided to
    380 skip updating files that have been changed recently.  Set it to the number of
    381 seconds that has passed since a file was changed before it will be cleaned.
    382 For example, to clean only files that were not changed in the last hour: >
    383      let g:spell_clean_limit = 60 * 60
    384 The default is one second.
    385 
    386 
    387 WORDS
    388 
    389 Vim uses a fixed method to recognize a word.  This is independent of
    390 'iskeyword', so that it also works in help files and for languages that
    391 include characters like '-' in 'iskeyword'.  The word characters do depend on
    392 'encoding'.
    393 
    394 The table with word characters is stored in the main .spl file.  Therefore it
    395 matters what the current locale is when generating it!  A .add.spl file does
    396 not contain a word table though.
    397 
    398 For a word that starts with a digit the digit is ignored, unless the word as a
    399 whole is recognized.  Thus if "3D" is a word and "D" is not then "3D" is
    400 recognized as a word, but if "3D" is not a word then only the "D" is marked as
    401 bad.  Hex numbers in the form 0x12ab and 0X12AB are recognized.
    402 
    403 
    404 WORD COMBINATIONS
    405 
    406 It is possible to spell-check words that include a space.  This is used to
    407 recognize words that are invalid when used by themselves, e.g. for "et al.".
    408 It can also be used to recognize "the the" and highlight it.
    409 
    410 The number of spaces is irrelevant.  In most cases a line break may also
    411 appear.  However, this makes it difficult to find out where to start checking
    412 for spelling mistakes.  When you make a change to one line and only that line
    413 is redrawn Vim won't look in the previous line, thus when "et" is at the end
    414 of the previous line "al." will be flagged as an error.  And when you type
    415 "the<CR>the" the highlighting doesn't appear until the first line is redrawn.
    416 Use |CTRL-L| to redraw right away.  "[s" will also stop at a word combination
    417 with a line break.
    418 
    419 When encountering a line break Vim skips characters such as "*", '>' and '"',
    420 so that comments in C, shell and Vim code can be spell checked.
    421 
    422 
    423 SYNTAX HIGHLIGHTING					*spell-syntax*
    424 
    425 Files that use syntax highlighting can specify where spell checking should be
    426 done:
    427 
    428 1.  everywhere			   default
    429 2.  in specific items		   use "contains=@Spell"
    430 3.  everywhere but specific items  use "contains=@NoSpell"
    431 
    432 For the second method adding the @NoSpell cluster will disable spell checking
    433 again.  This can be used, for example, to add @Spell to the comments of a
    434 program, and add @NoSpell for items that shouldn't be checked.
    435 Also see |:syn-spell| for text that is not in a syntax item.
    436 
    437 
    438 VIM SCRIPTS
    439 
    440 If you want to write a Vim script that does something with spelling, you may
    441 find these functions useful:
    442 
    443    spellbadword()	find badly spelled word at the cursor
    444    spellsuggest()	get list of spelling suggestions
    445    soundfold()		get the sound-a-like version of a word
    446 
    447 
    448 SETTING 'spellcapcheck' AUTOMATICALLY			*set-spc-auto*
    449 
    450 After the 'spelllang' option has been set successfully, Vim will source the
    451 files "spell/LANG.vim" and "spell/LANG.lua" in 'runtimepath'.  "LANG" is the
    452 value of 'spelllang' up to the first comma, dot or underscore.  This can be
    453 used to set options specifically for the language, especially 'spellcapcheck'.
    454 
    455 The distribution includes a few of these files.  Use this command to see what
    456 they do: >
    457 :next $VIMRUNTIME/spell/*.vim
    458 
    459 Note that the default scripts don't set 'spellcapcheck' if it was changed from
    460 the default value.  This assumes the user prefers another value then.
    461 
    462 
    463 DOUBLE SCORING						*spell-double-scoring*
    464 
    465 The 'spellsuggest' option can be used to select "double" scoring.  This
    466 mechanism is based on the principle that there are two kinds of spelling
    467 mistakes:
    468 
    469 1. You know how to spell the word, but mistype something.  This results in a
    470   small editing distance (character swapped/omitted/inserted) and possibly a
    471   word that sounds completely different.
    472 
    473 2. You don't know how to spell the word and type something that sounds right.
    474   The edit distance can be big but the word is similar after sound-folding.
    475 
    476 Since scores for these two mistakes will be very different we use a list
    477 for each and mix them.
    478 
    479 The sound-folding is slow and people that know the language won't make the
    480 second kind of mistakes.  Therefore 'spellsuggest' can be set to select the
    481 preferred method for scoring the suggestions.
    482 
    483 ==============================================================================
    484 3. Generating a spell file				*spell-mkspell*
    485 
    486 Vim uses a binary file format for spelling.  This greatly speeds up loading
    487 the word list and keeps it small.
    488 					    *.aff* *.dic* *Myspell*
    489 You can create a Vim spell file from the .aff and .dic files that Myspell
    490 uses.  Myspell is used by OpenOffice.org and Mozilla.  The OpenOffice .oxt
    491 files are zip files which contain the .aff and .dic files.  You should be able
    492 to find them here:
    493 https://extensions.openoffice.org/en/search@f%5B0%5D%3Dfield_project_tags%253A311.html
    494 The older, OpenOffice 2 files may be used if this doesn't work:
    495 http://wiki.services.openoffice.org/wiki/Dictionaries
    496 You can also use a plain word list.  The results are the same, the choice
    497 depends on what word lists you can find.
    498 
    499 Make sure your current locale is set properly, otherwise Vim doesn't know what
    500 characters are upper/lower case letters.  If the locale isn't available (e.g.,
    501 when using an MS-Windows codepage on Unix) add tables to the .aff file
    502 |spell-affix-chars|.  If the .aff file doesn't define a table then the word
    503 table of the currently active spelling is used.  If spelling is not active
    504 then Vim will try to guess.
    505 
    506 						*:mksp* *:mkspell*
    507 :mksp[ell][!] [-ascii] {outname} {inname} ...
    508 		Generate a Vim spell file from word lists.  Example: >
    509 	:mkspell /tmp/nl nl_NL.words
    510 <								*E751*
    511 		When {outname} ends in ".spl" it is used as the output
    512 		file name.  Otherwise it should be a language name,
    513 		such as "en", without the region name.  The file
    514 		written will be "{outname}.{encoding}.spl", where
    515 		{encoding} is the value of the 'encoding' option.
    516 
    517 		When the output file already exists [!] must be used
    518 		to overwrite it.
    519 
    520 		When the [-ascii] argument is present, words with
    521 		non-ascii characters are skipped.  The resulting file
    522 		ends in "ascii.spl".
    523 
    524 		The input can be the Myspell format files {inname}.aff
    525 		and {inname}.dic.  If {inname}.aff does not exist then
    526 		{inname} is used as the file name of a plain word
    527 		list.
    528 
    529 		Multiple {inname} arguments can be given to combine
    530 		regions into one Vim spell file.  Example: >
    531 	:mkspell ~/.config/nvim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
    532 <			This combines the English word lists for US, CA and AU
    533 		into one en.spl file.
    534 		Up to eight regions can be combined. *E754* *E755*
    535 		The REP and SAL items of the first .aff file where
    536 		they appear are used. |spell-REP| |spell-SAL|
    537 							*E845*
    538 		This command uses a lot of memory, required to find
    539 		the optimal word tree (Polish, Italian and Hungarian
    540 		require several hundred Mbyte).  The final result will
    541 		be much smaller, because compression is used.  To
    542 		avoid running out of memory compression will be done
    543 		now and then.  This can be tuned with the 'mkspellmem'
    544 		option.
    545 
    546 		After the spell file was written and it was being used
    547 		in a buffer it will be reloaded automatically.
    548 
    549 :mksp[ell] [-ascii] {name}.{enc}.add
    550 		Like ":mkspell" above, using {name}.{enc}.add as the
    551 		input file and producing an output file in the same
    552 		directory that has ".spl" appended.
    553 
    554 :mksp[ell] [-ascii] {name}
    555 		Like ":mkspell" above, using {name} as the input file
    556 		and producing an output file in the same directory
    557 		that has ".{enc}.spl" appended.
    558 
    559 Vim will report the number of duplicate words.  This might be a mistake in the
    560 list of words.  But sometimes it is used to have different prefixes and
    561 suffixes for the same basic word to avoid them combining (e.g. Czech uses
    562 this).  If you want Vim to report all duplicate words set the 'verbose'
    563 option.
    564 
    565 Since you might want to change a Myspell word list for use with Vim the
    566 following procedure is recommended:
    567 
    568 1. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
    569 2. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
    570 3. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
    571   words, define word characters with FOL/LOW/UPP, etc.  The distributed
    572   "*.diff" files can be used.
    573 4. Start Vim with the right locale and use |:mkspell| to generate the Vim
    574   spell file.
    575 5. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
    576   a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
    577   wrote it somewhere else.
    578 
    579 When the Myspell files are updated you can merge the differences:
    580 1. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
    581 2. Use |diff-mode| to see what changed: >
    582 nvim -d xx_YY.orig.dic xx_YY.new.dic
    583 3. Take over the changes you like in xx_YY.dic.
    584   You may also need to change xx_YY.aff.
    585 4. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.orig.aff.
    586 
    587 
    588 SPELL FILE VERSIONS					*E770* *E771* *E772*
    589 
    590 Spell checking is a relatively new feature in Vim, thus it's possible that the
    591 .spl file format will be changed to support more languages.  Vim will check
    592 the validity of the spell file and report anything wrong.
    593 
    594 E771: Old spell file, needs to be updated ~
    595 This spell file is older than your Vim.  You need to update the .spl file.
    596 
    597 E772: Spell file is for newer version of Vim ~
    598 This means the spell file was made for a later version of Vim.  You need to
    599 update Vim.
    600 
    601 E770: Unsupported section in spell file ~
    602 This means the spell file was made for a later version of Vim and contains a
    603 section that is required for the spell file to work.  In this case it's
    604 probably a good idea to upgrade your Vim.
    605 
    606 
    607 SPELL FILE DUMP
    608 
    609 If for some reason you want to check what words are supported by the currently
    610 used spelling files, use this command:
    611 
    612 						*:spelldump* *:spelld*
    613 :spelld[ump]		Open a new window and fill it with all currently valid
    614 		words.  Compound words are not included.
    615 		Note: For some languages the result may be enormous,
    616 		causing Vim to run out of memory.
    617 
    618 :spelld[ump]!		Like ":spelldump" and include the word count.  This is
    619 		the number of times the word was found while
    620 		updating the screen.  Words that are in COMMON items
    621 		get a starting count of 10.
    622 
    623 The format of the word list is used |spell-wordlist-format|.  You should be
    624 able to read it with ":mkspell" to generate one .spl file that includes all
    625 the words.
    626 
    627 When all entries to 'spelllang' use the same regions or no regions at all then
    628 the region information is included in the dumped words.  Otherwise only words
    629 for the current region are included and no "/regions" line is generated.
    630 
    631 Comment lines with the name of the .spl file are used as a header above the
    632 words that were generated from that .spl file.
    633 
    634 
    635 SPELL FILE MISSING				    *spell-SpellFileMissing*
    636 
    637 If a spell file is missing, the user is asked whether to download it. See
    638 |spellfile.lua|.
    639 
    640 						*E797*
    641 Note that the SpellFileMissing autocommand must not change or destroy the
    642 buffer the user was editing.
    643 
    644 ==============================================================================
    645 4. Spell file format					*spell-file-format*
    646 
    647 This is the format of the files that are used by the person who creates and
    648 maintains a word list.
    649 
    650 Note that we avoid the word "dictionary" here.  That is because the goal of
    651 spell checking differs from writing a dictionary (as in the book).  For
    652 spelling we need a list of words that are OK, thus should not be highlighted.
    653 Person and company names will not appear in a dictionary, but do appear in a
    654 word list.  And some old words are rarely used while they are common
    655 misspellings.  These do appear in a dictionary but not in a word list.
    656 
    657 There are two formats: A straight list of words and a list using affix
    658 compression.  The files with affix compression are used by Myspell (Mozilla
    659 and OpenOffice.org).  This requires two files, one with .aff and one with .dic
    660 extension.
    661 
    662 
    663 FORMAT OF STRAIGHT WORD LIST				*spell-wordlist-format*
    664 
    665 The words must appear one per line.  That is all that is required.
    666 
    667 Additionally the following items are recognized:
    668 
    669 - Empty and blank lines are ignored.
    670 
    671 # comment ~
    672 - Lines starting with a # are ignored (comment lines).
    673 
    674 /encoding=utf-8 ~
    675 - A line starting with "/encoding=", before any word, specifies the encoding
    676  of the file.  After the second '=' comes an encoding name.  This tells Vim
    677  to setup conversion from the specified encoding to 'encoding'.  Thus you can
    678  use one word list for several target encodings.
    679 
    680 /regions=usca ~
    681 - A line starting with "/regions=" specifies the region names that are
    682  supported.  Each region name must be two ASCII letters.  The first one is
    683  region 1.  Thus "/regions=usca" has region 1 "us" and region 2 "ca".
    684  In an addition word list the region names should be equal to the main word
    685  list!
    686 
    687 - Other lines starting with '/' are reserved for future use.  The ones that
    688  are not recognized are ignored.  You do get a warning message, so that you
    689  know something won't work.
    690 
    691 - A "/" may follow the word with the following items:
    692    =		Case must match exactly.
    693    ?		Rare word.
    694    !		Bad (wrong) word.
    695    1 to 9	A region in which the word is valid.  If no regions are
    696 	specified the word is valid in all regions.
    697 
    698 Example:
    699 
    700 # This is an example word list		comment
    701 /encoding=latin1			encoding of the file
    702 /regions=uscagb				regions "us", "ca" and "gb"
    703 example					word for all regions
    704 blah/12					word for regions "us" and "ca"
    705 vim/!					bad word
    706 Campbell/?3				rare word in region 3 "gb"
    707 's mornings/=				keep-case word
    708 
    709 Note that when "/=" is used the same word with all upper-case letters is not
    710 accepted.  This is different from a word with mixed case that is automatically
    711 marked as keep-case, those words may appear in all upper-case letters.
    712 
    713 
    714 FORMAT WITH .AFF AND .DIC FILES				*aff-dic-format*
    715 
    716 There are two files: the basic word list and an affix file.  The affix file
    717 specifies settings for the language and can contain affixes.  The affixes are
    718 used to modify the basic words to get the full word list.  This significantly
    719 reduces the number of words, especially for a language like Polish.  This is
    720 called affix compression.
    721 
    722 The basic word list and the affix file are combined with the ":mkspell"
    723 command and results in a binary spell file.  All the preprocessing has been
    724 done, thus this file loads fast.  The binary spell file format is described in
    725 the source code (src/spell.c).  But only developers need to know about it.
    726 
    727 The preprocessing also allows us to take the Myspell language files and modify
    728 them before the Vim word list is made.  The tools for this can be found in the
    729 "src/spell" directory.
    730 
    731 The format for the affix and word list files is based on what Myspell uses
    732 (the spell checker of Mozilla and OpenOffice.org).  A description can be found
    733 here:
    734 https://lingucomponent.openoffice.org/affix.readme
    735 Note that affixes are case sensitive, this isn't obvious from the description.
    736 
    737 Vim supports quite a few extras.  They are described below |spell-affix-vim|.
    738 Attempts have been made to keep this compatible with other spell checkers, so
    739 that the same files can often be used.  One other project that offers more
    740 than Myspell is Hunspell ( https://hunspell.github.io ).
    741 
    742 
    743 WORD LIST FORMAT				*spell-dic-format*
    744 
    745 A short example, with line numbers:
    746 
    747 1	1234 ~
    748 2	aan ~
    749 3	Als ~
    750 4	Etten-Leur ~
    751 5	et al. ~
    752 6	's-Gravenhage ~
    753 7	's-Gravenhaags ~
    754 8	# word that differs between regions ~
    755 9	kado/1 ~
    756 10	cadeau/2 ~
    757 11	TCP,IP ~
    758 12	/the S affix may add a 's' ~
    759 13	bedel/S ~
    760 
    761 The first line contains the number of words.  Vim ignores it, but you do get
    762 an error message if it's not there.  *E760*
    763 
    764 What follows is one word per line.  White space at the end of the line is
    765 ignored, all other white space matters.  The encoding is specified in the
    766 affix file |spell-SET|.
    767 
    768 Comment lines start with '#' or '/'.  See the example lines 8 and 12.  Note
    769 that putting a comment after a word is NOT allowed:
    770 
    771 	someword   # comment that causes an error! ~
    772 
    773 After the word there is an optional slash and flags.  Most of these flags are
    774 letters that indicate the affixes that can be used with this word.  These are
    775 specified with SFX and PFX lines in the .aff file, see |spell-SFX| and
    776 |spell-PFX|.  Vim allows using other flag types with the FLAG item in the
    777 affix file |spell-FLAG|.
    778 
    779 When the word only has lower-case letters it will also match with the word
    780 starting with an upper-case letter.
    781 
    782 When the word includes an upper-case letter, this means the upper-case letter
    783 is required at this position.  The same word with a lower-case letter at this
    784 position will not match.  When some of the other letters are upper-case it
    785 will not match either.
    786 
    787 The word with all upper-case characters will always be OK,
    788 
    789 word list	matches			does not match ~
    790 als		als Als ALS		ALs AlS aLs aLS
    791 Als		Als  ALS		als ALs AlS aLs aLS
    792 ALS		ALS			als Als ALs AlS aLs aLS
    793 AlS		AlS ALS			als Als ALs aLs aLS
    794 
    795 The KEEPCASE affix ID can be used to specifically match a word with identical
    796 case only, see below |spell-KEEPCASE|.
    797 
    798 Note: in line 5 to 7 non-word characters are used.  You can include any
    799 character in a word.  When checking the text a word still only matches when it
    800 appears with a non-word character before and after it.  For Myspell a word
    801 starting with a non-word character probably won't work.
    802 
    803 In line 12 the word "TCP/IP" is defined.  Since the slash has a special
    804 meaning the comma is used instead.  This is defined with the SLASH item in the
    805 affix file, see |spell-SLASH|.  Note that without this SLASH item the word
    806 will be "TCP,IP".
    807 
    808 
    809 AFFIX FILE FORMAT			*spell-aff-format* *spell-affix-vim*
    810 
    811 						*spell-affix-comment*
    812 Comment lines in the .aff file start with a '#':
    813 
    814 # comment line ~
    815 
    816 Items with a fixed number of arguments can be followed by a comment.  But only
    817 if none of the arguments can contain white space.  The comment must start with
    818 a "#" character.  Example:
    819 
    820 KEEPCASE =  # fix case for words with this flag ~
    821 
    822 
    823 ENCODING							*spell-SET*
    824 
    825 The affix file can be in any encoding that is supported by "iconv".  However,
    826 in some cases the current locale should also be set properly at the time
    827 |:mkspell| is invoked.  Adding FOL/LOW/UPP lines removes this requirement
    828 |spell-FOL|.
    829 
    830 The encoding should be specified before anything where the encoding matters.
    831 The encoding applies both to the affix file and the dictionary file.  It is
    832 done with a SET line:
    833 
    834 SET utf-8 ~
    835 
    836 The encoding can be different from the value of the 'encoding' option at the
    837 time ":mkspell" is used.  Vim will then convert everything to 'encoding' and
    838 generate a spell file for 'encoding'.  If some of the used characters to not
    839 fit in 'encoding' you will get an error message.
    840 						*spell-affix-mbyte*
    841 When using a multibyte encoding it's possible to use more different affix
    842 flags.  But Myspell doesn't support that, thus you may not want to use it
    843 anyway.  For compatibility use an 8-bit encoding.
    844 
    845 
    846 INFORMATION
    847 
    848 These entries in the affix file can be used to add information to the spell
    849 file.  There are no restrictions on the format, but they should be in the
    850 right encoding.
    851 
    852 			*spell-NAME* *spell-VERSION* *spell-HOME*
    853 			*spell-AUTHOR* *spell-EMAIL* *spell-COPYRIGHT*
    854 NAME		Name of the language
    855 VERSION		1.0.1  with fixes
    856 HOME		https://www.example.com
    857 AUTHOR		John Doe
    858 EMAIL		john AT Doe DOT net
    859 COPYRIGHT	LGPL
    860 
    861 These fields are put in the .spl file as-is.  The |:spellinfo| command can be
    862 used to view the info.
    863 
    864 						*:spellinfo* *:spelli*
    865 :spelli[nfo]		Display the information for the spell file(s) used for
    866 		the current buffer.
    867 
    868 
    869 CHARACTER TABLES
    870 						*spell-affix-chars*
    871 When using an 8-bit encoding the affix file should define what characters are
    872 word characters.  This is because the system where ":mkspell" is used may not
    873 support a locale with this encoding and isalpha() won't work.  For example
    874 when using "cp1250" on Unix.
    875 					*E761* *E762* *spell-FOL*
    876 					*spell-LOW* *spell-UPP*
    877 Three lines in the affix file are needed.  Simplistic example:
    878 
    879 FOL  áëñ ~
    880 LOW  áëñ ~
    881 UPP  ÁËÑ ~
    882 
    883 All three lines must have exactly the same number of characters.
    884 
    885 The "FOL" line specifies the case-folded characters.  These are used to
    886 compare words while ignoring case.  For most encodings this is identical to
    887 the lower case line.
    888 
    889 The "LOW" line specifies the characters in lower-case.  Mostly it's equal to
    890 the "FOL" line.
    891 
    892 The "UPP" line specifies the characters with upper-case.  That is, a character
    893 is upper-case where it's different from the character at the same position in
    894 "FOL".
    895 
    896 An exception is made for the German sharp s ß.  The upper-case version is
    897 "SS".  In the FOL/LOW/UPP lines it should be included, so that it's recognized
    898 as a word character, but use the ß character in all three.
    899 
    900 ASCII characters should be omitted, Vim always handles these in the same way.
    901 When the encoding is UTF-8 no word characters need to be specified.
    902 
    903 						*E763*
    904 Vim allows you to use spell checking for several languages in the same file.
    905 You can list them in the 'spelllang' option.  As a consequence all spell files
    906 for the same encoding must use the same word characters, otherwise they can't
    907 be combined without errors.
    908 
    909 If you get an E763 warning that the word tables differ you need to update your
    910 ".spl" spell files.  If you downloaded the files, get the latest version of
    911 all spell files you use.  If you are only using one, e.g., German, then also
    912 download the recent English spell files.  Otherwise generate the .spl file
    913 again with |:mkspell|.  If you still get errors check the FOL, LOW and UPP
    914 lines in the used .aff files.
    915 
    916 The XX.ascii.spl spell file generated with the "-ascii" argument will not
    917 contain the table with characters, so that it can be combine with spell files
    918 for any encoding.  The .add.spl files also do not contain the table.
    919 
    920 
    921 MID-WORD CHARACTERS
    922 						*spell-midword*
    923 Some characters are only to be considered word characters if they are used in
    924 between two ordinary word characters.  An example is the single quote: It is
    925 often used to put text in quotes, thus it can't be recognized as a word
    926 character, but when it appears in between word characters it must be part of
    927 the word.  This is needed to detect a spelling error such as they'are.  That
    928 should be they're, but since "they" and "are" are words themselves that would
    929 go unnoticed.
    930 
    931 These characters are defined with MIDWORD in the .aff file.  Example:
    932 
    933 MIDWORD	'- ~
    934 
    935 
    936 FLAG TYPES						*spell-FLAG*
    937 
    938 Flags are used to specify the affixes that can be used with a word and for
    939 other properties of the word.  Normally single-character flags are used.  This
    940 limits the number of possible flags, especially for 8-bit encodings.  The FLAG
    941 item can be used if more affixes are to be used.  Possible values:
    942 
    943 FLAG long	use two-character flags
    944 FLAG num	use numbers, from 1 up to 65000
    945 FLAG caplong	use one-character flags without A-Z and two-character
    946 		flags that start with A-Z
    947 
    948 With "FLAG num" the numbers in a list of affixes need to be separated with a
    949 comma: "234,2143,1435".  This method is inefficient, but useful if the file is
    950 generated with a program.
    951 
    952 When using "caplong" the two-character flags all start with a capital: "Aa",
    953 "B1", "BB", etc.  This is useful to use one-character flags for the most
    954 common items and two-character flags for uncommon items.
    955 
    956 Note: When using utf-8 only characters up to 65000 may be used for flags.
    957 
    958 Note: even when using "num" or "long" the number of flags available to
    959 compounding and prefixes is limited to about 250.
    960 
    961 
    962 AFFIXES						*spell-PFX* *spell-SFX*
    963 
    964 The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
    965 documentation or the Aspell manual:
    966 http://aspell.net/man-html/Affix-Compression.html).
    967 
    968 Summary:
    969 SFX L Y 2 ~
    970 SFX L 0 re [^x] ~
    971 SFX L 0 ro x ~
    972 
    973 The first line is a header and has four fields:
    974 SFX {flag} {combine} {count}
    975 
    976 {flag}		The name used for the suffix.  Mostly it's a single letter,
    977 	but other characters can be used, see |spell-FLAG|.
    978 
    979 {combine}	Can be 'Y' or 'N'.  When 'Y' then the word plus suffix can
    980 	also have a prefix.  When 'N' then a prefix is not allowed.
    981 
    982 {count}		The number of lines following.  If this is wrong you will get
    983 	an error message.
    984 
    985 For PFX the fields are exactly the same.
    986 
    987 The basic format for the following lines is:
    988 SFX {flag} {strip} {add} {condition} {extra}
    989 
    990 {flag}		Must be the same as the {flag} used in the first line.
    991 
    992 {strip}		Characters removed from the basic word.  There is no check if
    993 	the characters are actually there, only the length is used (in
    994 	bytes).  This better match the {condition}, otherwise strange
    995 	things may happen.  If the {strip} length is equal to or
    996 	longer than the basic word the suffix won't be used.
    997 	When {strip} is 0 (zero) then nothing is stripped.
    998 
    999 {add}		Characters added to the basic word, after removing {strip}.
   1000 	Optionally there is a '/' followed by flags.  The flags apply
   1001 	to the word plus affix.  See |spell-affix-flags|
   1002 
   1003 {condition}	A simplistic pattern.  Only when this matches with a basic
   1004 	word will the suffix be used for that word.  This is normally
   1005 	for using one suffix letter with different {add} and {strip}
   1006 	fields for words with different endings.
   1007 	When {condition} is a . (dot) there is no condition.
   1008 	The pattern may contain:
   1009 	- Literal characters.
   1010 	- A set of characters in []. [abc] matches a, b and c.
   1011 	  A dash is allowed for a range [a-c], but this is
   1012 	  Vim-specific.
   1013 	- A set of characters that starts with a ^, meaning the
   1014 	  complement of the specified characters. [^abc] matches any
   1015 	  character but a, b and c.
   1016 
   1017 {extra}		Optional extra text:
   1018 	    # comment		Comment is ignored
   1019 	    -			Hunspell uses this, ignored
   1020 
   1021 For PFX the fields are the same, but the {strip}, {add} and {condition} apply
   1022 to the start of the word.
   1023 
   1024 Note: Myspell ignores any extra text after the relevant info.  Vim requires
   1025 this text to start with a "#" so that mistakes don't go unnoticed.  Example:
   1026 
   1027 SFX F 0 in   [^i]n      # Spion > Spionin  ~
   1028 SFX F 0 nen  in		# Bauerin > Bauerinnen ~
   1029 
   1030 However, to avoid lots of errors in affix files written for Myspell, you can
   1031 add the IGNOREEXTRA flag.
   1032 
   1033 Apparently Myspell allows an affix name to appear more than once.  Since this
   1034 might also be a mistake, Vim checks for an extra "S".  The affix files for
   1035 Myspell that use this feature apparently have this flag.  Example:
   1036 
   1037 SFX a Y 1 S ~
   1038 SFX a 0 an . ~
   1039 
   1040 SFX a Y 2 S ~
   1041 SFX a 0 en . ~
   1042 SFX a 0 on . ~
   1043 
   1044 
   1045 AFFIX FLAGS						*spell-affix-flags*
   1046 
   1047 This is a feature that comes from Hunspell: The affix may specify flags.  This
   1048 works similar to flags specified on a basic word.  The flags apply to the
   1049 basic word plus the affix (but there are restrictions).  Example:
   1050 
   1051 SFX S Y 1 ~
   1052 SFX S 0 s . ~
   1053 
   1054 SFX A Y 1 ~
   1055 SFX A 0 able/S . ~
   1056 
   1057 When the dictionary file contains "drink/AS" then these words are possible:
   1058 
   1059 drink
   1060 drinks		uses S suffix
   1061 drinkable	uses A suffix
   1062 drinkables	uses A suffix and then S suffix
   1063 
   1064 Generally the flags of the suffix are added to the flags of the basic word,
   1065 both are used for the word plus suffix.  But the flags of the basic word are
   1066 only used once for affixes, except that both one prefix and one suffix can be
   1067 used when both support combining.
   1068 
   1069 Specifically, the affix flags can be used for:
   1070 - Suffixes on suffixes, as in the example above.  This works once, thus you
   1071  can have two suffixes on a word (plus one prefix).
   1072 - Making the word with the affix rare, by using the |spell-RARE| flag.
   1073 - Exclude the word with the affix from compounding, by using the
   1074  |spell-COMPOUNDFORBIDFLAG| flag.
   1075 - Allow the word with the affix to be part of a compound word on the side of
   1076  the affix with the |spell-COMPOUNDPERMITFLAG|.
   1077 - Use the NEEDCOMPOUND flag: word plus affix can only be used as part of a
   1078  compound word. |spell-NEEDCOMPOUND|
   1079 - Compound flags: word plus affix can be part of a compound word at the end,
   1080  middle, start, etc.  The flags are combined with the flags of the basic
   1081  word.  |spell-compound|
   1082 - NEEDAFFIX: another affix is needed to make a valid word.
   1083 - CIRCUMFIX, as explained just below.
   1084 
   1085 
   1086 IGNOREEXTRA						*spell-IGNOREEXTRA*
   1087 
   1088 Normally Vim gives an error for an extra field that does not start with '#'.
   1089 This avoids errors going unnoticed.  However, some files created for Myspell
   1090 or Hunspell may contain many entries with an extra field.  Use the IGNOREEXTRA
   1091 flag to avoid lots of errors.
   1092 
   1093 
   1094 CIRCUMFIX						*spell-CIRCUMFIX*
   1095 
   1096 The CIRCUMFIX flag means a prefix and suffix must be added at the same time.
   1097 If a prefix has the CIRCUMFIX flag then only suffixes with the CIRCUMFIX flag
   1098 can be added, and the other way around.
   1099 An alternative is to only specify the suffix, and give that suffix two flags:
   1100 the required prefix and the NEEDAFFIX flag.  |spell-NEEDAFFIX|
   1101 
   1102 
   1103 PFXPOSTPONE						*spell-PFXPOSTPONE*
   1104 
   1105 When an affix file has very many prefixes that apply to many words it's not
   1106 possible to build the whole word list in memory.  This applies to Hebrew (a
   1107 list with all words is over a Gbyte).  In that case applying prefixes must be
   1108 postponed.  This makes spell checking slower.  It is indicated by this keyword
   1109 in the .aff file:
   1110 
   1111 PFXPOSTPONE ~
   1112 
   1113 Only prefixes without a chop string and without flags can be postponed.
   1114 Prefixes with a chop string or with flags will still be included in the word
   1115 list.  An exception if the chop string is one character and equal to the last
   1116 character of the added string, but in lower case.  Thus when the chop string
   1117 is used to allow the following word to start with an upper case letter.
   1118 
   1119 
   1120 WORDS WITH A SLASH					*spell-SLASH*
   1121 
   1122 The slash is used in the .dic file to separate the basic word from the affix
   1123 letters and other flags.  Unfortunately, this means you cannot use a slash in
   1124 a word.  Thus "TCP/IP" is not a word but "TCP" with the flags "IP".  To
   1125 include a slash in the word put a backslash before it: "TCP\/IP".  In the rare
   1126 case you want to use a backslash inside a word you need to use two
   1127 backslashes.
   1128 Any other use of the backslash is reserved for future expansion.
   1129 
   1130 
   1131 KEEP-CASE WORDS						*spell-KEEPCASE*
   1132 
   1133 In the affix file a KEEPCASE line can be used to define the affix name used
   1134 for keep-case words.  Example:
   1135 
   1136 KEEPCASE = ~
   1137 
   1138 This flag is not supported by Myspell.  It has the meaning that case matters.
   1139 This can be used if the word does not have the first letter in upper case at
   1140 the start of a sentence.  Example:
   1141 
   1142    word list	    matches		    does not match ~
   1143    's morgens/=    's morgens		    'S morgens 's Morgens 'S MORGENS
   1144    's Morgens	    's Morgens 'S MORGENS   'S morgens 's morgens
   1145 
   1146 The flag can also be used to avoid that the word matches when it is in all
   1147 upper-case letters.
   1148 
   1149 
   1150 RARE WORDS						*spell-RARE*
   1151 
   1152 In the affix file a RARE line can be used to define the affix name used for
   1153 rare words.  Example:
   1154 
   1155 RARE ? ~
   1156 
   1157 Rare words are highlighted differently from bad words.  This is to be used for
   1158 words that are correct for the language, but are hardly ever used and could be
   1159 a typing mistake anyway.
   1160 
   1161 This flag can also be used on an affix, so that a basic word is not rare but
   1162 the basic word plus affix is rare |spell-affix-flags|.  However, if the word
   1163 also appears as a good word in another way (e.g., in another region) it won't
   1164 be marked as rare.
   1165 
   1166 
   1167 BAD WORDS						*spell-BAD*
   1168 
   1169 In the affix file a BAD line can be used to define the affix name used for
   1170 bad words.  Example:
   1171 
   1172 BAD ! ~
   1173 
   1174 This can be used to exclude words that would otherwise be good.  For example
   1175 "the the" in the .dic file:
   1176 
   1177 the the/! ~
   1178 
   1179 Once a word has been marked as bad it won't be undone by encountering the same
   1180 word as good.
   1181 
   1182 The flag also applies to the word with affixes, thus this can be used to mark
   1183 a whole bunch of related words as bad.
   1184 
   1185 						*spell-FORBIDDENWORD*
   1186 FORBIDDENWORD can be used just like BAD.  For compatibility with Hunspell.
   1187 
   1188 						*spell-NEEDAFFIX*
   1189 The NEEDAFFIX flag is used to require that a word is used with an affix.  The
   1190 word itself is not a good word (unless there is an empty affix).  Example:
   1191 
   1192 NEEDAFFIX + ~
   1193 
   1194 
   1195 COMPOUND WORDS						*spell-compound*
   1196 
   1197 A compound word is a longer word made by concatenating words that appear in
   1198 the .dic file.  To specify which words may be concatenated a character is
   1199 used.  This character is put in the list of affixes after the word.  We will
   1200 call this character a flag here.  Obviously these flags must be different from
   1201 any affix IDs used.
   1202 
   1203 						*spell-COMPOUNDFLAG*
   1204 The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.  All
   1205 words with this flag combine in any order.  This means there is no control
   1206 over which word comes first.  Example:
   1207 COMPOUNDFLAG c ~
   1208 
   1209 						*spell-COMPOUNDRULE*
   1210 A more advanced method to specify how compound words can be formed uses
   1211 multiple items with multiple flags.  This is not compatible with Myspell 3.0.
   1212 Let's start with an example:
   1213 COMPOUNDRULE c+ ~
   1214 COMPOUNDRULE se ~
   1215 
   1216 The first line defines that words with the "c" flag can be concatenated in any
   1217 order.  The second line defines compound words that are made of one word with
   1218 the "s" flag and one word with the "e" flag.  With this dictionary:
   1219 bork/c ~
   1220 onion/s ~
   1221 soup/e ~
   1222 
   1223 You can make these words:
   1224 bork
   1225 borkbork
   1226 borkborkbork
   1227 (etc.)
   1228 onion
   1229 soup
   1230 onionsoup
   1231 
   1232 The COMPOUNDRULE item may appear multiple times.  The argument is made out of
   1233 one or more groups, where each group can be:
   1234 one flag			e.g., c
   1235 alternate flags inside []	e.g., [abc]
   1236 Optionally this may be followed by:
   1237 *	the group appears zero or more times, e.g., sm*e
   1238 +	the group appears one or more times, e.g., c+
   1239 ?	the group appears zero times or once, e.g., x?
   1240 
   1241 This is similar to the regexp pattern syntax (but not the same!).  A few
   1242 examples with the sequence of word flags they require:
   1243    COMPOUNDRULE x+	    x xx xxx etc.
   1244    COMPOUNDRULE yz	    yz
   1245    COMPOUNDRULE x+z	    xz xxz xxxz etc.
   1246    COMPOUNDRULE yx+	    yx yxx yxxx etc.
   1247    COMPOUNDRULE xy?z	    xz xyz
   1248 
   1249    COMPOUNDRULE [abc]z    az bz cz
   1250    COMPOUNDRULE [abc]+z   az aaz abaz bz baz bcbz cz caz cbaz etc.
   1251    COMPOUNDRULE a[xyz]+   ax axx axyz ay ayx ayzz az azy azxy etc.
   1252    COMPOUNDRULE sm*e	    se sme smme smmme etc.
   1253    COMPOUNDRULE s[xyz]*e  se sxe sxye sxyxe sye syze sze szye szyxe  etc.
   1254 
   1255 A specific example: Allow a compound to be made of two words and a dash:
   1256 In the .aff file:
   1257     COMPOUNDRULE sde ~
   1258     NEEDAFFIX x ~
   1259     COMPOUNDWORDMAX 3 ~
   1260     COMPOUNDMIN 1 ~
   1261 In the .dic file:
   1262     start/s ~
   1263     end/e ~
   1264     -/xd ~
   1265 
   1266 This allows for the word "start-end", but not "startend".
   1267 
   1268 An additional implied rule is that, without further flags, a word with a
   1269 prefix cannot be compounded after another word, and a word with a suffix
   1270 cannot be compounded with a following word.  Thus the affix cannot appear
   1271 on the inside of a compound word.  This can be changed with the
   1272 |spell-COMPOUNDPERMITFLAG|.
   1273 
   1274 						*spell-NEEDCOMPOUND*
   1275 The NEEDCOMPOUND flag is used to require that a word is used as part of a
   1276 compound word.  The word itself is not a good word.  Example:
   1277 
   1278 NEEDCOMPOUND & ~
   1279 
   1280 						*spell-ONLYINCOMPOUND*
   1281 The ONLYINCOMPOUND does exactly the same as NEEDCOMPOUND.  Supported for
   1282 compatibility with Hunspell.
   1283 
   1284 						*spell-COMPOUNDMIN*
   1285 The minimal character length of a word used for compounding is specified with
   1286 COMPOUNDMIN.  Example:
   1287 COMPOUNDMIN 5 ~
   1288 
   1289 When omitted there is no minimal length.  Obviously you could just leave out
   1290 the compound flag from short words instead, this feature is present for
   1291 compatibility with Myspell.
   1292 
   1293 						*spell-COMPOUNDWORDMAX*
   1294 The maximum number of words that can be concatenated into a compound word is
   1295 specified with COMPOUNDWORDMAX.  Example:
   1296 COMPOUNDWORDMAX 3 ~
   1297 
   1298 When omitted there is no maximum.  It applies to all compound words.
   1299 
   1300 To set a limit for words with specific flags make sure the items in
   1301 COMPOUNDRULE where they appear don't allow too many words.
   1302 
   1303 						*spell-COMPOUNDSYLMAX*
   1304 The maximum number of syllables that a compound word may contain is specified
   1305 with COMPOUNDSYLMAX.  Example:
   1306 COMPOUNDSYLMAX 6 ~
   1307 
   1308 This has no effect if there is no SYLLABLE item.  Without COMPOUNDSYLMAX there
   1309 is no limit on the number of syllables.
   1310 
   1311 If both COMPOUNDWORDMAX and COMPOUNDSYLMAX are defined, a compound word is
   1312 accepted if it fits one of the criteria, thus is either made from up to
   1313 COMPOUNDWORDMAX words or contains up to COMPOUNDSYLMAX syllables.
   1314 
   1315 					    *spell-COMPOUNDFORBIDFLAG*
   1316 The COMPOUNDFORBIDFLAG specifies a flag that can be used on an affix.  It
   1317 means that the word plus affix cannot be used in a compound word.  Example:
   1318 affix file:
   1319 	COMPOUNDFLAG c ~
   1320 	COMPOUNDFORBIDFLAG x ~
   1321 	SFX a Y 2 ~
   1322 	SFX a 0 s   . ~
   1323 	SFX a 0 ize/x . ~
   1324 dictionary:
   1325 	word/c ~
   1326 	util/ac ~
   1327 
   1328 This allows for "wordutil" and "wordutils" but not "wordutilize".
   1329 Note: this doesn't work for postponed prefixes yet.
   1330 
   1331 					    *spell-COMPOUNDPERMITFLAG*
   1332 The COMPOUNDPERMITFLAG specifies a flag that can be used on an affix.  It
   1333 means that the word plus affix can also be used in a compound word in a way
   1334 where the affix ends up halfway through the word.  Without this flag that is
   1335 not allowed.
   1336 Note: this doesn't work for postponed prefixes yet.
   1337 
   1338 					    *spell-COMPOUNDROOT*
   1339 The COMPOUNDROOT flag is used for words in the dictionary that are already a
   1340 compound.  This means it counts for two words when checking the compounding
   1341 rules.  Can also be used for an affix to count the affix as a compounding
   1342 word.
   1343 
   1344 					*spell-CHECKCOMPOUNDPATTERN*
   1345 CHECKCOMPOUNDPATTERN is used to define patterns that, when matching at the
   1346 position where two words are compounded together forbids the compound.
   1347 For example:
   1348 CHECKCOMPOUNDPATTERN o e ~
   1349 
   1350 This forbids compounding if the first word ends in "o" and the second word
   1351 starts with "e".
   1352 
   1353 The arguments must be plain text, no patterns are actually supported, despite
   1354 the item name.  Case is always ignored.
   1355 
   1356 The Hunspell feature to use three arguments and flags is not supported.
   1357 
   1358 						*spell-NOCOMPOUNDSUGS*
   1359 This item indicates that using compounding to make suggestions is not a good
   1360 idea.  Use this when compounding is used with very short or one-character
   1361 words.  E.g. to make numbers out of digits.  Without this flag creating
   1362 suggestions would spend most time trying all kind of weird compound words.
   1363 
   1364 NOCOMPOUNDSUGS ~
   1365 
   1366 						*spell-SYLLABLE*
   1367 The SYLLABLE item defines characters or character sequences that are used to
   1368 count the number of syllables in a word.  Example:
   1369 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
   1370 
   1371 Before the first slash is the set of characters that are counted for one
   1372 syllable, also when repeated and mixed, until the next character that is not
   1373 in this set.  After the slash come sequences of characters that are counted
   1374 for one syllable.  These are preferred over using characters from the set.
   1375 With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
   1376 
   1377 Only case-folded letters need to be included.
   1378 
   1379 Another way to restrict compounding was mentioned above: Adding the
   1380 |spell-COMPOUNDFORBIDFLAG| flag to an affix causes all words that are made
   1381 with that affix to not be used for compounding.
   1382 
   1383 
   1384 UNLIMITED COMPOUNDING					*spell-NOBREAK*
   1385 
   1386 For some languages, such as Thai, there is no space in between words.  This
   1387 looks like all words are compounded.  To specify this use the NOBREAK item in
   1388 the affix file, without arguments:
   1389 NOBREAK ~
   1390 
   1391 Vim will try to figure out where one word ends and a next starts.  When there
   1392 are spelling mistakes this may not be quite right.
   1393 
   1394 
   1395 						*spell-COMMON*
   1396 Common words can be specified with the COMMON item.  This will give better
   1397 suggestions when editing a short file.  Example:
   1398 
   1399 COMMON  the of to and a in is it you that he she was for on are ~
   1400 
   1401 The words must be separated by white space, up to 25 per line.
   1402 When multiple regions are specified in a ":mkspell" command the common words
   1403 for all regions are combined and used for all regions.
   1404 
   1405 						*spell-NOSPLITSUGS*
   1406 This item indicates that splitting a word to make suggestions is not a good
   1407 idea.  Split-word suggestions will appear only when there are few similar
   1408 words.
   1409 
   1410 NOSPLITSUGS ~
   1411 
   1412 						*spell-NOSUGGEST*
   1413 The flag specified with NOSUGGEST can be used for words that will not be
   1414 suggested.  Can be used for obscene words.
   1415 
   1416 NOSUGGEST % ~
   1417 
   1418 
   1419 REPLACEMENTS						*spell-REP*
   1420 
   1421 In the affix file REP items can be used to define common mistakes.  This is
   1422 used to make spelling suggestions.  The items define the "from" text and the
   1423 "to" replacement.  Example:
   1424 
   1425 REP 4 ~
   1426 REP f ph ~
   1427 REP ph f ~
   1428 REP k ch ~
   1429 REP ch k ~
   1430 
   1431 The first line specifies the number of REP lines following.  Vim ignores the
   1432 number, but it must be there (for compatibility with Myspell).
   1433 
   1434 Don't include simple one-character replacements or swaps.  Vim will try these
   1435 anyway.  You can include whole words if you want to, but you might want to use
   1436 the "file:" item in 'spellsuggest' instead.
   1437 
   1438 You can include a space by using an underscore:
   1439 
   1440 REP the_the the ~
   1441 
   1442 
   1443 SIMILAR CHARACTERS					*spell-MAP* *E783*
   1444 
   1445 In the affix file MAP items can be used to define letters that are very much
   1446 alike.  This is mostly used for a letter with different accents.  This is used
   1447 to prefer suggestions with these letters substituted.  Example:
   1448 
   1449 MAP 2 ~
   1450 MAP eéëêè ~
   1451 MAP uüùúû ~
   1452 
   1453 The first line specifies the number of MAP lines following.  Vim ignores the
   1454 number, but the line must be there.
   1455 
   1456 Each letter must appear in only one of the MAP items.  It's a bit more
   1457 efficient if the first letter is ASCII or at least one without accents.
   1458 
   1459 
   1460 .SUG FILE						*spell-NOSUGFILE*
   1461 
   1462 When soundfolding is specified in the affix file then ":mkspell" will normally
   1463 produce a .sug file next to the .spl file.  This file is used to find
   1464 suggestions by their sound-a-like form quickly.  At the cost of a lot of
   1465 memory (the amount depends on the number of words, |:mkspell| will display an
   1466 estimate when it's done).
   1467 
   1468 To avoid producing a .sug file use this item in the affix file:
   1469 
   1470 NOSUGFILE ~
   1471 
   1472 Users can simply omit the .sug file if they don't want to use it.
   1473 
   1474 
   1475 SOUND-A-LIKE						*spell-SAL*
   1476 
   1477 In the affix file SAL items can be used to define the sounds-a-like mechanism
   1478 to be used.  The main items define the "from" text and the "to" replacement.
   1479 Simplistic example:
   1480 
   1481 SAL CIA			 X ~
   1482 SAL CH			 X ~
   1483 SAL C			 K ~
   1484 SAL K			 K ~
   1485 
   1486 There are a few rules and this can become quite complicated.  An explanation
   1487 how it works can be found in the Aspell manual:
   1488 http://aspell.net/man-html/Phonetic-Code.html.
   1489 
   1490 There are a few special items:
   1491 
   1492 SAL followup		true ~
   1493 SAL collapse_result	true ~
   1494 SAL remove_accents	true ~
   1495 
   1496 "1" has the same meaning as "true".  Any other value means "false".
   1497 
   1498 
   1499 SIMPLE SOUNDFOLDING				*spell-SOFOFROM* *spell-SOFOTO*
   1500 
   1501 The SAL mechanism is complex and slow.  A simpler mechanism is mapping all
   1502 characters to another character, mapping similar sounding characters to the
   1503 same character.  At the same time this does case folding.  You can not have
   1504 both SAL items and simple soundfolding.
   1505 
   1506 There are two items required: one to specify the characters that are mapped
   1507 and one that specifies the characters they are mapped to.  They must have
   1508 exactly the same number of characters.  Example:
   1509 
   1510    SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
   1511    SOFOTO   ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
   1512 
   1513 In the example all vowels are mapped to the same character 'e'.  Another
   1514 method would be to leave out all vowels.  Some characters that sound nearly
   1515 the same and are often mixed up, such as 'm' and 'n', are mapped to the same
   1516 character.  Don't do this too much, all words will start looking alike.
   1517 
   1518 Characters that do not appear in SOFOFROM will be left out, except that all
   1519 white space is replaced by one space.  Sequences of the same character in
   1520 SOFOFROM are replaced by one.
   1521 
   1522 You can use the |soundfold()| function to try out the results.  Or set the
   1523 'verbose' option to see the score in the output of the |z=| command.
   1524 
   1525 
   1526 UNSUPPORTED ITEMS				*spell-affix-not-supported*
   1527 
   1528 These items appear in the affix file of other spell checkers.  In Vim they are
   1529 ignored, not supported or defined in another way.
   1530 
   1531 ACCENT		(Hunspell)				*spell-ACCENT*
   1532 	Use MAP instead. |spell-MAP|
   1533 
   1534 BREAK		(Hunspell)				*spell-BREAK*
   1535 	Define break points.  Unclear how it works exactly.
   1536 	Not supported.
   1537 
   1538 CHECKCOMPOUNDCASE  (Hunspell)			*spell-CHECKCOMPOUNDCASE*
   1539 	Disallow uppercase letters at compound word boundaries.
   1540 	Not supported.
   1541 
   1542 CHECKCOMPOUNDDUP  (Hunspell)			*spell-CHECKCOMPOUNDDUP*
   1543 	Disallow using the same word twice in a compound.  Not
   1544 	supported.
   1545 
   1546 CHECKCOMPOUNDREP  (Hunspell)			*spell-CHECKCOMPOUNDREP*
   1547 	Something about using REP items and compound words.  Not
   1548 	supported.
   1549 
   1550 CHECKCOMPOUNDTRIPLE  (Hunspell)			*spell-CHECKCOMPOUNDTRIPLE*
   1551 	Forbid three identical characters when compounding.  Not
   1552 	supported.
   1553 
   1554 CHECKSHARPS  (Hunspell)				*spell-CHECKSHARPS*
   1555 	SS letter pair in uppercased (German) words may be upper case
   1556 	sharp s (ß).  Not supported.
   1557 
   1558 COMPLEXPREFIXES  (Hunspell)				*spell-COMPLEXPREFIXES*
   1559 	Enables using two prefixes.  Not supported.
   1560 
   1561 COMPOUND	(Hunspell)				*spell-COMPOUND*
   1562 	This is one line with the count of COMPOUND items, followed by
   1563 	that many COMPOUND lines with a pattern.
   1564 	Remove the first line with the count and rename the other
   1565 	items to COMPOUNDRULE |spell-COMPOUNDRULE|
   1566 
   1567 COMPOUNDFIRST	(Hunspell)				*spell-COMPOUNDFIRST*
   1568 	Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
   1569 
   1570 COMPOUNDBEGIN	(Hunspell)				*spell-COMPOUNDBEGIN*
   1571 	Words signed with COMPOUNDBEGIN may be first elements in
   1572 	compound words.
   1573 	Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
   1574 
   1575 COMPOUNDLAST	(Hunspell)				*spell-COMPOUNDLAST*
   1576 	Words signed with COMPOUNDLAST may be last elements in
   1577 	compound words.
   1578 	Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
   1579 
   1580 COMPOUNDEND	(Hunspell)				*spell-COMPOUNDEND*
   1581 	Probably the same as COMPOUNDLAST
   1582 
   1583 COMPOUNDMIDDLE	(Hunspell)				*spell-COMPOUNDMIDDLE*
   1584 	Words signed with COMPOUNDMIDDLE may be middle elements in
   1585 	compound words.
   1586 	Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
   1587 
   1588 COMPOUNDRULES	(Hunspell)				*spell-COMPOUNDRULES*
   1589 	Number of COMPOUNDRULE lines following.  Ignored, but the
   1590 	argument must be a number.
   1591 
   1592 COMPOUNDSYLLABLE  (Hunspell)			*spell-COMPOUNDSYLLABLE*
   1593 	Use SYLLABLE and COMPOUNDSYLMAX instead. |spell-SYLLABLE|
   1594 	|spell-COMPOUNDSYLMAX|
   1595 
   1596 KEY		(Hunspell)				*spell-KEY*
   1597 	Define characters that are close together on the keyboard.
   1598 	Used to give better suggestions.  Not supported.
   1599 
   1600 LANG		(Hunspell)				*spell-LANG*
   1601 	This specifies language-specific behavior.  This actually
   1602 	moves part of the language knowledge into the program,
   1603 	therefore Vim does not support it.  Each language property
   1604 	must be specified separately.
   1605 
   1606 LEMMA_PRESENT	(Hunspell)				*spell-LEMMA_PRESENT*
   1607 	Only needed for morphological analysis.
   1608 
   1609 MAXNGRAMSUGS	(Hunspell)				*spell-MAXNGRAMSUGS*
   1610 	Set number of n-gram suggestions.  Not supported.
   1611 
   1612 PSEUDOROOT	(Hunspell)				*spell-PSEUDOROOT*
   1613 	Use NEEDAFFIX instead. |spell-NEEDAFFIX|
   1614 
   1615 SUGSWITHDOTS	(Hunspell)				*spell-SUGSWITHDOTS*
   1616 	Adds dots to suggestions.  Vim doesn't need this.
   1617 
   1618 SYLLABLENUM	(Hunspell)				*spell-SYLLABLENUM*
   1619 	Not supported.
   1620 
   1621 TRY		(Myspell, Hunspell, others)		*spell-TRY*
   1622 	Vim does not use the TRY item, it is ignored.  For making
   1623 	suggestions the actual characters in the words are used, that
   1624 	is much more efficient.
   1625 
   1626 WORDCHARS	(Hunspell)				*spell-WORDCHARS*
   1627 	Used to recognize words.  Vim doesn't need it, because there
   1628 	is no need to separate words before checking them (using a
   1629 	trie instead of a hashtable).
   1630 
   1631 ==============================================================================
   1632 5. Spell checker design					*develop-spell*
   1633 
   1634 When spell checking was going to be added to Vim a survey was done over the
   1635 available spell checking libraries and programs.  Unfortunately, the result
   1636 was that none of them provided sufficient capabilities to be used as the spell
   1637 checking engine in Vim, for various reasons:
   1638 
   1639 - Missing support for multi-byte encodings.  At least UTF-8 must be supported,
   1640  so that more than one language can be used in the same file.
   1641  Doing on-the-fly conversion is not always possible (would require iconv
   1642  support).
   1643 - For the programs and libraries: Using them as-is would require installing
   1644  them separately from Vim.  That's mostly not impossible, but a drawback.
   1645 - Performance: A few tests showed that it's possible to check spelling on the
   1646  fly (while redrawing), just like syntax highlighting.  But the mechanisms
   1647  used by other code are much slower.  Myspell uses a hashtable, for example.
   1648  The affix compression that most spell checkers use makes it slower too.
   1649 - For using an external program like aspell a communication mechanism would
   1650  have to be setup.  That's complicated to do in a portable way (Unix-only
   1651  would be relatively simple, but that's not good enough).  And performance
   1652  will become a problem (lots of process switching involved).
   1653 - Missing support for words with non-word characters, such as "Etten-Leur" and
   1654  "et al.", would require marking the pieces of them OK, lowering the
   1655  reliability.
   1656 - Missing support for regions or dialects.  Makes it difficult to accept
   1657  all English words and highlight non-Canadian words differently.
   1658 - Missing support for rare words.  Many words are correct but hardly ever used
   1659  and could be a misspelled often-used word.
   1660 - For making suggestions the speed is less important and requiring to install
   1661  another program or library would be acceptable.  But the word lists probably
   1662  differ, the suggestions may be wrong words.
   1663 
   1664 
   1665 Spelling suggestions				*develop-spell-suggestions*
   1666 
   1667 For making suggestions there are two basic mechanisms:
   1668 1. Try changing the bad word a little bit and check for a match with a good
   1669   word.  Or go through the list of good words, change them a little bit and
   1670   check for a match with the bad word.  The changes are deleting a character,
   1671   inserting a character, swapping two characters, etc.
   1672 2. Perform soundfolding on both the bad word and the good words and then find
   1673   matches, possibly with a few changes like with the first mechanism.
   1674 
   1675 The first is good for finding typing mistakes.  After experimenting with
   1676 hashtables and looking at solutions from other spell checkers the conclusion
   1677 was that a trie (a kind of tree structure) is ideal for this.  Both for
   1678 reducing memory use and being able to try sensible changes.  For example, when
   1679 inserting a character only characters that lead to good words need to be
   1680 tried.  Other mechanisms (with hashtables) need to try all possible letters at
   1681 every position in the word.  Also, a hashtable has the requirement that word
   1682 boundaries are identified separately, while a trie does not require this.
   1683 That makes the mechanism a lot simpler.
   1684 
   1685 Soundfolding is useful when someone knows how the words sounds but doesn't
   1686 know how it is spelled.  For example, the word "dictionary" might be written
   1687 as "daktonerie".  The number of changes that the first method would need to
   1688 try is very big, it's hard to find the good word that way.  After soundfolding
   1689 the words become "tktnr" and "tkxnry", these differ by only two letters.
   1690 
   1691 To find words by their soundfolded equivalent (soundalike word) we need a list
   1692 of all soundfolded words.  A few experiments have been done to find out what
   1693 the best method is.  Alternatives:
   1694 1. Do the sound folding on the fly when looking for suggestions.  This means
   1695   walking through the trie of good words, soundfolding each word and
   1696   checking how different it is from the bad word.  This is very efficient for
   1697   memory use, but takes a long time.  On a fast PC it takes a couple of
   1698   seconds for English, which can be acceptable for interactive use.  But for
   1699   some languages it takes more than ten seconds (e.g., German, Catalan),
   1700   which is unacceptable slow.  For batch processing (automatic corrections)
   1701   it's too slow for all languages.
   1702 2. Use a trie for the soundfolded words, so that searching can be done just
   1703   like how it works without soundfolding.  This requires remembering a list
   1704   of good words for each soundfolded word.  This makes finding matches very
   1705   fast but requires quite a lot of memory, in the order of 1 to 10 Mbyte.
   1706   For some languages more than the original word list.
   1707 3. Like the second alternative, but reduce the amount of memory by using affix
   1708   compression and store only the soundfolded basic word.  This is what Aspell
   1709   does.  Disadvantage is that affixes need to be stripped from the bad word
   1710   before soundfolding it, which means that mistakes at the start and/or end
   1711   of the word will cause the mechanism to fail.  Also, this becomes slow when
   1712   the bad word is quite different from the good word.
   1713 
   1714 The choice made is to use the second mechanism and use a separate file.  This
   1715 way a user with sufficient memory can get very good suggestions while a user
   1716 who is short of memory or just wants the spell checking and no suggestions
   1717 doesn't use so much memory.
   1718 
   1719 
   1720 Word frequency
   1721 
   1722 For sorting suggestions it helps to know which words are common.  In theory we
   1723 could store a word frequency with the word in the dictionary.  However, this
   1724 requires storing a count per word.  That degrades word tree compression a lot.
   1725 And maintaining the word frequency for all languages will be a heavy task.
   1726 Also, it would be nice to prefer words that are already in the text.  This way
   1727 the words that appear in the specific text are preferred for suggestions.
   1728 
   1729 What has been implemented is to count words that have been seen during
   1730 displaying.  A hashtable is used to quickly find the word count.  The count is
   1731 initialized from words listed in COMMON items in the affix file, so that it
   1732 also works when starting a new file.
   1733 
   1734 This isn't ideal, because the longer Vim is running the higher the counts
   1735 become.  But in practice it is a noticeable improvement over not using the word
   1736 count.
   1737 
   1738 vim:tw=78:sw=4:ts=8:noet:ft=help:norl: