[ neovim ].git.dasho

spell.txt (67986B)

1 *spell.txt* Nvim
2
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7 Spell checking *spell*
8
9 Type |gO| to see the table of contents.
10
11 ==============================================================================
12 1. Quick start *spell-quickstart* *E756*
13
14 This command switches on spell checking: >
15
16 :setlocal spell spelllang=en_us
17
18 This switches on the 'spell' option and specifies to check for US English.
19
20 The words that are not recognized are highlighted with one of these:
21 SpellBad word not recognized |hl-SpellBad|
22 SpellCap word not capitalised |hl-SpellCap|
23 SpellRare rare word |hl-SpellRare|
24 SpellLocal wrong spelling for selected region |hl-SpellLocal|
25
26 Vim only checks words for spelling, there is no grammar check.
27
28 If the 'mousemodel' option is set to "popup" and the cursor is on a badly
29 spelled word or it is "popup_setpos" and the mouse pointer is on a badly
30 spelled word, then the popup menu will contain a submenu to replace the bad
31 word. Note: this slows down the appearance of the popup menu.
32
33 To search for the next misspelled word:
34
35 *]s*
36 ]s Move to next misspelled word after the cursor.
37 A count before the command can be used to repeat.
38 'wrapscan' applies.
39
40 *[s*
41 [s Like "]s" but search backwards, find the misspelled
42 word before the cursor. Doesn't recognize words
43 split over two lines, thus may stop at words that are
44 not highlighted as bad. Does not stop at word with
45 missing capital at the start of a line.
46
47 *]S*
48 ]S Like "]s" but only stop at bad words, not at rare
49 words or words for another region.
50
51 *[S*
52 [S Like "]S" but search backwards.
53
54 *]r*
55 ]r Move to next "rare" word after the cursor.
56 A count before the command can be used to repeat.
57 'wrapscan' applies.
58
59 *[r*
60 [r Like "]r" but search backwards, find the "rare"
61 word before the cursor. Doesn't recognize words
62 split over two lines, thus may stop at words that are
63 not highlighted as rare.
64
65
66 To add words to your own word list:
67
68 *zg*
69 zg Add word under the cursor as a good word to the first
70 name in 'spellfile'. A count may precede the command
71 to indicate the entry in 'spellfile' to be used. A
72 count of two uses the second entry.
73
74 In Visual mode the selected characters are added as a
75 word (including white space!).
76 When the cursor is on text that is marked as badly
77 spelled then the marked text is used.
78 Otherwise the word under the cursor, separated by
79 non-word characters, is used.
80
81 If the word is explicitly marked as bad word in
82 another spell file the result is unpredictable.
83
84 *zG*
85 zG Like "zg" but add the word to the internal word list
86 |internal-wordlist|.
87
88 *zw*
89 zw Like "zg" but mark the word as a wrong (bad) word.
90 If the word already appears in 'spellfile' it is
91 turned into a comment line. See |spellfile-cleanup|
92 for getting rid of those.
93
94 *zW*
95 zW Like "zw" but add the word to the internal word list
96 |internal-wordlist|.
97
98 zuw *zug* *zuw*
99 zug Undo |zw| and |zg|, remove the word from the entry in
100 'spellfile'. Count used as with |zg|.
101
102 zuW *zuG* *zuW*
103 zuG Undo |zW| and |zG|, remove the word from the internal
104 word list. Count used as with |zg|.
105
106 *:spe* *:spellgood* *E1280*
107 :[count]spe[llgood] {word}
108 Add {word} as a good word to 'spellfile', like with
109 |zg|. Without count the first name is used, with a
110 count of two the second entry, etc.
111
112 :spe[llgood]! {word} Add {word} as a good word to the internal word list,
113 like with |zG|.
114
115 *:spellw* *:spellwrong*
116 :[count]spellw[rong] {word}
117 Add {word} as a wrong (bad) word to 'spellfile', as
118 with |zw|. Without count the first name is used, with
119 a count of two the second entry, etc.
120
121 :spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
122 list, like with |zW|.
123
124 *:spellra* *:spellrare*
125 :[count]spellra[re] {word}
126 Add {word} as a rare word to 'spellfile', similar to
127 |zw|. Without count the first name is used, with
128 a count of two the second entry, etc.
129
130 There are no normal mode commands to mark words as
131 rare as this is a fairly uncommon command and all
132 intuitive commands for this are already taken. If you
133 want you can add mappings with e.g.: >
134 nnoremap z? :exe ':spellrare ' .. expand('<cWORD>')<CR>
135 nnoremap z/ :exe ':spellrare! ' .. expand('<cWORD>')<CR>
136 < |:spellundo|, |zuw|, or |zuW| can be used to undo this.
137
138 :spellra[re]! {word} Add {word} as a rare word to the internal word
139 list, similar to |zW|.
140
141 :[count]spellu[ndo] {word} *:spellu* *:spellundo*
142 Like |zuw|. [count] used as with |:spellgood|.
143
144 :spellu[ndo]! {word} Like |zuW|. [count] used as with |:spellgood|.
145
146
147 After adding a word to 'spellfile' with the above commands its associated
148 ".spl" file will automatically be updated and reloaded. If you change
149 'spellfile' manually you need to use the |:mkspell| command. This sequence of
150 commands mostly works well: >
151 :edit <file in 'spellfile'>
152 < (make changes to the spell file) >
153 :mkspell! %
154
155 More details about the 'spellfile' format below |spell-wordlist-format|.
156
157 *internal-wordlist*
158 The internal word list is used for all buffers where 'spell' is set. It is
159 not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
160 is set.
161
162
163 Finding suggestions for bad words:
164 *z=*
165 z= For the word under/after the cursor suggest correctly
166 spelled words. This also works to find alternatives
167 for a word that is not highlighted as a bad word,
168 e.g., when the word after it is bad.
169 In Visual mode the highlighted text is taken as the
170 word to be replaced.
171 The results are sorted on similarity to the word being
172 replaced.
173 This may take a long time. Hit CTRL-C when you get
174 bored.
175
176 If the command is used without a count the
177 alternatives are listed and you can enter the number
178 of your choice or press <Enter> if you don't want to
179 replace. You can also use the mouse to click on your
180 choice (only works if the mouse can be used in Normal
181 mode and when there are no line wraps). Click on the
182 first line (the header) to cancel.
183
184 The suggestions listed normally replace a highlighted
185 bad word. Sometimes they include other text, in that
186 case the replaced text is also listed after a "<".
187
188 If a count is used that suggestion is used, without
189 prompting. For example, "1z=" always takes the first
190 suggestion.
191
192 If 'verbose' is non-zero a score will be displayed
193 with the suggestions to indicate the likeliness to the
194 badly spelled word (the higher the score the more
195 different).
196 When a word was replaced the redo command "." will
197 repeat the word replacement. This works like "ciw",
198 the good word and <Esc>. This does NOT work for Thai
199 and other languages without spaces between words.
200
201 *:spellr* *:spellrepall* *E752* *E753*
202 :spellr[epall] Repeat the replacement done by |z=| for all matches
203 with the replaced word in the current window.
204
205 In Insert mode, when the cursor is after a badly spelled word, you can use
206 CTRL-X s to find suggestions. This works like Insert mode completion. Use
207 CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
208
209 The 'spellsuggest' option influences how the list of suggestions is generated
210 and sorted. See 'spellsuggest'.
211
212 The 'spellcapcheck' option is used to check the first word of a sentence
213 starts with a capital. This doesn't work for the first word in the file.
214 When there is a line break right after a sentence the highlighting of the next
215 line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
216 how it can be set automatically when 'spelllang' is set.
217
218 The 'spelloptions' option has a few more flags that influence the way spell
219 checking works. For example, "camel" splits CamelCased words so that each
220 part of the word is spell-checked separately.
221
222 Vim counts the number of times a good word is encountered. This is used to
223 sort the suggestions: words that have been seen before get a small bonus,
224 words that have been seen often get a bigger bonus. The COMMON item in the
225 affix file can be used to define common words, so that this mechanism also
226 works in a new or short file |spell-COMMON|.
227
228 ==============================================================================
229 2. Remarks on spell checking *spell-remarks*
230
231 PERFORMANCE
232
233 Vim does on-the-fly spell checking. To make this work fast the word list is
234 loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
235 might also be a noticeable delay when the word list is loaded, which happens
236 when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
237 To minimize the delay each word list is only loaded once, it is not deleted
238 when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
239 all the word lists are reloaded, thus you may notice a delay then too.
240
241
242 REGIONS
243
244 A word may be spelled differently in various regions. For example, English
245 comes in (at least) these variants:
246
247 en all regions
248 en_au Australia
249 en_ca Canada
250 en_gb Great Britain
251 en_nz New Zealand
252 en_us USA
253
254 Words that are not used in one region but are used in another region are
255 highlighted with SpellLocal |hl-SpellLocal|.
256
257 Always use lowercase letters for the language and region names.
258
259 When adding a word with |zg| or another command it's always added for all
260 regions. You can change that by manually editing the 'spellfile'. See
261 |spell-wordlist-format|. Note that the regions as specified in the files in
262 'spellfile' are only used when all entries in 'spelllang' specify the same
263 region (not counting files specified by their .spl name).
264
265 *spell-german*
266 Specific exception: For German these special regions are used:
267 de all German words accepted
268 de_de old and new spelling
269 de_19 old spelling
270 de_20 new spelling
271 de_at Austria
272 de_ch Switzerland
273
274 *spell-russian*
275 Specific exception: For Russian these special regions are used:
276 ru all Russian words accepted
277 ru_ru "IE" letter spelling
278 ru_yo "YO" letter spelling
279
280 *spell-yiddish*
281 Yiddish requires using "utf-8" encoding, because of the special characters
282 used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
283 instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
284 In a table:
285 'encoding' 'spelllang'
286 utf-8 yi Yiddish
287 latin1 yi transliterated Yiddish
288 utf-8 yi-tr transliterated Yiddish
289
290 *spell-cjk*
291 Chinese, Japanese and other East Asian characters are normally marked as
292 errors, because spell checking of these characters is not supported. If
293 'spelllang' includes "cjk", these characters are not marked as errors. This
294 is useful when editing text with spell checking while some Asian words are
295 present.
296
297
298 SPELL FILES *spell-load*
299
300 Vim searches for spell files in the "spell" subdirectory of the directories in
301 'runtimepath'. The name is: LL.EEE.spl, where:
302 LL the language name
303 EEE the value of 'encoding'
304
305 The value for "LL" comes from 'spelllang', but excludes the region name.
306 Examples:
307 'spelllang' LL ~
308 en_us en
309 en-rare en-rare
310 medical_ca medical
311
312 Only the first file is loaded, the one that is first in 'runtimepath'. If
313 this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
314 All the ones that are found are used.
315
316 Additionally, the files related to the names in 'spellfile' are loaded. These
317 are the files that |zg| and |zw| add good and wrong words to.
318
319 Exceptions:
320 - Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
321 matter for spelling.
322 - When no spell file for 'encoding' is found "ascii" is tried. This only
323 works for languages where nearly all words are ASCII, such as English. It
324 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
325 is being edited. For the ".add" files the same name as the found main
326 spell file is used.
327
328 For example, with these values:
329 'runtimepath' is "~/.config/nvim,/usr/share/nvim/runtime/,~/.config/nvim/after"
330 'encoding' is "iso-8859-2"
331 'spelllang' is "pl"
332
333 Vim will look for:
334 1. ~/.config/nvim/spell/pl.iso-8859-2.spl
335 2. /usr/share/nvim/runtime/spell/pl.iso-8859-2.spl
336 3. ~/.config/nvim/spell/pl.iso-8859-2.add.spl
337 4. /usr/share/nvim/runtime/spell/pl.iso-8859-2.add.spl
338 5. ~/.config/nvim/after/spell/pl.iso-8859-2.add.spl
339
340 This assumes 1. is not found and 2. is found.
341
342 If 'encoding' is "latin1" Vim will look for:
343 1. ~/.config/nvim/spell/pl.latin1.spl
344 2. /usr/share/nvim/runtime/spell/pl.latin1.spl
345 3. ~/.config/nvim/after/spell/pl.latin1.spl
346 4. ~/.config/nvim/spell/pl.ascii.spl
347 5. /usr/share/nvim/runtime/spell/pl.ascii.spl
348 6. ~/.config/nvim/after/spell/pl.ascii.spl
349
350 This assumes none of them are found (Polish doesn't make sense when leaving
351 out the non-ASCII characters).
352
353 A spell file might not be available in the current 'encoding'. See
354 |spell-mkspell| about how to create a spell file. Converting a spell file
355 with "iconv" will NOT work!
356
357 *spell-sug-file* *E781*
358 If there is a file with exactly the same name as the ".spl" file but ending in
359 ".sug", that file will be used for giving better suggestions. It isn't loaded
360 before suggestions are made to reduce memory use.
361
362 *E758* *E759* *E778* *E779* *E780* *E782*
363 When loading a spell file Vim checks that it is properly formatted. If you
364 get an error the file may be truncated, modified or intended for another Vim
365 version.
366
367
368 SPELLFILE CLEANUP *spellfile-cleanup*
369
370 The |zw| command turns existing entries in 'spellfile' into comment lines.
371 This avoids having to write a new file every time, but results in the file
372 only getting longer, never shorter. To clean up the comment lines in all
373 ".add" spell files do this: >
374 :runtime spell/cleanadd.vim
375
376 This deletes all comment lines, except the ones that start with "##". Use
377 "##" lines to add comments that you want to keep.
378
379 You can invoke this script as often as you like. A variable is provided to
380 skip updating files that have been changed recently. Set it to the number of
381 seconds that has passed since a file was changed before it will be cleaned.
382 For example, to clean only files that were not changed in the last hour: >
383 let g:spell_clean_limit = 60 * 60
384 The default is one second.
385
386
387 WORDS
388
389 Vim uses a fixed method to recognize a word. This is independent of
390 'iskeyword', so that it also works in help files and for languages that
391 include characters like '-' in 'iskeyword'. The word characters do depend on
392 'encoding'.
393
394 The table with word characters is stored in the main .spl file. Therefore it
395 matters what the current locale is when generating it! A .add.spl file does
396 not contain a word table though.
397
398 For a word that starts with a digit the digit is ignored, unless the word as a
399 whole is recognized. Thus if "3D" is a word and "D" is not then "3D" is
400 recognized as a word, but if "3D" is not a word then only the "D" is marked as
401 bad. Hex numbers in the form 0x12ab and 0X12AB are recognized.
402
403
404 WORD COMBINATIONS
405
406 It is possible to spell-check words that include a space. This is used to
407 recognize words that are invalid when used by themselves, e.g. for "et al.".
408 It can also be used to recognize "the the" and highlight it.
409
410 The number of spaces is irrelevant. In most cases a line break may also
411 appear. However, this makes it difficult to find out where to start checking
412 for spelling mistakes. When you make a change to one line and only that line
413 is redrawn Vim won't look in the previous line, thus when "et" is at the end
414 of the previous line "al." will be flagged as an error. And when you type
415 "the<CR>the" the highlighting doesn't appear until the first line is redrawn.
416 Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
417 with a line break.
418
419 When encountering a line break Vim skips characters such as "*", '>' and '"',
420 so that comments in C, shell and Vim code can be spell checked.
421
422
423 SYNTAX HIGHLIGHTING *spell-syntax*
424
425 Files that use syntax highlighting can specify where spell checking should be
426 done:
427
428 1. everywhere default
429 2. in specific items use "contains=@Spell"
430 3. everywhere but specific items use "contains=@NoSpell"
431
432 For the second method adding the @NoSpell cluster will disable spell checking
433 again. This can be used, for example, to add @Spell to the comments of a
434 program, and add @NoSpell for items that shouldn't be checked.
435 Also see |:syn-spell| for text that is not in a syntax item.
436
437
438 VIM SCRIPTS
439
440 If you want to write a Vim script that does something with spelling, you may
441 find these functions useful:
442
443 spellbadword() find badly spelled word at the cursor
444 spellsuggest() get list of spelling suggestions
445 soundfold() get the sound-a-like version of a word
446
447
448 SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
449
450 After the 'spelllang' option has been set successfully, Vim will source the
451 files "spell/LANG.vim" and "spell/LANG.lua" in 'runtimepath'. "LANG" is the
452 value of 'spelllang' up to the first comma, dot or underscore. This can be
453 used to set options specifically for the language, especially 'spellcapcheck'.
454
455 The distribution includes a few of these files. Use this command to see what
456 they do: >
457 :next $VIMRUNTIME/spell/*.vim
458
459 Note that the default scripts don't set 'spellcapcheck' if it was changed from
460 the default value. This assumes the user prefers another value then.
461
462
463 DOUBLE SCORING *spell-double-scoring*
464
465 The 'spellsuggest' option can be used to select "double" scoring. This
466 mechanism is based on the principle that there are two kinds of spelling
467 mistakes:
468
469 1. You know how to spell the word, but mistype something. This results in a
470 small editing distance (character swapped/omitted/inserted) and possibly a
471 word that sounds completely different.
472
473 2. You don't know how to spell the word and type something that sounds right.
474 The edit distance can be big but the word is similar after sound-folding.
475
476 Since scores for these two mistakes will be very different we use a list
477 for each and mix them.
478
479 The sound-folding is slow and people that know the language won't make the
480 second kind of mistakes. Therefore 'spellsuggest' can be set to select the
481 preferred method for scoring the suggestions.
482
483 ==============================================================================
484 3. Generating a spell file *spell-mkspell*
485
486 Vim uses a binary file format for spelling. This greatly speeds up loading
487 the word list and keeps it small.
488 *.aff* *.dic* *Myspell*
489 You can create a Vim spell file from the .aff and .dic files that Myspell
490 uses. Myspell is used by OpenOffice.org and Mozilla. The OpenOffice .oxt
491 files are zip files which contain the .aff and .dic files. You should be able
492 to find them here:
493 https://extensions.openoffice.org/en/search@f%5B0%5D%3Dfield_project_tags%253A311.html
494 The older, OpenOffice 2 files may be used if this doesn't work:
495 http://wiki.services.openoffice.org/wiki/Dictionaries
496 You can also use a plain word list. The results are the same, the choice
497 depends on what word lists you can find.
498
499 Make sure your current locale is set properly, otherwise Vim doesn't know what
500 characters are upper/lower case letters. If the locale isn't available (e.g.,
501 when using an MS-Windows codepage on Unix) add tables to the .aff file
502 |spell-affix-chars|. If the .aff file doesn't define a table then the word
503 table of the currently active spelling is used. If spelling is not active
504 then Vim will try to guess.
505
506 *:mksp* *:mkspell*
507 :mksp[ell][!] [-ascii] {outname} {inname} ...
508 Generate a Vim spell file from word lists. Example: >
509 :mkspell /tmp/nl nl_NL.words
510 < *E751*
511 When {outname} ends in ".spl" it is used as the output
512 file name. Otherwise it should be a language name,
513 such as "en", without the region name. The file
514 written will be "{outname}.{encoding}.spl", where
515 {encoding} is the value of the 'encoding' option.
516
517 When the output file already exists [!] must be used
518 to overwrite it.
519
520 When the [-ascii] argument is present, words with
521 non-ascii characters are skipped. The resulting file
522 ends in "ascii.spl".
523
524 The input can be the Myspell format files {inname}.aff
525 and {inname}.dic. If {inname}.aff does not exist then
526 {inname} is used as the file name of a plain word
527 list.
528
529 Multiple {inname} arguments can be given to combine
530 regions into one Vim spell file. Example: >
531 :mkspell ~/.config/nvim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
532 < This combines the English word lists for US, CA and AU
533 into one en.spl file.
534 Up to eight regions can be combined. *E754* *E755*
535 The REP and SAL items of the first .aff file where
536 they appear are used. |spell-REP| |spell-SAL|
537 *E845*
538 This command uses a lot of memory, required to find
539 the optimal word tree (Polish, Italian and Hungarian
540 require several hundred Mbyte). The final result will
541 be much smaller, because compression is used. To
542 avoid running out of memory compression will be done
543 now and then. This can be tuned with the 'mkspellmem'
544 option.
545
546 After the spell file was written and it was being used
547 in a buffer it will be reloaded automatically.
548
549 :mksp[ell] [-ascii] {name}.{enc}.add
550 Like ":mkspell" above, using {name}.{enc}.add as the
551 input file and producing an output file in the same
552 directory that has ".spl" appended.
553
554 :mksp[ell] [-ascii] {name}
555 Like ":mkspell" above, using {name} as the input file
556 and producing an output file in the same directory
557 that has ".{enc}.spl" appended.
558
559 Vim will report the number of duplicate words. This might be a mistake in the
560 list of words. But sometimes it is used to have different prefixes and
561 suffixes for the same basic word to avoid them combining (e.g. Czech uses
562 this). If you want Vim to report all duplicate words set the 'verbose'
563 option.
564
565 Since you might want to change a Myspell word list for use with Vim the
566 following procedure is recommended:
567
568 1. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
569 2. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
570 3. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
571 words, define word characters with FOL/LOW/UPP, etc. The distributed
572 "*.diff" files can be used.
573 4. Start Vim with the right locale and use |:mkspell| to generate the Vim
574 spell file.
575 5. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
576 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
577 wrote it somewhere else.
578
579 When the Myspell files are updated you can merge the differences:
580 1. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
581 2. Use |diff-mode| to see what changed: >
582 nvim -d xx_YY.orig.dic xx_YY.new.dic
583 3. Take over the changes you like in xx_YY.dic.
584 You may also need to change xx_YY.aff.
585 4. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.orig.aff.
586
587
588 SPELL FILE VERSIONS *E770* *E771* *E772*
589
590 Spell checking is a relatively new feature in Vim, thus it's possible that the
591 .spl file format will be changed to support more languages. Vim will check
592 the validity of the spell file and report anything wrong.
593
594 E771: Old spell file, needs to be updated ~
595 This spell file is older than your Vim. You need to update the .spl file.
596
597 E772: Spell file is for newer version of Vim ~
598 This means the spell file was made for a later version of Vim. You need to
599 update Vim.
600
601 E770: Unsupported section in spell file ~
602 This means the spell file was made for a later version of Vim and contains a
603 section that is required for the spell file to work. In this case it's
604 probably a good idea to upgrade your Vim.
605
606
607 SPELL FILE DUMP
608
609 If for some reason you want to check what words are supported by the currently
610 used spelling files, use this command:
611
612 *:spelldump* *:spelld*
613 :spelld[ump] Open a new window and fill it with all currently valid
614 words. Compound words are not included.
615 Note: For some languages the result may be enormous,
616 causing Vim to run out of memory.
617
618 :spelld[ump]! Like ":spelldump" and include the word count. This is
619 the number of times the word was found while
620 updating the screen. Words that are in COMMON items
621 get a starting count of 10.
622
623 The format of the word list is used |spell-wordlist-format|. You should be
624 able to read it with ":mkspell" to generate one .spl file that includes all
625 the words.
626
627 When all entries to 'spelllang' use the same regions or no regions at all then
628 the region information is included in the dumped words. Otherwise only words
629 for the current region are included and no "/regions" line is generated.
630
631 Comment lines with the name of the .spl file are used as a header above the
632 words that were generated from that .spl file.
633
634
635 SPELL FILE MISSING *spell-SpellFileMissing*
636
637 If a spell file is missing, the user is asked whether to download it. See
638 |spellfile.lua|.
639
640 *E797*
641 Note that the SpellFileMissing autocommand must not change or destroy the
642 buffer the user was editing.
643
644 ==============================================================================
645 4. Spell file format *spell-file-format*
646
647 This is the format of the files that are used by the person who creates and
648 maintains a word list.
649
650 Note that we avoid the word "dictionary" here. That is because the goal of
651 spell checking differs from writing a dictionary (as in the book). For
652 spelling we need a list of words that are OK, thus should not be highlighted.
653 Person and company names will not appear in a dictionary, but do appear in a
654 word list. And some old words are rarely used while they are common
655 misspellings. These do appear in a dictionary but not in a word list.
656
657 There are two formats: A straight list of words and a list using affix
658 compression. The files with affix compression are used by Myspell (Mozilla
659 and OpenOffice.org). This requires two files, one with .aff and one with .dic
660 extension.
661
662
663 FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
664
665 The words must appear one per line. That is all that is required.
666
667 Additionally the following items are recognized:
668
669 - Empty and blank lines are ignored.
670
671 # comment ~
672 - Lines starting with a # are ignored (comment lines).
673
674 /encoding=utf-8 ~
675 - A line starting with "/encoding=", before any word, specifies the encoding
676 of the file. After the second '=' comes an encoding name. This tells Vim
677 to setup conversion from the specified encoding to 'encoding'. Thus you can
678 use one word list for several target encodings.
679
680 /regions=usca ~
681 - A line starting with "/regions=" specifies the region names that are
682 supported. Each region name must be two ASCII letters. The first one is
683 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
684 In an addition word list the region names should be equal to the main word
685 list!
686
687 - Other lines starting with '/' are reserved for future use. The ones that
688 are not recognized are ignored. You do get a warning message, so that you
689 know something won't work.
690
691 - A "/" may follow the word with the following items:
692 = Case must match exactly.
693 ? Rare word.
694 ! Bad (wrong) word.
695 1 to 9 A region in which the word is valid. If no regions are
696 specified the word is valid in all regions.
697
698 Example:
699
700 # This is an example word list comment
701 /encoding=latin1 encoding of the file
702 /regions=uscagb regions "us", "ca" and "gb"
703 example word for all regions
704 blah/12 word for regions "us" and "ca"
705 vim/! bad word
706 Campbell/?3 rare word in region 3 "gb"
707 's mornings/= keep-case word
708
709 Note that when "/=" is used the same word with all upper-case letters is not
710 accepted. This is different from a word with mixed case that is automatically
711 marked as keep-case, those words may appear in all upper-case letters.
712
713
714 FORMAT WITH .AFF AND .DIC FILES *aff-dic-format*
715
716 There are two files: the basic word list and an affix file. The affix file
717 specifies settings for the language and can contain affixes. The affixes are
718 used to modify the basic words to get the full word list. This significantly
719 reduces the number of words, especially for a language like Polish. This is
720 called affix compression.
721
722 The basic word list and the affix file are combined with the ":mkspell"
723 command and results in a binary spell file. All the preprocessing has been
724 done, thus this file loads fast. The binary spell file format is described in
725 the source code (src/spell.c). But only developers need to know about it.
726
727 The preprocessing also allows us to take the Myspell language files and modify
728 them before the Vim word list is made. The tools for this can be found in the
729 "src/spell" directory.
730
731 The format for the affix and word list files is based on what Myspell uses
732 (the spell checker of Mozilla and OpenOffice.org). A description can be found
733 here:
734 https://lingucomponent.openoffice.org/affix.readme
735 Note that affixes are case sensitive, this isn't obvious from the description.
736
737 Vim supports quite a few extras. They are described below |spell-affix-vim|.
738 Attempts have been made to keep this compatible with other spell checkers, so
739 that the same files can often be used. One other project that offers more
740 than Myspell is Hunspell ( https://hunspell.github.io ).
741
742
743 WORD LIST FORMAT *spell-dic-format*
744
745 A short example, with line numbers:
746
747 1 1234 ~
748 2 aan ~
749 3 Als ~
750 4 Etten-Leur ~
751 5 et al. ~
752 6 's-Gravenhage ~
753 7 's-Gravenhaags ~
754 8 # word that differs between regions ~
755 9 kado/1 ~
756 10 cadeau/2 ~
757 11 TCP,IP ~
758 12 /the S affix may add a 's' ~
759 13 bedel/S ~
760
761 The first line contains the number of words. Vim ignores it, but you do get
762 an error message if it's not there. *E760*
763
764 What follows is one word per line. White space at the end of the line is
765 ignored, all other white space matters. The encoding is specified in the
766 affix file |spell-SET|.
767
768 Comment lines start with '#' or '/'. See the example lines 8 and 12. Note
769 that putting a comment after a word is NOT allowed:
770
771 someword # comment that causes an error! ~
772
773 After the word there is an optional slash and flags. Most of these flags are
774 letters that indicate the affixes that can be used with this word. These are
775 specified with SFX and PFX lines in the .aff file, see |spell-SFX| and
776 |spell-PFX|. Vim allows using other flag types with the FLAG item in the
777 affix file |spell-FLAG|.
778
779 When the word only has lower-case letters it will also match with the word
780 starting with an upper-case letter.
781
782 When the word includes an upper-case letter, this means the upper-case letter
783 is required at this position. The same word with a lower-case letter at this
784 position will not match. When some of the other letters are upper-case it
785 will not match either.
786
787 The word with all upper-case characters will always be OK,
788
789 word list matches does not match ~
790 als als Als ALS ALs AlS aLs aLS
791 Als Als ALS als ALs AlS aLs aLS
792 ALS ALS als Als ALs AlS aLs aLS
793 AlS AlS ALS als Als ALs aLs aLS
794
795 The KEEPCASE affix ID can be used to specifically match a word with identical
796 case only, see below |spell-KEEPCASE|.
797
798 Note: in line 5 to 7 non-word characters are used. You can include any
799 character in a word. When checking the text a word still only matches when it
800 appears with a non-word character before and after it. For Myspell a word
801 starting with a non-word character probably won't work.
802
803 In line 12 the word "TCP/IP" is defined. Since the slash has a special
804 meaning the comma is used instead. This is defined with the SLASH item in the
805 affix file, see |spell-SLASH|. Note that without this SLASH item the word
806 will be "TCP,IP".
807
808
809 AFFIX FILE FORMAT *spell-aff-format* *spell-affix-vim*
810
811 *spell-affix-comment*
812 Comment lines in the .aff file start with a '#':
813
814 # comment line ~
815
816 Items with a fixed number of arguments can be followed by a comment. But only
817 if none of the arguments can contain white space. The comment must start with
818 a "#" character. Example:
819
820 KEEPCASE = # fix case for words with this flag ~
821
822
823 ENCODING *spell-SET*
824
825 The affix file can be in any encoding that is supported by "iconv". However,
826 in some cases the current locale should also be set properly at the time
827 |:mkspell| is invoked. Adding FOL/LOW/UPP lines removes this requirement
828 |spell-FOL|.
829
830 The encoding should be specified before anything where the encoding matters.
831 The encoding applies both to the affix file and the dictionary file. It is
832 done with a SET line:
833
834 SET utf-8 ~
835
836 The encoding can be different from the value of the 'encoding' option at the
837 time ":mkspell" is used. Vim will then convert everything to 'encoding' and
838 generate a spell file for 'encoding'. If some of the used characters to not
839 fit in 'encoding' you will get an error message.
840 *spell-affix-mbyte*
841 When using a multibyte encoding it's possible to use more different affix
842 flags. But Myspell doesn't support that, thus you may not want to use it
843 anyway. For compatibility use an 8-bit encoding.
844
845
846 INFORMATION
847
848 These entries in the affix file can be used to add information to the spell
849 file. There are no restrictions on the format, but they should be in the
850 right encoding.
851
852 *spell-NAME* *spell-VERSION* *spell-HOME*
853 *spell-AUTHOR* *spell-EMAIL* *spell-COPYRIGHT*
854 NAME Name of the language
855 VERSION 1.0.1 with fixes
856 HOME https://www.example.com
857 AUTHOR John Doe
858 EMAIL john AT Doe DOT net
859 COPYRIGHT LGPL
860
861 These fields are put in the .spl file as-is. The |:spellinfo| command can be
862 used to view the info.
863
864 *:spellinfo* *:spelli*
865 :spelli[nfo] Display the information for the spell file(s) used for
866 the current buffer.
867
868
869 CHARACTER TABLES
870 *spell-affix-chars*
871 When using an 8-bit encoding the affix file should define what characters are
872 word characters. This is because the system where ":mkspell" is used may not
873 support a locale with this encoding and isalpha() won't work. For example
874 when using "cp1250" on Unix.
875 *E761* *E762* *spell-FOL*
876 *spell-LOW* *spell-UPP*
877 Three lines in the affix file are needed. Simplistic example:
878
879 FOL áëñ ~
880 LOW áëñ ~
881 UPP ÁËÑ ~
882
883 All three lines must have exactly the same number of characters.
884
885 The "FOL" line specifies the case-folded characters. These are used to
886 compare words while ignoring case. For most encodings this is identical to
887 the lower case line.
888
889 The "LOW" line specifies the characters in lower-case. Mostly it's equal to
890 the "FOL" line.
891
892 The "UPP" line specifies the characters with upper-case. That is, a character
893 is upper-case where it's different from the character at the same position in
894 "FOL".
895
896 An exception is made for the German sharp s ß. The upper-case version is
897 "SS". In the FOL/LOW/UPP lines it should be included, so that it's recognized
898 as a word character, but use the ß character in all three.
899
900 ASCII characters should be omitted, Vim always handles these in the same way.
901 When the encoding is UTF-8 no word characters need to be specified.
902
903 *E763*
904 Vim allows you to use spell checking for several languages in the same file.
905 You can list them in the 'spelllang' option. As a consequence all spell files
906 for the same encoding must use the same word characters, otherwise they can't
907 be combined without errors.
908
909 If you get an E763 warning that the word tables differ you need to update your
910 ".spl" spell files. If you downloaded the files, get the latest version of
911 all spell files you use. If you are only using one, e.g., German, then also
912 download the recent English spell files. Otherwise generate the .spl file
913 again with |:mkspell|. If you still get errors check the FOL, LOW and UPP
914 lines in the used .aff files.
915
916 The XX.ascii.spl spell file generated with the "-ascii" argument will not
917 contain the table with characters, so that it can be combine with spell files
918 for any encoding. The .add.spl files also do not contain the table.
919
920
921 MID-WORD CHARACTERS
922 *spell-midword*
923 Some characters are only to be considered word characters if they are used in
924 between two ordinary word characters. An example is the single quote: It is
925 often used to put text in quotes, thus it can't be recognized as a word
926 character, but when it appears in between word characters it must be part of
927 the word. This is needed to detect a spelling error such as they'are. That
928 should be they're, but since "they" and "are" are words themselves that would
929 go unnoticed.
930
931 These characters are defined with MIDWORD in the .aff file. Example:
932
933 MIDWORD '- ~
934
935
936 FLAG TYPES *spell-FLAG*
937
938 Flags are used to specify the affixes that can be used with a word and for
939 other properties of the word. Normally single-character flags are used. This
940 limits the number of possible flags, especially for 8-bit encodings. The FLAG
941 item can be used if more affixes are to be used. Possible values:
942
943 FLAG long use two-character flags
944 FLAG num use numbers, from 1 up to 65000
945 FLAG caplong use one-character flags without A-Z and two-character
946 flags that start with A-Z
947
948 With "FLAG num" the numbers in a list of affixes need to be separated with a
949 comma: "234,2143,1435". This method is inefficient, but useful if the file is
950 generated with a program.
951
952 When using "caplong" the two-character flags all start with a capital: "Aa",
953 "B1", "BB", etc. This is useful to use one-character flags for the most
954 common items and two-character flags for uncommon items.
955
956 Note: When using utf-8 only characters up to 65000 may be used for flags.
957
958 Note: even when using "num" or "long" the number of flags available to
959 compounding and prefixes is limited to about 250.
960
961
962 AFFIXES *spell-PFX* *spell-SFX*
963
964 The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
965 documentation or the Aspell manual:
966 http://aspell.net/man-html/Affix-Compression.html).
967
968 Summary:
969 SFX L Y 2 ~
970 SFX L 0 re [^x] ~
971 SFX L 0 ro x ~
972
973 The first line is a header and has four fields:
974 SFX {flag} {combine} {count}
975
976 {flag} The name used for the suffix. Mostly it's a single letter,
977 but other characters can be used, see |spell-FLAG|.
978
979 {combine} Can be 'Y' or 'N'. When 'Y' then the word plus suffix can
980 also have a prefix. When 'N' then a prefix is not allowed.
981
982 {count} The number of lines following. If this is wrong you will get
983 an error message.
984
985 For PFX the fields are exactly the same.
986
987 The basic format for the following lines is:
988 SFX {flag} {strip} {add} {condition} {extra}
989
990 {flag} Must be the same as the {flag} used in the first line.
991
992 {strip} Characters removed from the basic word. There is no check if
993 the characters are actually there, only the length is used (in
994 bytes). This better match the {condition}, otherwise strange
995 things may happen. If the {strip} length is equal to or
996 longer than the basic word the suffix won't be used.
997 When {strip} is 0 (zero) then nothing is stripped.
998
999 {add} Characters added to the basic word, after removing {strip}.
1000 Optionally there is a '/' followed by flags. The flags apply
1001 to the word plus affix. See |spell-affix-flags|
1002
1003 {condition} A simplistic pattern. Only when this matches with a basic
1004 word will the suffix be used for that word. This is normally
1005 for using one suffix letter with different {add} and {strip}
1006 fields for words with different endings.
1007 When {condition} is a . (dot) there is no condition.
1008 The pattern may contain:
1009 - Literal characters.
1010 - A set of characters in []. [abc] matches a, b and c.
1011 A dash is allowed for a range [a-c], but this is
1012 Vim-specific.
1013 - A set of characters that starts with a ^, meaning the
1014 complement of the specified characters. [^abc] matches any
1015 character but a, b and c.
1016
1017 {extra} Optional extra text:
1018 # comment Comment is ignored
1019 - Hunspell uses this, ignored
1020
1021 For PFX the fields are the same, but the {strip}, {add} and {condition} apply
1022 to the start of the word.
1023
1024 Note: Myspell ignores any extra text after the relevant info. Vim requires
1025 this text to start with a "#" so that mistakes don't go unnoticed. Example:
1026
1027 SFX F 0 in [^i]n # Spion > Spionin ~
1028 SFX F 0 nen in # Bauerin > Bauerinnen ~
1029
1030 However, to avoid lots of errors in affix files written for Myspell, you can
1031 add the IGNOREEXTRA flag.
1032
1033 Apparently Myspell allows an affix name to appear more than once. Since this
1034 might also be a mistake, Vim checks for an extra "S". The affix files for
1035 Myspell that use this feature apparently have this flag. Example:
1036
1037 SFX a Y 1 S ~
1038 SFX a 0 an . ~
1039
1040 SFX a Y 2 S ~
1041 SFX a 0 en . ~
1042 SFX a 0 on . ~
1043
1044
1045 AFFIX FLAGS *spell-affix-flags*
1046
1047 This is a feature that comes from Hunspell: The affix may specify flags. This
1048 works similar to flags specified on a basic word. The flags apply to the
1049 basic word plus the affix (but there are restrictions). Example:
1050
1051 SFX S Y 1 ~
1052 SFX S 0 s . ~
1053
1054 SFX A Y 1 ~
1055 SFX A 0 able/S . ~
1056
1057 When the dictionary file contains "drink/AS" then these words are possible:
1058
1059 drink
1060 drinks uses S suffix
1061 drinkable uses A suffix
1062 drinkables uses A suffix and then S suffix
1063
1064 Generally the flags of the suffix are added to the flags of the basic word,
1065 both are used for the word plus suffix. But the flags of the basic word are
1066 only used once for affixes, except that both one prefix and one suffix can be
1067 used when both support combining.
1068
1069 Specifically, the affix flags can be used for:
1070 - Suffixes on suffixes, as in the example above. This works once, thus you
1071 can have two suffixes on a word (plus one prefix).
1072 - Making the word with the affix rare, by using the |spell-RARE| flag.
1073 - Exclude the word with the affix from compounding, by using the
1074 |spell-COMPOUNDFORBIDFLAG| flag.
1075 - Allow the word with the affix to be part of a compound word on the side of
1076 the affix with the |spell-COMPOUNDPERMITFLAG|.
1077 - Use the NEEDCOMPOUND flag: word plus affix can only be used as part of a
1078 compound word. |spell-NEEDCOMPOUND|
1079 - Compound flags: word plus affix can be part of a compound word at the end,
1080 middle, start, etc. The flags are combined with the flags of the basic
1081 word. |spell-compound|
1082 - NEEDAFFIX: another affix is needed to make a valid word.
1083 - CIRCUMFIX, as explained just below.
1084
1085
1086 IGNOREEXTRA *spell-IGNOREEXTRA*
1087
1088 Normally Vim gives an error for an extra field that does not start with '#'.
1089 This avoids errors going unnoticed. However, some files created for Myspell
1090 or Hunspell may contain many entries with an extra field. Use the IGNOREEXTRA
1091 flag to avoid lots of errors.
1092
1093
1094 CIRCUMFIX *spell-CIRCUMFIX*
1095
1096 The CIRCUMFIX flag means a prefix and suffix must be added at the same time.
1097 If a prefix has the CIRCUMFIX flag then only suffixes with the CIRCUMFIX flag
1098 can be added, and the other way around.
1099 An alternative is to only specify the suffix, and give that suffix two flags:
1100 the required prefix and the NEEDAFFIX flag. |spell-NEEDAFFIX|
1101
1102
1103 PFXPOSTPONE *spell-PFXPOSTPONE*
1104
1105 When an affix file has very many prefixes that apply to many words it's not
1106 possible to build the whole word list in memory. This applies to Hebrew (a
1107 list with all words is over a Gbyte). In that case applying prefixes must be
1108 postponed. This makes spell checking slower. It is indicated by this keyword
1109 in the .aff file:
1110
1111 PFXPOSTPONE ~
1112
1113 Only prefixes without a chop string and without flags can be postponed.
1114 Prefixes with a chop string or with flags will still be included in the word
1115 list. An exception if the chop string is one character and equal to the last
1116 character of the added string, but in lower case. Thus when the chop string
1117 is used to allow the following word to start with an upper case letter.
1118
1119
1120 WORDS WITH A SLASH *spell-SLASH*
1121
1122 The slash is used in the .dic file to separate the basic word from the affix
1123 letters and other flags. Unfortunately, this means you cannot use a slash in
1124 a word. Thus "TCP/IP" is not a word but "TCP" with the flags "IP". To
1125 include a slash in the word put a backslash before it: "TCP\/IP". In the rare
1126 case you want to use a backslash inside a word you need to use two
1127 backslashes.
1128 Any other use of the backslash is reserved for future expansion.
1129
1130
1131 KEEP-CASE WORDS *spell-KEEPCASE*
1132
1133 In the affix file a KEEPCASE line can be used to define the affix name used
1134 for keep-case words. Example:
1135
1136 KEEPCASE = ~
1137
1138 This flag is not supported by Myspell. It has the meaning that case matters.
1139 This can be used if the word does not have the first letter in upper case at
1140 the start of a sentence. Example:
1141
1142 word list matches does not match ~
1143 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
1144 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
1145
1146 The flag can also be used to avoid that the word matches when it is in all
1147 upper-case letters.
1148
1149
1150 RARE WORDS *spell-RARE*
1151
1152 In the affix file a RARE line can be used to define the affix name used for
1153 rare words. Example:
1154
1155 RARE ? ~
1156
1157 Rare words are highlighted differently from bad words. This is to be used for
1158 words that are correct for the language, but are hardly ever used and could be
1159 a typing mistake anyway.
1160
1161 This flag can also be used on an affix, so that a basic word is not rare but
1162 the basic word plus affix is rare |spell-affix-flags|. However, if the word
1163 also appears as a good word in another way (e.g., in another region) it won't
1164 be marked as rare.
1165
1166
1167 BAD WORDS *spell-BAD*
1168
1169 In the affix file a BAD line can be used to define the affix name used for
1170 bad words. Example:
1171
1172 BAD ! ~
1173
1174 This can be used to exclude words that would otherwise be good. For example
1175 "the the" in the .dic file:
1176
1177 the the/! ~
1178
1179 Once a word has been marked as bad it won't be undone by encountering the same
1180 word as good.
1181
1182 The flag also applies to the word with affixes, thus this can be used to mark
1183 a whole bunch of related words as bad.
1184
1185 *spell-FORBIDDENWORD*
1186 FORBIDDENWORD can be used just like BAD. For compatibility with Hunspell.
1187
1188 *spell-NEEDAFFIX*
1189 The NEEDAFFIX flag is used to require that a word is used with an affix. The
1190 word itself is not a good word (unless there is an empty affix). Example:
1191
1192 NEEDAFFIX + ~
1193
1194
1195 COMPOUND WORDS *spell-compound*
1196
1197 A compound word is a longer word made by concatenating words that appear in
1198 the .dic file. To specify which words may be concatenated a character is
1199 used. This character is put in the list of affixes after the word. We will
1200 call this character a flag here. Obviously these flags must be different from
1201 any affix IDs used.
1202
1203 *spell-COMPOUNDFLAG*
1204 The Myspell compatible method uses one flag, specified with COMPOUNDFLAG. All
1205 words with this flag combine in any order. This means there is no control
1206 over which word comes first. Example:
1207 COMPOUNDFLAG c ~
1208
1209 *spell-COMPOUNDRULE*
1210 A more advanced method to specify how compound words can be formed uses
1211 multiple items with multiple flags. This is not compatible with Myspell 3.0.
1212 Let's start with an example:
1213 COMPOUNDRULE c+ ~
1214 COMPOUNDRULE se ~
1215
1216 The first line defines that words with the "c" flag can be concatenated in any
1217 order. The second line defines compound words that are made of one word with
1218 the "s" flag and one word with the "e" flag. With this dictionary:
1219 bork/c ~
1220 onion/s ~
1221 soup/e ~
1222
1223 You can make these words:
1224 bork
1225 borkbork
1226 borkborkbork
1227 (etc.)
1228 onion
1229 soup
1230 onionsoup
1231
1232 The COMPOUNDRULE item may appear multiple times. The argument is made out of
1233 one or more groups, where each group can be:
1234 one flag e.g., c
1235 alternate flags inside [] e.g., [abc]
1236 Optionally this may be followed by:
1237 * the group appears zero or more times, e.g., sm*e
1238 + the group appears one or more times, e.g., c+
1239 ? the group appears zero times or once, e.g., x?
1240
1241 This is similar to the regexp pattern syntax (but not the same!). A few
1242 examples with the sequence of word flags they require:
1243 COMPOUNDRULE x+ x xx xxx etc.
1244 COMPOUNDRULE yz yz
1245 COMPOUNDRULE x+z xz xxz xxxz etc.
1246 COMPOUNDRULE yx+ yx yxx yxxx etc.
1247 COMPOUNDRULE xy?z xz xyz
1248
1249 COMPOUNDRULE [abc]z az bz cz
1250 COMPOUNDRULE [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
1251 COMPOUNDRULE a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
1252 COMPOUNDRULE sm*e se sme smme smmme etc.
1253 COMPOUNDRULE s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
1254
1255 A specific example: Allow a compound to be made of two words and a dash:
1256 In the .aff file:
1257 COMPOUNDRULE sde ~
1258 NEEDAFFIX x ~
1259 COMPOUNDWORDMAX 3 ~
1260 COMPOUNDMIN 1 ~
1261 In the .dic file:
1262 start/s ~
1263 end/e ~
1264 -/xd ~
1265
1266 This allows for the word "start-end", but not "startend".
1267
1268 An additional implied rule is that, without further flags, a word with a
1269 prefix cannot be compounded after another word, and a word with a suffix
1270 cannot be compounded with a following word. Thus the affix cannot appear
1271 on the inside of a compound word. This can be changed with the
1272 |spell-COMPOUNDPERMITFLAG|.
1273
1274 *spell-NEEDCOMPOUND*
1275 The NEEDCOMPOUND flag is used to require that a word is used as part of a
1276 compound word. The word itself is not a good word. Example:
1277
1278 NEEDCOMPOUND & ~
1279
1280 *spell-ONLYINCOMPOUND*
1281 The ONLYINCOMPOUND does exactly the same as NEEDCOMPOUND. Supported for
1282 compatibility with Hunspell.
1283
1284 *spell-COMPOUNDMIN*
1285 The minimal character length of a word used for compounding is specified with
1286 COMPOUNDMIN. Example:
1287 COMPOUNDMIN 5 ~
1288
1289 When omitted there is no minimal length. Obviously you could just leave out
1290 the compound flag from short words instead, this feature is present for
1291 compatibility with Myspell.
1292
1293 *spell-COMPOUNDWORDMAX*
1294 The maximum number of words that can be concatenated into a compound word is
1295 specified with COMPOUNDWORDMAX. Example:
1296 COMPOUNDWORDMAX 3 ~
1297
1298 When omitted there is no maximum. It applies to all compound words.
1299
1300 To set a limit for words with specific flags make sure the items in
1301 COMPOUNDRULE where they appear don't allow too many words.
1302
1303 *spell-COMPOUNDSYLMAX*
1304 The maximum number of syllables that a compound word may contain is specified
1305 with COMPOUNDSYLMAX. Example:
1306 COMPOUNDSYLMAX 6 ~
1307
1308 This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
1309 is no limit on the number of syllables.
1310
1311 If both COMPOUNDWORDMAX and COMPOUNDSYLMAX are defined, a compound word is
1312 accepted if it fits one of the criteria, thus is either made from up to
1313 COMPOUNDWORDMAX words or contains up to COMPOUNDSYLMAX syllables.
1314
1315 *spell-COMPOUNDFORBIDFLAG*
1316 The COMPOUNDFORBIDFLAG specifies a flag that can be used on an affix. It
1317 means that the word plus affix cannot be used in a compound word. Example:
1318 affix file:
1319 COMPOUNDFLAG c ~
1320 COMPOUNDFORBIDFLAG x ~
1321 SFX a Y 2 ~
1322 SFX a 0 s . ~
1323 SFX a 0 ize/x . ~
1324 dictionary:
1325 word/c ~
1326 util/ac ~
1327
1328 This allows for "wordutil" and "wordutils" but not "wordutilize".
1329 Note: this doesn't work for postponed prefixes yet.
1330
1331 *spell-COMPOUNDPERMITFLAG*
1332 The COMPOUNDPERMITFLAG specifies a flag that can be used on an affix. It
1333 means that the word plus affix can also be used in a compound word in a way
1334 where the affix ends up halfway through the word. Without this flag that is
1335 not allowed.
1336 Note: this doesn't work for postponed prefixes yet.
1337
1338 *spell-COMPOUNDROOT*
1339 The COMPOUNDROOT flag is used for words in the dictionary that are already a
1340 compound. This means it counts for two words when checking the compounding
1341 rules. Can also be used for an affix to count the affix as a compounding
1342 word.
1343
1344 *spell-CHECKCOMPOUNDPATTERN*
1345 CHECKCOMPOUNDPATTERN is used to define patterns that, when matching at the
1346 position where two words are compounded together forbids the compound.
1347 For example:
1348 CHECKCOMPOUNDPATTERN o e ~
1349
1350 This forbids compounding if the first word ends in "o" and the second word
1351 starts with "e".
1352
1353 The arguments must be plain text, no patterns are actually supported, despite
1354 the item name. Case is always ignored.
1355
1356 The Hunspell feature to use three arguments and flags is not supported.
1357
1358 *spell-NOCOMPOUNDSUGS*
1359 This item indicates that using compounding to make suggestions is not a good
1360 idea. Use this when compounding is used with very short or one-character
1361 words. E.g. to make numbers out of digits. Without this flag creating
1362 suggestions would spend most time trying all kind of weird compound words.
1363
1364 NOCOMPOUNDSUGS ~
1365
1366 *spell-SYLLABLE*
1367 The SYLLABLE item defines characters or character sequences that are used to
1368 count the number of syllables in a word. Example:
1369 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
1370
1371 Before the first slash is the set of characters that are counted for one
1372 syllable, also when repeated and mixed, until the next character that is not
1373 in this set. After the slash come sequences of characters that are counted
1374 for one syllable. These are preferred over using characters from the set.
1375 With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
1376
1377 Only case-folded letters need to be included.
1378
1379 Another way to restrict compounding was mentioned above: Adding the
1380 |spell-COMPOUNDFORBIDFLAG| flag to an affix causes all words that are made
1381 with that affix to not be used for compounding.
1382
1383
1384 UNLIMITED COMPOUNDING *spell-NOBREAK*
1385
1386 For some languages, such as Thai, there is no space in between words. This
1387 looks like all words are compounded. To specify this use the NOBREAK item in
1388 the affix file, without arguments:
1389 NOBREAK ~
1390
1391 Vim will try to figure out where one word ends and a next starts. When there
1392 are spelling mistakes this may not be quite right.
1393
1394
1395 *spell-COMMON*
1396 Common words can be specified with the COMMON item. This will give better
1397 suggestions when editing a short file. Example:
1398
1399 COMMON the of to and a in is it you that he she was for on are ~
1400
1401 The words must be separated by white space, up to 25 per line.
1402 When multiple regions are specified in a ":mkspell" command the common words
1403 for all regions are combined and used for all regions.
1404
1405 *spell-NOSPLITSUGS*
1406 This item indicates that splitting a word to make suggestions is not a good
1407 idea. Split-word suggestions will appear only when there are few similar
1408 words.
1409
1410 NOSPLITSUGS ~
1411
1412 *spell-NOSUGGEST*
1413 The flag specified with NOSUGGEST can be used for words that will not be
1414 suggested. Can be used for obscene words.
1415
1416 NOSUGGEST % ~
1417
1418
1419 REPLACEMENTS *spell-REP*
1420
1421 In the affix file REP items can be used to define common mistakes. This is
1422 used to make spelling suggestions. The items define the "from" text and the
1423 "to" replacement. Example:
1424
1425 REP 4 ~
1426 REP f ph ~
1427 REP ph f ~
1428 REP k ch ~
1429 REP ch k ~
1430
1431 The first line specifies the number of REP lines following. Vim ignores the
1432 number, but it must be there (for compatibility with Myspell).
1433
1434 Don't include simple one-character replacements or swaps. Vim will try these
1435 anyway. You can include whole words if you want to, but you might want to use
1436 the "file:" item in 'spellsuggest' instead.
1437
1438 You can include a space by using an underscore:
1439
1440 REP the_the the ~
1441
1442
1443 SIMILAR CHARACTERS *spell-MAP* *E783*
1444
1445 In the affix file MAP items can be used to define letters that are very much
1446 alike. This is mostly used for a letter with different accents. This is used
1447 to prefer suggestions with these letters substituted. Example:
1448
1449 MAP 2 ~
1450 MAP eéëêè ~
1451 MAP uüùúû ~
1452
1453 The first line specifies the number of MAP lines following. Vim ignores the
1454 number, but the line must be there.
1455
1456 Each letter must appear in only one of the MAP items. It's a bit more
1457 efficient if the first letter is ASCII or at least one without accents.
1458
1459
1460 .SUG FILE *spell-NOSUGFILE*
1461
1462 When soundfolding is specified in the affix file then ":mkspell" will normally
1463 produce a .sug file next to the .spl file. This file is used to find
1464 suggestions by their sound-a-like form quickly. At the cost of a lot of
1465 memory (the amount depends on the number of words, |:mkspell| will display an
1466 estimate when it's done).
1467
1468 To avoid producing a .sug file use this item in the affix file:
1469
1470 NOSUGFILE ~
1471
1472 Users can simply omit the .sug file if they don't want to use it.
1473
1474
1475 SOUND-A-LIKE *spell-SAL*
1476
1477 In the affix file SAL items can be used to define the sounds-a-like mechanism
1478 to be used. The main items define the "from" text and the "to" replacement.
1479 Simplistic example:
1480
1481 SAL CIA X ~
1482 SAL CH X ~
1483 SAL C K ~
1484 SAL K K ~
1485
1486 There are a few rules and this can become quite complicated. An explanation
1487 how it works can be found in the Aspell manual:
1488 http://aspell.net/man-html/Phonetic-Code.html.
1489
1490 There are a few special items:
1491
1492 SAL followup true ~
1493 SAL collapse_result true ~
1494 SAL remove_accents true ~
1495
1496 "1" has the same meaning as "true". Any other value means "false".
1497
1498
1499 SIMPLE SOUNDFOLDING *spell-SOFOFROM* *spell-SOFOTO*
1500
1501 The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1502 characters to another character, mapping similar sounding characters to the
1503 same character. At the same time this does case folding. You can not have
1504 both SAL items and simple soundfolding.
1505
1506 There are two items required: one to specify the characters that are mapped
1507 and one that specifies the characters they are mapped to. They must have
1508 exactly the same number of characters. Example:
1509
1510 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1511 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1512
1513 In the example all vowels are mapped to the same character 'e'. Another
1514 method would be to leave out all vowels. Some characters that sound nearly
1515 the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1516 character. Don't do this too much, all words will start looking alike.
1517
1518 Characters that do not appear in SOFOFROM will be left out, except that all
1519 white space is replaced by one space. Sequences of the same character in
1520 SOFOFROM are replaced by one.
1521
1522 You can use the |soundfold()| function to try out the results. Or set the
1523 'verbose' option to see the score in the output of the |z=| command.
1524
1525
1526 UNSUPPORTED ITEMS *spell-affix-not-supported*
1527
1528 These items appear in the affix file of other spell checkers. In Vim they are
1529 ignored, not supported or defined in another way.
1530
1531 ACCENT (Hunspell) *spell-ACCENT*
1532 Use MAP instead. |spell-MAP|
1533
1534 BREAK (Hunspell) *spell-BREAK*
1535 Define break points. Unclear how it works exactly.
1536 Not supported.
1537
1538 CHECKCOMPOUNDCASE (Hunspell) *spell-CHECKCOMPOUNDCASE*
1539 Disallow uppercase letters at compound word boundaries.
1540 Not supported.
1541
1542 CHECKCOMPOUNDDUP (Hunspell) *spell-CHECKCOMPOUNDDUP*
1543 Disallow using the same word twice in a compound. Not
1544 supported.
1545
1546 CHECKCOMPOUNDREP (Hunspell) *spell-CHECKCOMPOUNDREP*
1547 Something about using REP items and compound words. Not
1548 supported.
1549
1550 CHECKCOMPOUNDTRIPLE (Hunspell) *spell-CHECKCOMPOUNDTRIPLE*
1551 Forbid three identical characters when compounding. Not
1552 supported.
1553
1554 CHECKSHARPS (Hunspell) *spell-CHECKSHARPS*
1555 SS letter pair in uppercased (German) words may be upper case
1556 sharp s (ß). Not supported.
1557
1558 COMPLEXPREFIXES (Hunspell) *spell-COMPLEXPREFIXES*
1559 Enables using two prefixes. Not supported.
1560
1561 COMPOUND (Hunspell) *spell-COMPOUND*
1562 This is one line with the count of COMPOUND items, followed by
1563 that many COMPOUND lines with a pattern.
1564 Remove the first line with the count and rename the other
1565 items to COMPOUNDRULE |spell-COMPOUNDRULE|
1566
1567 COMPOUNDFIRST (Hunspell) *spell-COMPOUNDFIRST*
1568 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
1569
1570 COMPOUNDBEGIN (Hunspell) *spell-COMPOUNDBEGIN*
1571 Words signed with COMPOUNDBEGIN may be first elements in
1572 compound words.
1573 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
1574
1575 COMPOUNDLAST (Hunspell) *spell-COMPOUNDLAST*
1576 Words signed with COMPOUNDLAST may be last elements in
1577 compound words.
1578 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
1579
1580 COMPOUNDEND (Hunspell) *spell-COMPOUNDEND*
1581 Probably the same as COMPOUNDLAST
1582
1583 COMPOUNDMIDDLE (Hunspell) *spell-COMPOUNDMIDDLE*
1584 Words signed with COMPOUNDMIDDLE may be middle elements in
1585 compound words.
1586 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE|
1587
1588 COMPOUNDRULES (Hunspell) *spell-COMPOUNDRULES*
1589 Number of COMPOUNDRULE lines following. Ignored, but the
1590 argument must be a number.
1591
1592 COMPOUNDSYLLABLE (Hunspell) *spell-COMPOUNDSYLLABLE*
1593 Use SYLLABLE and COMPOUNDSYLMAX instead. |spell-SYLLABLE|
1594 |spell-COMPOUNDSYLMAX|
1595
1596 KEY (Hunspell) *spell-KEY*
1597 Define characters that are close together on the keyboard.
1598 Used to give better suggestions. Not supported.
1599
1600 LANG (Hunspell) *spell-LANG*
1601 This specifies language-specific behavior. This actually
1602 moves part of the language knowledge into the program,
1603 therefore Vim does not support it. Each language property
1604 must be specified separately.
1605
1606 LEMMA_PRESENT (Hunspell) *spell-LEMMA_PRESENT*
1607 Only needed for morphological analysis.
1608
1609 MAXNGRAMSUGS (Hunspell) *spell-MAXNGRAMSUGS*
1610 Set number of n-gram suggestions. Not supported.
1611
1612 PSEUDOROOT (Hunspell) *spell-PSEUDOROOT*
1613 Use NEEDAFFIX instead. |spell-NEEDAFFIX|
1614
1615 SUGSWITHDOTS (Hunspell) *spell-SUGSWITHDOTS*
1616 Adds dots to suggestions. Vim doesn't need this.
1617
1618 SYLLABLENUM (Hunspell) *spell-SYLLABLENUM*
1619 Not supported.
1620
1621 TRY (Myspell, Hunspell, others) *spell-TRY*
1622 Vim does not use the TRY item, it is ignored. For making
1623 suggestions the actual characters in the words are used, that
1624 is much more efficient.
1625
1626 WORDCHARS (Hunspell) *spell-WORDCHARS*
1627 Used to recognize words. Vim doesn't need it, because there
1628 is no need to separate words before checking them (using a
1629 trie instead of a hashtable).
1630
1631 ==============================================================================
1632 5. Spell checker design *develop-spell*
1633
1634 When spell checking was going to be added to Vim a survey was done over the
1635 available spell checking libraries and programs. Unfortunately, the result
1636 was that none of them provided sufficient capabilities to be used as the spell
1637 checking engine in Vim, for various reasons:
1638
1639 - Missing support for multi-byte encodings. At least UTF-8 must be supported,
1640 so that more than one language can be used in the same file.
1641 Doing on-the-fly conversion is not always possible (would require iconv
1642 support).
1643 - For the programs and libraries: Using them as-is would require installing
1644 them separately from Vim. That's mostly not impossible, but a drawback.
1645 - Performance: A few tests showed that it's possible to check spelling on the
1646 fly (while redrawing), just like syntax highlighting. But the mechanisms
1647 used by other code are much slower. Myspell uses a hashtable, for example.
1648 The affix compression that most spell checkers use makes it slower too.
1649 - For using an external program like aspell a communication mechanism would
1650 have to be setup. That's complicated to do in a portable way (Unix-only
1651 would be relatively simple, but that's not good enough). And performance
1652 will become a problem (lots of process switching involved).
1653 - Missing support for words with non-word characters, such as "Etten-Leur" and
1654 "et al.", would require marking the pieces of them OK, lowering the
1655 reliability.
1656 - Missing support for regions or dialects. Makes it difficult to accept
1657 all English words and highlight non-Canadian words differently.
1658 - Missing support for rare words. Many words are correct but hardly ever used
1659 and could be a misspelled often-used word.
1660 - For making suggestions the speed is less important and requiring to install
1661 another program or library would be acceptable. But the word lists probably
1662 differ, the suggestions may be wrong words.
1663
1664
1665 Spelling suggestions *develop-spell-suggestions*
1666
1667 For making suggestions there are two basic mechanisms:
1668 1. Try changing the bad word a little bit and check for a match with a good
1669 word. Or go through the list of good words, change them a little bit and
1670 check for a match with the bad word. The changes are deleting a character,
1671 inserting a character, swapping two characters, etc.
1672 2. Perform soundfolding on both the bad word and the good words and then find
1673 matches, possibly with a few changes like with the first mechanism.
1674
1675 The first is good for finding typing mistakes. After experimenting with
1676 hashtables and looking at solutions from other spell checkers the conclusion
1677 was that a trie (a kind of tree structure) is ideal for this. Both for
1678 reducing memory use and being able to try sensible changes. For example, when
1679 inserting a character only characters that lead to good words need to be
1680 tried. Other mechanisms (with hashtables) need to try all possible letters at
1681 every position in the word. Also, a hashtable has the requirement that word
1682 boundaries are identified separately, while a trie does not require this.
1683 That makes the mechanism a lot simpler.
1684
1685 Soundfolding is useful when someone knows how the words sounds but doesn't
1686 know how it is spelled. For example, the word "dictionary" might be written
1687 as "daktonerie". The number of changes that the first method would need to
1688 try is very big, it's hard to find the good word that way. After soundfolding
1689 the words become "tktnr" and "tkxnry", these differ by only two letters.
1690
1691 To find words by their soundfolded equivalent (soundalike word) we need a list
1692 of all soundfolded words. A few experiments have been done to find out what
1693 the best method is. Alternatives:
1694 1. Do the sound folding on the fly when looking for suggestions. This means
1695 walking through the trie of good words, soundfolding each word and
1696 checking how different it is from the bad word. This is very efficient for
1697 memory use, but takes a long time. On a fast PC it takes a couple of
1698 seconds for English, which can be acceptable for interactive use. But for
1699 some languages it takes more than ten seconds (e.g., German, Catalan),
1700 which is unacceptable slow. For batch processing (automatic corrections)
1701 it's too slow for all languages.
1702 2. Use a trie for the soundfolded words, so that searching can be done just
1703 like how it works without soundfolding. This requires remembering a list
1704 of good words for each soundfolded word. This makes finding matches very
1705 fast but requires quite a lot of memory, in the order of 1 to 10 Mbyte.
1706 For some languages more than the original word list.
1707 3. Like the second alternative, but reduce the amount of memory by using affix
1708 compression and store only the soundfolded basic word. This is what Aspell
1709 does. Disadvantage is that affixes need to be stripped from the bad word
1710 before soundfolding it, which means that mistakes at the start and/or end
1711 of the word will cause the mechanism to fail. Also, this becomes slow when
1712 the bad word is quite different from the good word.
1713
1714 The choice made is to use the second mechanism and use a separate file. This
1715 way a user with sufficient memory can get very good suggestions while a user
1716 who is short of memory or just wants the spell checking and no suggestions
1717 doesn't use so much memory.
1718
1719
1720 Word frequency
1721
1722 For sorting suggestions it helps to know which words are common. In theory we
1723 could store a word frequency with the word in the dictionary. However, this
1724 requires storing a count per word. That degrades word tree compression a lot.
1725 And maintaining the word frequency for all languages will be a heavy task.
1726 Also, it would be nice to prefer words that are already in the text. This way
1727 the words that appear in the specific text are preferred for suggestions.
1728
1729 What has been implemented is to count words that have been seen during
1730 displaying. A hashtable is used to quickly find the word count. The count is
1731 initialized from words listed in COMMON items in the affix file, so that it
1732 also works when starting a new file.
1733
1734 This isn't ideal, because the longer Vim is running the higher the counts
1735 become. But in practice it is a noticeable improvement over not using the word
1736 count.
1737
1738 vim:tw=78:sw=4:ts=8:noet:ft=help:norl:

	neovim Neovim text editor
	git clone https://git.dasho.dev/neovim.git
	Log \| Files \| Refs \| README