spell.txt (67986B)
1 *spell.txt* Nvim 2 3 4 VIM REFERENCE MANUAL by Bram Moolenaar 5 6 7 Spell checking *spell* 8 9 Type |gO| to see the table of contents. 10 11 ============================================================================== 12 1. Quick start *spell-quickstart* *E756* 13 14 This command switches on spell checking: > 15 16 :setlocal spell spelllang=en_us 17 18 This switches on the 'spell' option and specifies to check for US English. 19 20 The words that are not recognized are highlighted with one of these: 21 SpellBad word not recognized |hl-SpellBad| 22 SpellCap word not capitalised |hl-SpellCap| 23 SpellRare rare word |hl-SpellRare| 24 SpellLocal wrong spelling for selected region |hl-SpellLocal| 25 26 Vim only checks words for spelling, there is no grammar check. 27 28 If the 'mousemodel' option is set to "popup" and the cursor is on a badly 29 spelled word or it is "popup_setpos" and the mouse pointer is on a badly 30 spelled word, then the popup menu will contain a submenu to replace the bad 31 word. Note: this slows down the appearance of the popup menu. 32 33 To search for the next misspelled word: 34 35 *]s* 36 ]s Move to next misspelled word after the cursor. 37 A count before the command can be used to repeat. 38 'wrapscan' applies. 39 40 *[s* 41 [s Like "]s" but search backwards, find the misspelled 42 word before the cursor. Doesn't recognize words 43 split over two lines, thus may stop at words that are 44 not highlighted as bad. Does not stop at word with 45 missing capital at the start of a line. 46 47 *]S* 48 ]S Like "]s" but only stop at bad words, not at rare 49 words or words for another region. 50 51 *[S* 52 [S Like "]S" but search backwards. 53 54 *]r* 55 ]r Move to next "rare" word after the cursor. 56 A count before the command can be used to repeat. 57 'wrapscan' applies. 58 59 *[r* 60 [r Like "]r" but search backwards, find the "rare" 61 word before the cursor. Doesn't recognize words 62 split over two lines, thus may stop at words that are 63 not highlighted as rare. 64 65 66 To add words to your own word list: 67 68 *zg* 69 zg Add word under the cursor as a good word to the first 70 name in 'spellfile'. A count may precede the command 71 to indicate the entry in 'spellfile' to be used. A 72 count of two uses the second entry. 73 74 In Visual mode the selected characters are added as a 75 word (including white space!). 76 When the cursor is on text that is marked as badly 77 spelled then the marked text is used. 78 Otherwise the word under the cursor, separated by 79 non-word characters, is used. 80 81 If the word is explicitly marked as bad word in 82 another spell file the result is unpredictable. 83 84 *zG* 85 zG Like "zg" but add the word to the internal word list 86 |internal-wordlist|. 87 88 *zw* 89 zw Like "zg" but mark the word as a wrong (bad) word. 90 If the word already appears in 'spellfile' it is 91 turned into a comment line. See |spellfile-cleanup| 92 for getting rid of those. 93 94 *zW* 95 zW Like "zw" but add the word to the internal word list 96 |internal-wordlist|. 97 98 zuw *zug* *zuw* 99 zug Undo |zw| and |zg|, remove the word from the entry in 100 'spellfile'. Count used as with |zg|. 101 102 zuW *zuG* *zuW* 103 zuG Undo |zW| and |zG|, remove the word from the internal 104 word list. Count used as with |zg|. 105 106 *:spe* *:spellgood* *E1280* 107 :[count]spe[llgood] {word} 108 Add {word} as a good word to 'spellfile', like with 109 |zg|. Without count the first name is used, with a 110 count of two the second entry, etc. 111 112 :spe[llgood]! {word} Add {word} as a good word to the internal word list, 113 like with |zG|. 114 115 *:spellw* *:spellwrong* 116 :[count]spellw[rong] {word} 117 Add {word} as a wrong (bad) word to 'spellfile', as 118 with |zw|. Without count the first name is used, with 119 a count of two the second entry, etc. 120 121 :spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word 122 list, like with |zW|. 123 124 *:spellra* *:spellrare* 125 :[count]spellra[re] {word} 126 Add {word} as a rare word to 'spellfile', similar to 127 |zw|. Without count the first name is used, with 128 a count of two the second entry, etc. 129 130 There are no normal mode commands to mark words as 131 rare as this is a fairly uncommon command and all 132 intuitive commands for this are already taken. If you 133 want you can add mappings with e.g.: > 134 nnoremap z? :exe ':spellrare ' .. expand('<cWORD>')<CR> 135 nnoremap z/ :exe ':spellrare! ' .. expand('<cWORD>')<CR> 136 < |:spellundo|, |zuw|, or |zuW| can be used to undo this. 137 138 :spellra[re]! {word} Add {word} as a rare word to the internal word 139 list, similar to |zW|. 140 141 :[count]spellu[ndo] {word} *:spellu* *:spellundo* 142 Like |zuw|. [count] used as with |:spellgood|. 143 144 :spellu[ndo]! {word} Like |zuW|. [count] used as with |:spellgood|. 145 146 147 After adding a word to 'spellfile' with the above commands its associated 148 ".spl" file will automatically be updated and reloaded. If you change 149 'spellfile' manually you need to use the |:mkspell| command. This sequence of 150 commands mostly works well: > 151 :edit <file in 'spellfile'> 152 < (make changes to the spell file) > 153 :mkspell! % 154 155 More details about the 'spellfile' format below |spell-wordlist-format|. 156 157 *internal-wordlist* 158 The internal word list is used for all buffers where 'spell' is set. It is 159 not stored, it is lost when you exit Vim. It is also cleared when 'encoding' 160 is set. 161 162 163 Finding suggestions for bad words: 164 *z=* 165 z= For the word under/after the cursor suggest correctly 166 spelled words. This also works to find alternatives 167 for a word that is not highlighted as a bad word, 168 e.g., when the word after it is bad. 169 In Visual mode the highlighted text is taken as the 170 word to be replaced. 171 The results are sorted on similarity to the word being 172 replaced. 173 This may take a long time. Hit CTRL-C when you get 174 bored. 175 176 If the command is used without a count the 177 alternatives are listed and you can enter the number 178 of your choice or press <Enter> if you don't want to 179 replace. You can also use the mouse to click on your 180 choice (only works if the mouse can be used in Normal 181 mode and when there are no line wraps). Click on the 182 first line (the header) to cancel. 183 184 The suggestions listed normally replace a highlighted 185 bad word. Sometimes they include other text, in that 186 case the replaced text is also listed after a "<". 187 188 If a count is used that suggestion is used, without 189 prompting. For example, "1z=" always takes the first 190 suggestion. 191 192 If 'verbose' is non-zero a score will be displayed 193 with the suggestions to indicate the likeliness to the 194 badly spelled word (the higher the score the more 195 different). 196 When a word was replaced the redo command "." will 197 repeat the word replacement. This works like "ciw", 198 the good word and <Esc>. This does NOT work for Thai 199 and other languages without spaces between words. 200 201 *:spellr* *:spellrepall* *E752* *E753* 202 :spellr[epall] Repeat the replacement done by |z=| for all matches 203 with the replaced word in the current window. 204 205 In Insert mode, when the cursor is after a badly spelled word, you can use 206 CTRL-X s to find suggestions. This works like Insert mode completion. Use 207 CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s| 208 209 The 'spellsuggest' option influences how the list of suggestions is generated 210 and sorted. See 'spellsuggest'. 211 212 The 'spellcapcheck' option is used to check the first word of a sentence 213 starts with a capital. This doesn't work for the first word in the file. 214 When there is a line break right after a sentence the highlighting of the next 215 line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for 216 how it can be set automatically when 'spelllang' is set. 217 218 The 'spelloptions' option has a few more flags that influence the way spell 219 checking works. For example, "camel" splits CamelCased words so that each 220 part of the word is spell-checked separately. 221 222 Vim counts the number of times a good word is encountered. This is used to 223 sort the suggestions: words that have been seen before get a small bonus, 224 words that have been seen often get a bigger bonus. The COMMON item in the 225 affix file can be used to define common words, so that this mechanism also 226 works in a new or short file |spell-COMMON|. 227 228 ============================================================================== 229 2. Remarks on spell checking *spell-remarks* 230 231 PERFORMANCE 232 233 Vim does on-the-fly spell checking. To make this work fast the word list is 234 loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There 235 might also be a noticeable delay when the word list is loaded, which happens 236 when 'spell' is set and when 'spelllang' is set while 'spell' was already set. 237 To minimize the delay each word list is only loaded once, it is not deleted 238 when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set 239 all the word lists are reloaded, thus you may notice a delay then too. 240 241 242 REGIONS 243 244 A word may be spelled differently in various regions. For example, English 245 comes in (at least) these variants: 246 247 en all regions 248 en_au Australia 249 en_ca Canada 250 en_gb Great Britain 251 en_nz New Zealand 252 en_us USA 253 254 Words that are not used in one region but are used in another region are 255 highlighted with SpellLocal |hl-SpellLocal|. 256 257 Always use lowercase letters for the language and region names. 258 259 When adding a word with |zg| or another command it's always added for all 260 regions. You can change that by manually editing the 'spellfile'. See 261 |spell-wordlist-format|. Note that the regions as specified in the files in 262 'spellfile' are only used when all entries in 'spelllang' specify the same 263 region (not counting files specified by their .spl name). 264 265 *spell-german* 266 Specific exception: For German these special regions are used: 267 de all German words accepted 268 de_de old and new spelling 269 de_19 old spelling 270 de_20 new spelling 271 de_at Austria 272 de_ch Switzerland 273 274 *spell-russian* 275 Specific exception: For Russian these special regions are used: 276 ru all Russian words accepted 277 ru_ru "IE" letter spelling 278 ru_yo "YO" letter spelling 279 280 *spell-yiddish* 281 Yiddish requires using "utf-8" encoding, because of the special characters 282 used. If you are using latin1 Vim will use transliterated (romanized) Yiddish 283 instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr". 284 In a table: 285 'encoding' 'spelllang' 286 utf-8 yi Yiddish 287 latin1 yi transliterated Yiddish 288 utf-8 yi-tr transliterated Yiddish 289 290 *spell-cjk* 291 Chinese, Japanese and other East Asian characters are normally marked as 292 errors, because spell checking of these characters is not supported. If 293 'spelllang' includes "cjk", these characters are not marked as errors. This 294 is useful when editing text with spell checking while some Asian words are 295 present. 296 297 298 SPELL FILES *spell-load* 299 300 Vim searches for spell files in the "spell" subdirectory of the directories in 301 'runtimepath'. The name is: LL.EEE.spl, where: 302 LL the language name 303 EEE the value of 'encoding' 304 305 The value for "LL" comes from 'spelllang', but excludes the region name. 306 Examples: 307 'spelllang' LL ~ 308 en_us en 309 en-rare en-rare 310 medical_ca medical 311 312 Only the first file is loaded, the one that is first in 'runtimepath'. If 313 this succeeds then additionally files with the name LL.EEE.add.spl are loaded. 314 All the ones that are found are used. 315 316 Additionally, the files related to the names in 'spellfile' are loaded. These 317 are the files that |zg| and |zw| add good and wrong words to. 318 319 Exceptions: 320 - Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't 321 matter for spelling. 322 - When no spell file for 'encoding' is found "ascii" is tried. This only 323 works for languages where nearly all words are ASCII, such as English. It 324 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text 325 is being edited. For the ".add" files the same name as the found main 326 spell file is used. 327 328 For example, with these values: 329 'runtimepath' is "~/.config/nvim,/usr/share/nvim/runtime/,~/.config/nvim/after" 330 'encoding' is "iso-8859-2" 331 'spelllang' is "pl" 332 333 Vim will look for: 334 1. ~/.config/nvim/spell/pl.iso-8859-2.spl 335 2. /usr/share/nvim/runtime/spell/pl.iso-8859-2.spl 336 3. ~/.config/nvim/spell/pl.iso-8859-2.add.spl 337 4. /usr/share/nvim/runtime/spell/pl.iso-8859-2.add.spl 338 5. ~/.config/nvim/after/spell/pl.iso-8859-2.add.spl 339 340 This assumes 1. is not found and 2. is found. 341 342 If 'encoding' is "latin1" Vim will look for: 343 1. ~/.config/nvim/spell/pl.latin1.spl 344 2. /usr/share/nvim/runtime/spell/pl.latin1.spl 345 3. ~/.config/nvim/after/spell/pl.latin1.spl 346 4. ~/.config/nvim/spell/pl.ascii.spl 347 5. /usr/share/nvim/runtime/spell/pl.ascii.spl 348 6. ~/.config/nvim/after/spell/pl.ascii.spl 349 350 This assumes none of them are found (Polish doesn't make sense when leaving 351 out the non-ASCII characters). 352 353 A spell file might not be available in the current 'encoding'. See 354 |spell-mkspell| about how to create a spell file. Converting a spell file 355 with "iconv" will NOT work! 356 357 *spell-sug-file* *E781* 358 If there is a file with exactly the same name as the ".spl" file but ending in 359 ".sug", that file will be used for giving better suggestions. It isn't loaded 360 before suggestions are made to reduce memory use. 361 362 *E758* *E759* *E778* *E779* *E780* *E782* 363 When loading a spell file Vim checks that it is properly formatted. If you 364 get an error the file may be truncated, modified or intended for another Vim 365 version. 366 367 368 SPELLFILE CLEANUP *spellfile-cleanup* 369 370 The |zw| command turns existing entries in 'spellfile' into comment lines. 371 This avoids having to write a new file every time, but results in the file 372 only getting longer, never shorter. To clean up the comment lines in all 373 ".add" spell files do this: > 374 :runtime spell/cleanadd.vim 375 376 This deletes all comment lines, except the ones that start with "##". Use 377 "##" lines to add comments that you want to keep. 378 379 You can invoke this script as often as you like. A variable is provided to 380 skip updating files that have been changed recently. Set it to the number of 381 seconds that has passed since a file was changed before it will be cleaned. 382 For example, to clean only files that were not changed in the last hour: > 383 let g:spell_clean_limit = 60 * 60 384 The default is one second. 385 386 387 WORDS 388 389 Vim uses a fixed method to recognize a word. This is independent of 390 'iskeyword', so that it also works in help files and for languages that 391 include characters like '-' in 'iskeyword'. The word characters do depend on 392 'encoding'. 393 394 The table with word characters is stored in the main .spl file. Therefore it 395 matters what the current locale is when generating it! A .add.spl file does 396 not contain a word table though. 397 398 For a word that starts with a digit the digit is ignored, unless the word as a 399 whole is recognized. Thus if "3D" is a word and "D" is not then "3D" is 400 recognized as a word, but if "3D" is not a word then only the "D" is marked as 401 bad. Hex numbers in the form 0x12ab and 0X12AB are recognized. 402 403 404 WORD COMBINATIONS 405 406 It is possible to spell-check words that include a space. This is used to 407 recognize words that are invalid when used by themselves, e.g. for "et al.". 408 It can also be used to recognize "the the" and highlight it. 409 410 The number of spaces is irrelevant. In most cases a line break may also 411 appear. However, this makes it difficult to find out where to start checking 412 for spelling mistakes. When you make a change to one line and only that line 413 is redrawn Vim won't look in the previous line, thus when "et" is at the end 414 of the previous line "al." will be flagged as an error. And when you type 415 "the<CR>the" the highlighting doesn't appear until the first line is redrawn. 416 Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination 417 with a line break. 418 419 When encountering a line break Vim skips characters such as "*", '>' and '"', 420 so that comments in C, shell and Vim code can be spell checked. 421 422 423 SYNTAX HIGHLIGHTING *spell-syntax* 424 425 Files that use syntax highlighting can specify where spell checking should be 426 done: 427 428 1. everywhere default 429 2. in specific items use "contains=@Spell" 430 3. everywhere but specific items use "contains=@NoSpell" 431 432 For the second method adding the @NoSpell cluster will disable spell checking 433 again. This can be used, for example, to add @Spell to the comments of a 434 program, and add @NoSpell for items that shouldn't be checked. 435 Also see |:syn-spell| for text that is not in a syntax item. 436 437 438 VIM SCRIPTS 439 440 If you want to write a Vim script that does something with spelling, you may 441 find these functions useful: 442 443 spellbadword() find badly spelled word at the cursor 444 spellsuggest() get list of spelling suggestions 445 soundfold() get the sound-a-like version of a word 446 447 448 SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto* 449 450 After the 'spelllang' option has been set successfully, Vim will source the 451 files "spell/LANG.vim" and "spell/LANG.lua" in 'runtimepath'. "LANG" is the 452 value of 'spelllang' up to the first comma, dot or underscore. This can be 453 used to set options specifically for the language, especially 'spellcapcheck'. 454 455 The distribution includes a few of these files. Use this command to see what 456 they do: > 457 :next $VIMRUNTIME/spell/*.vim 458 459 Note that the default scripts don't set 'spellcapcheck' if it was changed from 460 the default value. This assumes the user prefers another value then. 461 462 463 DOUBLE SCORING *spell-double-scoring* 464 465 The 'spellsuggest' option can be used to select "double" scoring. This 466 mechanism is based on the principle that there are two kinds of spelling 467 mistakes: 468 469 1. You know how to spell the word, but mistype something. This results in a 470 small editing distance (character swapped/omitted/inserted) and possibly a 471 word that sounds completely different. 472 473 2. You don't know how to spell the word and type something that sounds right. 474 The edit distance can be big but the word is similar after sound-folding. 475 476 Since scores for these two mistakes will be very different we use a list 477 for each and mix them. 478 479 The sound-folding is slow and people that know the language won't make the 480 second kind of mistakes. Therefore 'spellsuggest' can be set to select the 481 preferred method for scoring the suggestions. 482 483 ============================================================================== 484 3. Generating a spell file *spell-mkspell* 485 486 Vim uses a binary file format for spelling. This greatly speeds up loading 487 the word list and keeps it small. 488 *.aff* *.dic* *Myspell* 489 You can create a Vim spell file from the .aff and .dic files that Myspell 490 uses. Myspell is used by OpenOffice.org and Mozilla. The OpenOffice .oxt 491 files are zip files which contain the .aff and .dic files. You should be able 492 to find them here: 493 https://extensions.openoffice.org/en/search@f%5B0%5D%3Dfield_project_tags%253A311.html 494 The older, OpenOffice 2 files may be used if this doesn't work: 495 http://wiki.services.openoffice.org/wiki/Dictionaries 496 You can also use a plain word list. The results are the same, the choice 497 depends on what word lists you can find. 498 499 Make sure your current locale is set properly, otherwise Vim doesn't know what 500 characters are upper/lower case letters. If the locale isn't available (e.g., 501 when using an MS-Windows codepage on Unix) add tables to the .aff file 502 |spell-affix-chars|. If the .aff file doesn't define a table then the word 503 table of the currently active spelling is used. If spelling is not active 504 then Vim will try to guess. 505 506 *:mksp* *:mkspell* 507 :mksp[ell][!] [-ascii] {outname} {inname} ... 508 Generate a Vim spell file from word lists. Example: > 509 :mkspell /tmp/nl nl_NL.words 510 < *E751* 511 When {outname} ends in ".spl" it is used as the output 512 file name. Otherwise it should be a language name, 513 such as "en", without the region name. The file 514 written will be "{outname}.{encoding}.spl", where 515 {encoding} is the value of the 'encoding' option. 516 517 When the output file already exists [!] must be used 518 to overwrite it. 519 520 When the [-ascii] argument is present, words with 521 non-ascii characters are skipped. The resulting file 522 ends in "ascii.spl". 523 524 The input can be the Myspell format files {inname}.aff 525 and {inname}.dic. If {inname}.aff does not exist then 526 {inname} is used as the file name of a plain word 527 list. 528 529 Multiple {inname} arguments can be given to combine 530 regions into one Vim spell file. Example: > 531 :mkspell ~/.config/nvim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU 532 < This combines the English word lists for US, CA and AU 533 into one en.spl file. 534 Up to eight regions can be combined. *E754* *E755* 535 The REP and SAL items of the first .aff file where 536 they appear are used. |spell-REP| |spell-SAL| 537 *E845* 538 This command uses a lot of memory, required to find 539 the optimal word tree (Polish, Italian and Hungarian 540 require several hundred Mbyte). The final result will 541 be much smaller, because compression is used. To 542 avoid running out of memory compression will be done 543 now and then. This can be tuned with the 'mkspellmem' 544 option. 545 546 After the spell file was written and it was being used 547 in a buffer it will be reloaded automatically. 548 549 :mksp[ell] [-ascii] {name}.{enc}.add 550 Like ":mkspell" above, using {name}.{enc}.add as the 551 input file and producing an output file in the same 552 directory that has ".spl" appended. 553 554 :mksp[ell] [-ascii] {name} 555 Like ":mkspell" above, using {name} as the input file 556 and producing an output file in the same directory 557 that has ".{enc}.spl" appended. 558 559 Vim will report the number of duplicate words. This might be a mistake in the 560 list of words. But sometimes it is used to have different prefixes and 561 suffixes for the same basic word to avoid them combining (e.g. Czech uses 562 this). If you want Vim to report all duplicate words set the 'verbose' 563 option. 564 565 Since you might want to change a Myspell word list for use with Vim the 566 following procedure is recommended: 567 568 1. Obtain the xx_YY.aff and xx_YY.dic files from Myspell. 569 2. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic. 570 3. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing 571 words, define word characters with FOL/LOW/UPP, etc. The distributed 572 "*.diff" files can be used. 573 4. Start Vim with the right locale and use |:mkspell| to generate the Vim 574 spell file. 575 5. Try out the spell file with ":set spell spelllang=xx" if you wrote it in 576 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you 577 wrote it somewhere else. 578 579 When the Myspell files are updated you can merge the differences: 580 1. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic. 581 2. Use |diff-mode| to see what changed: > 582 nvim -d xx_YY.orig.dic xx_YY.new.dic 583 3. Take over the changes you like in xx_YY.dic. 584 You may also need to change xx_YY.aff. 585 4. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.orig.aff. 586 587 588 SPELL FILE VERSIONS *E770* *E771* *E772* 589 590 Spell checking is a relatively new feature in Vim, thus it's possible that the 591 .spl file format will be changed to support more languages. Vim will check 592 the validity of the spell file and report anything wrong. 593 594 E771: Old spell file, needs to be updated ~ 595 This spell file is older than your Vim. You need to update the .spl file. 596 597 E772: Spell file is for newer version of Vim ~ 598 This means the spell file was made for a later version of Vim. You need to 599 update Vim. 600 601 E770: Unsupported section in spell file ~ 602 This means the spell file was made for a later version of Vim and contains a 603 section that is required for the spell file to work. In this case it's 604 probably a good idea to upgrade your Vim. 605 606 607 SPELL FILE DUMP 608 609 If for some reason you want to check what words are supported by the currently 610 used spelling files, use this command: 611 612 *:spelldump* *:spelld* 613 :spelld[ump] Open a new window and fill it with all currently valid 614 words. Compound words are not included. 615 Note: For some languages the result may be enormous, 616 causing Vim to run out of memory. 617 618 :spelld[ump]! Like ":spelldump" and include the word count. This is 619 the number of times the word was found while 620 updating the screen. Words that are in COMMON items 621 get a starting count of 10. 622 623 The format of the word list is used |spell-wordlist-format|. You should be 624 able to read it with ":mkspell" to generate one .spl file that includes all 625 the words. 626 627 When all entries to 'spelllang' use the same regions or no regions at all then 628 the region information is included in the dumped words. Otherwise only words 629 for the current region are included and no "/regions" line is generated. 630 631 Comment lines with the name of the .spl file are used as a header above the 632 words that were generated from that .spl file. 633 634 635 SPELL FILE MISSING *spell-SpellFileMissing* 636 637 If a spell file is missing, the user is asked whether to download it. See 638 |spellfile.lua|. 639 640 *E797* 641 Note that the SpellFileMissing autocommand must not change or destroy the 642 buffer the user was editing. 643 644 ============================================================================== 645 4. Spell file format *spell-file-format* 646 647 This is the format of the files that are used by the person who creates and 648 maintains a word list. 649 650 Note that we avoid the word "dictionary" here. That is because the goal of 651 spell checking differs from writing a dictionary (as in the book). For 652 spelling we need a list of words that are OK, thus should not be highlighted. 653 Person and company names will not appear in a dictionary, but do appear in a 654 word list. And some old words are rarely used while they are common 655 misspellings. These do appear in a dictionary but not in a word list. 656 657 There are two formats: A straight list of words and a list using affix 658 compression. The files with affix compression are used by Myspell (Mozilla 659 and OpenOffice.org). This requires two files, one with .aff and one with .dic 660 extension. 661 662 663 FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format* 664 665 The words must appear one per line. That is all that is required. 666 667 Additionally the following items are recognized: 668 669 - Empty and blank lines are ignored. 670 671 # comment ~ 672 - Lines starting with a # are ignored (comment lines). 673 674 /encoding=utf-8 ~ 675 - A line starting with "/encoding=", before any word, specifies the encoding 676 of the file. After the second '=' comes an encoding name. This tells Vim 677 to setup conversion from the specified encoding to 'encoding'. Thus you can 678 use one word list for several target encodings. 679 680 /regions=usca ~ 681 - A line starting with "/regions=" specifies the region names that are 682 supported. Each region name must be two ASCII letters. The first one is 683 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca". 684 In an addition word list the region names should be equal to the main word 685 list! 686 687 - Other lines starting with '/' are reserved for future use. The ones that 688 are not recognized are ignored. You do get a warning message, so that you 689 know something won't work. 690 691 - A "/" may follow the word with the following items: 692 = Case must match exactly. 693 ? Rare word. 694 ! Bad (wrong) word. 695 1 to 9 A region in which the word is valid. If no regions are 696 specified the word is valid in all regions. 697 698 Example: 699 700 # This is an example word list comment 701 /encoding=latin1 encoding of the file 702 /regions=uscagb regions "us", "ca" and "gb" 703 example word for all regions 704 blah/12 word for regions "us" and "ca" 705 vim/! bad word 706 Campbell/?3 rare word in region 3 "gb" 707 's mornings/= keep-case word 708 709 Note that when "/=" is used the same word with all upper-case letters is not 710 accepted. This is different from a word with mixed case that is automatically 711 marked as keep-case, those words may appear in all upper-case letters. 712 713 714 FORMAT WITH .AFF AND .DIC FILES *aff-dic-format* 715 716 There are two files: the basic word list and an affix file. The affix file 717 specifies settings for the language and can contain affixes. The affixes are 718 used to modify the basic words to get the full word list. This significantly 719 reduces the number of words, especially for a language like Polish. This is 720 called affix compression. 721 722 The basic word list and the affix file are combined with the ":mkspell" 723 command and results in a binary spell file. All the preprocessing has been 724 done, thus this file loads fast. The binary spell file format is described in 725 the source code (src/spell.c). But only developers need to know about it. 726 727 The preprocessing also allows us to take the Myspell language files and modify 728 them before the Vim word list is made. The tools for this can be found in the 729 "src/spell" directory. 730 731 The format for the affix and word list files is based on what Myspell uses 732 (the spell checker of Mozilla and OpenOffice.org). A description can be found 733 here: 734 https://lingucomponent.openoffice.org/affix.readme 735 Note that affixes are case sensitive, this isn't obvious from the description. 736 737 Vim supports quite a few extras. They are described below |spell-affix-vim|. 738 Attempts have been made to keep this compatible with other spell checkers, so 739 that the same files can often be used. One other project that offers more 740 than Myspell is Hunspell ( https://hunspell.github.io ). 741 742 743 WORD LIST FORMAT *spell-dic-format* 744 745 A short example, with line numbers: 746 747 1 1234 ~ 748 2 aan ~ 749 3 Als ~ 750 4 Etten-Leur ~ 751 5 et al. ~ 752 6 's-Gravenhage ~ 753 7 's-Gravenhaags ~ 754 8 # word that differs between regions ~ 755 9 kado/1 ~ 756 10 cadeau/2 ~ 757 11 TCP,IP ~ 758 12 /the S affix may add a 's' ~ 759 13 bedel/S ~ 760 761 The first line contains the number of words. Vim ignores it, but you do get 762 an error message if it's not there. *E760* 763 764 What follows is one word per line. White space at the end of the line is 765 ignored, all other white space matters. The encoding is specified in the 766 affix file |spell-SET|. 767 768 Comment lines start with '#' or '/'. See the example lines 8 and 12. Note 769 that putting a comment after a word is NOT allowed: 770 771 someword # comment that causes an error! ~ 772 773 After the word there is an optional slash and flags. Most of these flags are 774 letters that indicate the affixes that can be used with this word. These are 775 specified with SFX and PFX lines in the .aff file, see |spell-SFX| and 776 |spell-PFX|. Vim allows using other flag types with the FLAG item in the 777 affix file |spell-FLAG|. 778 779 When the word only has lower-case letters it will also match with the word 780 starting with an upper-case letter. 781 782 When the word includes an upper-case letter, this means the upper-case letter 783 is required at this position. The same word with a lower-case letter at this 784 position will not match. When some of the other letters are upper-case it 785 will not match either. 786 787 The word with all upper-case characters will always be OK, 788 789 word list matches does not match ~ 790 als als Als ALS ALs AlS aLs aLS 791 Als Als ALS als ALs AlS aLs aLS 792 ALS ALS als Als ALs AlS aLs aLS 793 AlS AlS ALS als Als ALs aLs aLS 794 795 The KEEPCASE affix ID can be used to specifically match a word with identical 796 case only, see below |spell-KEEPCASE|. 797 798 Note: in line 5 to 7 non-word characters are used. You can include any 799 character in a word. When checking the text a word still only matches when it 800 appears with a non-word character before and after it. For Myspell a word 801 starting with a non-word character probably won't work. 802 803 In line 12 the word "TCP/IP" is defined. Since the slash has a special 804 meaning the comma is used instead. This is defined with the SLASH item in the 805 affix file, see |spell-SLASH|. Note that without this SLASH item the word 806 will be "TCP,IP". 807 808 809 AFFIX FILE FORMAT *spell-aff-format* *spell-affix-vim* 810 811 *spell-affix-comment* 812 Comment lines in the .aff file start with a '#': 813 814 # comment line ~ 815 816 Items with a fixed number of arguments can be followed by a comment. But only 817 if none of the arguments can contain white space. The comment must start with 818 a "#" character. Example: 819 820 KEEPCASE = # fix case for words with this flag ~ 821 822 823 ENCODING *spell-SET* 824 825 The affix file can be in any encoding that is supported by "iconv". However, 826 in some cases the current locale should also be set properly at the time 827 |:mkspell| is invoked. Adding FOL/LOW/UPP lines removes this requirement 828 |spell-FOL|. 829 830 The encoding should be specified before anything where the encoding matters. 831 The encoding applies both to the affix file and the dictionary file. It is 832 done with a SET line: 833 834 SET utf-8 ~ 835 836 The encoding can be different from the value of the 'encoding' option at the 837 time ":mkspell" is used. Vim will then convert everything to 'encoding' and 838 generate a spell file for 'encoding'. If some of the used characters to not 839 fit in 'encoding' you will get an error message. 840 *spell-affix-mbyte* 841 When using a multibyte encoding it's possible to use more different affix 842 flags. But Myspell doesn't support that, thus you may not want to use it 843 anyway. For compatibility use an 8-bit encoding. 844 845 846 INFORMATION 847 848 These entries in the affix file can be used to add information to the spell 849 file. There are no restrictions on the format, but they should be in the 850 right encoding. 851 852 *spell-NAME* *spell-VERSION* *spell-HOME* 853 *spell-AUTHOR* *spell-EMAIL* *spell-COPYRIGHT* 854 NAME Name of the language 855 VERSION 1.0.1 with fixes 856 HOME https://www.example.com 857 AUTHOR John Doe 858 EMAIL john AT Doe DOT net 859 COPYRIGHT LGPL 860 861 These fields are put in the .spl file as-is. The |:spellinfo| command can be 862 used to view the info. 863 864 *:spellinfo* *:spelli* 865 :spelli[nfo] Display the information for the spell file(s) used for 866 the current buffer. 867 868 869 CHARACTER TABLES 870 *spell-affix-chars* 871 When using an 8-bit encoding the affix file should define what characters are 872 word characters. This is because the system where ":mkspell" is used may not 873 support a locale with this encoding and isalpha() won't work. For example 874 when using "cp1250" on Unix. 875 *E761* *E762* *spell-FOL* 876 *spell-LOW* *spell-UPP* 877 Three lines in the affix file are needed. Simplistic example: 878 879 FOL áëñ ~ 880 LOW áëñ ~ 881 UPP ÁËÑ ~ 882 883 All three lines must have exactly the same number of characters. 884 885 The "FOL" line specifies the case-folded characters. These are used to 886 compare words while ignoring case. For most encodings this is identical to 887 the lower case line. 888 889 The "LOW" line specifies the characters in lower-case. Mostly it's equal to 890 the "FOL" line. 891 892 The "UPP" line specifies the characters with upper-case. That is, a character 893 is upper-case where it's different from the character at the same position in 894 "FOL". 895 896 An exception is made for the German sharp s ß. The upper-case version is 897 "SS". In the FOL/LOW/UPP lines it should be included, so that it's recognized 898 as a word character, but use the ß character in all three. 899 900 ASCII characters should be omitted, Vim always handles these in the same way. 901 When the encoding is UTF-8 no word characters need to be specified. 902 903 *E763* 904 Vim allows you to use spell checking for several languages in the same file. 905 You can list them in the 'spelllang' option. As a consequence all spell files 906 for the same encoding must use the same word characters, otherwise they can't 907 be combined without errors. 908 909 If you get an E763 warning that the word tables differ you need to update your 910 ".spl" spell files. If you downloaded the files, get the latest version of 911 all spell files you use. If you are only using one, e.g., German, then also 912 download the recent English spell files. Otherwise generate the .spl file 913 again with |:mkspell|. If you still get errors check the FOL, LOW and UPP 914 lines in the used .aff files. 915 916 The XX.ascii.spl spell file generated with the "-ascii" argument will not 917 contain the table with characters, so that it can be combine with spell files 918 for any encoding. The .add.spl files also do not contain the table. 919 920 921 MID-WORD CHARACTERS 922 *spell-midword* 923 Some characters are only to be considered word characters if they are used in 924 between two ordinary word characters. An example is the single quote: It is 925 often used to put text in quotes, thus it can't be recognized as a word 926 character, but when it appears in between word characters it must be part of 927 the word. This is needed to detect a spelling error such as they'are. That 928 should be they're, but since "they" and "are" are words themselves that would 929 go unnoticed. 930 931 These characters are defined with MIDWORD in the .aff file. Example: 932 933 MIDWORD '- ~ 934 935 936 FLAG TYPES *spell-FLAG* 937 938 Flags are used to specify the affixes that can be used with a word and for 939 other properties of the word. Normally single-character flags are used. This 940 limits the number of possible flags, especially for 8-bit encodings. The FLAG 941 item can be used if more affixes are to be used. Possible values: 942 943 FLAG long use two-character flags 944 FLAG num use numbers, from 1 up to 65000 945 FLAG caplong use one-character flags without A-Z and two-character 946 flags that start with A-Z 947 948 With "FLAG num" the numbers in a list of affixes need to be separated with a 949 comma: "234,2143,1435". This method is inefficient, but useful if the file is 950 generated with a program. 951 952 When using "caplong" the two-character flags all start with a capital: "Aa", 953 "B1", "BB", etc. This is useful to use one-character flags for the most 954 common items and two-character flags for uncommon items. 955 956 Note: When using utf-8 only characters up to 65000 may be used for flags. 957 958 Note: even when using "num" or "long" the number of flags available to 959 compounding and prefixes is limited to about 250. 960 961 962 AFFIXES *spell-PFX* *spell-SFX* 963 964 The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell 965 documentation or the Aspell manual: 966 http://aspell.net/man-html/Affix-Compression.html). 967 968 Summary: 969 SFX L Y 2 ~ 970 SFX L 0 re [^x] ~ 971 SFX L 0 ro x ~ 972 973 The first line is a header and has four fields: 974 SFX {flag} {combine} {count} 975 976 {flag} The name used for the suffix. Mostly it's a single letter, 977 but other characters can be used, see |spell-FLAG|. 978 979 {combine} Can be 'Y' or 'N'. When 'Y' then the word plus suffix can 980 also have a prefix. When 'N' then a prefix is not allowed. 981 982 {count} The number of lines following. If this is wrong you will get 983 an error message. 984 985 For PFX the fields are exactly the same. 986 987 The basic format for the following lines is: 988 SFX {flag} {strip} {add} {condition} {extra} 989 990 {flag} Must be the same as the {flag} used in the first line. 991 992 {strip} Characters removed from the basic word. There is no check if 993 the characters are actually there, only the length is used (in 994 bytes). This better match the {condition}, otherwise strange 995 things may happen. If the {strip} length is equal to or 996 longer than the basic word the suffix won't be used. 997 When {strip} is 0 (zero) then nothing is stripped. 998 999 {add} Characters added to the basic word, after removing {strip}. 1000 Optionally there is a '/' followed by flags. The flags apply 1001 to the word plus affix. See |spell-affix-flags| 1002 1003 {condition} A simplistic pattern. Only when this matches with a basic 1004 word will the suffix be used for that word. This is normally 1005 for using one suffix letter with different {add} and {strip} 1006 fields for words with different endings. 1007 When {condition} is a . (dot) there is no condition. 1008 The pattern may contain: 1009 - Literal characters. 1010 - A set of characters in []. [abc] matches a, b and c. 1011 A dash is allowed for a range [a-c], but this is 1012 Vim-specific. 1013 - A set of characters that starts with a ^, meaning the 1014 complement of the specified characters. [^abc] matches any 1015 character but a, b and c. 1016 1017 {extra} Optional extra text: 1018 # comment Comment is ignored 1019 - Hunspell uses this, ignored 1020 1021 For PFX the fields are the same, but the {strip}, {add} and {condition} apply 1022 to the start of the word. 1023 1024 Note: Myspell ignores any extra text after the relevant info. Vim requires 1025 this text to start with a "#" so that mistakes don't go unnoticed. Example: 1026 1027 SFX F 0 in [^i]n # Spion > Spionin ~ 1028 SFX F 0 nen in # Bauerin > Bauerinnen ~ 1029 1030 However, to avoid lots of errors in affix files written for Myspell, you can 1031 add the IGNOREEXTRA flag. 1032 1033 Apparently Myspell allows an affix name to appear more than once. Since this 1034 might also be a mistake, Vim checks for an extra "S". The affix files for 1035 Myspell that use this feature apparently have this flag. Example: 1036 1037 SFX a Y 1 S ~ 1038 SFX a 0 an . ~ 1039 1040 SFX a Y 2 S ~ 1041 SFX a 0 en . ~ 1042 SFX a 0 on . ~ 1043 1044 1045 AFFIX FLAGS *spell-affix-flags* 1046 1047 This is a feature that comes from Hunspell: The affix may specify flags. This 1048 works similar to flags specified on a basic word. The flags apply to the 1049 basic word plus the affix (but there are restrictions). Example: 1050 1051 SFX S Y 1 ~ 1052 SFX S 0 s . ~ 1053 1054 SFX A Y 1 ~ 1055 SFX A 0 able/S . ~ 1056 1057 When the dictionary file contains "drink/AS" then these words are possible: 1058 1059 drink 1060 drinks uses S suffix 1061 drinkable uses A suffix 1062 drinkables uses A suffix and then S suffix 1063 1064 Generally the flags of the suffix are added to the flags of the basic word, 1065 both are used for the word plus suffix. But the flags of the basic word are 1066 only used once for affixes, except that both one prefix and one suffix can be 1067 used when both support combining. 1068 1069 Specifically, the affix flags can be used for: 1070 - Suffixes on suffixes, as in the example above. This works once, thus you 1071 can have two suffixes on a word (plus one prefix). 1072 - Making the word with the affix rare, by using the |spell-RARE| flag. 1073 - Exclude the word with the affix from compounding, by using the 1074 |spell-COMPOUNDFORBIDFLAG| flag. 1075 - Allow the word with the affix to be part of a compound word on the side of 1076 the affix with the |spell-COMPOUNDPERMITFLAG|. 1077 - Use the NEEDCOMPOUND flag: word plus affix can only be used as part of a 1078 compound word. |spell-NEEDCOMPOUND| 1079 - Compound flags: word plus affix can be part of a compound word at the end, 1080 middle, start, etc. The flags are combined with the flags of the basic 1081 word. |spell-compound| 1082 - NEEDAFFIX: another affix is needed to make a valid word. 1083 - CIRCUMFIX, as explained just below. 1084 1085 1086 IGNOREEXTRA *spell-IGNOREEXTRA* 1087 1088 Normally Vim gives an error for an extra field that does not start with '#'. 1089 This avoids errors going unnoticed. However, some files created for Myspell 1090 or Hunspell may contain many entries with an extra field. Use the IGNOREEXTRA 1091 flag to avoid lots of errors. 1092 1093 1094 CIRCUMFIX *spell-CIRCUMFIX* 1095 1096 The CIRCUMFIX flag means a prefix and suffix must be added at the same time. 1097 If a prefix has the CIRCUMFIX flag then only suffixes with the CIRCUMFIX flag 1098 can be added, and the other way around. 1099 An alternative is to only specify the suffix, and give that suffix two flags: 1100 the required prefix and the NEEDAFFIX flag. |spell-NEEDAFFIX| 1101 1102 1103 PFXPOSTPONE *spell-PFXPOSTPONE* 1104 1105 When an affix file has very many prefixes that apply to many words it's not 1106 possible to build the whole word list in memory. This applies to Hebrew (a 1107 list with all words is over a Gbyte). In that case applying prefixes must be 1108 postponed. This makes spell checking slower. It is indicated by this keyword 1109 in the .aff file: 1110 1111 PFXPOSTPONE ~ 1112 1113 Only prefixes without a chop string and without flags can be postponed. 1114 Prefixes with a chop string or with flags will still be included in the word 1115 list. An exception if the chop string is one character and equal to the last 1116 character of the added string, but in lower case. Thus when the chop string 1117 is used to allow the following word to start with an upper case letter. 1118 1119 1120 WORDS WITH A SLASH *spell-SLASH* 1121 1122 The slash is used in the .dic file to separate the basic word from the affix 1123 letters and other flags. Unfortunately, this means you cannot use a slash in 1124 a word. Thus "TCP/IP" is not a word but "TCP" with the flags "IP". To 1125 include a slash in the word put a backslash before it: "TCP\/IP". In the rare 1126 case you want to use a backslash inside a word you need to use two 1127 backslashes. 1128 Any other use of the backslash is reserved for future expansion. 1129 1130 1131 KEEP-CASE WORDS *spell-KEEPCASE* 1132 1133 In the affix file a KEEPCASE line can be used to define the affix name used 1134 for keep-case words. Example: 1135 1136 KEEPCASE = ~ 1137 1138 This flag is not supported by Myspell. It has the meaning that case matters. 1139 This can be used if the word does not have the first letter in upper case at 1140 the start of a sentence. Example: 1141 1142 word list matches does not match ~ 1143 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS 1144 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens 1145 1146 The flag can also be used to avoid that the word matches when it is in all 1147 upper-case letters. 1148 1149 1150 RARE WORDS *spell-RARE* 1151 1152 In the affix file a RARE line can be used to define the affix name used for 1153 rare words. Example: 1154 1155 RARE ? ~ 1156 1157 Rare words are highlighted differently from bad words. This is to be used for 1158 words that are correct for the language, but are hardly ever used and could be 1159 a typing mistake anyway. 1160 1161 This flag can also be used on an affix, so that a basic word is not rare but 1162 the basic word plus affix is rare |spell-affix-flags|. However, if the word 1163 also appears as a good word in another way (e.g., in another region) it won't 1164 be marked as rare. 1165 1166 1167 BAD WORDS *spell-BAD* 1168 1169 In the affix file a BAD line can be used to define the affix name used for 1170 bad words. Example: 1171 1172 BAD ! ~ 1173 1174 This can be used to exclude words that would otherwise be good. For example 1175 "the the" in the .dic file: 1176 1177 the the/! ~ 1178 1179 Once a word has been marked as bad it won't be undone by encountering the same 1180 word as good. 1181 1182 The flag also applies to the word with affixes, thus this can be used to mark 1183 a whole bunch of related words as bad. 1184 1185 *spell-FORBIDDENWORD* 1186 FORBIDDENWORD can be used just like BAD. For compatibility with Hunspell. 1187 1188 *spell-NEEDAFFIX* 1189 The NEEDAFFIX flag is used to require that a word is used with an affix. The 1190 word itself is not a good word (unless there is an empty affix). Example: 1191 1192 NEEDAFFIX + ~ 1193 1194 1195 COMPOUND WORDS *spell-compound* 1196 1197 A compound word is a longer word made by concatenating words that appear in 1198 the .dic file. To specify which words may be concatenated a character is 1199 used. This character is put in the list of affixes after the word. We will 1200 call this character a flag here. Obviously these flags must be different from 1201 any affix IDs used. 1202 1203 *spell-COMPOUNDFLAG* 1204 The Myspell compatible method uses one flag, specified with COMPOUNDFLAG. All 1205 words with this flag combine in any order. This means there is no control 1206 over which word comes first. Example: 1207 COMPOUNDFLAG c ~ 1208 1209 *spell-COMPOUNDRULE* 1210 A more advanced method to specify how compound words can be formed uses 1211 multiple items with multiple flags. This is not compatible with Myspell 3.0. 1212 Let's start with an example: 1213 COMPOUNDRULE c+ ~ 1214 COMPOUNDRULE se ~ 1215 1216 The first line defines that words with the "c" flag can be concatenated in any 1217 order. The second line defines compound words that are made of one word with 1218 the "s" flag and one word with the "e" flag. With this dictionary: 1219 bork/c ~ 1220 onion/s ~ 1221 soup/e ~ 1222 1223 You can make these words: 1224 bork 1225 borkbork 1226 borkborkbork 1227 (etc.) 1228 onion 1229 soup 1230 onionsoup 1231 1232 The COMPOUNDRULE item may appear multiple times. The argument is made out of 1233 one or more groups, where each group can be: 1234 one flag e.g., c 1235 alternate flags inside [] e.g., [abc] 1236 Optionally this may be followed by: 1237 * the group appears zero or more times, e.g., sm*e 1238 + the group appears one or more times, e.g., c+ 1239 ? the group appears zero times or once, e.g., x? 1240 1241 This is similar to the regexp pattern syntax (but not the same!). A few 1242 examples with the sequence of word flags they require: 1243 COMPOUNDRULE x+ x xx xxx etc. 1244 COMPOUNDRULE yz yz 1245 COMPOUNDRULE x+z xz xxz xxxz etc. 1246 COMPOUNDRULE yx+ yx yxx yxxx etc. 1247 COMPOUNDRULE xy?z xz xyz 1248 1249 COMPOUNDRULE [abc]z az bz cz 1250 COMPOUNDRULE [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc. 1251 COMPOUNDRULE a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc. 1252 COMPOUNDRULE sm*e se sme smme smmme etc. 1253 COMPOUNDRULE s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc. 1254 1255 A specific example: Allow a compound to be made of two words and a dash: 1256 In the .aff file: 1257 COMPOUNDRULE sde ~ 1258 NEEDAFFIX x ~ 1259 COMPOUNDWORDMAX 3 ~ 1260 COMPOUNDMIN 1 ~ 1261 In the .dic file: 1262 start/s ~ 1263 end/e ~ 1264 -/xd ~ 1265 1266 This allows for the word "start-end", but not "startend". 1267 1268 An additional implied rule is that, without further flags, a word with a 1269 prefix cannot be compounded after another word, and a word with a suffix 1270 cannot be compounded with a following word. Thus the affix cannot appear 1271 on the inside of a compound word. This can be changed with the 1272 |spell-COMPOUNDPERMITFLAG|. 1273 1274 *spell-NEEDCOMPOUND* 1275 The NEEDCOMPOUND flag is used to require that a word is used as part of a 1276 compound word. The word itself is not a good word. Example: 1277 1278 NEEDCOMPOUND & ~ 1279 1280 *spell-ONLYINCOMPOUND* 1281 The ONLYINCOMPOUND does exactly the same as NEEDCOMPOUND. Supported for 1282 compatibility with Hunspell. 1283 1284 *spell-COMPOUNDMIN* 1285 The minimal character length of a word used for compounding is specified with 1286 COMPOUNDMIN. Example: 1287 COMPOUNDMIN 5 ~ 1288 1289 When omitted there is no minimal length. Obviously you could just leave out 1290 the compound flag from short words instead, this feature is present for 1291 compatibility with Myspell. 1292 1293 *spell-COMPOUNDWORDMAX* 1294 The maximum number of words that can be concatenated into a compound word is 1295 specified with COMPOUNDWORDMAX. Example: 1296 COMPOUNDWORDMAX 3 ~ 1297 1298 When omitted there is no maximum. It applies to all compound words. 1299 1300 To set a limit for words with specific flags make sure the items in 1301 COMPOUNDRULE where they appear don't allow too many words. 1302 1303 *spell-COMPOUNDSYLMAX* 1304 The maximum number of syllables that a compound word may contain is specified 1305 with COMPOUNDSYLMAX. Example: 1306 COMPOUNDSYLMAX 6 ~ 1307 1308 This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there 1309 is no limit on the number of syllables. 1310 1311 If both COMPOUNDWORDMAX and COMPOUNDSYLMAX are defined, a compound word is 1312 accepted if it fits one of the criteria, thus is either made from up to 1313 COMPOUNDWORDMAX words or contains up to COMPOUNDSYLMAX syllables. 1314 1315 *spell-COMPOUNDFORBIDFLAG* 1316 The COMPOUNDFORBIDFLAG specifies a flag that can be used on an affix. It 1317 means that the word plus affix cannot be used in a compound word. Example: 1318 affix file: 1319 COMPOUNDFLAG c ~ 1320 COMPOUNDFORBIDFLAG x ~ 1321 SFX a Y 2 ~ 1322 SFX a 0 s . ~ 1323 SFX a 0 ize/x . ~ 1324 dictionary: 1325 word/c ~ 1326 util/ac ~ 1327 1328 This allows for "wordutil" and "wordutils" but not "wordutilize". 1329 Note: this doesn't work for postponed prefixes yet. 1330 1331 *spell-COMPOUNDPERMITFLAG* 1332 The COMPOUNDPERMITFLAG specifies a flag that can be used on an affix. It 1333 means that the word plus affix can also be used in a compound word in a way 1334 where the affix ends up halfway through the word. Without this flag that is 1335 not allowed. 1336 Note: this doesn't work for postponed prefixes yet. 1337 1338 *spell-COMPOUNDROOT* 1339 The COMPOUNDROOT flag is used for words in the dictionary that are already a 1340 compound. This means it counts for two words when checking the compounding 1341 rules. Can also be used for an affix to count the affix as a compounding 1342 word. 1343 1344 *spell-CHECKCOMPOUNDPATTERN* 1345 CHECKCOMPOUNDPATTERN is used to define patterns that, when matching at the 1346 position where two words are compounded together forbids the compound. 1347 For example: 1348 CHECKCOMPOUNDPATTERN o e ~ 1349 1350 This forbids compounding if the first word ends in "o" and the second word 1351 starts with "e". 1352 1353 The arguments must be plain text, no patterns are actually supported, despite 1354 the item name. Case is always ignored. 1355 1356 The Hunspell feature to use three arguments and flags is not supported. 1357 1358 *spell-NOCOMPOUNDSUGS* 1359 This item indicates that using compounding to make suggestions is not a good 1360 idea. Use this when compounding is used with very short or one-character 1361 words. E.g. to make numbers out of digits. Without this flag creating 1362 suggestions would spend most time trying all kind of weird compound words. 1363 1364 NOCOMPOUNDSUGS ~ 1365 1366 *spell-SYLLABLE* 1367 The SYLLABLE item defines characters or character sequences that are used to 1368 count the number of syllables in a word. Example: 1369 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~ 1370 1371 Before the first slash is the set of characters that are counted for one 1372 syllable, also when repeated and mixed, until the next character that is not 1373 in this set. After the slash come sequences of characters that are counted 1374 for one syllable. These are preferred over using characters from the set. 1375 With the example "ideeen" has three syllables, counted by "i", "ee" and "e". 1376 1377 Only case-folded letters need to be included. 1378 1379 Another way to restrict compounding was mentioned above: Adding the 1380 |spell-COMPOUNDFORBIDFLAG| flag to an affix causes all words that are made 1381 with that affix to not be used for compounding. 1382 1383 1384 UNLIMITED COMPOUNDING *spell-NOBREAK* 1385 1386 For some languages, such as Thai, there is no space in between words. This 1387 looks like all words are compounded. To specify this use the NOBREAK item in 1388 the affix file, without arguments: 1389 NOBREAK ~ 1390 1391 Vim will try to figure out where one word ends and a next starts. When there 1392 are spelling mistakes this may not be quite right. 1393 1394 1395 *spell-COMMON* 1396 Common words can be specified with the COMMON item. This will give better 1397 suggestions when editing a short file. Example: 1398 1399 COMMON the of to and a in is it you that he she was for on are ~ 1400 1401 The words must be separated by white space, up to 25 per line. 1402 When multiple regions are specified in a ":mkspell" command the common words 1403 for all regions are combined and used for all regions. 1404 1405 *spell-NOSPLITSUGS* 1406 This item indicates that splitting a word to make suggestions is not a good 1407 idea. Split-word suggestions will appear only when there are few similar 1408 words. 1409 1410 NOSPLITSUGS ~ 1411 1412 *spell-NOSUGGEST* 1413 The flag specified with NOSUGGEST can be used for words that will not be 1414 suggested. Can be used for obscene words. 1415 1416 NOSUGGEST % ~ 1417 1418 1419 REPLACEMENTS *spell-REP* 1420 1421 In the affix file REP items can be used to define common mistakes. This is 1422 used to make spelling suggestions. The items define the "from" text and the 1423 "to" replacement. Example: 1424 1425 REP 4 ~ 1426 REP f ph ~ 1427 REP ph f ~ 1428 REP k ch ~ 1429 REP ch k ~ 1430 1431 The first line specifies the number of REP lines following. Vim ignores the 1432 number, but it must be there (for compatibility with Myspell). 1433 1434 Don't include simple one-character replacements or swaps. Vim will try these 1435 anyway. You can include whole words if you want to, but you might want to use 1436 the "file:" item in 'spellsuggest' instead. 1437 1438 You can include a space by using an underscore: 1439 1440 REP the_the the ~ 1441 1442 1443 SIMILAR CHARACTERS *spell-MAP* *E783* 1444 1445 In the affix file MAP items can be used to define letters that are very much 1446 alike. This is mostly used for a letter with different accents. This is used 1447 to prefer suggestions with these letters substituted. Example: 1448 1449 MAP 2 ~ 1450 MAP eéëêè ~ 1451 MAP uüùúû ~ 1452 1453 The first line specifies the number of MAP lines following. Vim ignores the 1454 number, but the line must be there. 1455 1456 Each letter must appear in only one of the MAP items. It's a bit more 1457 efficient if the first letter is ASCII or at least one without accents. 1458 1459 1460 .SUG FILE *spell-NOSUGFILE* 1461 1462 When soundfolding is specified in the affix file then ":mkspell" will normally 1463 produce a .sug file next to the .spl file. This file is used to find 1464 suggestions by their sound-a-like form quickly. At the cost of a lot of 1465 memory (the amount depends on the number of words, |:mkspell| will display an 1466 estimate when it's done). 1467 1468 To avoid producing a .sug file use this item in the affix file: 1469 1470 NOSUGFILE ~ 1471 1472 Users can simply omit the .sug file if they don't want to use it. 1473 1474 1475 SOUND-A-LIKE *spell-SAL* 1476 1477 In the affix file SAL items can be used to define the sounds-a-like mechanism 1478 to be used. The main items define the "from" text and the "to" replacement. 1479 Simplistic example: 1480 1481 SAL CIA X ~ 1482 SAL CH X ~ 1483 SAL C K ~ 1484 SAL K K ~ 1485 1486 There are a few rules and this can become quite complicated. An explanation 1487 how it works can be found in the Aspell manual: 1488 http://aspell.net/man-html/Phonetic-Code.html. 1489 1490 There are a few special items: 1491 1492 SAL followup true ~ 1493 SAL collapse_result true ~ 1494 SAL remove_accents true ~ 1495 1496 "1" has the same meaning as "true". Any other value means "false". 1497 1498 1499 SIMPLE SOUNDFOLDING *spell-SOFOFROM* *spell-SOFOTO* 1500 1501 The SAL mechanism is complex and slow. A simpler mechanism is mapping all 1502 characters to another character, mapping similar sounding characters to the 1503 same character. At the same time this does case folding. You can not have 1504 both SAL items and simple soundfolding. 1505 1506 There are two items required: one to specify the characters that are mapped 1507 and one that specifies the characters they are mapped to. They must have 1508 exactly the same number of characters. Example: 1509 1510 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~ 1511 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~ 1512 1513 In the example all vowels are mapped to the same character 'e'. Another 1514 method would be to leave out all vowels. Some characters that sound nearly 1515 the same and are often mixed up, such as 'm' and 'n', are mapped to the same 1516 character. Don't do this too much, all words will start looking alike. 1517 1518 Characters that do not appear in SOFOFROM will be left out, except that all 1519 white space is replaced by one space. Sequences of the same character in 1520 SOFOFROM are replaced by one. 1521 1522 You can use the |soundfold()| function to try out the results. Or set the 1523 'verbose' option to see the score in the output of the |z=| command. 1524 1525 1526 UNSUPPORTED ITEMS *spell-affix-not-supported* 1527 1528 These items appear in the affix file of other spell checkers. In Vim they are 1529 ignored, not supported or defined in another way. 1530 1531 ACCENT (Hunspell) *spell-ACCENT* 1532 Use MAP instead. |spell-MAP| 1533 1534 BREAK (Hunspell) *spell-BREAK* 1535 Define break points. Unclear how it works exactly. 1536 Not supported. 1537 1538 CHECKCOMPOUNDCASE (Hunspell) *spell-CHECKCOMPOUNDCASE* 1539 Disallow uppercase letters at compound word boundaries. 1540 Not supported. 1541 1542 CHECKCOMPOUNDDUP (Hunspell) *spell-CHECKCOMPOUNDDUP* 1543 Disallow using the same word twice in a compound. Not 1544 supported. 1545 1546 CHECKCOMPOUNDREP (Hunspell) *spell-CHECKCOMPOUNDREP* 1547 Something about using REP items and compound words. Not 1548 supported. 1549 1550 CHECKCOMPOUNDTRIPLE (Hunspell) *spell-CHECKCOMPOUNDTRIPLE* 1551 Forbid three identical characters when compounding. Not 1552 supported. 1553 1554 CHECKSHARPS (Hunspell) *spell-CHECKSHARPS* 1555 SS letter pair in uppercased (German) words may be upper case 1556 sharp s (ß). Not supported. 1557 1558 COMPLEXPREFIXES (Hunspell) *spell-COMPLEXPREFIXES* 1559 Enables using two prefixes. Not supported. 1560 1561 COMPOUND (Hunspell) *spell-COMPOUND* 1562 This is one line with the count of COMPOUND items, followed by 1563 that many COMPOUND lines with a pattern. 1564 Remove the first line with the count and rename the other 1565 items to COMPOUNDRULE |spell-COMPOUNDRULE| 1566 1567 COMPOUNDFIRST (Hunspell) *spell-COMPOUNDFIRST* 1568 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE| 1569 1570 COMPOUNDBEGIN (Hunspell) *spell-COMPOUNDBEGIN* 1571 Words signed with COMPOUNDBEGIN may be first elements in 1572 compound words. 1573 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE| 1574 1575 COMPOUNDLAST (Hunspell) *spell-COMPOUNDLAST* 1576 Words signed with COMPOUNDLAST may be last elements in 1577 compound words. 1578 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE| 1579 1580 COMPOUNDEND (Hunspell) *spell-COMPOUNDEND* 1581 Probably the same as COMPOUNDLAST 1582 1583 COMPOUNDMIDDLE (Hunspell) *spell-COMPOUNDMIDDLE* 1584 Words signed with COMPOUNDMIDDLE may be middle elements in 1585 compound words. 1586 Use COMPOUNDRULE instead. |spell-COMPOUNDRULE| 1587 1588 COMPOUNDRULES (Hunspell) *spell-COMPOUNDRULES* 1589 Number of COMPOUNDRULE lines following. Ignored, but the 1590 argument must be a number. 1591 1592 COMPOUNDSYLLABLE (Hunspell) *spell-COMPOUNDSYLLABLE* 1593 Use SYLLABLE and COMPOUNDSYLMAX instead. |spell-SYLLABLE| 1594 |spell-COMPOUNDSYLMAX| 1595 1596 KEY (Hunspell) *spell-KEY* 1597 Define characters that are close together on the keyboard. 1598 Used to give better suggestions. Not supported. 1599 1600 LANG (Hunspell) *spell-LANG* 1601 This specifies language-specific behavior. This actually 1602 moves part of the language knowledge into the program, 1603 therefore Vim does not support it. Each language property 1604 must be specified separately. 1605 1606 LEMMA_PRESENT (Hunspell) *spell-LEMMA_PRESENT* 1607 Only needed for morphological analysis. 1608 1609 MAXNGRAMSUGS (Hunspell) *spell-MAXNGRAMSUGS* 1610 Set number of n-gram suggestions. Not supported. 1611 1612 PSEUDOROOT (Hunspell) *spell-PSEUDOROOT* 1613 Use NEEDAFFIX instead. |spell-NEEDAFFIX| 1614 1615 SUGSWITHDOTS (Hunspell) *spell-SUGSWITHDOTS* 1616 Adds dots to suggestions. Vim doesn't need this. 1617 1618 SYLLABLENUM (Hunspell) *spell-SYLLABLENUM* 1619 Not supported. 1620 1621 TRY (Myspell, Hunspell, others) *spell-TRY* 1622 Vim does not use the TRY item, it is ignored. For making 1623 suggestions the actual characters in the words are used, that 1624 is much more efficient. 1625 1626 WORDCHARS (Hunspell) *spell-WORDCHARS* 1627 Used to recognize words. Vim doesn't need it, because there 1628 is no need to separate words before checking them (using a 1629 trie instead of a hashtable). 1630 1631 ============================================================================== 1632 5. Spell checker design *develop-spell* 1633 1634 When spell checking was going to be added to Vim a survey was done over the 1635 available spell checking libraries and programs. Unfortunately, the result 1636 was that none of them provided sufficient capabilities to be used as the spell 1637 checking engine in Vim, for various reasons: 1638 1639 - Missing support for multi-byte encodings. At least UTF-8 must be supported, 1640 so that more than one language can be used in the same file. 1641 Doing on-the-fly conversion is not always possible (would require iconv 1642 support). 1643 - For the programs and libraries: Using them as-is would require installing 1644 them separately from Vim. That's mostly not impossible, but a drawback. 1645 - Performance: A few tests showed that it's possible to check spelling on the 1646 fly (while redrawing), just like syntax highlighting. But the mechanisms 1647 used by other code are much slower. Myspell uses a hashtable, for example. 1648 The affix compression that most spell checkers use makes it slower too. 1649 - For using an external program like aspell a communication mechanism would 1650 have to be setup. That's complicated to do in a portable way (Unix-only 1651 would be relatively simple, but that's not good enough). And performance 1652 will become a problem (lots of process switching involved). 1653 - Missing support for words with non-word characters, such as "Etten-Leur" and 1654 "et al.", would require marking the pieces of them OK, lowering the 1655 reliability. 1656 - Missing support for regions or dialects. Makes it difficult to accept 1657 all English words and highlight non-Canadian words differently. 1658 - Missing support for rare words. Many words are correct but hardly ever used 1659 and could be a misspelled often-used word. 1660 - For making suggestions the speed is less important and requiring to install 1661 another program or library would be acceptable. But the word lists probably 1662 differ, the suggestions may be wrong words. 1663 1664 1665 Spelling suggestions *develop-spell-suggestions* 1666 1667 For making suggestions there are two basic mechanisms: 1668 1. Try changing the bad word a little bit and check for a match with a good 1669 word. Or go through the list of good words, change them a little bit and 1670 check for a match with the bad word. The changes are deleting a character, 1671 inserting a character, swapping two characters, etc. 1672 2. Perform soundfolding on both the bad word and the good words and then find 1673 matches, possibly with a few changes like with the first mechanism. 1674 1675 The first is good for finding typing mistakes. After experimenting with 1676 hashtables and looking at solutions from other spell checkers the conclusion 1677 was that a trie (a kind of tree structure) is ideal for this. Both for 1678 reducing memory use and being able to try sensible changes. For example, when 1679 inserting a character only characters that lead to good words need to be 1680 tried. Other mechanisms (with hashtables) need to try all possible letters at 1681 every position in the word. Also, a hashtable has the requirement that word 1682 boundaries are identified separately, while a trie does not require this. 1683 That makes the mechanism a lot simpler. 1684 1685 Soundfolding is useful when someone knows how the words sounds but doesn't 1686 know how it is spelled. For example, the word "dictionary" might be written 1687 as "daktonerie". The number of changes that the first method would need to 1688 try is very big, it's hard to find the good word that way. After soundfolding 1689 the words become "tktnr" and "tkxnry", these differ by only two letters. 1690 1691 To find words by their soundfolded equivalent (soundalike word) we need a list 1692 of all soundfolded words. A few experiments have been done to find out what 1693 the best method is. Alternatives: 1694 1. Do the sound folding on the fly when looking for suggestions. This means 1695 walking through the trie of good words, soundfolding each word and 1696 checking how different it is from the bad word. This is very efficient for 1697 memory use, but takes a long time. On a fast PC it takes a couple of 1698 seconds for English, which can be acceptable for interactive use. But for 1699 some languages it takes more than ten seconds (e.g., German, Catalan), 1700 which is unacceptable slow. For batch processing (automatic corrections) 1701 it's too slow for all languages. 1702 2. Use a trie for the soundfolded words, so that searching can be done just 1703 like how it works without soundfolding. This requires remembering a list 1704 of good words for each soundfolded word. This makes finding matches very 1705 fast but requires quite a lot of memory, in the order of 1 to 10 Mbyte. 1706 For some languages more than the original word list. 1707 3. Like the second alternative, but reduce the amount of memory by using affix 1708 compression and store only the soundfolded basic word. This is what Aspell 1709 does. Disadvantage is that affixes need to be stripped from the bad word 1710 before soundfolding it, which means that mistakes at the start and/or end 1711 of the word will cause the mechanism to fail. Also, this becomes slow when 1712 the bad word is quite different from the good word. 1713 1714 The choice made is to use the second mechanism and use a separate file. This 1715 way a user with sufficient memory can get very good suggestions while a user 1716 who is short of memory or just wants the spell checking and no suggestions 1717 doesn't use so much memory. 1718 1719 1720 Word frequency 1721 1722 For sorting suggestions it helps to know which words are common. In theory we 1723 could store a word frequency with the word in the dictionary. However, this 1724 requires storing a count per word. That degrades word tree compression a lot. 1725 And maintaining the word frequency for all languages will be a heavy task. 1726 Also, it would be nice to prefer words that are already in the text. This way 1727 the words that appear in the specific text are preferred for suggestions. 1728 1729 What has been implemented is to count words that have been seen during 1730 displaying. A hashtable is used to quickly find the word count. The count is 1731 initialized from words listed in COMMON items in the affix file, so that it 1732 also works when starting a new file. 1733 1734 This isn't ideal, because the longer Vim is running the higher the counts 1735 become. But in practice it is a noticeable improvement over not using the word 1736 count. 1737 1738 vim:tw=78:sw=4:ts=8:noet:ft=help:norl: