neovim

Neovim text editor
git clone https://git.dasho.dev/neovim.git
Log | Files | Refs | README

mbyte.txt (29839B)


      1 *mbyte.txt*     Nvim
      2 
      3 
      4 	  VIM REFERENCE MANUAL	  by Bram Moolenaar et al.
      5 
      6 
      7 Multi-byte support				*multibyte* *multi-byte*
      8 					*Chinese* *Japanese* *Korean*
      9 This is about editing text in languages which have many characters that can
     10 not be represented using one byte (one octet).  Examples are Chinese, Japanese
     11 and Korean.  Unicode is also covered here.
     12 
     13 For an introduction to the most common features, see |usr_45.txt| in the user
     14 manual.
     15 For changing the language of messages and menus see |mlang.txt|.
     16 
     17                                      Type |gO| to see the table of contents.
     18 
     19 ==============================================================================
     20 Getting started						*mbyte-first*
     21 
     22 This is a summary of the multibyte features in Nvim.
     23 
     24 
     25 LOCALE
     26 
     27 First of all, you must make sure your current locale is set correctly.  If
     28 your system has been installed to use the language, it probably works right
     29 away.  If not, you can often make it work by setting the $LANG environment
     30 variable in your shell: >
     31 
     32 setenv LANG ja_JP.EUC
     33 
     34 Unfortunately, the name of the locale depends on your system.  Japanese might
     35 also be called "ja_JP.EUCjp" or just "ja".  To see what is currently used: >
     36 
     37 :language
     38 
     39 To change the locale inside Vim use: >
     40 
     41 :language ja_JP.EUC
     42 
     43 Vim will give an error message if this doesn't work.  This is a good way to
     44 experiment and find the locale name you want to use.  But it's always better
     45 to set the locale in the shell, so that it is used right from the start.
     46 
     47 See |mbyte-locale| for details.
     48 
     49 
     50 ENCODING
     51 
     52 Nvim always uses UTF-8 internally. Thus 'encoding' is always set to "utf-8"
     53 and cannot be changed.
     54 
     55 All the text that is used inside Vim will be in UTF-8. Not only the text in
     56 the buffers, but also in registers, variables, etc.
     57 
     58 You can edit files in different encodings than UTF-8.  Nvim will convert the
     59 file when you read it and convert it back when you write it.
     60 See 'fileencoding', 'fileencodings' and |++enc|.
     61 
     62 
     63 DISPLAY AND FONTS
     64 
     65 If you are working in a terminal (emulator) you must make sure it accepts
     66 UTF-8, the encoding which Vim is working with. Otherwise only ASCII can
     67 be displayed and edited correctly.
     68 
     69 For the GUI you must select fonts that work with UTF-8.  You can set 'guifont'
     70 and 'guifontwide'.  'guifont' is used for the single-width characters,
     71 'guifontwide' for the double-width characters. Thus the 'guifontwide' font
     72 must be exactly twice as wide as 'guifont'. Example for UTF-8: >
     73 
     74 :set guifont=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1
     75 :set guifontwide=-misc-fixed-medium-r-normal-*-18-120-100-100-c-180-iso10646-1
     76 
     77 You can also set 'guifont' alone, the Nvim GUI will try to find a matching
     78 'guifontwide' for you.
     79 
     80 
     81 INPUT
     82 
     83 There are several ways to enter multibyte characters:
     84 - Your system IME can be used.
     85 - Keymaps can be used.  See |mbyte-keymap|.
     86 
     87 The options 'iminsert', 'imsearch' and 'imcmdline' can be used to choose
     88 the different input methods or disable them temporarily.
     89 
     90 ==============================================================================
     91 Locale							*mbyte-locale*
     92 
     93 The easiest setup is when your whole system uses the locale you want to work
     94 in.  But it's also possible to set the locale for one shell you are working
     95 in, or just use a certain locale inside Vim.
     96 
     97 
     98 WHAT IS A LOCALE?					*locale*
     99 
    100 There are many languages in the world.  And there are different cultures and
    101 environments at least as many as the number of languages.  A linguistic
    102 environment corresponding to an area is called "locale".  This includes
    103 information about the used language, the charset, collating order for sorting,
    104 date format, currency format and so on.  For Vim only the language and charset
    105 really matter.
    106 
    107 You can only use a locale if your system has support for it.  Some systems
    108 have only a few locales, especially in the USA.  The language which you want
    109 to use may not be on your system.  In that case you might be able to install
    110 it as an extra package.  Check your system documentation for how to do that.
    111 
    112 The location in which the locales are installed varies from system to system.
    113 For example, "/usr/share/locale" or "/usr/lib/locale".  See your system's
    114 setlocale() man page.
    115 
    116 Looking in these directories will show you the exact name of each locale.
    117 Mostly upper/lowercase matters, thus "ja_JP.EUC" and "ja_jp.euc" are
    118 different.  Some systems have a locale.alias file, which allows translation
    119 from a short name like "nl" to the full name "nl_NL.ISO_8859-1".
    120 
    121 Note that X-windows has its own locale stuff.  And unfortunately uses locale
    122 names different from what is used elsewhere.  This is confusing!  For Vim it
    123 matters what the setlocale() function uses, which is generally NOT the
    124 X-windows stuff.  You might have to do some experiments to find out what
    125 really works.
    126 
    127 						*locale-name*
    128 The (simplified) format of |locale| name is:
    129 
    130 language
    131 or	language_territory
    132 or	language_territory.codeset
    133 
    134 Territory means the country (or part of it), codeset means the |charset|.  For
    135 example, the locale name "ja_JP.eucJP" means:
    136 ja	the language is Japanese
    137 JP	the country is Japan
    138 eucJP	the codeset is EUC-JP
    139 But it also could be "ja", "ja_JP.EUC", "ja_JP.ujis", etc.  And unfortunately,
    140 the locale name for a specific language, territory and codeset is not unified
    141 and depends on your system.
    142 
    143 Examples of locale name:
    144    charset	    language		  locale name ~
    145    GB2312	    Chinese (simplified)  zh_CN.EUC, zh_CN.GB2312
    146    Big5	    Chinese (traditional) zh_TW.BIG5, zh_TW.Big5
    147    CNS-11643	    Chinese (traditional) zh_TW
    148    EUC-JP	    Japanese		  ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP
    149    Shift_JIS	    Japanese		  ja_JP.SJIS, ja_JP.Shift_JIS
    150    EUC-KR	    Korean		  ko, ko_KR.EUC
    151 
    152 
    153 USING A LOCALE
    154 
    155 To start using a locale for the whole system, see the documentation of your
    156 system.  Mostly you need to set it in a configuration file in "/etc".
    157 
    158 To use a locale in a shell, set the $LANG environment value.  When you want to
    159 use Korean and the |locale| name is "ko", do this:
    160 
    161    sh:    export LANG=ko
    162    csh:   setenv LANG ko
    163 
    164 You can put this in your ~/.profile or ~/.cshrc file to always use it.
    165 
    166 To use a locale in Vim only, use the |:language| command: >
    167 
    168 :language ko
    169 
    170 Put this in your |init.vim| file to use it always.
    171 
    172 Or specify $LANG when starting Vim:
    173 
    174   sh:    LANG=ko vim {vim-arguments}
    175   csh:	  env LANG=ko vim {vim-arguments}
    176 
    177 You could make a small shell script for this.
    178 
    179 ==============================================================================
    180 Encoding				*mbyte-encoding*
    181 
    182 UTF-8 is always used internally to encode characters. This applies to all the
    183 places where text is used, including buffers (files loaded into memory),
    184 registers and variables.
    185 
    186 						*charset* *codeset*
    187 Charset is another name for encoding.  There are subtle differences, but these
    188 don't matter when using Vim.  "codeset" is another similar name.
    189 
    190 Each character is encoded as one or more bytes.  When all characters are
    191 encoded with one byte, we call this a single-byte encoding.  The most often
    192 used one is called "latin1".  This limits the number of characters to 256.
    193 Some of these are control characters, thus even fewer can be used for text.
    194 
    195 When some characters use two or more bytes, we call this a multibyte
    196 encoding.  This allows using much more than 256 characters, which is required
    197 for most East Asian languages.
    198 
    199 Most multibyte encodings use one byte for the first 127 characters.  These
    200 are equal to ASCII, which makes it easy to exchange plain-ASCII text, no
    201 matter what language is used.  Thus you might see the right text even when the
    202 encoding was set wrong.
    203 
    204 						*encoding-names*
    205 Vim can edit files in different character encodings.  There are three major groups:
    206 
    207 1   8bit	Single-byte encodings, 256 different characters.  Mostly used
    208 	in USA and Europe.  Example: ISO-8859-1 (Latin1).  All
    209 	characters occupy one screen cell only.
    210 
    211 2   2byte	Double-byte encodings, over 10000 different characters.
    212 	Mostly used in Asian countries.  Example: euc-kr (Korean)
    213 	The number of screen cells is equal to the number of bytes
    214 	(except for euc-jp when the first byte is 0x8e).
    215 
    216 u   Unicode	Universal encoding, can replace all others.  ISO 10646.
    217 	Millions of different characters.  Example: UTF-8.  The
    218 	relation between bytes and screen cells is complex.
    219 
    220 Only UTF-8 is used by Vim internally.  But files in other
    221 encodings can be edited by using conversion, see 'fileencoding'.
    222 
    223 Recognized 'fileencoding' values include:		*encoding-values*
    224 1   latin1	8-bit characters (ISO 8859-1, also used for cp1252)
    225 1   iso-8859-n	ISO_8859 variant (n = 2 to 15)
    226 1   koi8-r	Russian
    227 1   koi8-u	Ukrainian
    228 1   macroman    MacRoman (Macintosh encoding)
    229 1   8bit-{name} any 8-bit encoding (Vim specific name)
    230 1   cp437	similar to iso-8859-1
    231 1   cp737	similar to iso-8859-7
    232 1   cp775	Baltic
    233 1   cp850	similar to iso-8859-4
    234 1   cp852	similar to iso-8859-1
    235 1   cp855	similar to iso-8859-2
    236 1   cp857	similar to iso-8859-5
    237 1   cp860	similar to iso-8859-9
    238 1   cp861	similar to iso-8859-1
    239 1   cp862	similar to iso-8859-1
    240 1   cp863	similar to iso-8859-8
    241 1   cp865	similar to iso-8859-1
    242 1   cp866	similar to iso-8859-5
    243 1   cp869	similar to iso-8859-7
    244 1   cp874	Thai
    245 1   cp1250	Czech, Polish, etc.
    246 1   cp1251	Cyrillic
    247 1   cp1253	Greek
    248 1   cp1254	Turkish
    249 1   cp1255	Hebrew
    250 1   cp1256	Arabic
    251 1   cp1257	Baltic
    252 1   cp1258	Vietnamese
    253 1   cp{number}	MS-Windows: any installed single-byte codepage
    254 2   cp932	Japanese (Windows only)
    255 2   euc-jp	Japanese
    256 2   sjis	Japanese
    257 2   cp949	Korean
    258 2   euc-kr	Korean
    259 2   cp936	simplified Chinese (Windows only)
    260 2   euc-cn	simplified Chinese
    261 2   cp950	traditional Chinese (alias for big5)
    262 2   big5	traditional Chinese (alias for cp950)
    263 2   euc-tw	traditional Chinese
    264 2   2byte-{name} any double-byte encoding (Vim-specific name)
    265 2   cp{number}	MS-Windows: any installed double-byte codepage
    266 u   utf-8	32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1)
    267 u   ucs-2	16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1)
    268 u   ucs-2le	like ucs-2, little endian
    269 u   utf-16	ucs-2 extended with double-words for more characters
    270 u   utf-16le	like utf-16, little endian
    271 u   ucs-4	32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
    272 u   ucs-4le	like ucs-4, little endian
    273 
    274 The {name} can be any encoding name that your system supports.  It is passed
    275 to iconv() to convert between UTF-8 and the encoding of the file.
    276 For MS-Windows "cp{number}" means using codepage {number}.
    277 Examples: >
    278 	:set fileencoding=8bit-cp1252
    279 	:set fileencoding=2byte-cp932
    280 
    281 The MS-Windows codepage 1252 is very similar to latin1.  For practical reasons
    282 the same encoding is used and it's called latin1.  'isprint' can be used to
    283 display the characters 0x80 - 0xA0 or not.
    284 
    285 Several aliases can be used, they are translated to one of the names above.
    286 Incomplete list:
    287 
    288 1   ansi	same as latin1 (obsolete, for backward compatibility)
    289 2   japan	Japanese: "euc-jp"
    290 2   korea	Korean: "euc-kr"
    291 2   prc		simplified Chinese: "euc-cn"
    292 2   chinese     same as "prc"
    293 2   taiwan	traditional Chinese: "euc-tw"
    294 u   utf8	same as utf-8
    295 u   unicode	same as ucs-2
    296 u   ucs2be	same as ucs-2 (big endian)
    297 u   ucs-2be	same as ucs-2 (big endian)
    298 u   ucs-4be	same as ucs-4 (big endian)
    299 u   utf-32	same as ucs-4
    300 u   utf-32le	same as ucs-4le
    301    default     the encoding of the current locale.
    302 
    303 For the UCS codes the byte order matters.  This is tricky, use UTF-8 whenever
    304 you can.  The default is to use big-endian (most significant byte comes
    305 first):
    306     name	bytes		char ~
    307     ucs-2	      11 22	    1122
    308     ucs-2le	      22 11	    1122
    309     ucs-4	11 22 33 44	11223344
    310     ucs-4le	44 33 22 11	11223344
    311 
    312 On MS-Windows systems you often want to use "ucs-2le", because it uses little
    313 endian UCS-2.
    314 
    315 There are a few encodings which are similar, but not exactly the same.  Vim
    316 treats them as if they were different encodings, so that conversion will be
    317 done when needed.  You might want to use the similar name to avoid conversion
    318 or when conversion is not possible:
    319 
    320 cp932, shift-jis, sjis
    321 cp936, euc-cn
    322 
    323 CONVERSION						*charset-conversion*
    324 
    325 Vim will automatically convert from one to another encoding in several places:
    326 - When reading a file and 'fileencoding' is different from "utf-8"
    327 - When writing a file and 'fileencoding' is different from "utf-8"
    328 - When displaying messages and the encoding used for LC_MESSAGES differs from
    329  "utf-8" (requires a gettext version that supports this).
    330 - When reading a Vim script where |:scriptencoding| is different from
    331  "utf-8".
    332 Most of these require iconv.  Conversion for reading and writing files may
    333 also be specified with the 'charconvert' option.
    334 
    335 Useful utilities for converting the charset:
    336    All:	    iconv
    337 GNU iconv can convert most encodings.  Unicode is used as the
    338 intermediate encoding, which allows conversion from and to all other
    339 encodings.  See https://directory.fsf.org/wiki/Libiconv.
    340 
    341 
    342 						*mbyte-conversion*
    343 When reading and writing files in an encoding different from "utf-8",
    344 conversion needs to be done.  These conversions are supported:
    345 - All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are
    346  handled internally.
    347 - For MS-Windows, conversion from and
    348  to any codepage should work.
    349 - Conversion specified with 'charconvert'
    350 - Conversion with the iconv library, if it is available.
    351 Old versions of GNU iconv() may cause the conversion to fail (they
    352 request a very large buffer, more than Vim is willing to provide).
    353 Try getting another iconv() implementation.
    354 
    355 ==============================================================================
    356 Input with a keymap					*mbyte-keymap*
    357 
    358 When the keyboard doesn't produce the characters you want to enter in your
    359 text, you can use the 'keymap' option.  This will translate one or more
    360 (English) characters to another (non-English) character.  This only happens
    361 when typing text, not when typing Vim commands.  This avoids having to switch
    362 between two keyboard settings.
    363 
    364 The value of the 'keymap' option specifies a keymap file to use.  The name of
    365 this file is one of these two:
    366 
    367 keymap/{keymap}_utf-8.vim
    368 keymap/{keymap}.vim
    369 
    370 Here {keymap} is the value of the 'keymap' option.
    371 The file name with "utf-8" included is tried first.
    372 
    373 'runtimepath' is used to find these files.  To see an overview of all
    374 available keymap files, use this: >
    375 :echo globpath(&rtp, "keymap/*.vim")
    376 
    377 In Insert and Command-line mode you can use CTRL-^ to toggle between using the
    378 keyboard map or not. |i_CTRL-^| |c_CTRL-^|
    379 This flag is remembered for Insert mode with the 'iminsert' option.  When
    380 leaving and entering Insert mode the previous value is used.  The same value
    381 is also used for commands that take a single character argument, like |f| and
    382 |r|.
    383 For Command-line mode the flag is NOT remembered.  You are expected to type an
    384 Ex command first, which is ASCII.
    385 For typing search patterns the 'imsearch' option is used.  It can be set to
    386 use the same value as for 'iminsert'.
    387 							*lCursor*
    388 It is possible to give the GUI cursor another color when the language mappings
    389 are being used.  This is disabled by default, to avoid that the cursor becomes
    390 invisible when you use a non-standard background color.  Here is an example to
    391 use a brightly colored cursor: >
    392 :highlight Cursor guifg=NONE guibg=Green
    393 :highlight lCursor guifg=NONE guibg=Cyan
    394 <
    395 	*keymap-file-format* *:loadk* *:loadkeymap* *E105* *E791*
    396 The keymap file looks something like this: >
    397 
    398 " Maintainer:	name <email@address>
    399 " Last Changed:	2001 Jan 1
    400 
    401 let b:keymap_name = "short"
    402 
    403 loadkeymap
    404 a	A
    405 b	B	comment
    406 
    407 The lines starting with a " are comments and will be ignored.  Blank lines are
    408 also ignored.  The lines with the mappings may have a comment after the useful
    409 text.
    410 
    411 The "b:keymap_name" can be set to a short name, which will be shown in the
    412 status line.  The idea is that this takes less room than the value of
    413 'keymap', which might be long to distinguish between different languages,
    414 keyboards and encodings.
    415 
    416 The actual mappings are in the lines below "loadkeymap".  In the example "a"
    417 is mapped to "A" and "b" to "B".  Thus the first item is mapped to the second
    418 item.  This is done for each line, until the end of the file.
    419 These items are exactly the same as what can be used in a |:lmap| command,
    420 using "<buffer>" to make the mappings local to the buffer.
    421 You can check the result with this command: >
    422 :lmap
    423 The two items must be separated by white space.  You cannot include white
    424 space inside an item, use the special names "<Tab>" and "<Space>" instead.
    425 The length of the two items together must not exceed 200 bytes.
    426 
    427 It's possible to have more than one character in the first column.  This works
    428 like a dead key.  Example: >
    429 'a	á
    430 Since Vim doesn't know if the next character after a quote is really an "a",
    431 it will wait for the next character.  To be able to insert a single quote,
    432 also add this line: >
    433 ''	'
    434 Since the mapping is defined with |:lmap| the resulting quote will not be
    435 used for the start of another character defined in the 'keymap'.
    436 It can be used in a standard |:imap| mapping.
    437 The "accents" keymap uses this.				*keymap-accents*
    438 
    439 The first column can also be in |<>| form:
    440 <C-c>		Ctrl-C
    441 <A-c>		Alt-c
    442 <A-C>		Alt-C
    443 Note that the Alt mappings may not work, depending on your keyboard and
    444 terminal.
    445 
    446 Although it's possible to have more than one character in the second column,
    447 this is unusual.  But you can use various ways to specify the character: >
    448 A	a		literal character
    449 A	<char-97>	decimal value
    450 A	<char-0x61>	hexadecimal value
    451 A	<char-0141>	octal value
    452 x	<Space>		special key name
    453 
    454 The characters are assumed to be encoded in UTF-8.
    455 It's possible to use ":scriptencoding" when all characters are given
    456 literally.  That doesn't work when using the <char-> construct, because the
    457 conversion is done on the keymap file, not on the resulting character.
    458 
    459 The lines after "loadkeymap" are interpreted with 'cpoptions' set to "C".
    460 This means that continuation lines are not used and a backslash has a special
    461 meaning in the mappings.  Examples: >
    462 
    463 " a comment line
    464 \"	x	maps " to x
    465 \\	y	maps \ to y
    466 
    467 If you write a keymap file that will be useful for others, consider submitting
    468 it to the Vim maintainer for inclusion in the distribution:
    469 <maintainer@vim.org>
    470 
    471 
    472 HEBREW KEYMAP						*keymap-hebrew*
    473 
    474 This file explains what characters are available in UTF-8 and CP1255
    475 encodings, and what the keymaps are to get those characters:
    476 
    477 glyph   encoding	   keymap ~
    478 Char UTF-8 cp1255  hebrew  hebrewp  name ~
    479 א    0x5d0  0xe0     t	      a     alef
    480 ב    0x5d1  0xe1     c	      b     bet
    481 ג    0x5d2  0xe2     d	      g     gimel
    482 ד    0x5d3  0xe3     s	      d     dalet
    483 ה    0x5d4  0xe4     v	      h     he
    484 ו    0x5d5  0xe5     u	      v     vav
    485 ז    0x5d6  0xe6     z	      z     zayin
    486 ח    0x5d7  0xe7     j	      j     het
    487 ט    0x5d8  0xe8     y	      T     tet
    488 י    0x5d9  0xe9     h	      y     yod
    489 ך    0x5da  0xea     l	      K     kaf sofit
    490 כ    0x5db  0xeb     f	      k     kaf
    491 ל    0x5dc  0xec     k	      l     lamed
    492 ם    0x5dd  0xed     o	      M     mem sofit
    493 מ    0x5de  0xee     n	      m     mem
    494 ן    0x5df  0xef     i	      N     nun sofit
    495 נ    0x5e0  0xf0     b	      n     nun
    496 ס    0x5e1  0xf1     x	      s     samech
    497 ע    0x5e2  0xf2     g	      u     ayin
    498 ף    0x5e3  0xf3     ;	      P     pe sofit
    499 פ    0x5e4  0xf4     p	      p     pe
    500 ץ    0x5e5  0xf5     .	      X     tsadi sofit
    501 צ    0x5e6  0xf6     m	      x     tsadi
    502 ק    0x5e7  0xf7     e	      q     qof
    503 ר    0x5e8  0xf8     r	      r     resh
    504 ש    0x5e9  0xf9     a	      w     shin
    505 ת    0x5ea  0xfa     ,	      t     tav
    506 
    507 Vowel marks and special punctuation:
    508 הְ    0x5b0  0xc0     A:      A:   sheva
    509 הֱ    0x5b1  0xc1     HE      HE   hataf segol
    510 הֲ    0x5b2  0xc2     HA      HA   hataf patah
    511 הֳ    0x5b3  0xc3     HO      HO   hataf qamats
    512 הִ    0x5b4  0xc4     I       I    hiriq
    513 הֵ    0x5b5  0xc5     AY      AY   tsere
    514 הֶ    0x5b6  0xc6     E       E    segol
    515 הַ    0x5b7  0xc7     AA      AA   patah
    516 הָ    0x5b8  0xc8     AO      AO   qamats
    517 הֹ    0x5b9  0xc9     O       O    holam
    518 הֻ    0x5bb  0xcb     U       U    qubuts
    519 כּ    0x5bc  0xcc     D       D    dagesh
    520 הֽ    0x5bd  0xcd     ]T      ]T   meteg
    521 ה־   0x5be  0xce     ]Q      ]Q   maqaf
    522 בֿ    0x5bf  0xcf     ]R      ]R   rafe
    523 ב׀   0x5c0  0xd0     ]p      ]p   paseq
    524 שׁ    0x5c1  0xd1     SR      SR   shin-dot
    525 שׂ    0x5c2  0xd2     SL      SL   sin-dot
    526 ׃    0x5c3  0xd3     ]P      ]P   sof-pasuq
    527 װ    0x5f0  0xd4     VV      VV   double-vav
    528 ױ    0x5f1  0xd5     VY      VY   vav-yod
    529 ײ    0x5f2  0xd6     YY      YY   yod-yod
    530 
    531 The following are only available in UTF-8
    532 
    533 Cantillation marks:
    534 glyph
    535 Char UTF-8 hebrew name
    536 ב֑    0x591   C:   etnahta
    537 ב֒    0x592   Cs   segol
    538 ב֓    0x593   CS   shalshelet
    539 ב֔    0x594   Cz   zaqef qatan
    540 ב֕    0x595   CZ   zaqef gadol
    541 ב֖    0x596   Ct   tipeha
    542 ב֗    0x597   Cr   revia
    543 ב֘    0x598   Cq   zarqa
    544 ב֙    0x599   Cp   pashta
    545 ב֚    0x59a   C!   yetiv
    546 ב֛    0x59b   Cv   tevir
    547 ב֜    0x59c   Cg   geresh
    548 ב֝    0x59d   C*   geresh qadim
    549 ב֞    0x59e   CG   gershayim
    550 ב֟    0x59f   CP   qarnei-parah
    551 ב֪    0x5aa   Cy   yerach-ben-yomo
    552 ב֫    0x5ab   Co   ole
    553 ב֬    0x5ac   Ci   iluy
    554 ב֭    0x5ad   Cd   dehi
    555 ב֮    0x5ae   Cn   zinor
    556 ב֯    0x5af   CC   masora circle
    557 
    558 Combining forms:
    559 ﬠ    0xfb20  X`   Alternative ayin
    560 ﬡ    0xfb21  X'   Alternative alef
    561 ﬢ    0xfb22  X-d  Alternative dalet
    562 ﬣ    0xfb23  X-h  Alternative he
    563 ﬤ    0xfb24  X-k  Alternative kaf
    564 ﬥ    0xfb25  X-l  Alternative lamed
    565 ﬦ    0xfb26  X-m  Alternative mem-sofit
    566 ﬧ    0xfb27  X-r  Alternative resh
    567 ﬨ    0xfb28  X-t  Alternative tav
    568 ﬩    0xfb29  X-+  Alternative plus
    569 שׁ    0xfb2a  XW   shin+shin-dot
    570 שׂ    0xfb2b  Xw   shin+sin-dot
    571 שּׁ    0xfb2c  X..W  shin+shin-dot+dagesh
    572 שּׂ    0xfb2d  X..w  shin+sin-dot+dagesh
    573 אַ    0xfb2e  XA   alef+patah
    574 אָ    0xfb2f  XO   alef+qamats
    575 אּ    0xfb30  XI   alef+hiriq (mapiq)
    576 בּ    0xfb31  X.b  bet+dagesh
    577 גּ    0xfb32  X.g  gimel+dagesh
    578 דּ    0xfb33  X.d  dalet+dagesh
    579 הּ    0xfb34  X.h  he+dagesh
    580 וּ    0xfb35  Xu  vav+dagesh
    581 זּ    0xfb36  X.z  zayin+dagesh
    582 טּ    0xfb38  X.T  tet+dagesh
    583 יּ    0xfb39  X.y  yud+dagesh
    584 ךּ    0xfb3a  X.K  kaf sofit+dagesh
    585 כּ    0xfb3b  X.k  kaf+dagesh
    586 לּ    0xfb3c  X.l  lamed+dagesh
    587 מּ    0xfb3e  X.m  mem+dagesh
    588 נּ    0xfb40  X.n  nun+dagesh
    589 סּ    0xfb41  X.s  samech+dagesh
    590 ףּ    0xfb43  X.P  pe sofit+dagesh
    591 פּ    0xfb44  X.p  pe+dagesh
    592 צּ    0xfb46  X.x  tsadi+dagesh
    593 קּ    0xfb47  X.q  qof+dagesh
    594 רּ    0xfb48  X.r  resh+dagesh
    595 שּ    0xfb49  X.w  shin+dagesh
    596 תּ    0xfb4a  X.t  tav+dagesh
    597 וֹ    0xfb4b  Xo   vav+holam
    598 בֿ    0xfb4c  XRb  bet+rafe
    599 כֿ    0xfb4d  XRk  kaf+rafe
    600 פֿ    0xfb4e  XRp  pe+rafe
    601 ﭏ    0xfb4f  Xal  alef-lamed
    602 
    603 ==============================================================================
    604 Using UTF-8				*mbyte-utf8* *UTF-8* *utf-8* *utf8*
    605 						*Unicode* *unicode*
    606 The Unicode character set was designed to include all characters from other
    607 character sets.  Therefore it is possible to write text in (almost) any
    608 language using Unicode.  And it's mostly possible to mix these languages in
    609 one file, which is impossible with other encodings.
    610 
    611 Unicode can be encoded in several ways.  The most popular one is UTF-8, which
    612 uses one or more bytes for each character and is backwards compatible with
    613 ASCII.  On MS-Windows UTF-16 is also used (previously UCS-2), which uses
    614 16-bit words.  Nvim supports all of these encodings, but always uses UTF-8
    615 internally.
    616 
    617 Nvim supports double-width characters; works best with 'guifontwide'.  When
    618 using only 'guifont' the wide characters are drawn in the normal width and
    619 a space to fill the gap.
    620 
    621 EMOJI							*emoji*
    622 
    623 You can list emoji characters using this script: >vim
    624    :source $VIMRUNTIME/scripts/emoji_list.lua
    625 <
    626 						*bom-bytes*
    627 When reading a file a BOM (Byte Order Mark) can be used to recognize the
    628 Unicode encoding:
    629 EF BB BF     UTF-8
    630 FE FF        UTF-16 big endian
    631 FF FE        UTF-16 little endian
    632 00 00 FE FF  UTF-32 big endian
    633 FF FE 00 00  UTF-32 little endian
    634 
    635 UTF-8 is the recommended encoding.  Note that it's difficult to tell UTF-16
    636 and UTF-32 apart.  UTF-16 is often used on MS-Windows, UTF-32 is not
    637 widespread as file format.
    638 
    639 
    640 				*mbyte-combining* *mbyte-composing*
    641 A composing or combining character is used to change the meaning of the
    642 character before it.  The combining characters are drawn on top of the
    643 preceding character.
    644 
    645 Nvim largely follows the definition of extended grapheme clusters in UAX#29
    646 in the Unicode standard, with some modifications: An ascii char will always
    647 start a new cluster. In addition 'arabicshape' enables the combining of some
    648 arabic letters, when they are shaped to be displayed together in a single cell.
    649 
    650 Too big combined characters cannot be displayed, but they can still be
    651 inspected using the |g8| and |ga| commands described below.
    652 When editing text a composing character is mostly considered part of the
    653 preceding character.  For example "x" will delete a character and its
    654 following composing characters by default.
    655 If the 'delcombine' option is on, then pressing 'x' will delete the combining
    656 characters, one at a time, then the base character.  But when inserting, you
    657 type the first character and the following composing characters separately,
    658 after which they will be joined.  The "r" command will not allow you to type a
    659 combining character, because it doesn't know one is coming.  Use "R" instead.
    660 
    661 Bytes which are not part of a valid UTF-8 byte sequence are handled like a
    662 single character and displayed as <xx>, where "xx" is the hex value of the
    663 byte.
    664 
    665 Overlong sequences are not handled specially and displayed like a valid
    666 character.  However, search patterns may not match on an overlong sequence.
    667 (an overlong sequence is where more bytes are used than required for the
    668 character.)  An exception is NUL (zero) which is displayed as "<00>".
    669 
    670 In the file and buffer the full range of Unicode characters can be used (31
    671 bits).  However, displaying only works for the characters present in the
    672 selected font.
    673 
    674 Useful commands:
    675 - "ga" shows the decimal, hexadecimal and octal value of the character under
    676  the cursor.  If there are composing characters these are shown too.  (If the
    677  message is truncated, use ":messages").
    678 - "g8" shows the bytes used in a UTF-8 character, also the composing
    679  characters, as hex numbers.
    680 - ":set fileencodings=" forces using UTF-8 for all files.  The
    681  default is to automatically detect the encoding of a file.
    682 
    683 
    684 STARTING VIM
    685 
    686 You might want to select the font used for the menus.  Unfortunately this
    687 doesn't always work.  See the system specific remarks below, and 'langmenu'.
    688 
    689 
    690 USING UTF-8 IN X-WINDOWS				*utf-8-in-xwindows*
    691 
    692 You need to specify a font to be used.  For double-wide characters another
    693 font is required, which is exactly twice as wide.  There are two ways to do
    694 this:
    695 
    696 1. Set 'guifont' and let Nvim find a matching 'guifontwide'
    697 2. Set 'guifont' and 'guifontwide'
    698 
    699 See the documentation for each option for details.  Example: >
    700 
    701   :set guifont=-misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1
    702 
    703 You might also want to set the font used for the menus.  This only works for
    704 Motif.  Use the ":hi Menu font={fontname}" command for this. |:highlight|
    705 
    706 
    707 TYPING UTF-8						*utf-8-typing*
    708 
    709 If you are using X-Windows, you should find an input method that supports
    710 UTF-8.
    711 
    712 If your system does not provide support for typing UTF-8, you can use the
    713 'keymap' feature.  This allows writing a keymap file, which defines a UTF-8
    714 character as a sequence of ASCII characters.  See |mbyte-keymap|.
    715 
    716 If everything else fails, you can type any character as four hex bytes: >
    717 
    718 CTRL-V u 1234
    719 
    720 "1234" is interpreted as a hex number.  You must type four characters, prepend
    721 a zero if necessary.
    722 
    723 
    724 COMMAND ARGUMENTS					*utf-8-char-arg*
    725 
    726 Commands like |f|, |F|, |t| and |r| take an argument of one character.  For
    727 UTF-8 this argument may include one or two composing characters.  These need
    728 to be produced together with the base character, Nvim doesn't wait for the next
    729 character to be typed to find out if it is a composing character or not.
    730 Using 'keymap' or |:lmap| is a nice way to type these characters.
    731 
    732 The commands that search for a character in a line handle composing characters
    733 as follows.  When searching for a character without a composing character,
    734 this will find matches in the text with or without composing characters.  When
    735 searching for a character with a composing character, this will only find
    736 matches with that composing character.  It was implemented this way, because
    737 not everybody is able to type a composing character.
    738 
    739 ==============================================================================
    740 Overview of options					*mbyte-options*
    741 
    742 These options are relevant for editing multibyte files.
    743 
    744 'fileencoding'	Encoding of a file.  When it's different from "utf-8"
    745 	conversion is done when reading or writing the file.
    746 
    747 'fileencodings'	List of possible encodings of a file.  When opening a file
    748 	these will be tried and the first one that doesn't cause an
    749 	error is used for 'fileencoding'.
    750 
    751 'charconvert'	Expression used to convert files from one encoding to another.
    752 
    753 'formatoptions' The 'm' flag can be included to have formatting break a line
    754 	at a multibyte character of 256 or higher.  Thus is useful for
    755 	languages where a sequence of characters can be broken
    756 	anywhere.
    757 
    758 'keymap'	Specify the name of a keyboard mapping.
    759 
    760 ==============================================================================
    761 
    762 Contributions specifically for the multibyte features by:
    763 Chi-Deok Hwang <hwang@mizi.co.kr>
    764 SungHyun Nam <goweol@gmail.com>
    765 K.Nagano <nagano@atese.advantest.co.jp>
    766 Taro Muraoka  <koron@tka.att.ne.jp>
    767 Yasuhiro Matsumoto <mattn@mail.goo.ne.jp>
    768 
    769 vim:tw=78:ts=8:noet:ft=help:norl: