tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

Hira_Kana.txt (4465B)


      1 # © 2016 and later: Unicode, Inc. and others.
      2 # License & terms of use: http://www.unicode.org/copyright.html
      3 # Generated using tools/cldr/cldr-to-icu/
      4 #
      5 # File: Hira_Kana.txt
      6 # Generated from CLDR
      7 #
      8 
      9 # note: a global filter is more efficient, but MUST include all source chars
     10 :: [[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:Nonspacing_Mark:]]-[\u309B \u309C]];
     11 :: NFKC (NFC);
     12 # Hiragana-Katakana
     13 # This is largely a one-to-one mapping, but it has a
     14 # few kinks:
     15 # 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
     16 # Hiragana equivalents.  We use Hiragana wa/wi/we/wo
     17 # (308F-3092) with a voicing mark (3099), which is
     18 # semantically equivalent.  However, this is a non-
     19 # roundtripping transformation.
     20 # 2. The Katakana small ka/ke (30F5,30F6) have no
     21 # Hiragana equiavlents.  We convert them to normal
     22 # Hiragana ka/ke (304B,3051).  This is a one-way
     23 # information-losing transformation and precludes
     24 # round-tripping of 30F5 and 30F6.
     25 # 3. The combining marks 3099-309C are in the Hiragana
     26 # block, but they apply to Katakana as well, so we
     27 # leave them untouched.
     28 # 4. The Katakana prolonged sound mark 30FC doubles the
     29 # preceding vowel.  This is a one-way information-
     30 # losing transformation from Katakana to Hiragana.
     31 # 5. The Katakana middle dot separates words in foreign
     32 # expressions; we leave this unmodified.
     33 # The above points preclude successful round-trip
     34 # transformations of arbitrary input text.  However,
     35 # they provide naturalistic results that should conform
     36 # to user expectations.
     37 # Combining equivalents va/vi/ve/vo
     38 わ\u3099 ↔ ヷ;
     39 ゐ\u3099 ↔ ヸ;
     40 ゑ\u3099 ↔ ヹ;
     41 を\u3099 ↔ ヺ;
     42 # One-to-one mappings, main block
     43 # 3041:3094 ↔ 30A1:30F4
     44 # 309D,E ↔ 30FD,E
     45 ぁ ↔ ァ;
     46 あ ↔ ア;
     47 ぃ ↔ ィ;
     48 い ↔ イ;
     49 ぅ ↔ ゥ;
     50 う ↔ ウ;
     51 ぇ ↔ ェ;
     52 え ↔ エ;
     53 ぉ ↔ ォ;
     54 お ↔ オ;
     55 か ↔ カ;
     56 が ↔ ガ;
     57 き ↔ キ;
     58 ぎ ↔ ギ;
     59 く ↔ ク;
     60 ぐ ↔ グ;
     61 け ↔ ケ;
     62 げ ↔ ゲ;
     63 こ ↔ コ;
     64 ご ↔ ゴ;
     65 さ ↔ サ;
     66 ざ ↔ ザ;
     67 し ↔ シ;
     68 じ ↔ ジ;
     69 す ↔ ス;
     70 ず ↔ ズ;
     71 せ ↔ セ;
     72 ぜ ↔ ゼ;
     73 そ ↔ ソ;
     74 ぞ ↔ ゾ;
     75 た ↔ タ;
     76 だ ↔ ダ;
     77 ち ↔ チ;
     78 ぢ ↔ ヂ;
     79 っ ↔ ッ;
     80 つ ↔ ツ;
     81 づ ↔ ヅ;
     82 て ↔ テ;
     83 で ↔ デ;
     84 と ↔ ト;
     85 ど ↔ ド;
     86 な ↔ ナ;
     87 に ↔ ニ;
     88 ぬ ↔ ヌ;
     89 ね ↔ ネ;
     90 の ↔ ノ;
     91 は ↔ ハ;
     92 ば ↔ バ;
     93 ぱ ↔ パ;
     94 ひ ↔ ヒ;
     95 び ↔ ビ;
     96 ぴ ↔ ピ;
     97 ふ ↔ フ;
     98 ぶ ↔ ブ;
     99 ぷ ↔ プ;
    100 へ ↔ ヘ;
    101 べ ↔ ベ;
    102 ぺ ↔ ペ;
    103 ほ ↔ ホ;
    104 ぼ ↔ ボ;
    105 ぽ ↔ ポ;
    106 ま ↔ マ;
    107 み ↔ ミ;
    108 む ↔ ム;
    109 め ↔ メ;
    110 も ↔ モ;
    111 ゃ ↔ ャ;
    112 や ↔ ヤ;
    113 ゅ ↔ ュ;
    114 ゆ ↔ ユ;
    115 ょ ↔ ョ;
    116 よ ↔ ヨ;
    117 ら ↔ ラ;
    118 り ↔ リ;
    119 る ↔ ル;
    120 れ ↔ レ;
    121 ろ ↔ ロ;
    122 ゎ ↔ ヮ;
    123 わ ↔ ワ;
    124 ゐ ↔ ヰ;
    125 ゑ ↔ ヱ;
    126 を ↔ ヲ;
    127 ん ↔ ン;
    128 ゔ ↔ ヴ;
    129 ゝ ↔ ヽ;
    130 ゞ ↔ ヾ;
    131 # One-way Katakana-Hiragana xform of small K ka/ke to
    132 # normal H ka/ke.
    133 か ← ヵ;
    134 け ← ヶ;
    135 # Katakana followed by a prolonged sound mark 30FC has
    136 # its final vowel doubled.  This is a Katakana-Hiragana
    137 # one-way information-losing transformation.  We
    138 # include the small Katakana (e.g., small A 3041) and
    139 # do not distinguish them from their large
    140 # counterparts.  It doesn't make sense to double a
    141 # small counterpart vowel as a small Hiragana vowel, so
    142 # we don't do so.  In natural text this should never
    143 # occur anyway.  If a 30FC is seen without a preceding
    144 # vowel sound (e.g., after n 30F3) we do not change it.
    145 ### $long = ー;
    146 # The following categories are Hiragana, not Katakana
    147 # as might be expected, since by the time we get to the
    148 # 30FC, the preceding character will have already been
    149 # transformed to Hiragana.
    150 # {The following mechanically generated from the
    151 # Unicode 3.0 data:}
    152 $xa = [
    153 ぁ あ か が さ ざ
    154 た だ な は ば ぱ
    155 ま ゃ や ら ゎ わ
    156 ];
    157 $xi = [
    158 ぃ い き ぎ し じ
    159 ち ぢ に ひ び ぴ
    160 み り ゐ
    161 ];
    162 $xu = [
    163 ぅ う く ぐ す ず
    164 っ つ づ ぬ ふ ぶ
    165 ぷ む ゅ ゆ る ゔ
    166 ];
    167 $xe = [
    168 ぇ え け げ せ ぜ
    169 て で ね へ べ ぺ
    170 め れ ゑ
    171 ];
    172 $xo = [
    173 ぉ お こ ご そ ぞ
    174 と ど の ほ ぼ ぽ
    175 も ょ よ ろ を
    176 ];
    177 あ ← $xa {ー};
    178 い ← $xi {ー};
    179 う ← $xu {ー};
    180 え ← $xe {ー};
    181 お ← $xo {ー};
    182 :: NFC (NFKC) ;
    183 # note: a global filter is more efficient, but MUST include all source chars!!
    184 :: ([[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:Nonspacing_Mark:]]-[\u309B \u309C]]);
    185 # eof