Hira_Kana.txt (4465B)
1 # © 2016 and later: Unicode, Inc. and others. 2 # License & terms of use: http://www.unicode.org/copyright.html 3 # Generated using tools/cldr/cldr-to-icu/ 4 # 5 # File: Hira_Kana.txt 6 # Generated from CLDR 7 # 8 9 # note: a global filter is more efficient, but MUST include all source chars 10 :: [[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:Nonspacing_Mark:]]-[\u309B \u309C]]; 11 :: NFKC (NFC); 12 # Hiragana-Katakana 13 # This is largely a one-to-one mapping, but it has a 14 # few kinks: 15 # 1. The Katakana va/vi/ve/vo (30F7-30FA) have no 16 # Hiragana equivalents. We use Hiragana wa/wi/we/wo 17 # (308F-3092) with a voicing mark (3099), which is 18 # semantically equivalent. However, this is a non- 19 # roundtripping transformation. 20 # 2. The Katakana small ka/ke (30F5,30F6) have no 21 # Hiragana equiavlents. We convert them to normal 22 # Hiragana ka/ke (304B,3051). This is a one-way 23 # information-losing transformation and precludes 24 # round-tripping of 30F5 and 30F6. 25 # 3. The combining marks 3099-309C are in the Hiragana 26 # block, but they apply to Katakana as well, so we 27 # leave them untouched. 28 # 4. The Katakana prolonged sound mark 30FC doubles the 29 # preceding vowel. This is a one-way information- 30 # losing transformation from Katakana to Hiragana. 31 # 5. The Katakana middle dot separates words in foreign 32 # expressions; we leave this unmodified. 33 # The above points preclude successful round-trip 34 # transformations of arbitrary input text. However, 35 # they provide naturalistic results that should conform 36 # to user expectations. 37 # Combining equivalents va/vi/ve/vo 38 わ\u3099 ↔ ヷ; 39 ゐ\u3099 ↔ ヸ; 40 ゑ\u3099 ↔ ヹ; 41 を\u3099 ↔ ヺ; 42 # One-to-one mappings, main block 43 # 3041:3094 ↔ 30A1:30F4 44 # 309D,E ↔ 30FD,E 45 ぁ ↔ ァ; 46 あ ↔ ア; 47 ぃ ↔ ィ; 48 い ↔ イ; 49 ぅ ↔ ゥ; 50 う ↔ ウ; 51 ぇ ↔ ェ; 52 え ↔ エ; 53 ぉ ↔ ォ; 54 お ↔ オ; 55 か ↔ カ; 56 が ↔ ガ; 57 き ↔ キ; 58 ぎ ↔ ギ; 59 く ↔ ク; 60 ぐ ↔ グ; 61 け ↔ ケ; 62 げ ↔ ゲ; 63 こ ↔ コ; 64 ご ↔ ゴ; 65 さ ↔ サ; 66 ざ ↔ ザ; 67 し ↔ シ; 68 じ ↔ ジ; 69 す ↔ ス; 70 ず ↔ ズ; 71 せ ↔ セ; 72 ぜ ↔ ゼ; 73 そ ↔ ソ; 74 ぞ ↔ ゾ; 75 た ↔ タ; 76 だ ↔ ダ; 77 ち ↔ チ; 78 ぢ ↔ ヂ; 79 っ ↔ ッ; 80 つ ↔ ツ; 81 づ ↔ ヅ; 82 て ↔ テ; 83 で ↔ デ; 84 と ↔ ト; 85 ど ↔ ド; 86 な ↔ ナ; 87 に ↔ ニ; 88 ぬ ↔ ヌ; 89 ね ↔ ネ; 90 の ↔ ノ; 91 は ↔ ハ; 92 ば ↔ バ; 93 ぱ ↔ パ; 94 ひ ↔ ヒ; 95 び ↔ ビ; 96 ぴ ↔ ピ; 97 ふ ↔ フ; 98 ぶ ↔ ブ; 99 ぷ ↔ プ; 100 へ ↔ ヘ; 101 べ ↔ ベ; 102 ぺ ↔ ペ; 103 ほ ↔ ホ; 104 ぼ ↔ ボ; 105 ぽ ↔ ポ; 106 ま ↔ マ; 107 み ↔ ミ; 108 む ↔ ム; 109 め ↔ メ; 110 も ↔ モ; 111 ゃ ↔ ャ; 112 や ↔ ヤ; 113 ゅ ↔ ュ; 114 ゆ ↔ ユ; 115 ょ ↔ ョ; 116 よ ↔ ヨ; 117 ら ↔ ラ; 118 り ↔ リ; 119 る ↔ ル; 120 れ ↔ レ; 121 ろ ↔ ロ; 122 ゎ ↔ ヮ; 123 わ ↔ ワ; 124 ゐ ↔ ヰ; 125 ゑ ↔ ヱ; 126 を ↔ ヲ; 127 ん ↔ ン; 128 ゔ ↔ ヴ; 129 ゝ ↔ ヽ; 130 ゞ ↔ ヾ; 131 # One-way Katakana-Hiragana xform of small K ka/ke to 132 # normal H ka/ke. 133 か ← ヵ; 134 け ← ヶ; 135 # Katakana followed by a prolonged sound mark 30FC has 136 # its final vowel doubled. This is a Katakana-Hiragana 137 # one-way information-losing transformation. We 138 # include the small Katakana (e.g., small A 3041) and 139 # do not distinguish them from their large 140 # counterparts. It doesn't make sense to double a 141 # small counterpart vowel as a small Hiragana vowel, so 142 # we don't do so. In natural text this should never 143 # occur anyway. If a 30FC is seen without a preceding 144 # vowel sound (e.g., after n 30F3) we do not change it. 145 ### $long = ー; 146 # The following categories are Hiragana, not Katakana 147 # as might be expected, since by the time we get to the 148 # 30FC, the preceding character will have already been 149 # transformed to Hiragana. 150 # {The following mechanically generated from the 151 # Unicode 3.0 data:} 152 $xa = [ 153 ぁ あ か が さ ざ 154 た だ な は ば ぱ 155 ま ゃ や ら ゎ わ 156 ]; 157 $xi = [ 158 ぃ い き ぎ し じ 159 ち ぢ に ひ び ぴ 160 み り ゐ 161 ]; 162 $xu = [ 163 ぅ う く ぐ す ず 164 っ つ づ ぬ ふ ぶ 165 ぷ む ゅ ゆ る ゔ 166 ]; 167 $xe = [ 168 ぇ え け げ せ ぜ 169 て で ね へ べ ぺ 170 め れ ゑ 171 ]; 172 $xo = [ 173 ぉ お こ ご そ ぞ 174 と ど の ほ ぼ ぽ 175 も ょ よ ろ を 176 ]; 177 あ ← $xa {ー}; 178 い ← $xi {ー}; 179 う ← $xu {ー}; 180 え ← $xe {ー}; 181 お ← $xo {ー}; 182 :: NFC (NFKC) ; 183 # note: a global filter is more efficient, but MUST include all source chars!! 184 :: ([[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:Nonspacing_Mark:]]-[\u309B \u309C]]); 185 # eof