]> git.saurik.com Git - apple/icu.git/blob - icuSources/data/translit/Hira_Kana.txt
a1eb33b56c4a58e23da12b96bad0ceeeb4644dcb
[apple/icu.git] / icuSources / data / translit / Hira_Kana.txt
1 # © 2016 and later: Unicode, Inc. and others.
2 # License & terms of use: http://www.unicode.org/copyright.html#License
3 #
4 # File: Hira_Kana.txt
5 # Generated from CLDR
6 #
7
8 # note: a global filter is more efficient, but MUST include all source chars
9 :: [[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]-[\u309B \u309C]];
10 :: NFKC (NFC);
11 # Hiragana-Katakana
12 # This is largely a one-to-one mapping, but it has a
13 # few kinks:
14 # 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
15 # Hiragana equivalents. We use Hiragana wa/wi/we/wo
16 # (308F-3092) with a voicing mark (3099), which is
17 # semantically equivalent. However, this is a non-
18 # roundtripping transformation.
19 # 2. The Katakana small ka/ke (30F5,30F6) have no
20 # Hiragana equiavlents. We convert them to normal
21 # Hiragana ka/ke (304B,3051). This is a one-way
22 # information-losing transformation and precludes
23 # round-tripping of 30F5 and 30F6.
24 # 3. The combining marks 3099-309C are in the Hiragana
25 # block, but they apply to Katakana as well, so we
26 # leave them untouched.
27 # 4. The Katakana prolonged sound mark 30FC doubles the
28 # preceding vowel. This is a one-way information-
29 # losing transformation from Katakana to Hiragana.
30 # 5. The Katakana middle dot separates words in foreign
31 # expressions; we leave this unmodified.
32 # The above points preclude successful round-trip
33 # transformations of arbitrary input text. However,
34 # they provide naturalistic results that should conform
35 # to user expectations.
36 # Combining equivalents va/vi/ve/vo
37 わ\u3099 ↔ ヷ;
38 ゐ\u3099 ↔ ヸ;
39 ゑ\u3099 ↔ ヹ;
40 を\u3099 ↔ ヺ;
41 # One-to-one mappings, main block
42 # 3041:3094 ↔ 30A1:30F4
43 # 309D,E ↔ 30FD,E
44 ぁ ↔ ァ;
45 あ ↔ ア;
46 ぃ ↔ ィ;
47 い ↔ イ;
48 ぅ ↔ ゥ;
49 う ↔ ウ;
50 ぇ ↔ ェ;
51 え ↔ エ;
52 ぉ ↔ ォ;
53 お ↔ オ;
54 か ↔ カ;
55 が ↔ ガ;
56 き ↔ キ;
57 ぎ ↔ ギ;
58 く ↔ ク;
59 ぐ ↔ グ;
60 け ↔ ケ;
61 げ ↔ ゲ;
62 こ ↔ コ;
63 ご ↔ ゴ;
64 さ ↔ サ;
65 ざ ↔ ザ;
66 し ↔ シ;
67 じ ↔ ジ;
68 す ↔ ス;
69 ず ↔ ズ;
70 せ ↔ セ;
71 ぜ ↔ ゼ;
72 そ ↔ ソ;
73 ぞ ↔ ゾ;
74 た ↔ タ;
75 だ ↔ ダ;
76 ち ↔ チ;
77 ぢ ↔ ヂ;
78 っ ↔ ッ;
79 つ ↔ ツ;
80 づ ↔ ヅ;
81 て ↔ テ;
82 で ↔ デ;
83 と ↔ ト;
84 ど ↔ ド;
85 な ↔ ナ;
86 に ↔ ニ;
87 ぬ ↔ ヌ;
88 ね ↔ ネ;
89 の ↔ ノ;
90 は ↔ ハ;
91 ば ↔ バ;
92 ぱ ↔ パ;
93 ひ ↔ ヒ;
94 び ↔ ビ;
95 ぴ ↔ ピ;
96 ふ ↔ フ;
97 ぶ ↔ ブ;
98 ぷ ↔ プ;
99 へ ↔ ヘ;
100 べ ↔ ベ;
101 ぺ ↔ ペ;
102 ほ ↔ ホ;
103 ぼ ↔ ボ;
104 ぽ ↔ ポ;
105 ま ↔ マ;
106 み ↔ ミ;
107 む ↔ ム;
108 め ↔ メ;
109 も ↔ モ;
110 ゃ ↔ ャ;
111 や ↔ ヤ;
112 ゅ ↔ ュ;
113 ゆ ↔ ユ;
114 ょ ↔ ョ;
115 よ ↔ ヨ;
116 ら ↔ ラ;
117 り ↔ リ;
118 る ↔ ル;
119 れ ↔ レ;
120 ろ ↔ ロ;
121 ゎ ↔ ヮ;
122 わ ↔ ワ;
123 ゐ ↔ ヰ;
124 ゑ ↔ ヱ;
125 を ↔ ヲ;
126 ん ↔ ン;
127 ゔ ↔ ヴ;
128 ゝ ↔ ヽ;
129 ゞ ↔ ヾ;
130 # One-way Katakana-Hiragana xform of small K ka/ke to
131 # normal H ka/ke.
132 か ← ヵ;
133 け ← ヶ;
134 # Katakana followed by a prolonged sound mark 30FC has
135 # its final vowel doubled. This is a Katakana-Hiragana
136 # one-way information-losing transformation. We
137 # include the small Katakana (e.g., small A 3041) and
138 # do not distinguish them from their large
139 # counterparts. It doesn't make sense to double a
140 # small counterpart vowel as a small Hiragana vowel, so
141 # we don't do so. In natural text this should never
142 # occur anyway. If a 30FC is seen without a preceding
143 # vowel sound (e.g., after n 30F3) we do not change it.
144 ### $long = ー;
145 # The following categories are Hiragana, not Katakana
146 # as might be expected, since by the time we get to the
147 # 30FC, the preceding character will have already been
148 # transformed to Hiragana.
149 # {The following mechanically generated from the
150 # Unicode 3.0 data:}
151 $xa = [ \
152 ぁ あ か が さ ざ \
153 た だ な は ば ぱ \
154 ま ゃ や ら ゎ わ \
155 ];
156 $xi = [ \
157 ぃ い き ぎ し じ \
158 ち ぢ に ひ び ぴ \
159 み り ゐ \
160 ];
161 $xu = [ \
162 ぅ う く ぐ す ず \
163 っ つ づ ぬ ふ ぶ \
164 ぷ む ゅ ゆ る ゔ \
165 ];
166 $xe = [ \
167 ぇ え け げ せ ぜ \
168 て で ね へ べ ぺ \
169 め れ ゑ \
170 ];
171 $xo = [ \
172 ぉ お こ ご そ ぞ \
173 と ど の ほ ぼ ぽ \
174 も ょ よ ろ を \
175 ];
176 あ ← $xa {ー};
177 い ← $xi {ー};
178 う ← $xu {ー};
179 え ← $xe {ー};
180 お ← $xo {ー};
181 :: NFC (NFKC) ;
182 # note: a global filter is more efficient, but MUST include all source chars!!
183 :: ([[\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]-[\u309B \u309C]]);
184 # eof
185