]>
Commit | Line | Data |
---|---|---|
2ca993e8 A |
1 | # *************************************************************************** |
2 | # * | |
3 | # * Copyright (C) 2004-2016, International Business Machines | |
4 | # * Corporation; Unicode, Inc.; and others. All Rights Reserved. | |
5 | # * | |
6 | # *************************************************************************** | |
7 | # File: Hira_Kana.txt | |
8 | # Generated from CLDR | |
9 | # | |
10 | ||
11 | # note: a global filter is more efficient, but MUST include all source chars | |
12 | :: [\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ; | |
13 | :: NFKC (); | |
14 | # Hiragana-Katakana | |
15 | # This is largely a one-to-one mapping, but it has a | |
16 | # few kinks: | |
17 | # 1. The Katakana va/vi/ve/vo (30F7-30FA) have no | |
18 | # Hiragana equivalents. We use Hiragana wa/wi/we/wo | |
19 | # (308F-3092) with a voicing mark (3099), which is | |
20 | # semantically equivalent. However, this is a non- | |
21 | # roundtripping transformation. | |
22 | # 2. The Katakana small ka/ke (30F5,30F6) have no | |
23 | # Hiragana equiavlents. We convert them to normal | |
24 | # Hiragana ka/ke (304B,3051). This is a one-way | |
25 | # information-losing transformation and precludes | |
26 | # round-tripping of 30F5 and 30F6. | |
27 | # 3. The combining marks 3099-309C are in the Hiragana | |
28 | # block, but they apply to Katakana as well, so we | |
29 | # leave them untouched. | |
30 | # 4. The Katakana prolonged sound mark 30FC doubles the | |
31 | # preceding vowel. This is a one-way information- | |
32 | # losing transformation from Katakana to Hiragana. | |
33 | # 5. The Katakana middle dot separates words in foreign | |
34 | # expressions; we leave this unmodified. | |
35 | # The above points preclude successful round-trip | |
36 | # transformations of arbitrary input text. However, | |
37 | # they provide naturalistic results that should conform | |
38 | # to user expectations. | |
39 | # Combining equivalents va/vi/ve/vo | |
40 | わ\u3099 ↔ ヷ; | |
41 | ゐ\u3099 ↔ ヸ; | |
42 | ゑ\u3099 ↔ ヹ; | |
43 | を\u3099 ↔ ヺ; | |
44 | # One-to-one mappings, main block | |
45 | # 3041:3094 ↔ 30A1:30F4 | |
46 | # 309D,E ↔ 30FD,E | |
47 | ぁ ↔ ァ; | |
48 | あ ↔ ア; | |
49 | ぃ ↔ ィ; | |
50 | い ↔ イ; | |
51 | ぅ ↔ ゥ; | |
52 | う ↔ ウ; | |
53 | ぇ ↔ ェ; | |
54 | え ↔ エ; | |
55 | ぉ ↔ ォ; | |
56 | お ↔ オ; | |
57 | か ↔ カ; | |
58 | が ↔ ガ; | |
59 | き ↔ キ; | |
60 | ぎ ↔ ギ; | |
61 | く ↔ ク; | |
62 | ぐ ↔ グ; | |
63 | け ↔ ケ; | |
64 | げ ↔ ゲ; | |
65 | こ ↔ コ; | |
66 | ご ↔ ゴ; | |
67 | さ ↔ サ; | |
68 | ざ ↔ ザ; | |
69 | し ↔ シ; | |
70 | じ ↔ ジ; | |
71 | す ↔ ス; | |
72 | ず ↔ ズ; | |
73 | せ ↔ セ; | |
74 | ぜ ↔ ゼ; | |
75 | そ ↔ ソ; | |
76 | ぞ ↔ ゾ; | |
77 | た ↔ タ; | |
78 | だ ↔ ダ; | |
79 | ち ↔ チ; | |
80 | ぢ ↔ ヂ; | |
81 | っ ↔ ッ; | |
82 | つ ↔ ツ; | |
83 | づ ↔ ヅ; | |
84 | て ↔ テ; | |
85 | で ↔ デ; | |
86 | と ↔ ト; | |
87 | ど ↔ ド; | |
88 | な ↔ ナ; | |
89 | に ↔ ニ; | |
90 | ぬ ↔ ヌ; | |
91 | ね ↔ ネ; | |
92 | の ↔ ノ; | |
93 | は ↔ ハ; | |
94 | ば ↔ バ; | |
95 | ぱ ↔ パ; | |
96 | ひ ↔ ヒ; | |
97 | び ↔ ビ; | |
98 | ぴ ↔ ピ; | |
99 | ふ ↔ フ; | |
100 | ぶ ↔ ブ; | |
101 | ぷ ↔ プ; | |
102 | へ ↔ ヘ; | |
103 | べ ↔ ベ; | |
104 | ぺ ↔ ペ; | |
105 | ほ ↔ ホ; | |
106 | ぼ ↔ ボ; | |
107 | ぽ ↔ ポ; | |
108 | ま ↔ マ; | |
109 | み ↔ ミ; | |
110 | む ↔ ム; | |
111 | め ↔ メ; | |
112 | も ↔ モ; | |
113 | ゃ ↔ ャ; | |
114 | や ↔ ヤ; | |
115 | ゅ ↔ ュ; | |
116 | ゆ ↔ ユ; | |
117 | ょ ↔ ョ; | |
118 | よ ↔ ヨ; | |
119 | ら ↔ ラ; | |
120 | り ↔ リ; | |
121 | る ↔ ル; | |
122 | れ ↔ レ; | |
123 | ろ ↔ ロ; | |
124 | ゎ ↔ ヮ; | |
125 | わ ↔ ワ; | |
126 | ゐ ↔ ヰ; | |
127 | ゑ ↔ ヱ; | |
128 | を ↔ ヲ; | |
129 | ん ↔ ン; | |
130 | ゔ ↔ ヴ; | |
131 | ゝ ↔ ヽ; | |
132 | ゞ ↔ ヾ; | |
133 | # One-way Katakana-Hiragana xform of small K ka/ke to | |
134 | # normal H ka/ke. | |
135 | か ← ヵ; | |
136 | け ← ヶ; | |
137 | # Katakana followed by a prolonged sound mark 30FC has | |
138 | # its final vowel doubled. This is a Katakana-Hiragana | |
139 | # one-way information-losing transformation. We | |
140 | # include the small Katakana (e.g., small A 3041) and | |
141 | # do not distinguish them from their large | |
142 | # counterparts. It doesn't make sense to double a | |
143 | # small counterpart vowel as a small Hiragana vowel, so | |
144 | # we don't do so. In natural text this should never | |
145 | # occur anyway. If a 30FC is seen without a preceding | |
146 | # vowel sound (e.g., after n 30F3) we do not change it. | |
147 | ### $long = ー; | |
148 | # The following categories are Hiragana, not Katakana | |
149 | # as might be expected, since by the time we get to the | |
150 | # 30FC, the preceding character will have already been | |
151 | # transformed to Hiragana. | |
152 | # {The following mechanically generated from the | |
153 | # Unicode 3.0 data:} | |
154 | $xa = [ \ | |
155 | ぁ あ か が さ ざ \ | |
156 | た だ な は ば ぱ \ | |
157 | ま ゃ や ら ゎ わ \ | |
158 | ]; | |
159 | $xi = [ \ | |
160 | ぃ い き ぎ し じ \ | |
161 | ち ぢ に ひ び ぴ \ | |
162 | み り ゐ \ | |
163 | ]; | |
164 | $xu = [ \ | |
165 | ぅ う く ぐ す ず \ | |
166 | っ つ づ ぬ ふ ぶ \ | |
167 | ぷ む ゅ ゆ る ゔ \ | |
168 | ]; | |
169 | $xe = [ \ | |
170 | ぇ え け げ せ ぜ \ | |
171 | て で ね へ べ ぺ \ | |
172 | め れ ゑ \ | |
173 | ]; | |
174 | $xo = [ \ | |
175 | ぉ お こ ご そ ぞ \ | |
176 | と ど の ほ ぼ ぽ \ | |
177 | も ょ よ ろ を \ | |
178 | ]; | |
179 | あ ← $xa {ー}; | |
180 | い ← $xi {ー}; | |
181 | う ← $xu {ー}; | |
182 | え ← $xe {ー}; | |
183 | お ← $xo {ー}; | |
184 | :: (NFKC) ; | |
185 | # note: a global filter is more efficient, but MUST include all source chars!! | |
186 | :: ([\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]); | |
187 | # eof | |
188 |