]> git.saurik.com Git - apple/icu.git/blame_incremental - icuSources/data/translit/Hira_Kana.txt
ICU-62141.0.1.tar.gz
[apple/icu.git] / icuSources / data / translit / Hira_Kana.txt
... / ...
CommitLineData
1# © 2016 and later: Unicode, Inc. and others.
2# License & terms of use: http://www.unicode.org/copyright.html#License
3#
4# File: Hira_Kana.txt
5# Generated from CLDR
6#
7
8# note: a global filter is more efficient, but MUST include all source chars
9:: [\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;
10:: NFKC ();
11# Hiragana-Katakana
12# This is largely a one-to-one mapping, but it has a
13# few kinks:
14# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
15# Hiragana equivalents. We use Hiragana wa/wi/we/wo
16# (308F-3092) with a voicing mark (3099), which is
17# semantically equivalent. However, this is a non-
18# roundtripping transformation.
19# 2. The Katakana small ka/ke (30F5,30F6) have no
20# Hiragana equiavlents. We convert them to normal
21# Hiragana ka/ke (304B,3051). This is a one-way
22# information-losing transformation and precludes
23# round-tripping of 30F5 and 30F6.
24# 3. The combining marks 3099-309C are in the Hiragana
25# block, but they apply to Katakana as well, so we
26# leave them untouched.
27# 4. The Katakana prolonged sound mark 30FC doubles the
28# preceding vowel. This is a one-way information-
29# losing transformation from Katakana to Hiragana.
30# 5. The Katakana middle dot separates words in foreign
31# expressions; we leave this unmodified.
32# The above points preclude successful round-trip
33# transformations of arbitrary input text. However,
34# they provide naturalistic results that should conform
35# to user expectations.
36# Combining equivalents va/vi/ve/vo
37わ\u3099 ↔ ヷ;
38ゐ\u3099 ↔ ヸ;
39ゑ\u3099 ↔ ヹ;
40を\u3099 ↔ ヺ;
41# One-to-one mappings, main block
42# 3041:3094 ↔ 30A1:30F4
43# 309D,E ↔ 30FD,E
44ぁ ↔ ァ;
45あ ↔ ア;
46ぃ ↔ ィ;
47い ↔ イ;
48ぅ ↔ ゥ;
49う ↔ ウ;
50ぇ ↔ ェ;
51え ↔ エ;
52ぉ ↔ ォ;
53お ↔ オ;
54か ↔ カ;
55が ↔ ガ;
56き ↔ キ;
57ぎ ↔ ギ;
58く ↔ ク;
59ぐ ↔ グ;
60け ↔ ケ;
61げ ↔ ゲ;
62こ ↔ コ;
63ご ↔ ゴ;
64さ ↔ サ;
65ざ ↔ ザ;
66し ↔ シ;
67じ ↔ ジ;
68す ↔ ス;
69ず ↔ ズ;
70せ ↔ セ;
71ぜ ↔ ゼ;
72そ ↔ ソ;
73ぞ ↔ ゾ;
74た ↔ タ;
75だ ↔ ダ;
76ち ↔ チ;
77ぢ ↔ ヂ;
78っ ↔ ッ;
79つ ↔ ツ;
80づ ↔ ヅ;
81て ↔ テ;
82で ↔ デ;
83と ↔ ト;
84ど ↔ ド;
85な ↔ ナ;
86に ↔ ニ;
87ぬ ↔ ヌ;
88ね ↔ ネ;
89の ↔ ノ;
90は ↔ ハ;
91ば ↔ バ;
92ぱ ↔ パ;
93ひ ↔ ヒ;
94び ↔ ビ;
95ぴ ↔ ピ;
96ふ ↔ フ;
97ぶ ↔ ブ;
98ぷ ↔ プ;
99へ ↔ ヘ;
100べ ↔ ベ;
101ぺ ↔ ペ;
102ほ ↔ ホ;
103ぼ ↔ ボ;
104ぽ ↔ ポ;
105ま ↔ マ;
106み ↔ ミ;
107む ↔ ム;
108め ↔ メ;
109も ↔ モ;
110ゃ ↔ ャ;
111や ↔ ヤ;
112ゅ ↔ ュ;
113ゆ ↔ ユ;
114ょ ↔ ョ;
115よ ↔ ヨ;
116ら ↔ ラ;
117り ↔ リ;
118る ↔ ル;
119れ ↔ レ;
120ろ ↔ ロ;
121ゎ ↔ ヮ;
122わ ↔ ワ;
123ゐ ↔ ヰ;
124ゑ ↔ ヱ;
125を ↔ ヲ;
126ん ↔ ン;
127ゔ ↔ ヴ;
128ゝ ↔ ヽ;
129ゞ ↔ ヾ;
130# One-way Katakana-Hiragana xform of small K ka/ke to
131# normal H ka/ke.
132か ← ヵ;
133け ← ヶ;
134# Katakana followed by a prolonged sound mark 30FC has
135# its final vowel doubled. This is a Katakana-Hiragana
136# one-way information-losing transformation. We
137# include the small Katakana (e.g., small A 3041) and
138# do not distinguish them from their large
139# counterparts. It doesn't make sense to double a
140# small counterpart vowel as a small Hiragana vowel, so
141# we don't do so. In natural text this should never
142# occur anyway. If a 30FC is seen without a preceding
143# vowel sound (e.g., after n 30F3) we do not change it.
144### $long = ー;
145# The following categories are Hiragana, not Katakana
146# as might be expected, since by the time we get to the
147# 30FC, the preceding character will have already been
148# transformed to Hiragana.
149# {The following mechanically generated from the
150# Unicode 3.0 data:}
151$xa = [ \
152ぁ あ か が さ ざ \
153た だ な は ば ぱ \
154ま ゃ や ら ゎ わ \
155];
156$xi = [ \
157ぃ い き ぎ し じ \
158ち ぢ に ひ び ぴ \
159み り ゐ \
160];
161$xu = [ \
162ぅ う く ぐ す ず \
163っ つ づ ぬ ふ ぶ \
164ぷ む ゅ ゆ る ゔ \
165];
166$xe = [ \
167ぇ え け げ せ ぜ \
168て で ね へ べ ぺ \
169め れ ゑ \
170];
171$xo = [ \
172ぉ お こ ご そ ぞ \
173と ど の ほ ぼ ぽ \
174も ょ よ ろ を \
175];
176あ ← $xa {ー};
177い ← $xi {ー};
178う ← $xu {ー};
179え ← $xe {ー};
180お ← $xo {ー};
181:: (NFKC) ;
182# note: a global filter is more efficient, but MUST include all source chars!!
183:: ([\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);
184# eof
185