]> git.saurik.com Git - apple/icu.git/blame - icuSources/data/translit/Hira_Kana.txt
ICU-57131.0.1.tar.gz
[apple/icu.git] / icuSources / data / translit / Hira_Kana.txt
CommitLineData
2ca993e8
A
1# ***************************************************************************
2# *
3# * Copyright (C) 2004-2016, International Business Machines
4# * Corporation; Unicode, Inc.; and others. All Rights Reserved.
5# *
6# ***************************************************************************
7# File: Hira_Kana.txt
8# Generated from CLDR
9#
10
11# note: a global filter is more efficient, but MUST include all source chars
12:: [\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;
13:: NFKC ();
14# Hiragana-Katakana
15# This is largely a one-to-one mapping, but it has a
16# few kinks:
17# 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
18# Hiragana equivalents. We use Hiragana wa/wi/we/wo
19# (308F-3092) with a voicing mark (3099), which is
20# semantically equivalent. However, this is a non-
21# roundtripping transformation.
22# 2. The Katakana small ka/ke (30F5,30F6) have no
23# Hiragana equiavlents. We convert them to normal
24# Hiragana ka/ke (304B,3051). This is a one-way
25# information-losing transformation and precludes
26# round-tripping of 30F5 and 30F6.
27# 3. The combining marks 3099-309C are in the Hiragana
28# block, but they apply to Katakana as well, so we
29# leave them untouched.
30# 4. The Katakana prolonged sound mark 30FC doubles the
31# preceding vowel. This is a one-way information-
32# losing transformation from Katakana to Hiragana.
33# 5. The Katakana middle dot separates words in foreign
34# expressions; we leave this unmodified.
35# The above points preclude successful round-trip
36# transformations of arbitrary input text. However,
37# they provide naturalistic results that should conform
38# to user expectations.
39# Combining equivalents va/vi/ve/vo
40わ\u3099 ↔ ヷ;
41ゐ\u3099 ↔ ヸ;
42ゑ\u3099 ↔ ヹ;
43を\u3099 ↔ ヺ;
44# One-to-one mappings, main block
45# 3041:3094 ↔ 30A1:30F4
46# 309D,E ↔ 30FD,E
47ぁ ↔ ァ;
48あ ↔ ア;
49ぃ ↔ ィ;
50い ↔ イ;
51ぅ ↔ ゥ;
52う ↔ ウ;
53ぇ ↔ ェ;
54え ↔ エ;
55ぉ ↔ ォ;
56お ↔ オ;
57か ↔ カ;
58が ↔ ガ;
59き ↔ キ;
60ぎ ↔ ギ;
61く ↔ ク;
62ぐ ↔ グ;
63け ↔ ケ;
64げ ↔ ゲ;
65こ ↔ コ;
66ご ↔ ゴ;
67さ ↔ サ;
68ざ ↔ ザ;
69し ↔ シ;
70じ ↔ ジ;
71す ↔ ス;
72ず ↔ ズ;
73せ ↔ セ;
74ぜ ↔ ゼ;
75そ ↔ ソ;
76ぞ ↔ ゾ;
77た ↔ タ;
78だ ↔ ダ;
79ち ↔ チ;
80ぢ ↔ ヂ;
81っ ↔ ッ;
82つ ↔ ツ;
83づ ↔ ヅ;
84て ↔ テ;
85で ↔ デ;
86と ↔ ト;
87ど ↔ ド;
88な ↔ ナ;
89に ↔ ニ;
90ぬ ↔ ヌ;
91ね ↔ ネ;
92の ↔ ノ;
93は ↔ ハ;
94ば ↔ バ;
95ぱ ↔ パ;
96ひ ↔ ヒ;
97び ↔ ビ;
98ぴ ↔ ピ;
99ふ ↔ フ;
100ぶ ↔ ブ;
101ぷ ↔ プ;
102へ ↔ ヘ;
103べ ↔ ベ;
104ぺ ↔ ペ;
105ほ ↔ ホ;
106ぼ ↔ ボ;
107ぽ ↔ ポ;
108ま ↔ マ;
109み ↔ ミ;
110む ↔ ム;
111め ↔ メ;
112も ↔ モ;
113ゃ ↔ ャ;
114や ↔ ヤ;
115ゅ ↔ ュ;
116ゆ ↔ ユ;
117ょ ↔ ョ;
118よ ↔ ヨ;
119ら ↔ ラ;
120り ↔ リ;
121る ↔ ル;
122れ ↔ レ;
123ろ ↔ ロ;
124ゎ ↔ ヮ;
125わ ↔ ワ;
126ゐ ↔ ヰ;
127ゑ ↔ ヱ;
128を ↔ ヲ;
129ん ↔ ン;
130ゔ ↔ ヴ;
131ゝ ↔ ヽ;
132ゞ ↔ ヾ;
133# One-way Katakana-Hiragana xform of small K ka/ke to
134# normal H ka/ke.
135か ← ヵ;
136け ← ヶ;
137# Katakana followed by a prolonged sound mark 30FC has
138# its final vowel doubled. This is a Katakana-Hiragana
139# one-way information-losing transformation. We
140# include the small Katakana (e.g., small A 3041) and
141# do not distinguish them from their large
142# counterparts. It doesn't make sense to double a
143# small counterpart vowel as a small Hiragana vowel, so
144# we don't do so. In natural text this should never
145# occur anyway. If a 30FC is seen without a preceding
146# vowel sound (e.g., after n 30F3) we do not change it.
147### $long = ー;
148# The following categories are Hiragana, not Katakana
149# as might be expected, since by the time we get to the
150# 30FC, the preceding character will have already been
151# transformed to Hiragana.
152# {The following mechanically generated from the
153# Unicode 3.0 data:}
154$xa = [ \
155ぁ あ か が さ ざ \
156た だ な は ば ぱ \
157ま ゃ や ら ゎ わ \
158];
159$xi = [ \
160ぃ い き ぎ し じ \
161ち ぢ に ひ び ぴ \
162み り ゐ \
163];
164$xu = [ \
165ぅ う く ぐ す ず \
166っ つ づ ぬ ふ ぶ \
167ぷ む ゅ ゆ る ゔ \
168];
169$xe = [ \
170ぇ え け げ せ ぜ \
171て で ね へ べ ぺ \
172め れ ゑ \
173];
174$xo = [ \
175ぉ お こ ご そ ぞ \
176と ど の ほ ぼ ぽ \
177も ょ よ ろ を \
178];
179あ ← $xa {ー};
180い ← $xi {ー};
181う ← $xu {ー};
182え ← $xe {ー};
183お ← $xo {ー};
184:: (NFKC) ;
185# note: a global filter is more efficient, but MUST include all source chars!!
186:: ([\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);
187# eof
188