1 # © 2016 and later: Unicode, Inc. and others.
2 # License & terms of use: http://www.unicode.org/copyright.html#License
8 # note: a global filter is more efficient, but MUST include all source chars
9 :: [\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;
12 # This is largely a one-to-one mapping, but it has a
14 # 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
15 # Hiragana equivalents. We use Hiragana wa/wi/we/wo
16 # (308F-3092) with a voicing mark (3099), which is
17 # semantically equivalent. However, this is a non-
18 # roundtripping transformation.
19 # 2. The Katakana small ka/ke (30F5,30F6) have no
20 # Hiragana equiavlents. We convert them to normal
21 # Hiragana ka/ke (304B,3051). This is a one-way
22 # information-losing transformation and precludes
23 # round-tripping of 30F5 and 30F6.
24 # 3. The combining marks 3099-309C are in the Hiragana
25 # block, but they apply to Katakana as well, so we
26 # leave them untouched.
27 # 4. The Katakana prolonged sound mark 30FC doubles the
28 # preceding vowel. This is a one-way information-
29 # losing transformation from Katakana to Hiragana.
30 # 5. The Katakana middle dot separates words in foreign
31 # expressions; we leave this unmodified.
32 # The above points preclude successful round-trip
33 # transformations of arbitrary input text. However,
34 # they provide naturalistic results that should conform
35 # to user expectations.
36 # Combining equivalents va/vi/ve/vo
41 # One-to-one mappings, main block
42 # 3041:3094 ↔ 30A1:30F4
130 # One-way Katakana-Hiragana xform of small K ka/ke to
134 # Katakana followed by a prolonged sound mark 30FC has
135 # its final vowel doubled. This is a Katakana-Hiragana
136 # one-way information-losing transformation. We
137 # include the small Katakana (e.g., small A 3041) and
138 # do not distinguish them from their large
139 # counterparts. It doesn't make sense to double a
140 # small counterpart vowel as a small Hiragana vowel, so
141 # we don't do so. In natural text this should never
142 # occur anyway. If a 30FC is seen without a preceding
143 # vowel sound (e.g., after n 30F3) we do not change it.
145 # The following categories are Hiragana, not Katakana
146 # as might be expected, since by the time we get to the
147 # 30FC, the preceding character will have already been
148 # transformed to Hiragana.
149 # {The following mechanically generated from the
182 # note: a global filter is more efficient, but MUST include all source chars!!
183 :: ([\u0000-\u007E 、。 \u3099-゜ ァ-ー 。-゚ー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);