icuSources/data/translit/Hira_Kana.txt

   1 # ***************************************************************************
   2 # *
   3 # *  Copyright (C) 2004-2016, International Business Machines
   4 # *  Corporation; Unicode, Inc.; and others.  All Rights Reserved.
   5 # *
   6 # ***************************************************************************
   7 # File: Hira_Kana.txt
   8 # Generated from CLDR
   9 #
  10
  11 # note: a global filter is more efficient, but MUST include all source chars
  12 :: [\u0000-\u007E 、。 \u3099-゜ ァ-ー ｡-ﾟー[:Hiragana:] [:Katakana:] [:nonspacing mark:]] ;
  13 :: NFKC ();
  14 # Hiragana-Katakana
  15 # This is largely a one-to-one mapping, but it has a
  16 # few kinks:
  17 # 1. The Katakana va/vi/ve/vo (30F7-30FA) have no
  18 # Hiragana equivalents.  We use Hiragana wa/wi/we/wo
  19 # (308F-3092) with a voicing mark (3099), which is
  20 # semantically equivalent.  However, this is a non-
  21 # roundtripping transformation.
  22 # 2. The Katakana small ka/ke (30F5,30F6) have no
  23 # Hiragana equiavlents.  We convert them to normal
  24 # Hiragana ka/ke (304B,3051).  This is a one-way
  25 # information-losing transformation and precludes
  26 # round-tripping of 30F5 and 30F6.
  27 # 3. The combining marks 3099-309C are in the Hiragana
  28 # block, but they apply to Katakana as well, so we
  29 # leave them untouched.
  30 # 4. The Katakana prolonged sound mark 30FC doubles the
  31 # preceding vowel.  This is a one-way information-
  32 # losing transformation from Katakana to Hiragana.
  33 # 5. The Katakana middle dot separates words in foreign
  34 # expressions; we leave this unmodified.
  35 # The above points preclude successful round-trip
  36 # transformations of arbitrary input text.  However,
  37 # they provide naturalistic results that should conform
  38 # to user expectations.
  39 # Combining equivalents va/vi/ve/vo
  40 わ\u3099 ↔ ヷ;
  41 ゐ\u3099 ↔ ヸ;
  42 ゑ\u3099 ↔ ヹ;
  43 を\u3099 ↔ ヺ;
  44 # One-to-one mappings, main block
  45 # 3041:3094 ↔ 30A1:30F4
  46 # 309D,E ↔ 30FD,E
  47 ぁ ↔ ァ;
  48 あ ↔ ア;
  49 ぃ ↔ ィ;
  50 い ↔ イ;
  51 ぅ ↔ ゥ;
  52 う ↔ ウ;
  53 ぇ ↔ ェ;
  54 え ↔ エ;
  55 ぉ ↔ ォ;
  56 お ↔ オ;
  57 か ↔ カ;
  58 が ↔ ガ;
  59 き ↔ キ;
  60 ぎ ↔ ギ;
  61 く ↔ ク;
  62 ぐ ↔ グ;
  63 け ↔ ケ;
  64 げ ↔ ゲ;
  65 こ ↔ コ;
  66 ご ↔ ゴ;
  67 さ ↔ サ;
  68 ざ ↔ ザ;
  69 し ↔ シ;
  70 じ ↔ ジ;
  71 す ↔ ス;
  72 ず ↔ ズ;
  73 せ ↔ セ;
  74 ぜ ↔ ゼ;
  75 そ ↔ ソ;
  76 ぞ ↔ ゾ;
  77 た ↔ タ;
  78 だ ↔ ダ;
  79 ち ↔ チ;
  80 ぢ ↔ ヂ;
  81 っ ↔ ッ;
  82 つ ↔ ツ;
  83 づ ↔ ヅ;
  84 て ↔ テ;
  85 で ↔ デ;
  86 と ↔ ト;
  87 ど ↔ ド;
  88 な ↔ ナ;
  89 に ↔ ニ;
  90 ぬ ↔ ヌ;
  91 ね ↔ ネ;
  92 の ↔ ノ;
  93 は ↔ ハ;
  94 ば ↔ バ;
  95 ぱ ↔ パ;
  96 ひ ↔ ヒ;
  97 び ↔ ビ;
  98 ぴ ↔ ピ;
  99 ふ ↔ フ;
 100 ぶ ↔ ブ;
 101 ぷ ↔ プ;
 102 へ ↔ ヘ;
 103 べ ↔ ベ;
 104 ぺ ↔ ペ;
 105 ほ ↔ ホ;
 106 ぼ ↔ ボ;
 107 ぽ ↔ ポ;
 108 ま ↔ マ;
 109 み ↔ ミ;
 110 む ↔ ム;
 111 め ↔ メ;
 112 も ↔ モ;
 113 ゃ ↔ ャ;
 114 や ↔ ヤ;
 115 ゅ ↔ ュ;
 116 ゆ ↔ ユ;
 117 ょ ↔ ョ;
 118 よ ↔ ヨ;
 119 ら ↔ ラ;
 120 り ↔ リ;
 121 る ↔ ル;
 122 れ ↔ レ;
 123 ろ ↔ ロ;
 124 ゎ ↔ ヮ;
 125 わ ↔ ワ;
 126 ゐ ↔ ヰ;
 127 ゑ ↔ ヱ;
 128 を ↔ ヲ;
 129 ん ↔ ン;
 130 ゔ ↔ ヴ;
 131 ゝ ↔ ヽ;
 132 ゞ ↔ ヾ;
 133 # One-way Katakana-Hiragana xform of small K ka/ke to
 134 # normal H ka/ke.
 135 か ← ヵ;
 136 け ← ヶ;
 137 # Katakana followed by a prolonged sound mark 30FC has
 138 # its final vowel doubled.  This is a Katakana-Hiragana
 139 # one-way information-losing transformation.  We
 140 # include the small Katakana (e.g., small A 3041) and
 141 # do not distinguish them from their large
 142 # counterparts.  It doesn't make sense to double a
 143 # small counterpart vowel as a small Hiragana vowel, so
 144 # we don't do so.  In natural text this should never
 145 # occur anyway.  If a 30FC is seen without a preceding
 146 # vowel sound (e.g., after n 30F3) we do not change it.
 147 ### $long = ー;
 148 # The following categories are Hiragana, not Katakana
 149 # as might be expected, since by the time we get to the
 150 # 30FC, the preceding character will have already been
 151 # transformed to Hiragana.
 152 # {The following mechanically generated from the
 153 # Unicode 3.0 data:}
 154 $xa = [ \
 155 ぁ あ か が さ ざ \
 156 た だ な は ば ぱ \
 157 ま ゃ や ら ゎ わ \
 158 ];
 159 $xi = [ \
 160 ぃ い き ぎ し じ \
 161 ち ぢ に ひ び ぴ \
 162 み り ゐ \
 163 ];
 164 $xu = [ \
 165 ぅ う く ぐ す ず \
 166 っ つ づ ぬ ふ ぶ \
 167 ぷ む ゅ ゆ る ゔ \
 168 ];
 169 $xe = [ \
 170 ぇ え け げ せ ぜ \
 171 て で ね へ べ ぺ \
 172 め れ ゑ \
 173 ];
 174 $xo = [ \
 175 ぉ お こ ご そ ぞ \
 176 と ど の ほ ぼ ぽ \
 177 も ょ よ ろ を \
 178 ];
 179 あ ← $xa {ー};
 180 い ← $xi {ー};
 181 う ← $xu {ー};
 182 え ← $xe {ー};
 183 お ← $xo {ー};
 184 :: (NFKC) ;
 185 # note: a global filter is more efficient, but MUST include all source chars!!
 186 :: ([\u0000-\u007E 、。 \u3099-゜ ァ-ー ｡-ﾟー[:Hiragana:] [:Katakana:] [:nonspacing mark:]]);
 187 # eof
 188