]> git.saurik.com Git - apple/icu.git/blame - icuSources/data/translit/Han_Spacedhan.txt
ICU-59117.0.1.tar.gz
[apple/icu.git] / icuSources / data / translit / Han_Spacedhan.txt
CommitLineData
f3c0d7a5
A
1# © 2016 and later: Unicode, Inc. and others.
2# License & terms of use: http://www.unicode.org/copyright.html#License
3#
73c04bcf 4# File: Han_Spacedhan.txt
f3c0d7a5 5# Generated from CLDR
73c04bcf 6#
2ca993e8
A
7
8# Only intended for internal use
9# Make sure Han are normalized, including characters that contain them.
10# The first set in the filter is computed with http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:tonfkd:/XXX/:]-[:ideographic:]-[:sc=han:]
11# Where XXX is the resolved [:ideographic:][:sc=han:]. It needs updating with each Unicode release!
12:: [[㆒-㆟㈠-㉇㊀-㊰㋀-㋋㍘-㍰㍻-㍿㏠-㏾ 🈐-🈒🈔-🈺🉀-🉈🉐🉑][:ideographic:][:sc=han:]] nfkc;
374ca955 13:: fullwidth-halfwidth;
729e4ab9 14。 → '.';
374ca955
A
15$terminalPunct = [\.\,\:\;\?\!.,:?!。、;[:Pe:][:Pf:]];
16$initialPunct = [:Ps:][:Pi:];
2ca993e8
A
17# add space between any Han or terminal punctuation and letters, and
18# between letters and Han or initial punct
729e4ab9
A
19[[:Ideographic:] $terminalPunct] {} [:Letter:] → ' ' ;
20[:Letter:] [:Mark:]* {} [[:Ideographic:] $initialPunct] → ' ' ;
2ca993e8 21# remove spacing between ideographs and other letters
729e4ab9
A
22← [:Ideographic:] { ' ' } [:Letter:] ;
23← [:Letter:] [:Mark:]* { ' ' } [:Ideographic:] ;
2ca993e8 24