]> git.saurik.com Git - apple/icu.git/blame - icuSources/data/translit/Han_Spacedhan.txt
ICU-57131.0.1.tar.gz
[apple/icu.git] / icuSources / data / translit / Han_Spacedhan.txt
CommitLineData
73c04bcf
A
1# ***************************************************************************
2# *
2ca993e8 3# * Copyright (C) 2004-2016, International Business Machines
73c04bcf
A
4# * Corporation; Unicode, Inc.; and others. All Rights Reserved.
5# *
6# ***************************************************************************
7# File: Han_Spacedhan.txt
46f4442e 8# Generated from CLDR
73c04bcf 9#
2ca993e8
A
10
11# Only intended for internal use
12# Make sure Han are normalized, including characters that contain them.
13# The first set in the filter is computed with http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:tonfkd:/XXX/:]-[:ideographic:]-[:sc=han:]
14# Where XXX is the resolved [:ideographic:][:sc=han:]. It needs updating with each Unicode release!
15:: [[㆒-㆟㈠-㉇㊀-㊰㋀-㋋㍘-㍰㍻-㍿㏠-㏾ 🈐-🈒🈔-🈺🉀-🉈🉐🉑][:ideographic:][:sc=han:]] nfkc;
374ca955 16:: fullwidth-halfwidth;
729e4ab9 17。 → '.';
374ca955
A
18$terminalPunct = [\.\,\:\;\?\!.,:?!。、;[:Pe:][:Pf:]];
19$initialPunct = [:Ps:][:Pi:];
2ca993e8
A
20# add space between any Han or terminal punctuation and letters, and
21# between letters and Han or initial punct
729e4ab9
A
22[[:Ideographic:] $terminalPunct] {} [:Letter:] → ' ' ;
23[:Letter:] [:Mark:]* {} [[:Ideographic:] $initialPunct] → ' ' ;
2ca993e8 24# remove spacing between ideographs and other letters
729e4ab9
A
25← [:Ideographic:] { ' ' } [:Letter:] ;
26← [:Letter:] [:Mark:]* { ' ' } [:Ideographic:] ;
2ca993e8 27