]>
Commit | Line | Data |
---|---|---|
f3c0d7a5 A |
1 | # © 2016 and later: Unicode, Inc. and others. |
2 | # License & terms of use: http://www.unicode.org/copyright.html#License | |
3 | # | |
73c04bcf | 4 | # File: Han_Spacedhan.txt |
f3c0d7a5 | 5 | # Generated from CLDR |
73c04bcf | 6 | # |
2ca993e8 A |
7 | |
8 | # Only intended for internal use | |
9 | # Make sure Han are normalized, including characters that contain them. | |
10 | # The first set in the filter is computed with http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:tonfkd:/XXX/:]-[:ideographic:]-[:sc=han:] | |
11 | # Where XXX is the resolved [:ideographic:][:sc=han:]. It needs updating with each Unicode release! | |
12 | :: [[㆒-㆟㈠-㉇㊀-㊰㋀-㋋㍘-㍰㍻-㍿㏠-㏾ 🈐-🈒🈔-🈺🉀-🉈🉐🉑][:ideographic:][:sc=han:]] nfkc; | |
374ca955 | 13 | :: fullwidth-halfwidth; |
729e4ab9 | 14 | 。 → '.'; |
374ca955 A |
15 | $terminalPunct = [\.\,\:\;\?\!.,:?!。、;[:Pe:][:Pf:]]; |
16 | $initialPunct = [:Ps:][:Pi:]; | |
2ca993e8 A |
17 | # add space between any Han or terminal punctuation and letters, and |
18 | # between letters and Han or initial punct | |
729e4ab9 A |
19 | [[:Ideographic:] $terminalPunct] {} [:Letter:] → ' ' ; |
20 | [:Letter:] [:Mark:]* {} [[:Ideographic:] $initialPunct] → ' ' ; | |
2ca993e8 | 21 | # remove spacing between ideographs and other letters |
729e4ab9 A |
22 | ← [:Ideographic:] { ' ' } [:Letter:] ; |
23 | ← [:Letter:] [:Mark:]* { ' ' } [:Ideographic:] ; | |
2ca993e8 | 24 |