]> git.saurik.com Git - apple/icu.git/blame - icuSources/data/unidata/changes.txt
ICU-551.51.4.tar.gz
[apple/icu.git] / icuSources / data / unidata / changes.txt
CommitLineData
b331163b 1* Copyright (C) 2004-2015, International Business Machines
73c04bcf
A
2* Corporation and others. All Rights Reserved.
3*
4* file name: changes.txt
5* encoding: US-ASCII
6* tab size: 8 (not used)
7* indentation:4
8*
9* created on: 2004may06
10* created by: Markus W. Scherer
11*
12* change log for Unicode updates
13
14---------------------------------------------------------------------------- ***
51004dcb 15
b331163b
A
16* New ISO 15924 script codes
17
18Starting with ICU 55, we do not add UScriptCode constants any more until their scripts
19are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.
20Script enum constant names want to follow the Unicode script property value aliases,
21which are assigned only when the scripts are encoded.
22When we encode scripts early and guess wrong, then we have confusing enum constants
23and have sometimes added aliases.
24
25Exception: Script codes like Latf and Aran that are not subject to separate encoding
26can be added at any time.
27
28Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html
29
30Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561
31- Adlm 166 Adlam
32- Aran 161 Arabic (Nastaliq variant)
33- Kitl 505 Khitan large script
34- Kits 288 Khitan small script
35- Marc 332 Marchen
36- Osge 219 Osage
37
38Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.
39
40Adlam, Marchen, and Osage are expected to go into Unicode 9;
41we should assign Unicode script property value aliases for them
42soon after Unicode 8 is released, and add them in ICU 56.
43
44Khitan scripts will be encoded later.
45
46---------------------------------------------------------------------------- ***
47
48Unicode 8.0 update for ICU ??
49
50* UCA issue from 7.0
51
52- U+1DE9 COMBINING LATIN SMALL LETTER BETA
53 sorts with Greek Beta, should sort with Latin B?
54 + Ken says:
55 No, it was deliberate:
56
57 03B2;GREEK SMALL LETTER BETA;Ll;;;;0392;;0392
58 1D5D;MODIFIER LETTER SMALL BETA;Lm;<super> 03B2;;;;;
59 1DE9;COMBINING LATIN SMALL LETTER BETA;Mn;<sort> 03B2;;;;;
60 1D66;GREEK SUBSCRIPT SMALL LETTER BETA;Ll;<sub> 03B2;;;;;
61
62 Note the relationship to U+1D5D.
63
64 When the disunified *Latin* beta base letter shows up in Unicode 8.0:
65
66 U+A7B4 LATIN CAPITAL LETTER BETA
67 U+A7B5 LATIN SMALL LETTER BETA
68
69 we could re-evaluate what U+1DE9 equates to, for collation,
70 but currently there isn’t any Latin beta to serve that function
71 in Unicode 7.0.
72
73- ICU_ROOT=~/svn.icu/trunk
74- ICU_SRC_DIR=$ICU_ROOT/src
75- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
76- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
77
78
79---------------------------------------------------------------------------- ***
80
81Unicode 7.0 update for ICU 54
82
83http://www.unicode.org/review/pri271/ -- beta review
84http://www.unicode.org/reports/uax-proposed-updates.html
85http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
86http://www.unicode.org/reports/tr44/tr44-13.html
87
88*** ICU Trac
89
90- ticket 10821: Unicode 7.0, UCA 7.0
91- C++ branches/markus/uni70 at r35584 from trunk at r35580
92- Java branches/markus/uni70 at r35587 from trunk at r35545
93
94*** CLDR Trac
95
96- ticket 7195: UCA 7.0 CLDR root collation
97- branches/markus/uni70 at r10062 from trunk at r10061
98
99- ticket 6762: script metadata for Unicode 7.0 new scripts
100
101*** Unicode version numbers
102- makedata.mak
103- uchar.h
104- com.ibm.icu.util.VersionInfo
105- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
106
107- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
108 so that the makefiles see the new version number.
109
110*** data files & enums & parser code
111
112* file preparation
113
114- download UCD & IDNA files
115- make sure that the Unicode data folder passed into preparseucd.py
116 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
117- only for manual diffs: remove version suffixes from the file names
118 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
119 (see https://sites.google.com/site/unicodetools/inputdata)
120- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
121- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
122- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
123- Restore TODO diffs in source/data/unidata/UCARules.txt
124 cd $ICU_SRC_DIR
125 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
126- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
127
128- also: from http://unicode.org/Public/security/7.0.0/ download new
129 confusables.txt & confusablesWholeScript.txt
130 and copy to $ICU_ROOT/src/source/data/unidata/
131
132* initial preparseucd.py changes
133- remove new Unicode scripts from the
134 only-in-ISO-15924 list according to the error message:
135 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
136 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
137 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
138 from _scripts_only_in_iso15924
139 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
140 and in com.ibm.icu.dev.test.lang.TestUScript.java
141- NamesList.txt now has a heading with a non-ASCII character
142 + keep ppucd.txt in platform charset, rather than changing tool/test parsers
143 + escape non-ASCII characters in heading comments
144- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
145 + get the copyright from the first file whose copyright line contains the current year
146
147* PropertyValueAliases.txt changes
148- 32 new Block (blk) values:
149 blk; Bassa_Vah ; Bassa_Vah
150 blk; Caucasian_Albanian ; Caucasian_Albanian
151 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers
152 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended
153 blk; Duployan ; Duployan
154 blk; Elbasan ; Elbasan
155 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended
156 blk; Grantha ; Grantha
157 blk; Khojki ; Khojki
158 blk; Khudawadi ; Khudawadi
159 blk; Latin_Ext_E ; Latin_Extended_E
160 blk; Linear_A ; Linear_A
161 blk; Mahajani ; Mahajani
162 blk; Manichaean ; Manichaean
163 blk; Mende_Kikakui ; Mende_Kikakui
164 blk; Modi ; Modi
165 blk; Mro ; Mro
166 blk; Myanmar_Ext_B ; Myanmar_Extended_B
167 blk; Nabataean ; Nabataean
168 blk; Old_North_Arabian ; Old_North_Arabian
169 blk; Old_Permic ; Old_Permic
170 blk; Ornamental_Dingbats ; Ornamental_Dingbats
171 blk; Pahawh_Hmong ; Pahawh_Hmong
172 blk; Palmyrene ; Palmyrene
173 blk; Pau_Cin_Hau ; Pau_Cin_Hau
174 blk; Psalter_Pahlavi ; Psalter_Pahlavi
175 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls
176 blk; Siddham ; Siddham
177 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers
178 blk; Sup_Arrows_C ; Supplemental_Arrows_C
179 blk; Tirhuta ; Tirhuta
180 blk; Warang_Citi ; Warang_Citi
181 -> add to uchar.h
182 use long property names for enum constants
183 -> add to UCharacter.UnicodeBlock IDs
184 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
185 replace public static final int \1_ID = \2; \3
186 -> add to UCharacter.UnicodeBlock objects
187 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
188 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
189- 28 new Joining_Group (jg) values:
190 jg ; Manichaean_Aleph ; Manichaean_Aleph
191 jg ; Manichaean_Ayin ; Manichaean_Ayin
192 jg ; Manichaean_Beth ; Manichaean_Beth
193 jg ; Manichaean_Daleth ; Manichaean_Daleth
194 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh
195 jg ; Manichaean_Five ; Manichaean_Five
196 jg ; Manichaean_Gimel ; Manichaean_Gimel
197 jg ; Manichaean_Heth ; Manichaean_Heth
198 jg ; Manichaean_Hundred ; Manichaean_Hundred
199 jg ; Manichaean_Kaph ; Manichaean_Kaph
200 jg ; Manichaean_Lamedh ; Manichaean_Lamedh
201 jg ; Manichaean_Mem ; Manichaean_Mem
202 jg ; Manichaean_Nun ; Manichaean_Nun
203 jg ; Manichaean_One ; Manichaean_One
204 jg ; Manichaean_Pe ; Manichaean_Pe
205 jg ; Manichaean_Qoph ; Manichaean_Qoph
206 jg ; Manichaean_Resh ; Manichaean_Resh
207 jg ; Manichaean_Sadhe ; Manichaean_Sadhe
208 jg ; Manichaean_Samekh ; Manichaean_Samekh
209 jg ; Manichaean_Taw ; Manichaean_Taw
210 jg ; Manichaean_Ten ; Manichaean_Ten
211 jg ; Manichaean_Teth ; Manichaean_Teth
212 jg ; Manichaean_Thamedh ; Manichaean_Thamedh
213 jg ; Manichaean_Twenty ; Manichaean_Twenty
214 jg ; Manichaean_Waw ; Manichaean_Waw
215 jg ; Manichaean_Yodh ; Manichaean_Yodh
216 jg ; Manichaean_Zayin ; Manichaean_Zayin
217 jg ; Straight_Waw ; Straight_Waw
218 -> uchar.h & UCharacter.JoiningGroup
219- 23 new Script (sc) values:
220 sc ; Aghb ; Caucasian_Albanian
221 sc ; Bass ; Bassa_Vah
222 sc ; Dupl ; Duployan
223 sc ; Elba ; Elbasan
224 sc ; Gran ; Grantha
225 sc ; Hmng ; Pahawh_Hmong
226 sc ; Khoj ; Khojki
227 sc ; Lina ; Linear_A
228 sc ; Mahj ; Mahajani
229 sc ; Mani ; Manichaean
230 sc ; Mend ; Mende_Kikakui
231 sc ; Modi ; Modi
232 sc ; Mroo ; Mro
233 sc ; Narb ; Old_North_Arabian
234 sc ; Nbat ; Nabataean
235 sc ; Palm ; Palmyrene
236 sc ; Pauc ; Pau_Cin_Hau
237 sc ; Perm ; Old_Permic
238 sc ; Phlp ; Psalter_Pahlavi
239 sc ; Sidd ; Siddham
240 sc ; Sind ; Khudawadi
241 sc ; Tirh ; Tirhuta
242 sc ; Wara ; Warang_Citi
243 -> uscript.h (many were added before)
244 comment "Mende Kikakui" for USCRIPT_MENDE
245 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
246 -> com.ibm.icu.lang.UScript
247 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
248 replace public static final int \1 = \2; \3
249- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
250 (added 2012-11-01)
251 Ahom 338 Ahom
252 Hatr 127 Hatran
253 Mult 323 Multani
254 (added 2013-10-12)
255 Modi 324 Modi
256 Pauc 263 Pau Cin Hau
257 Sidd 302 Siddham
258 -> uscript.h (some overlap with additions from Unicode)
259 -> com.ibm.icu.lang.UScript
260 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
261 replace public static final int \1 = \2; \3
262 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
263 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
264 and in com.ibm.icu.dev.test.lang.TestUScript.java
265
266* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
267 (not strictly necessary for NOT_ENCODED scripts)
268 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
269
270* generate normalization data files
271- cd $ICU_ROOT/dbg
272- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
273- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
274- UNIDATA=$ICU_SRC_DIR/source/data/unidata
275- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
276- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
277- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
278- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
279- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
280
281* build ICU (make install)
282 so that the tools build can pick up the new definitions from the installed header files.
283
284~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
285
286* build Unicode tools using CMake+make
287
288~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
289
290# Location (--prefix) of where ICU was installed.
291set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
292# Location of the ICU source tree.
293set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
294
295~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
296~/svn.icutools/trunk/dbg/unicode/c$ make
297
298* genprops work
299- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
300 + add second array of Joining_Group values for at most 10800..10FFF
301 icutools: unicode/c/genprops/bidipropsbuilder.cpp
302 icu: source/common/ubidi_props.h/.c/_data.h
303 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
304
305* generate core properties data files
306- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
307- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
308- rebuild ICU (make install) & tools
309- run genuca again (see step above) so that it picks up the new nfc.nrm
310- rebuild ICU (make install) & tools
311
312* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
313 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
314- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
315- Unicode 6.0..7.0: U+2260, U+226E, U+226F
316- nothing new in 7.0, no test file to update
317
318* run & fix ICU4C tests
319
320* update Java data files
321- refresh just the UCD-related files, just to be safe
322- see (ICU4C)/source/data/icu4j-readme.txt
323- mkdir /tmp/icu4j
324- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
325 output:
326 ...
327 Unicode .icu files built to ./out/build/icudt53l
328 echo timestamp > uni-core-data
329 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
330 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
331 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
332 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
333 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
334 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
335 mkdir -p /tmp/icu4j/main/shared/data
336 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
337 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
338 mkdir -p /tmp/icu4j/main/shared/data
339 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
340 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
341- copy the big-endian Unicode data files to another location,
342 separate from the other data files
343 ICUDT=icudt54b
344 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
345 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
346 cd ~/svn.icu/uni70/dbg/data/out/icu4j
347 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
348 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
349 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
350 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
351 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
352 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
353- refresh ICU4J
354 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
355
356* update CollationFCD.java
357 + copy & paste the initializers of lcccIndex[] etc. from
358 ICU4C/source/i18n/collationfcd.cpp to
359 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
360
361* refresh Java test .txt files
362- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
363 cd $ICU_SRC_DIR/source/data/unidata
364 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
365 cd ../../test/testdata
366 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
367 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
368
369* UCA
370
371- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
372- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
373- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
374- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
375- output files are in ~/svn.unitools/Generated/uca/7.0.0/
376- review data; compare files, use blankweights.sed or similar
377 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
378- cd ~/svn.unitools/Generated/uca/7.0.0/
379- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
380 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
381- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
382 (note removing the underscore before "Rules")
383 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
384- update (ICU4C)/source/test/testdata/CollationTest_*.txt
385 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
386 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
387 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
388 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
389 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
390- run genuca, see command line above
391- rebuild ICU4C
392- refresh ICU4J collation data:
393 (subset of instructions above for properties data refresh, except copies all coll/*)
394 ICUDT=icudt54b
395 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
396 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
397 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
398 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
399- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
400- note on intltest: if collate/UCAConformanceTest fails, then
401 utility/MultithreadTest/TestCollators will fail as well;
402 fix the conformance test before looking into the multi-thread test
403- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
404- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
405 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
406
407* When refreshing all of ICU4J data from ICU4C
408- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
409- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
410or
411- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
412
413* run & fix ICU4J tests
414
415*** LayoutEngine script information
416
417(For details see the Unicode 5.2 change log below.)
418
419* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
420 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
421 in the working directory.
422 (It also generates ScriptRunData.cpp, which is no longer needed.)
423
424 The generated files have a current copyright date and "@stable" statement.
425 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
426 for "born stable" Unicode API constants, and to stop parsing ICU version numbers
427 which may not contain dots any more.
428
429- diff current <icu>/source/layout files vs. generated ones
430 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
431 review and manually merge desired changes;
432 fix gratuitous changes, incorrect @draft/@stable and missing aliases;
433 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
434- if you just copy the above files, then
435 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
436 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
437
438*** API additions
439- send notice to icu-design about new born-@stable API (enum constants etc.)
440
441*** merge the Unicode update branches back onto the trunk
442- do not merge the icudata.jar and testdata.jar,
443 instead rebuild them from merged & tested ICU4C
444
445---------------------------------------------------------------------------- ***
446
57a6839d
A
447Unicode 6.3 update
448
449http://www.unicode.org/review/pri249/ -- beta review
450http://www.unicode.org/reports/uax-proposed-updates.html
451http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
452http://www.unicode.org/reports/tr44/tr44-11.html
453
454*** ICU Trac
455
456- ticket 10128: update ICU to Unicode 6.3 beta
457- ticket 10168: update ICU to Unicode 6.3 final
458- C++ branches/markus/uni63 at r33552 from trunk at r33551
459- Java branches/markus/uni63 at r33550 from trunk at r33553
460
461- ticket 10142: implement Unicode 6.3 bidi algorithm additions
462
463*** Unicode version numbers
464- makedata.mak
465- uchar.h
466 (configure.in & configure: have been modified to extract the version from uchar.h)
467- com.ibm.icu.util.VersionInfo
468- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
469
470- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
471 so that the makefiles see the new version number.
472
473*** data files & enums & parser code
474
475* file preparation
476
477- download UCD, UCA & IDNA files
478- make sure that the Unicode data folder passed into preparseucd.py
479 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
480- modify preparseucd.py:
481 parse new file BidiBrackets.txt
482 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
483- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
484- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
485- Check test file diffs for previously commented-out, known-failing data lines;
486 probably need to keep those commented out.
487
488* PropertyAliases.txt changes
489- 1 new Enumerated Property
490 bpt ; Bidi_Paired_Bracket_Type
491 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
492 -> ubidi_props.h & .c & UBiDiProps.java
493 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
494 -> uprops.cpp
495 -> change ubidi.icu format version from 2.0 to 2.1
496- 1 new Miscellaneous Property
497 bpb ; Bidi_Paired_Bracket
498 -> uchar.h & UProperty.java
499 -> ppucd.h & .cpp
500
501* PropertyValueAliases.txt changes
502- 3 Bidi_Paired_Bracket_Type (bpt) values:
503 bpt; c ; Close
504 bpt; n ; None
505 bpt; o ; Open
506 -> uchar.h & UCharacter.BidiPairedBracketType
507 -> ubidi_props.h & .c & UBiDiProps.java
508 -> change ubidi.icu format version from 2.0 to 2.1
509- 4 new Bidi_Class (bc) values:
510 bc ; FSI ; First_Strong_Isolate
511 bc ; LRI ; Left_To_Right_Isolate
512 bc ; RLI ; Right_To_Left_Isolate
513 bc ; PDI ; Pop_Directional_Isolate
514 -> uchar.h & UCharacterEnums.ECharacterDirection
515 -> until the bidi code gets updated,
516 Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
517- 3 new Word_Break (WB) values:
518 WB ; HL ; Hebrew_Letter
519 WB ; SQ ; Single_Quote
520 WB ; DQ ; Double_Quote
521 -> uchar.h & UCharacter.WordBreak
522 -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
523- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
524 (added 2012-10-16)
525 Aghb 239 Caucasian Albanian
526 Mahj 314 Mahajani
527 -> uscript.h
528 -> com.ibm.icu.lang.UScript
529 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
530 replace public static final int \1 = \2;\3
531 -> preparseucd.py _scripts_only_in_iso15924
532 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
533 and in com.ibm.icu.dev.test.lang.TestUScript.java
534 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
535 (not strictly necessary for NOT_ENCODED scripts)
536
537* generate normalization data files
538- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
539- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
540- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
541- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
542- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
543- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
544- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
545
546* build ICU (make install)
547 so that the tools build can pick up the new definitions from the installed header files.
548
549~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
550
551* build Unicode tools using CMake+make
552
553~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
554
555# Location (--prefix) of where ICU was installed.
556set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
557# Location of the ICU source tree.
558set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
559
560~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
561~/svn.icutools/trunk/dbg/unicode/c$ make
562
563* generate core properties data files
564- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
565- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
566- rebuild ICU (make install) & tools
567- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
568- rebuild ICU (make install) & tools
569
570* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
571 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
572- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
573- Unicode 6.0..6.3: U+2260, U+226E, U+226F
574- nothing new in 6.3, no test file to update
575
576* update Java data files
577- refresh just the UCD-related files, just to be safe
578- see (ICU4C)/source/data/icu4j-readme.txt
579- mkdir /tmp/icu4j
580- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
581 output:
582 ...
583 Unicode .icu files built to ./out/build/icudt52l
584 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
585 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
586 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
587 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
588 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
589 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
590 mkdir -p /tmp/icu4j/main/shared/data
591 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
592 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
593 mkdir -p /tmp/icu4j/main/shared/data
594 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
595 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
596- copy the big-endian Unicode data files to another location,
597 separate from the other data files
598 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
599 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
600 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
601 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
602 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
603 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
604 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
605- refresh ICU4J
606 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
607
608* refresh Java test .txt files
609- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
610
611* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
612
613- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
614- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
615- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
616- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
617 (note removing the underscore before "Rules")
618- update (ICU4C)/source/test/testdata/CollationTest_*.txt
619 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
620 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
621- check test file diffs for previously commented-out, known-failing data lines;
622 probably need to keep those commented out
623- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
624- run genuca, see command line above
625- rebuild ICU4C
626- refresh ICU4J collation data:
627 (subset of instructions above for properties data refresh, except copies all coll/*)
628 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
629 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
630 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
631 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
632- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
633- note on intltest: if collate/UCAConformanceTest fails, then
634 utility/MultithreadTest/TestCollators will fail as well;
635 fix the conformance test before looking into the multi-thread test
636
637* test ICU, fix test code where necessary
638
639* When refreshing all of ICU4J data from ICU4C
640- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
641- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
642or
643- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
644
645*** LayoutEngine script information
646- skipped for Unicode 6.3: no new scripts
647
648*** merge the Unicode update branches back onto the trunk
649- do not merge the icudata.jar and testdata.jar,
650 instead rebuild them from merged & tested ICU4C
651
652---------------------------------------------------------------------------- ***
653
51004dcb
A
654Unicode 6.2 update
655
656http://www.unicode.org/review/pri230/
657http://www.unicode.org/versions/beta-6.2.0.html
658http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
659http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values
660http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol
661http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols
662http://www.unicode.org/reports/tr46/tr46-8.html IDNA
663http://unicode.org/Public/idna/6.2.0/
664
665*** ICU Trac
666
667- ticket 9515: Unicode 6.2: final ICU update
668
669- ticket 9514: UCA 6.2: fix UCARules.txt
670
671- ticket 9437: update ICU to Unicode 6.2
672- C++ branches/markus/uni62 at r32050 from trunk at r32041
673- Java branches/markus/uni62 at r32068 from trunk at r32066
674
675*** Unicode version numbers
676- makedata.mak
677- uchar.h
678 (configure.in & configure: have been modified to extract the version from uchar.h)
679- com.ibm.icu.util.VersionInfo
680- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
681
682*** data files & enums & parser code
683
684* file preparation
685
686- download UCD, UCA & IDNA files
687- make sure that the Unicode data folder passed into preparseucd.py
688 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
689- modify preparseucd.py: NamesList.txt is now in UTF-8
690- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
691- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
692- Check test file diffs for previously commented-out, known-failing data lines;
693 probably need to keep those commented out.
694
695* PropertyValueAliases.txt changes
696- 1 new Line_Break (lb) value:
697 lb ; RI ; Regional_Indicator
698 -> uchar.h & UCharacter.LineBreak
699- 1 new Word_Break (WB) value:
700 WB ; RI ; Regional_Indicator
701 -> uchar.h & UCharacter.WordBreak
702- 1 new Grapheme_Cluster_Break (GCB) value:
703 GCB; RI ; Regional_Indicator
704 -> uchar.h & UCharacter.GraphemeClusterBreak
705
706* 3 new numeric values
707 The new value -1, which was really supposed to be NaN but that would have required
708 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
709 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
710 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
711 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
712 The two new values 216000 and 432000 require an addition to the encoding of numeric values.
713 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
714 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
715 -> uprops.h, uchar.c & UCharacterProperty.java
716 -> cucdtst.c & UCharacterTest.java
717
718* generate normalization data files
719- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
720- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
721- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
722- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
723- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
724- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
725- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
726
727* build ICU (make install)
728 so that the tools build can pick up the new definitions from the installed header files.
729* build Unicode tools using CMake+make
730
731* generate core properties data files
732- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
733- in initial bootstrapping, change the UCA version
734 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
735- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
736- rebuild ICU (make install) & tools
737 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
738 check if the UCA version in FractionalUCA.txt matches the new Unicode version
739 (see step above)
740- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
741- rebuild ICU (make install) & tools
742
743* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
744 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
745- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
746- Unicode 6.0..6.2: U+2260, U+226E, U+226F
747- nothing new in 6.2, no test file to update
748
749* update Java data files
750- refresh just the UCD-related files, just to be safe
751- see (ICU4C)/source/data/icu4j-readme.txt
752- mkdir /tmp/icu4j
753- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
754 output:
755 ...
756 Unicode .icu files built to ./out/build/icudt50l
757 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
758 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
759 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
760 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
761 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
762 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
763 mkdir -p /tmp/icu4j/main/shared/data
764 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
765 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
766 mkdir -p /tmp/icu4j/main/shared/data
767 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
768 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
769- copy the big-endian Unicode data files to another location,
770 separate from the other data files
771 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
772 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
773 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
774 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
775 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
776 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
777 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
778- refresh ICU4J
779 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
780
781* refresh Java test .txt files
782- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
783
784* UCA
785
786- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
787- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
788- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
789- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
790 (note removing the underscore before "Rules")
791- update (ICU4C)/source/test/testdata/CollationTest_*.txt
792 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
793 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
794- check test file diffs for previously commented-out, known-failing data lines;
795 probably need to keep those commented out
796- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
797- run genuca, see command line above
798- rebuild ICU4C
799- refresh ICU4J collation data:
800 (subset of instructions above for properties data refresh, except copies all coll/*)
801 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
802 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
803 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
804 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
805- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
806- note on intltest: if collate/UCAConformanceTest fails, then
807 utility/MultithreadTest/TestCollators will fail as well;
808 fix the conformance test before looking into the multi-thread test
809
810* test ICU, fix test code where necessary
811
812* When refreshing all of ICU4J data from ICU4C
813- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
814- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
815or
816- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
817
818*** LayoutEngine script information
819- skipped for Unicode 6.2: no new scripts
820
821*** merge the Unicode update branches back onto the trunk
822- do not merge the icudata.jar and testdata.jar,
823 instead rebuild them from merged & tested ICU4C
824
825---------------------------------------------------------------------------- ***
73c04bcf 826
4388f060
A
827Future Unicode update
828
829Tools simplified since the Unicode 6.1 update. See
830- http://site.icu-project.org/design/props/ppucd
831- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
832
833* Unicode version numbers
834- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
835
836* file preparation
837- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
838- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
839- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
840- Check test file diffs for previously commented-out, known-failing data lines;
841 probably need to keep those commented out.
842
843* PropertyValueAliases.txt changes
844- Script codes that are in ISO 15924 but not in Unicode are now listed in
845 preparseucd.py, in the _scripts_only_in_iso15924 variable.
846 If there are new ISO codes, then add them.
847 If Unicode adds some of them, then remove them from the .py variable.
848
849* UnicodeData.txt changes
850- No more manual changes for CJK ranges for algorithmic names;
851 those are now written to ppucd.txt and genprops reads them from there.
852
853* generate core properties data files (makeprops.sh was deleted)
854- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
855
856* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
857- it is now generated by preparseucd.py
858
859* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
860- it is now generated by preparseucd.py
861- make sure that the Unicode data folder passed into preparseucd.py
862 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
863 (can be in some subfolder)
864
865* generate normalization data files
866- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
867- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
868- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
869- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
870- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
871- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
872- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
873
874* build ICU (make install)
875* build Unicode tools using CMake+make
876
877* new way to call genuca (makeuca.sh was deleted)
878- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
879
880---------------------------------------------------------------------------- ***
881
882Unicode 6.1 update
883
884*** ICU Trac
885
886- ticket 8995 final update to Unicode 6.1
887- ticket 8994 regenerate source/layout/CanonData.cpp
888
889- ticket 8961 support Unicode "Age" value *names*
890- ticket 8963 support multiple character name aliases & types
891
892- ticket 8827 "update ICU to Unicode 6.1"
893- C++ branches/markus/uni61 at r30864 from trunk at r30843
894- Java branches/markus/uni61 at r30865 from trunk at r30863
895
896*** Unicode version numbers
897- makedata.mak
898- uchar.h
899 (configure.in & configure: have been modified to extract the version from uchar.h)
900- com.ibm.icu.util.VersionInfo
901- icutools/unicode/makedefs.sh
902 + also review & update other definitions in that file,
903 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
904
905*** data files & enums & parser code
906
907* file preparation
908
909~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
910- This prepares both unidata and testdata files in respective output subfolders.
911- Check test file diffs for previously commented-out, known-failing data lines;
912 probably need to keep those commented out.
913
914* PropertyValueAliases.txt changes
915- 11 new block names:
916 Arabic_Extended_A
917 Arabic_Mathematical_Alphabetic_Symbols
918 Chakma
919 Meetei_Mayek_Extensions
920 Meroitic_Cursive
921 Meroitic_Hieroglyphs
922 Miao
923 Sharada
924 Sora_Sompeng
925 Sundanese_Supplement
926 Takri
927 -> add to uchar.h
928 -> add to UCharacter.UnicodeBlock IDs
929 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
930 replace public static final int \1_ID = \2; \3
931 -> add to UCharacter.UnicodeBlock objects
932 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
933 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
934- 1 new Joining_Group (jg) value:
935 Rohingya_Yeh
936 -> uchar.h & UCharacter.JoiningGroup
937- 2 new Line_Break (lb) values:
938 CJ=Conditional_Japanese_Starter
939 HL=Hebrew_Letter
940 -> uchar.h & UCharacter.LineBreak
941- 7 new scripts:
942 sc ; Cakm ; Chakma
943 sc ; Merc ; Meroitic_Cursive
944 sc ; Mero ; Meroitic_Hieroglyphs
945 sc ; Plrd ; Miao
946 sc ; Shrd ; Sharada
947 sc ; Sora ; Sora_Sompeng
948 sc ; Takr ; Takri
949 -> remove these from SyntheticPropertyValueAliases.txt
950 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
951 and in com.ibm.icu.dev.test.lang.TestUScript.java
952- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
953 (added 2011-06-21)
954 Khoj 322 Khojki
955 Tirh 326 Tirhuta
956 and another one added 2011-12-09
957 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
958 -> uscript.h
959 -> com.ibm.icu.lang.UScript
960 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
961 replace public static final int \1 = \2;\3
962 -> SyntheticPropertyValueAliases.txt
963 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
964 and in com.ibm.icu.dev.test.lang.TestUScript.java
965
966* UnicodeData.txt changes
967- the last Unihan code point changes from U+9FCB to U+9FCC
968 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
969 + do change gennames.c
970 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
971
972* DerivedBidiClass.txt changes
973- 2 new default-AL blocks:
974# Arabic Extended-A: U+08A0 - U+08FF (was default-R)
975# Arabic Mathematical Alphabetic Symbols:
976# U+1EE00 - U+1EEFF (was default-R)
977- 2 new default-R blocks:
978# Meroitic Hieroglyphs:
979# U+10980 - U+1099F
980# Meroitic Cursive: U+109A0 - U+109FF
981 -> should be picked up by the explicit data in the file
982
983* NameAliases.txt changes
984- from
985 # Each line has two fields
986 # First field: Code point
987 # Second field: Alias
988- to
989 # Each line has three fields, as described here:
990 #
991 # First field: Code point
992 # Second field: Alias
993 # Third field: Type
994- Also, the file previously allowed multiple aliases but only now does it
995 actually provide multiple, even multiple of the same type. For example,
996 FEFF;BYTE ORDER MARK;alternate
997 FEFF;BOM;abbreviation
998 FEFF;ZWNBSP;abbreviation
999- This breaks our gennames parser, unames.icu data structure, and API.
1000 Fix gennames to only pick up "correction" aliases.
1001 New ticket #8963 for further changes.
1002
1003* run genpname/preparse.pl (on Linux)
1004 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1005 + make sure that data.h is writable
1006 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1007 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1008
1009* build ICU (make install)
1010 so that the tools build can pick up the new definitions from the installed header files.
1011* build Unicode tools (at least genpname) using CMake+make
1012
1013* run genpname
1014 (builds both pnames.icu and propname_data.h)
1015- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1016- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1017
1018* build ICU (make install)
1019* build Unicode tools using CMake+make
1020
1021* update source/data/unidata/norm2/nfkc_cf.txt
1022- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1023
1024* update source/data/unidata/norm2/uts46.txt
1025- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1026 to ~/svn.icu/tools/trunk/src/unicode/py
1027- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
1028- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1029- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1030
1031* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1032 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1033- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1034- Unicode 6.0..6.1: U+2260, U+226E, U+226F
1035- nothing new in 6.1, no test file to update
1036
1037* generate core properties data files
1038- in initial bootstrapping, change the UCA version
1039 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1040- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1041- rebuild ICU & tools
1042 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1043 check if the UCA version in FractionalUCA.txt matches the new Unicode version
1044 (see step above)
1045- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
1046 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1047- rebuild ICU & tools
1048
1049* update Java data files
1050- refresh just the UCD-related files, just to be safe
1051- see (ICU4C)/source/data/icu4j-readme.txt
1052- mkdir /tmp/icu4j
1053- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1054 output:
1055 ...
1056 Unicode .icu files built to ./out/build/icudt49l
1057 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1058 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
1059 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1060 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1061 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
1062 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
1063 mkdir -p /tmp/icu4j/main/shared/data
1064 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1065 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
1066 mkdir -p /tmp/icu4j/main/shared/data
1067 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1068 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
1069- copy the big-endian Unicode data files to another location,
1070 separate from the other data files
1071 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1072 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1073 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1074 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
1075 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1076 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1077 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1078- refresh ICU4J
1079 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1080
1081* refresh Java test .txt files
1082- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1083
1084* test ICU so far, fix test code where necessary
1085- temporarily ignore collation issues that look like UCA/UCD mismatches,
1086 until UCA data is updated
1087
1088* UCA
1089
1090- get output from Mark's tools; look in
1091 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
1092- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1093- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1094 (note removing the underscore before "Rules")
1095- update (ICU)/source/test/testdata/CollationTest_*.txt
1096 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1097 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1098- check test file diffs for previously commented-out, known-failing data lines;
1099 probably need to keep those commented out
1100- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1101- run makeuca.sh:
1102 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1103- rebuild ICU4C
1104- refresh ICU4J collation data:
1105 (subset of instructions above for properties data refresh, except copies all coll/*)
1106 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1107 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1108 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1109 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1110- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1111- note on intltest: if collate/UCAConformanceTest fails, then
1112 utility/MultithreadTest/TestCollators will fail as well;
1113 fix the conformance test before looking into the multi-thread test
1114
1115* When refreshing all of ICU4J data from ICU4C
1116- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1117- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1118or
1119- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1120
1121*** LayoutEngine script information
1122
1123(For details see the Unicode 5.2 change log below.)
1124
1125* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1126 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1127 in the working directory.
1128 (It also generates ScriptRunData.cpp, which is no longer needed.)
1129
1130 The generated files have a current copyright date and "@draft" statement.
1131
1132- diff current <icu>/source/layout files vs. generated ones
1133 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1134 review and manually merge desired changes;
1135 fix gratuitous changes, incorrect @draft and missing aliases;
1136 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1137- if you just copy the above files, then
1138 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1139 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1140
1141*** merge the Unicode update branches back onto the trunk
1142- do not merge the icudata.jar and testdata.jar,
1143 instead rebuild them from merged & tested ICU4C
1144
1145---------------------------------------------------------------------------- ***
1146
1147ICU 4.8 (no Unicode update, just new script codes)
1148
1149* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1150 (added 2010-12-21)
1151 Afak 439 Afaka
1152 Jurc 510 Jurchen
1153 Mroo 199 Mro, Mru
1154 Nshu 499 Nüshu
1155 Shrd 319 Sharada, Śāradā
1156 Sora 398 Sora Sompeng
1157 Takr 321 Takri, Ṭākrī, Ṭāṅkrī
1158 Tang 520 Tangut
1159 Wole 480 Woleai
1160 -> uscript.h
1161 -> com.ibm.icu.lang.UScript
1162 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1163 replace public static final int \1 = \2;\3
1164 -> genpname/SyntheticPropertyValueAliases.txt
1165 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1166 and in com.ibm.icu.dev.test.lang.TestUScript.java
1167
1168* run genpname/preparse.pl (on Linux)
1169 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1170 + make sure that data.h is writable
1171 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1172 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1173
1174* rebuild Unicode tools (at least genpname) using make
1175- You might first need to "make install" ICU so that the tools build can pick
1176 up the new definitions from the installed header files.
1177
1178* run genpname
1179 (builds both pnames.icu and propname_data.h)
1180- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1181- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1182- rebuild ICU & tools
1183
1184* run genprops
1185- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1186- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1187- rebuild ICU & tools
1188
1189* update Java data files
1190- refresh just the UCD-related files, just to be safe
1191- see (ICU4C)/source/data/icu4j-readme.txt
1192- mkdir /tmp/icu4j
1193- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1194- copy the big-endian Unicode data files to another location,
1195 separate from the other data files
1196 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1197 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1198 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1199- refresh ICU4J
1200 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
1201
1202* should have updated the layout engine script codes but forgot
1203
1204---------------------------------------------------------------------------- ***
1205
729e4ab9
A
1206Unicode 6.0 update
1207
1208*** related ICU Trac tickets
1209
12107264 Unicode 6.0 Update
1211
1212*** Unicode version numbers
1213- makedata.mak
1214- uchar.h
1215 (configure.in & configure: have been modified to extract the version from uchar.h)
1216- com.ibm.icu.util.VersionInfo
1217
1218*** data files & enums & parser code
1219
1220* file preparation
1221
1222~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
1223- This now prepares both unidata and testdata files in respective output subfolders.
1224
1225* PropertyAliases.txt changes
1226- new Script_Extensions property defined in the new ScriptExtensions.txt file
1227 but not listed in PropertyAliases.txt; reported to unicode.org;
1228 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
1229 scx; Script_Extensions
1230 -> uchar.h with new UProperty section
1231 -> com.ibm.icu.lang.UProperty, parallel with uchar.h
1232
1233* PropertyValueAliases.txt changes
1234- 12 new block names:
1235 Alchemical_Symbols
1236 Bamum_Supplement
1237 Batak
1238 Brahmi
1239 CJK_Unified_Ideographs_Extension_D
1240 Emoticons
1241 Ethiopic_Extended_A
1242 Kana_Supplement
1243 Mandaic
1244 Miscellaneous_Symbols_And_Pictographs
1245 Playing_Cards
1246 Transport_And_Map_Symbols
1247 -> add to uchar.h
1248 -> add to UCharacter.UnicodeBlock
1249 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1250 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1251- Joining_Group (jg) values:
1252 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
1253 -> uchar.h & UCharacter.JoiningGroup
1254- 3 new scripts:
1255 sc ; Batk ; Batak
1256 sc ; Brah ; Brahmi
1257 sc ; Mand ; Mandaic
1258 -> remove these from SyntheticPropertyValueAliases.txt
1259 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
1260 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1261 and in com.ibm.icu.dev.test.lang.TestUScript.java
1262- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1263 (added 2009-11-11..2010-07-18)
1264 Bass 259 Bassa Vah
1265 Dupl 755 Duployan shortand
1266 Elba 226 Elbasan
1267 Gran 343 Grantha
1268 Kpel 436 Kpelle
1269 Loma 437 Loma
1270 Mend 438 Mende
1271 Merc 101 Meroitic Cursive
1272 Narb 106 Old North Arabian
1273 Nbat 159 Nabataean
1274 Palm 126 Palmyrene
1275 Sind 318 Sindhi
1276 Wara 262 Warang Citi
1277 -> uscript.h
1278 -> com.ibm.icu.lang.UScript
1279 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1280 replace public static final int \1 = \2;\3
1281 -> SyntheticPropertyValueAliases.txt
1282 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1283 and in com.ibm.icu.dev.test.lang.TestUScript.java
1284- ISO 15924 name change
1285 Mero 100 Meroitic Hieroglyphs (was Meroitic)
1286 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
1287- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
1288
1289* UnicodeData.txt changes
1290- new CJK block:
1291 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
1292 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
1293 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
1294
1295* build Unicode tools using CMake+make
1296
1297* run genpname/preparse.pl (on Linux)
1298 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1299 + make sure that data.h is writable
1300 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1301 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1302
1303* rebuild Unicode tools (at least genpname) using make
1304- You might first need to "make install" ICU so that the tools build can pick
1305 up the new definitions from the installed header files.
1306
1307* run genpname
1308- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1309- rebuild ICU & tools
1310
1311* update source/data/unidata/norm2/nfkc_cf.txt
1312- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1313
1314* update source/data/unidata/norm2/uts46.txt
1315- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
1316 to ~/svn.icu/tools/trunk/src/unicode/py
1317- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
1318- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1319- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1320
1321* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1322 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1323- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1324- Unicode 6.0: U+2260, U+226E, U+226F
1325
1326* generate core properties data files
1327- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1328- rebuild ICU & tools
1329- run makeuca.sh so that genuca picks up the new nfc.nrm:
1330 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1331- rebuild ICU & tools
1332
1333* implement new Script_Extensions property (provisional)
1334- parser & generator: genprops & uprops.icu
1335- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
1336- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
1337
1338* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
1339- (one-time change)
1340- genbidi/gencase/genprops tools changes
1341- re-run makeprops.sh (see above)
1342- UCharacterProperty.java, UCharacterTypeIterator.java,
1343 UBiDiProps.java, UCaseProps.java, and several others with minor changes;
1344 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
1345
1346* update Java data files
1347- refresh just the UCD-related files, just to be safe
1348- see (ICU4C)/source/data/icu4j-readme.txt
1349- mkdir /tmp/icu4j
1350- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1351 output:
1352 ...
1353 Unicode .icu files built to ./out/build/icudt45l
1354 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1355 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1356 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1357 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
1358 mkdir -p /tmp/icu4j/main/shared/data
1359 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1360- copy the big-endian Unicode data files to another location,
1361 separate from the other data files
1362 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1363 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1364 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1365 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
1366 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1367 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1368 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1369- refresh ICU4J
1370 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1371
1372* refresh Java test .txt files
1373- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1374
1375* un-hardcode normalization skippable (NF*_Inert) test data
1376- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
1377
1378* copy updated break iterator test files
1379- now handled by early ucdcopy.py and
1380 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
1381 (old instructions:
1382 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
1383 to ~/svn.icu/trunk/src/source/test/testdata)
1384- they are not used in ICU4J
1385
1386* UCA
1387
1388- get output from Mark's tools; look in
1389 http://www.unicode.org/~book/incoming/mark/uca6.0.0/
1390 http://www.macchiato.com/unicode/utc/additional-uca-files
1391 http://www.unicode.org/Public/UCA/6.0.0/
1392 http://www.unicode.org/~mdavis/uca/
1393- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1394- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1395- update Han-implicit ranges for new CJK extensions:
1396 swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1397- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
1398 do not add it into invuca so that tailoring primary-after an ignorable works
1399- genuca: permit space between [variable top] bytes
1400- ucol.cpp: treat noncharacters like unassigned rather than ignorable
1401- run makeuca.sh:
1402 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1403- rebuild ICU4C
1404- refresh ICU4J collation data:
1405 (subset of instructions above for properties data refresh, except copies all coll/*)
1406 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1407 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1408 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1409 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1410- update (ICU)/source/test/testdata/CollationTest_*.txt
1411 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1412 with output from Mark's Unicode tools
1413- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1414- note on intltest: if collate/UCAConformanceTest fails, then
1415 utility/MultithreadTest/TestCollators will fail as well;
1416 fix the conformance test before looking into the multi-thread test
1417
1418* When refreshing all of ICU4J data from ICU4C
1419- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1420- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1421or
1422- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1423
1424*** LayoutEngine script information
1425
1426(For details see the Unicode 5.2 change log below.)
1427
1428* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1429ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1430ScriptRunData.cpp, which is no longer needed.)
1431
1432The generated files have a current copyright date and "@draft" statement.
1433
1434* copy the above files into <icu>/source/layout, replacing the old files.
1435* fix mixed line endings
1436* review the diffs and fix incorrect @draft and missing aliases;
1437 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1438* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1439
1440---------------------------------------------------------------------------- ***
1441
1442Unicode 5.2 update
1443
1444*** related ICU Trac tickets
1445
14467084 Unicode 5.2
1447
14487167 verify collation bytes
14497235 Java test NAME_ALIAS
14507236 Java DerivedCoreProperties.txt test
14517237 Java BidiTest.txt
14527238 UTrie2 in core unidata
14537239 test for tailoring gaps
14547240 Java fix CollationMiscTest
14557243 update layout engine for Unicode 5.2
1456
1457*** Unicode version numbers
1458- makedata.mak
1459- uchar.h
1460- configure.in & configure
1461- update ucdVersion in gennames.c if an algorithmic range changes
1462
1463*** data files & enums & parser code
1464
1465* file preparation
1466
1467python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
1468- includes finding files regardless of version numbers,
1469 copying them, and performing the equivalent processing of the
1470 ucdstrip and ucdmerge tools on the desired set of files
1471
1472* notes on changes
1473- PropertyAliases.txt
1474 moved from numeric to enumerated:
1475 ccc ; Canonical_Combining_Class
1476 new string properties:
1477 NFKC_CF ; NFKC_Casefold
1478 Name_Alias; Name_Alias
1479 new binary properties:
1480 Cased ; Cased
1481 CI ; Case_Ignorable
1482 CWCF ; Changes_When_Casefolded
1483 CWCM ; Changes_When_Casemapped
1484 CWKCF ; Changes_When_NFKC_Casefolded
1485 CWL ; Changes_When_Lowercased
1486 CWT ; Changes_When_Titlecased
1487 CWU ; Changes_When_Uppercased
1488 new CJK Unihan properties (not supported by ICU)
1489- PropertyValueAliases.txt
1490 new block names
1491 new scripts
1492 one script code change:
1493 sc ; Qaai ; Inherited
1494 ->
1495 sc ; Zinh ; Inherited ; Qaai
1496 new Line_Break (lb) value:
1497 lb ; CP ; Close_Parenthesis
1498 new Joining_Group (jg) values: Farsi_Yeh, Nya
1499 other new values:
1500 ccc; 214; ATA ; Attached_Above
1501- DerivedBidiClass.txt
1502 new default-R range: U+1E800 - U+1EFFF
1503- UnicodeData.txt
1504 all of the ISO comments are gone
1505 new CJK block end:
1506 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
1507 new CJK block:
1508 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
1509 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
1510
1511* genpname
1512- run preparse.pl
1513 + cd \svn\icuproj\icu\trunk\source\tools\genpname
1514 + make sure that data.h is writable
1515 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
1516 + preparse.pl complains with errors like the following:
1517 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
1518 This is because ICU 4.0 had scripts from ISO 15924 which are now
1519 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
1520 and PropertyValueAliases.txt.
1521 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
1522 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
1523 + preparse.pl complains with errors about block names missing from uchar.h; add them
1524
1525* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1526- new block & script values
1527 + 26 new blocks
1528 copy new blocks from Blocks.txt
1529 MS VC++ 2008 regular expression:
1530 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
1531 replace with " UBLOCK_\3 = 172, /*[\1]*/"
1532 + several new script values already added in ICU 4.0 for ISO 15924 coverage
1533 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
1534 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
1535 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
1536 (added to SyntheticPropertyValueAliases.txt)
1537- new Joining Group (JG) values: Farsi_Yeh, Nya
1538- new Line_Break (lb) value:
1539 lb ; CP ; Close_Parenthesis
1540
1541* hardcoded Unihan range end/limit
1542- Unihan range end moves from 9FC3 to 9FCB
1543 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
1544 + do change gennames.c
1545
1546* Compare definitions of new binary properties with what we used to use
1547 in algorithms, to see if the definitions changed.
1548- Verified that definitions for Cased and Case_Ignorable are unchanged.
1549 The gencase tool now parses the newly public Case_Ignorable values
1550 in case the definition changes in the future.
1551
1552* uchar.c & uprops.h & uprops.c & genprops
1553- new numeric values that didn't exist in Unicode data before:
1554 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
1555 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
1556 therefore redesign the encoding of numeric types and values for formatVersion 6;
1557 design for simple numbers up to at least 144 ("one gross"),
1558 large values up to at least 10^20,
1559 and fractions with numerators -1..17 and denominators 1..16
1560 to cover current and expected future values
1561 (e.g., more Han numeric values, Meroitic twelfths)
1562
1563* reimplement Hangul_Syllable_Type for new Jamo characters
1564- the old code assumed that all Jamo characters are in the 11xx block
1565- Unicode 5.2 fills holes there and adds new Jamo characters in
1566 A960..A97F; Hangul Jamo Extended-A
1567 and in
1568 D7B0..D7FF; Hangul Jamo Extended-B
1569- Hangul_Syllable_Type can be trivially derived from a subset of
1570 Grapheme_Cluster_Break values
1571
1572* build Unicode data source code for hardcoding core data
1573C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
1574
1575ICU data make path is \svn\icuproj\icu\trunk\source\data\
1576ICU root path is \svn\icuproj\icu\trunk
1577Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1578Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
1579Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
1580Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
1581Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
1582Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
1583Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
1584Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
1585Creating data file for Unicode Property Names
1586Creating data file for Unicode Character Properties
1587Creating data file for Unicode Case Mapping Properties
1588Creating data file for Unicode BiDi/Shaping Properties
1589Creating data file for Unicode Normalization
1590Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
1591Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
1592
1593- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
1594 and rebuild the common library
1595
1596*** UCA
1597
1598- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
1599- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
1600- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
1601[ Begin obsolete instructions:
1602 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
1603 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
1604 on Windows:
1605 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
1606 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
1607 End obsolete instructions]
1608- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1609 not just the *_STUB.txt files
1610- note on intltest: if collate/UCAConformanceTest fails, then
1611 utility/MultithreadTest/TestCollators will fail as well;
1612 fix the conformance test before looking into the multi-thread test
1613
1614*** Implement Cased & Case_Ignorable properties
1615- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
1616- Problem: These properties should be disjoint, but aren't
1617- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
1618- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
1619
1620*** Implement Changes_When_Xyz properties
1621- without stored data
1622
1623*** Implement Name_Alias property
1624- add it as another name field in unames.icu
1625- make it available via u_charName() and UCharNameChoice and
1626- consider it in u_charFromName()
1627
1628*** Break iterators
1629
1630* Update break iterator rules to new UAX versions and new property values
1631* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
1632
1633*** new BidiTest file
1634- review format and data
1635- copy BidiTest.txt to source/test/testdata
1636- write test code using this data
1637- fix ICU code where it fails the conformance test
1638
1639*** Java
1640- generally, find and update code corresponding to C/C++
1641- UCharacter.UnicodeBlock constants:
1642 a) add an _ID integer per new block, update COUNT
1643 b) add a class instance per new block
1644 Visual Studio regex:
1645 find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
1646 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1647- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
1648
1649- port test changes to Java
1650
1651*** LayoutEngine script information
1652
1653(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
1654
1655* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1656ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1657ScriptRunData.cpp, which is no longer needed.)
1658
1659The generated files have a current copyright date and "@draft" statement.
1660
1661-> Eric Mader wrote in email on 20090930:
1662 "I think the tool has been modified to update @draft to @stable for
1663 older scripts and to add @draft for new scripts.
1664 (I worked with an intern on this last year.)
1665 You should check the output after you run it."
1666
1667* copy the above files into <icu>/source/layout, replacing the old files.
1668* fix mixed line endings
1669* review the diffs and fix incorrect @draft and missing aliases
1670* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1671
1672Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1673and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1674
1675-> Eric Mader wrote in email on 20090930:
1676 "This is just a matter of making sure that all the per-script tables have
1677 entries for any new scripts that were added.
1678 If any new Indic characters were added, then the class tables in
1679 IndicClassTables.cpp should be updated to reflect this.
1680 John Emmons should know how to do this if it's required."
1681
1682* rebuild the layout and layoutex libraries.
1683
1684*** Documentation
1685- Update User Guide
1686 + Jamo_Short_Name, sfc->scf, binary property value aliases
1687
1688---------------------------------------------------------------------------- ***
1689
46f4442e
A
1690Unicode 5.1 update
1691
1692*** related ICU Trac tickets
1693
16945696 Update to Unicode 5.1
1695
1696*** Unicode version numbers
1697- makedata.mak
1698- uchar.h
1699- configure.in & configure
1700- update ucdVersion in gennames.c if an algorithmic range changes
1701
1702*** data files & enums & parser code
1703
1704* file preparation
1705- ucdstrip:
1706 DerivedCoreProperties.txt
1707 DerivedNormalizationProps.txt
1708 NormalizationTest.txt
1709 PropList.txt
1710 Scripts.txt
1711 GraphemeBreakProperty.txt
1712 SentenceBreakProperty.txt
1713 WordBreakProperty.txt
1714- ucdstrip and ucdmerge:
1715 EastAsianWidth.txt
1716 LineBreak.txt
1717
1718* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
1719copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
1720copy 5.1.0\ucd\Blocks.txt ..\unidata\
1721copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
1722copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
1723copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
1724copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
1725copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
1726copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
1727copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
1728copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
1729copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
1730copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
1731copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
1732
1733ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
1734ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
1735ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
1736ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
1737ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
1738ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
1739ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
1740ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
1741ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
1742ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
1743
1744* genpname
1745- run preparse.pl
1746 + cd \svn\icuproj\icu\uni51\source\tools\genpname
1747 + make sure that data.h is writable
1748 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
1749 + preparse.pl complains with errors like the following:
1750 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
1751 This is because ICU 3.8 had scripts from ISO 15924 which are now
1752 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
1753 and PropertyValueAliases.txt.
1754 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
1755 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
1756 + PropertyValueAliases.txt now explicitly contains values for boolean properties:
1757 N/Y, No/Yes, F/T, False/True
1758 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
1759 It will use further values from the file if present.
1760
1761* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1762- new block & script values
1763 + 17 new blocks
1764 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
1765 (removed from SyntheticPropertyValueAliases.txt)
1766 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
1767 (added to SyntheticPropertyValueAliases.txt)
1768- uprops.icu (uprops.h) only provides 7 bits for script codes.
1769 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
1770 There is none above 127 yet which is the script code for an
1771 assigned Unicode character, so ICU 4.0 uprops.icu does not store any
1772 script code values greater than 127.
1773 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
1774 in a parallel bit field, and that overflows now.
1775 Also, future values >=128 would be incompatible anyway.
1776 uprops.h is modified to move around several of the bit fields
1777 in the properties vector words, and now uses 8 bits for the script code.
1778 Two other bit fields also grow to accommodate future growth:
1779 Block (current count: 172) grows from 8 to 9 bits,
1780 and Word_Break grows from 4 to 5 bits.
1781- renamed property Simple_Case_Folding (sfc->scf)
1782 + nothing to be done: handled as normal alias
1783- new property JSN Jamo_Short_Name
1784 + no new API: only contributes to the Name property
1785- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
1786- new Joining Group (JG) value: Burushashki_Yeh_Barree
1787- new Sentence_Break (SB) values:
1788 SB ; CR ; CR
1789 SB ; EX ; Extend
1790 SB ; LF ; LF
1791 SB ; SC ; SContinue
1792- new Word_Break (WB) values:
1793 WB ; CR ; CR
1794 WB ; Extend ; Extend
1795 WB ; LF ; LF
1796 WB ; MB ; MidNumLet
1797
1798* Further changes in the 2008-02-29 update:
1799- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
1800 because they should not normally be invisible.
1801- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
1802- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
1803- new Word_Break (WB) value: NL=Newline
1804
1805* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
1806- Unihan range end moves from 9FBB to 9FC3
1807 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
1808 + do change gennames.c
1809
1810* build Unicode data source code for hardcoding core data
1811C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
1812
1813ICU data make path is \svn\icuproj\icu\uni51\source\data\
1814ICU root path is \svn\icuproj\icu\uni51
1815Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1816Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
1817Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
1818Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
1819Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
1820Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
1821Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
1822Creating data file for Unicode Character Properties
1823Creating data file for Unicode Case Mapping Properties
1824Creating data file for Unicode BiDi/Shaping Properties
1825Creating data file for Unicode Normalization
1826Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
1827Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
1828
1829- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
1830 and rebuild the common library
1831
1832*** Break iterators
1833
1834* Update break iterator rules to new UAX versions and new property values
1835
1836*** UCA
1837
1838* update FractionalUCA.txt and UCARules.txt with new canonical closure
1839
1840*** Test suites
1841- Test that APIs using Unicode property value aliases (like UnicodeSet)
1842 support all of the boolean values N/Y, No/Yes, F/T, False/True
1843 -> TestBinaryValues() tests in both cintltst and intltest
1844
1845*** LayoutEngine script information
1846* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
1847ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
1848ScriptRunData.cpp, which is no longer needed.)
1849
1850The generated files have a current copyright date and "@draft" statement.
1851
1852* copy the above files into <icu>/source/layout, replacing the old files.
1853
1854Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1855and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1856
1857* rebuild the layout and layoutex libraries.
1858
1859*** Documentation
1860- Update User Guide
1861 + Jamo_Short_Name, sfc->scf, binary property value aliases
1862
1863---------------------------------------------------------------------------- ***
1864
73c04bcf
A
1865Unicode 5.0 update
1866
1867*** related Jitterbugs
1868
18695084 RFE: Update to Unicode 5.0
1870
1871*** data files & enums & parser code
1872
1873* file preparation
1874- ucdstrip:
1875 DerivedCoreProperties.txt
1876 DerivedNormalizationProps.txt
1877 NormalizationTest.txt
1878 PropList.txt
1879 Scripts.txt
1880 GraphemeBreakProperty.txt
1881 SentenceBreakProperty.txt
1882 WordBreakProperty.txt
1883- ucdstrip and ucdmerge:
1884 EastAsianWidth.txt
1885 LineBreak.txt
1886
46f4442e 1887* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
73c04bcf
A
1888copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
1889copy 5.0.0\ucd\Blocks.txt ..\unidata\
1890copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
1891copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
1892copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
1893copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
1894copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
1895copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
1896copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
1897copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
1898copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
1899copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
1900copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
1901
1902ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
1903ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
1904ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
1905ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
1906ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
1907ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
1908ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
1909ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
1910ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
1911ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
1912
1913* update FractionalUCA.txt and UCARules.txt with new canonical closure
1914
1915* genpname
1916- run preparse.pl
1917 + make sure that data.h is writable
1918 + perl preparse.pl \cvs\oss\icu > out.txt
1919
1920* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1921- new block & script values
1922 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
1923
1924* build Unicode data source code for hardcoding core data
1925C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
1926
1927ICU data make path is \cvs\oss\icu\source\data\
1928ICU root path is \cvs\oss\icu
1929Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1930[etc.]
1931Creating data file for Unicode Character Properties
1932Creating data file for Unicode Case Mapping Properties
1933Creating data file for Unicode BiDi/Shaping Properties
1934Creating data file for Unicode Normalization
1935Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
1936Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
1937
1938- copy the .c source files to C:\cvs\oss\icu\source\common
1939 and rebuild the common library
1940
1941*** Unicode version numbers
1942- makedata.mak
1943- uchar.h
1944- configure.in
1945
1946*** LayoutEngine script information
1947* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
1948ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
1949ScriptRunData.cpp, which is no longer needed.)
1950
1951The generated files have a current copyright date and "@draft" statement.
1952
1953* copy the above files into <icu>/source/layout, replacing the old files.
1954
1955Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1956and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1957
1958* rebuild the layout and layoutex libraries.
1959
1960---------------------------------------------------------------------------- ***
1961
1962Unicode 4.1 update
1963
1964*** related Jitterbugs
1965
19664332 RFE: Update to Unicode 4.1
19674157 RBBI, TR29 4.1 updates
1968
1969*** data files & enums & parser code
1970
1971* file preparation
1972- ucdstrip:
1973 DerivedCoreProperties.txt
1974 DerivedNormalizationProps.txt
1975 NormalizationTest.txt
1976 GraphemeBreakProperty.txt
1977 SentenceBreakProperty.txt
1978 WordBreakProperty.txt
1979- ucdstrip and ucdmerge:
1980 EastAsianWidth.txt
1981 LineBreak.txt
1982
1983* add new files to the repository
1984 GraphemeBreakProperty.txt
1985 SentenceBreakProperty.txt
1986 WordBreakProperty.txt
1987
1988* update FractionalUCA.txt and UCARules.txt with new canonical closure
1989
1990* genpname
1991- handle new enumerated properties in sub read_uchar
1992- run preparse.pl
1993
1994* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1995- new binary properties
1996 + Pattern_Syntax
1997 + Pattern_White_Space
1998- new enumerated properties
1999 + Grapheme_Cluster_Break
2000 + Sentence_Break
2001 + Word_Break
2002- new block & script & line break values
2003
2004* gencase
2005- case-ignorable changes
2006 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2007 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
2008
2009*** Unicode version numbers
2010- makedata.mak
2011- uchar.h
2012- configure.in
2013
2014*** tests
2015- verify that u_charMirror() round-trips
2016- test all new properties and some new values of old properties
2017
2018*** other code
2019
2020* hardcoded Unihan range end/limit
2021- Unihan range end moves from 9FA5 to 9FBB
2022 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
2023 + do not modify BOCU/BOCSU code because that would change the encoding
2024 and break binary compatibility!
2025 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
2026 NamePrepProfile.txt
2027 + ignore trietest.c: test data is arbitrary
2028 + ignore tstnorm.cpp: test optimization, not important
2029 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
2030 + do change line_th.txt and word_th.txt
2031 by replacing hardcoded ranges with the new property values
2032 + do change gennames.c
2033
2034source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2035source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2036source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
2037
2038* case mappings
2039- compare new special casing context conditions with previous ones
2040 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2041
2042* genpname
2043- consider storing only the short name if it is the same as the long name
2044
2045*** other reviews
2046- UAX #29 changes (grapheme/word/sentence breaks)
2047- UAX #14 changes (line breaks)
2048- Pattern_Syntax & Pattern_White_Space
2049
2050---------------------------------------------------------------------------- ***
2051
374ca955
A
2052Unicode 4.0.1 update
2053
2054*** related Jitterbugs
2055
20563170 RFE: Update to Unicode 4.0.1
20573171 Add new Unicode 4.0.1 properties
20583520 use Unicode 4.0.1 updates for break iteration
2059
2060*** data files & enums & parser code
2061
2062* file preparation
2063- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
2064- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
2065
2066* file fixes
2067- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
2068 according to PRI #26
2069 http://www.unicode.org/review/resolved-pri.html#pri26
2070- undone again because no corrigendum in sight;
2071 instead modified tests to not check consistency on this for Unicode 4.0.1
2072
2073* ucdterms.txt
2074- update from http://www.unicode.org/copyright.html
2075 formatted for plain text
2076
2077* uchar.h & uprops.h & uprops.c & genprops
2078- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
2079- add U_LB_INSEPARABLE due to a spelling fix
2080 + put short name comment only on line with new constant
2081 for genpname perl script parser
2082- new binary properties
2083 + STerm
2084 + Variation_Selector
2085
2086* genpname
2087- fix genpname perl script so that it doesn't choke on more than 2 names per property value
2088- perl script: correctly calculate the maximum number of fields per row
2089
2090* uscript.h
2091- new script code Hrkt=Katakana_Or_Hiragana
2092
2093* gennorm.c track changes in DerivedNormalizationProps.txt
2094- "FNC" -> "FC_NFKC"
2095- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
2096
2097* genprops/props2.c track changes in DerivedNumericValues.txt
2098- changed from 3 columns to 2, dropping the numeric type
2099 + assume that the type is always numeric for Han characters,
2100 and that only those are added in addition to what UnicodeData.txt lists
2101
2102*** Unicode version numbers
2103- makedata.mak
2104- uchar.h
2105- configure.in
2106
2107*** tests
2108- update test of default bidi classes according to PRI #28
2109 /tsutil/cucdtst/TestUnicodeData
2110 http://www.unicode.org/review/resolved-pri.html#pri28
2111- bidi tests: change exemplar character for ES depending on Unicode version
2112- change hardcoded expected property values where they change
2113
2114*** other code
2115
2116* name matching
2117- read UCD.html
2118
2119* scripts
2120- use new Hrkt=Katakana_Or_Hiragana
2121
2122* ZWJ & ZWNJ
2123- are now part of combining character sequences
2124- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ