1 * Copyright (C) 2016 and later: Unicode, Inc. and others.
2 * License & terms of use: http://www.unicode.org/copyright.html
3 * Copyright (C) 2004-2016, International Business Machines
4 * Corporation and others. All Rights Reserved.
6 * file name: changes.txt
8 * tab size: 8 (not used)
11 * created on: 2004may06
12 * created by: Markus W. Scherer
14 * change log for Unicode updates
16 * For each new Unicode version, during the beta period,
17 * I copy the change log for the previous version to the top of this file.
18 * I adjust the versions, tickets, URLs, and paths.
19 * I work my way through the steps listed in the log, top to bottom,
20 * adjusting the log as necessary.
21 * I report problems to the UTC and/or CLDR and/or ICU.
22 * Before the data is final, I "turn the crank" several more times,
23 * using appropriate subsets of the steps.
25 ---------------------------------------------------------------------------- ***
27 * New ISO 15924 script codes
29 Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
30 until they are encoded in Unicode,
31 or can be assumed to be encoded in the next Unicode version.
32 Script enum constant names want to follow the Unicode script property value aliases,
33 which are assigned only when the scripts are encoded.
34 When we encode scripts early and guess wrong, then we have confusing enum constants
35 and have sometimes added aliases.
37 Variant script codes like Latf and Aran that are not subject to separate encoding
38 can be added at any time.
39 (For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
41 We add script codes used in CLDR or in the spoof checker.
42 This includes combination/alias codes like Hanb and Jamo.
43 See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
44 and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
46 We add special Z* script codes like Zsye.
48 For new script codes see http://www.unicode.org/iso15924/codechanges.html
50 ---------------------------------------------------------------------------- ***
52 Unicode 12.1 update for ICU 64.2
54 ** This is an abbreviated update with one new character for the new
55 ** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
56 https://en.wikipedia.org/wiki/Reiwa_period
58 http://www.unicode.org/versions/Unicode12.1.0/
60 ICU-20497 Unicode 12.1
62 cldrbug 11978: Unicode 12.1
64 * Command-line environment setup
66 UNICODE_DATA=~/unidata/uni121/20190403
67 CLDR_SRC=~/svn.cldr/uni
71 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
72 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
73 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
75 *** Unicode version numbers
78 - com.ibm.icu.util.VersionInfo
79 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
81 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
82 so that the makefiles see the new version number.
83 cd $ICU_ROOT/dbg/icu4c
84 ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
86 *** data files & enums & parser code
89 - mkdir -p $UNICODE_DATA
90 - download Unicode files into $UNICODE_DATA
91 + subfolders: emoji, idna, security, ucd, uca
92 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
94 * for manual diffs and for Unicode Tools input data updates:
95 remove version suffixes from the file names
96 ~$ unidata/desuffixucd.py $UNICODE_DATA
97 (see https://sites.google.com/site/unicodetools/inputdata)
99 * process and/or copy files
100 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
101 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
102 + For debugging, and tweaking how ppucd.txt is written,
103 the tool has an --only_ppucd option:
104 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
106 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
108 * build ICU (make install)
109 so that the tools build can pick up the new definitions from the installed header files.
111 $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
113 * update spoof checker UnicodeSet initializers:
114 inclusionPat & recommendedPat in uspoof.cpp
115 INCLUSION & RECOMMENDED in SpoofChecker.java
116 - make sure that the Unicode Tools tree contains the latest security data files
117 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
118 - update the hardcoded version number there in the DIRECTORY path
119 - run the tool (no special environment variables needed)
120 - copy & paste from the Console output into the .cpp & .java files
122 * generate normalization data files
123 cd $ICU_ROOT/dbg/icu4c
124 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
125 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
126 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
127 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
128 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
130 * build ICU (make install)
131 so that the tools build can pick up the new definitions from the installed header files.
133 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
135 * build Unicode tools using CMake+make
137 $ICU_SRC/tools/unicode/c/icudefs.txt:
139 # Location (--prefix) of where ICU was installed.
140 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
141 # Location of the ICU4C source tree.
142 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
145 mkdir -p tools/unicode/c
148 $ICU_ROOT/dbg/tools/unicode/c$
149 cmake ../../../../src/tools/unicode/c
152 * generate core properties data files
153 $ICU_ROOT/dbg/tools/unicode/c$
154 genprops/genprops $ICU_SRC/icu4c
155 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
156 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
157 - rebuild ICU (make install) & tools
159 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
160 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
161 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
162 - Unicode 6.0..12.1: U+2260, U+226E, U+226F
163 - nothing new in this Unicode version, no test file to update
165 * run & fix ICU4C tests
166 - Andy handles RBBI & spoof check test failures
168 * collation: CLDR collation root, UCA DUCET
170 - UCA DUCET goes into Mark's Unicode tools, see
171 https://sites.google.com/site/unicodetools/home#TOC-UCA
172 diff the main mapping file, look for bad changes
173 (for example, more bytes per weight for common characters)
174 ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
175 ~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
177 - CLDR root data files are checked into $CLDR_SRC/common/uca/
178 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
180 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
181 cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
182 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
183 cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
184 (note removing the underscore before "Rules")
185 cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
186 - restore TODO diffs in UCARules.txt
187 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
188 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
189 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
190 from the CLDR root files (..._CLDR_..._SHORT.txt)
191 cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
192 cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
193 cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
194 - if CLDR common/uca/unihan-index.txt changes, then update
195 CLDR common/collation/root.xml <collation type="private-unihan">
196 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
198 - run genuca, see command line above
202 https://sites.google.com/site/unicodetools/unihan
204 org.unicode.draft.GenerateUnihanCollators
207 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
208 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
209 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
210 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
213 org.unicode.draft.GenerateUnihanCollatorFiles
214 with the same arguments
217 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
218 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
221 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
222 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
223 - run CLDR unit tests, commit to CLDR
224 - generate ICU zh collation data: run CLDR
225 org.unicode.cldr.icu.NewLdml2IcuConverter
226 with program arguments
228 -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
229 -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
230 -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
231 -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
235 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
238 * run & fix ICU4C tests, now with new CLDR collation root data
239 - run all tests with the collation test data *_SHORT.txt or the full files
240 (the full ones have comments, useful for debugging)
241 - note on intltest: if collate/UCAConformanceTest fails, then
242 utility/MultithreadTest/TestCollators will fail as well;
243 fix the conformance test before looking into the multi-thread test
245 * update Java data files
246 - refresh just the UCD/UCA-related/derived files, just to be safe
247 - see (ICU4C)/source/data/icu4j-readme.txt
248 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
249 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
252 make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
253 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
254 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
255 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
256 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
257 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
258 mkdir -p /tmp/icu4j/main/shared/data
259 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
260 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
261 mkdir -p /tmp/icu4j/main/shared/data
262 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
263 make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
264 - copy the big-endian Unicode data files to another location,
265 separate from the other data files,
266 and then refresh ICU4J
267 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
268 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
269 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
270 cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
271 cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
272 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
273 cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
274 cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
275 cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
276 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
278 * When refreshing all of ICU4J data from ICU4C
279 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
280 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
282 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
284 * update CollationFCD.java
285 + copy & paste the initializers of lcccIndex[] etc. from
286 ICU4C/source/i18n/collationfcd.cpp to
287 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
289 * refresh Java test .txt files
290 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
291 cd $ICU_SRC/icu4c/source/data/unidata
292 cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
293 cd ../../test/testdata
294 cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
295 cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
297 * run & fix ICU4J tests
300 - send notice to icu-design about new born-@stable API (enum constants etc.)
302 *** CLDR numbering systems
303 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
304 for example, look for
305 ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
306 in new blocks (Blocks.txt)
307 Unicode 12: using Unicode 12 CLDR ticket #11478
308 hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
309 wcho 1E2F0..1E2F9 Wancho
310 Unicode 11: using Unicode 11 CLDR ticket #10978
311 rohg 10D30..10D39 Hanifi_Rohingya
312 gong 11DA0..11DA9 Gunjala_Gondi
313 Earlier: CLDR tickets specific to adding new numbering systems.
314 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
315 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
317 *** merge the Unicode update branches back onto the trunk
318 - do not merge the icudata.jar and testdata.jar,
319 instead rebuild them from merged & tested ICU4C
320 - make sure that changes to Unicode tools are checked in:
321 http://www.unicode.org/utility/trac/log/trunk/unicodetools
323 ---------------------------------------------------------------------------- ***
325 Unicode 12.0 update for ICU 64
327 http://www.unicode.org/versions/Unicode12.0.0/
328 http://unicode.org/versions/beta-12.0.0.html
329 https://www.unicode.org/review/pri389/
330 http://www.unicode.org/reports/uax-proposed-updates.html
331 http://www.unicode.org/reports/tr44/tr44-23.html
335 ICU-20111 move text layout properties data into a data file
337 cldrbug 11478: Unicode 12
338 Accidentally used ^/trunk instead of ^/branches/markus/uni12
340 * Command-line environment setup
342 UNICODE_DATA=~/unidata/uni12/20190309
343 CLDR_SRC=~/svn.cldr/uni
345 ICU_SRC=$ICU_ROOT/src
347 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
348 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
349 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
351 *** Unicode version numbers
354 - com.ibm.icu.util.VersionInfo
355 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
357 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
358 so that the makefiles see the new version number.
360 *** data files & enums & parser code
363 - mkdir -p $UNICODE_DATA
364 - download Unicode files into $UNICODE_DATA
365 + subfolders: emoji, idna, security, ucd, uca
366 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
368 * for manual diffs and for Unicode Tools input data updates:
369 remove version suffixes from the file names
370 ~$ unidata/desuffixucd.py $UNICODE_DATA
371 (see https://sites.google.com/site/unicodetools/inputdata)
373 * process and/or copy files
374 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
375 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
376 + For debugging, and tweaking how ppucd.txt is written,
377 the tool has an --only_ppucd option:
378 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
380 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
382 * build ICU (make install)
383 so that the tools build can pick up the new definitions from the installed header files.
385 $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
387 * new constants for new property values
388 - preparseucd.py error:
389 ValueError: missing uchar.h enum constants for some property values:
390 [(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
391 u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
392 u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
393 (u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
394 = PropertyValueAliases.txt new property values (diff old & new .txt files)
395 blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
396 blk; Elymaic ; Elymaic
397 blk; Nandinagari ; Nandinagari
398 blk; Nyiakeng_Puachue_Hmong ; Nyiakeng_Puachue_Hmong
399 blk; Ottoman_Siyaq_Numbers ; Ottoman_Siyaq_Numbers
400 blk; Small_Kana_Ext ; Small_Kana_Extension
401 blk; Symbols_And_Pictographs_Ext_A ; Symbols_And_Pictographs_Extended_A
402 blk; Tamil_Sup ; Tamil_Supplement
405 use long property names for enum constants,
406 for the trailing comment get the block start code point: diff old & new Blocks.txt
407 -> add to UCharacter.UnicodeBlock IDs
408 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
409 replace public static final int \1_ID = \2; \3
410 -> add to UCharacter.UnicodeBlock objects
411 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
412 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
415 sc ; Hmnp ; Nyiakeng_Puachue_Hmong
416 sc ; Nand ; Nandinagari
418 -> uscript.h & com.ibm.icu.lang.UScript
419 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
420 and in com.ibm.icu.dev.test.lang.TestUScript.java
422 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
423 (not strictly necessary for NOT_ENCODED scripts)
424 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
426 * update spoof checker UnicodeSet initializers:
427 inclusionPat & recommendedPat in uspoof.cpp
428 INCLUSION & RECOMMENDED in SpoofChecker.java
429 - make sure that the Unicode Tools tree contains the latest security data files
430 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
431 - update the hardcoded version number there in the DIRECTORY path
432 - run the tool (no special environment variables needed)
433 - copy & paste from the Console output into the .cpp & .java files
435 * generate normalization data files
436 cd $ICU_ROOT/dbg/icu4c
437 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
438 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
439 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
440 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
441 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
443 * build ICU (make install)
444 so that the tools build can pick up the new definitions from the installed header files.
446 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
448 * build Unicode tools using CMake+make
450 $ICU_SRC/tools/unicode/c/icudefs.txt:
452 # Location (--prefix) of where ICU was installed.
453 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
454 # Location of the ICU4C source tree.
455 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
458 mkdir -p tools/unicode/c
461 $ICU_ROOT/dbg/tools/unicode/c$
462 cmake ../../../../src/tools/unicode/c
465 * generate core properties data files
466 $ICU_ROOT/dbg/tools/unicode/c$
467 genprops/genprops $ICU_SRC/icu4c
468 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
469 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
470 - rebuild ICU (make install) & tools
472 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
473 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
474 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
475 - Unicode 6.0..12.0: U+2260, U+226E, U+226F
476 - nothing new in this Unicode version, no test file to update
478 * run & fix ICU4C tests
479 - update test of default bidi classes:
480 Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
481 see diffs in DerivedBidiClass.txt
482 + /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
483 + UCharacterTest.java TestIteration() defaultBidi[]
484 - Andy handles RBBI & spoof check test failures
486 * collation: CLDR collation root, UCA DUCET
488 - UCA DUCET goes into Mark's Unicode tools, see
489 https://sites.google.com/site/unicodetools/home#TOC-UCA
490 diff the main mapping file, look for bad changes
491 (for example, more bytes per weight for common characters)
492 ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
493 ~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
495 - CLDR root data files are checked into $CLDR_SRC/common/uca/
496 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
498 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
499 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
500 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
501 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
502 (note removing the underscore before "Rules")
503 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
504 - restore TODO diffs in UCARules.txt
505 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
506 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
507 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
508 from the CLDR root files (..._CLDR_..._SHORT.txt)
509 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
510 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
511 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
512 - if CLDR common/uca/unihan-index.txt changes, then update
513 CLDR common/collation/root.xml <collation type="private-unihan">
514 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
516 - run genuca, see command line above;
518 Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
519 FDD1 119CE; [71 CD 02, 05, 05] # Nandinagari first primary (compressible)
520 (add the character to genuca.cpp sampleCharsToScripts[])
521 + This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
522 and cache its values.
523 Works as long as the script metadata is updated before the collation data.
527 https://sites.google.com/site/unicodetools/unihan
529 org.unicode.draft.GenerateUnihanCollators
532 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
533 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
534 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
535 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
538 org.unicode.draft.GenerateUnihanCollatorFiles
539 with the same arguments
542 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
543 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
546 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
547 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
548 - run CLDR unit tests, commit to CLDR
549 - generate ICU zh collation data: run CLDR
550 org.unicode.cldr.icu.NewLdml2IcuConverter
551 with program arguments
553 -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
554 -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
555 -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
556 -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
560 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
563 * run & fix ICU4C tests, now with new CLDR collation root data
564 - run all tests with the collation test data *_SHORT.txt or the full files
565 (the full ones have comments, useful for debugging)
566 - note on intltest: if collate/UCAConformanceTest fails, then
567 utility/MultithreadTest/TestCollators will fail as well;
568 fix the conformance test before looking into the multi-thread test
570 * update Java data files
571 - refresh just the UCD/UCA-related/derived files, just to be safe
572 - see (ICU4C)/source/data/icu4j-readme.txt
573 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
574 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
577 Unicode .icu files built to ./out/build/icudt63l
578 echo timestamp > uni-core-data
579 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
580 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
581 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
582 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
583 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
584 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
585 mkdir -p /tmp/icu4j/main/shared/data
586 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
587 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
588 mkdir -p /tmp/icu4j/main/shared/data
589 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
590 make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
591 - copy the big-endian Unicode data files to another location,
592 separate from the other data files,
593 and then refresh ICU4J
594 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
595 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
596 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
597 cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
598 cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
599 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
600 cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
601 cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
602 cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
603 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
605 * When refreshing all of ICU4J data from ICU4C
606 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
607 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
609 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
611 * update CollationFCD.java
612 + copy & paste the initializers of lcccIndex[] etc. from
613 ICU4C/source/i18n/collationfcd.cpp to
614 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
616 * refresh Java test .txt files
617 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
618 cd $ICU_SRC/icu4c/source/data/unidata
619 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
620 cd ../../test/testdata
621 cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
622 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
624 * run & fix ICU4J tests
627 - send notice to icu-design about new born-@stable API (enum constants etc.)
629 *** CLDR numbering systems
630 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
631 for example, look for
632 ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
633 in new blocks (Blocks.txt)
634 Unicode 12: using Unicode 12 CLDR ticket #11478
635 hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
636 wcho 1E2F0..1E2F9 Wancho
637 Unicode 11: using Unicode 11 CLDR ticket #10978
638 rohg 10D30..10D39 Hanifi_Rohingya
639 gong 11DA0..11DA9 Gunjala_Gondi
640 Earlier: CLDR tickets specific to adding new numbering systems.
641 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
642 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
644 *** merge the Unicode update branches back onto the trunk
645 - do not merge the icudata.jar and testdata.jar,
646 instead rebuild them from merged & tested ICU4C
647 - make sure that changes to Unicode tools are checked in:
648 http://www.unicode.org/utility/trac/log/trunk/unicodetools
650 ---------------------------------------------------------------------------- ***
652 ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
654 * Command-line environment setup
656 UNICODE_DATA=~/unidata/uni11/20180609
657 CLDR_SRC=~/svn.cldr/uni
659 ICU_SRC=$ICU_ROOT/src
661 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
662 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
663 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
667 https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
668 https://unicode-org.atlassian.net/browse/ICU-12850 vo
670 *** data files & enums & parser code
673 - for each of the three new enumerated properties
674 + uchar.h: add the enum UProperty constant UCHAR_<long prop name>
675 + uchar.h: update UCHAR_INT_LIMIT
676 + uchar.h: add the enum U<long prop name>
677 with constants U_<short prop name>_<long value name>
678 + UProperty.java: add the constant <long prop name>
679 + UProperty.java: update INT_LIMIT
680 + UCharacter.java: add the interface <long prop name>
681 with constants <long value name>
683 * process and/or copy files
684 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
685 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
686 + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
688 + For debugging, and tweaking how ppucd.txt is written,
689 the tool has an --only_ppucd option:
690 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
692 * preparseucd.py changes
693 - add new property short names (uppercase) to _prop_and_value_re
694 so that ParseUCharHeader() parses the new enum constants
696 * build ICU (make install)
697 so that the tools build can pick up the new definitions from the installed header files.
699 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
701 * build Unicode tools using CMake+make
703 $ICU_SRC/tools/unicode/c/icudefs.txt:
705 # Location (--prefix) of where ICU was installed.
706 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
707 # Location of the ICU4C source tree.
708 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
711 mkdir -p tools/unicode/c
714 $ICU_ROOT/dbg/tools/unicode/c$
715 cmake ../../../../../src/tools/unicode/c
718 * generate core properties data files
719 $ICU_ROOT/dbg/tools/unicode/c$
720 genprops/genprops $ICU_SRC/icu4c
721 - rebuild ICU (make install) & tools
723 * write data for runtime, hardcoded for now
724 - add genprops/layoutpropsbuilder.cpp with pieces from sibling files
725 - generate new icu4c/source/common/ulayout_props_data.h
726 - for each of the three new enumerated properties
727 + int property max value
728 + small, 8-bit UCPTrie
729 (A small 16-bit trie with bit fields for these three properties
730 is very nearly the same size as the sum of the three.)
733 - uprops.cpp: #include ulayout_props_data.h
734 - uprops.cpp: add getInPC() etc. functions
735 - uprops.cpp: add lines to intProps[], include max values
736 - uprops.h: add UPropertySource constants
737 - uprops.cpp: add uprops_addPropertyStarts(src)
738 - uniset_props.cpp: add to UnicodeSet_initInclusion()
739 - intltest/ucdtest.cpp: write unit tests
741 * update Java data files
742 - refresh just the pnames.icu file with the new property [value] names, just to be safe
743 - see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
744 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
745 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
746 - copy the big-endian Unicode data files to another location,
747 separate from the other data files,
748 and then refresh ICU4J
749 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
750 cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
751 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
754 - UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
755 - UCharacterProperty.java: for each new property
756 + create a nested class to hold its CodePointTrie
757 + initialize it from a string literal
758 + paste in the initializer printed by genprops
759 + add a new IntProperty object to the intProps[] array
760 + use the correct max int value for each property, also printed by genprops
761 - UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
762 - UnicodeSet.java: add to getInclusions()
763 - UCharacterTest.java: write unit tests
765 ---------------------------------------------------------------------------- ***
767 Unicode 11.0 update for ICU 62
769 http://www.unicode.org/versions/Unicode11.0.0/
770 http://unicode.org/versions/beta-11.0.0.html
771 https://www.unicode.org/review/pri372/
772 http://www.unicode.org/reports/uax-proposed-updates.html
773 http://www.unicode.org/reports/tr44/tr44-21.html
775 * Command-line environment setup
777 UNICODE_DATA=~/unidata/uni11/20180521
778 CLDR_SRC=~/svn.cldr/uni
779 ICU_ROOT=~/svn.icu/uni
780 ICU_SRC=$ICU_ROOT/src
782 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
783 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
784 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
788 - ticket:13630: Unicode 11
789 - ^/branches/markus/uni11
793 - cldrbug 10978: Unicode 11
794 - ^/branches/markus/uni11
796 *** Unicode version numbers
799 - com.ibm.icu.util.VersionInfo
800 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
802 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
803 so that the makefiles see the new version number.
805 *** data files & enums & parser code
808 - mkdir -p $UNICODE_DATA
809 - download Unicode files into $UNICODE_DATA
810 + subfolders: emoji, idna, security, ucd, uca
811 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
813 * for manual diffs and for Unicode Tools input data updates:
814 remove version suffixes from the file names
815 ~$ unidata/desuffixucd.py $UNICODE_DATA
816 (see https://sites.google.com/site/unicodetools/inputdata)
818 * process and/or copy files
819 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
820 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
821 + For debugging, and tweaking how ppucd.txt is written,
822 the tool has an --only_ppucd option:
823 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
825 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
827 * build ICU (make install)
828 so that the tools build can pick up the new definitions from the installed header files.
830 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
832 * preparseucd.py changes
834 NameError: unknown property Extended_Pictographic
835 -> add Extended_Pictographic binary property
836 -> add new short names for all Emoji properties
838 * new constants for new property values
839 - preparseucd.py error:
840 ValueError: missing uchar.h enum constants for some property values:
841 [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
842 u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
843 u'Indic_Siyaq_Numbers'])),
844 (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
845 (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
846 (u'GCB', set([u'LinkC', u'Virama'])),
847 (u'WB', set([u'WSegSpace']))]
848 = PropertyValueAliases.txt new property values (diff old & new .txt files)
849 blk; Chess_Symbols ; Chess_Symbols
851 blk; Georgian_Ext ; Georgian_Extended
852 blk; Gunjala_Gondi ; Gunjala_Gondi
853 blk; Hanifi_Rohingya ; Hanifi_Rohingya
854 blk; Indic_Siyaq_Numbers ; Indic_Siyaq_Numbers
855 blk; Makasar ; Makasar
856 blk; Mayan_Numerals ; Mayan_Numerals
857 blk; Medefaidrin ; Medefaidrin
858 blk; Old_Sogdian ; Old_Sogdian
859 blk; Sogdian ; Sogdian
861 use long property names for enum constants,
862 for the trailing comment get the block start code point: diff old & new Blocks.txt
863 -> add to UCharacter.UnicodeBlock IDs
864 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
865 replace public static final int \1_ID = \2; \3
866 -> add to UCharacter.UnicodeBlock objects
867 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
868 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
870 GCB; LinkC ; LinkingConsonant
872 -> uchar.h & UCharacter.GraphemeClusterBreak
873 -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
875 InSC; Consonant_Initial_Postfixed ; Consonant_Initial_Postfixed
876 -> ignore: ICU does not yet support this property
878 jg ; Hanifi_Rohingya_Kinna_Ya ; Hanifi_Rohingya_Kinna_Ya
879 jg ; Hanifi_Rohingya_Pa ; Hanifi_Rohingya_Pa
880 -> uchar.h & UCharacter.JoiningGroup
883 sc ; Gong ; Gunjala_Gondi
885 sc ; Medf ; Medefaidrin
886 sc ; Rohg ; Hanifi_Rohingya
888 sc ; Sogo ; Old_Sogdian
889 -> uscript.h & com.ibm.icu.lang.UScript
890 -> Nushu had been added already
891 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
892 and in com.ibm.icu.dev.test.lang.TestUScript.java
894 WB ; WSegSpace ; WSegSpace
895 -> uchar.h & UCharacter.WordBreak
897 * New short names for emoji properties
899 - short names set in preparseucd.py
902 - boolean emoji property Extended_Pictographic
903 -> added in preparseucd.py
904 -> uchar.h & UProperty.java
905 - misc. property Equivalent_Unified_Ideograph (EqUIdeo)
906 as shown in PropertyValueAliases.txt
909 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
910 (not strictly necessary for NOT_ENCODED scripts)
911 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
913 * update spoof checker UnicodeSet initializers:
914 inclusionPat & recommendedPat in uspoof.cpp
915 INCLUSION & RECOMMENDED in SpoofChecker.java
916 - make sure that the Unicode Tools tree contains the latest security data files
917 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
918 - update the hardcoded version number there in the DIRECTORY path
919 - run the tool (no special environment variables needed)
920 - copy & paste from the Console output into the .cpp & .java files
922 * generate normalization data files
923 cd $ICU_ROOT/dbg/icu4c
924 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
925 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
926 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
927 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
928 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
930 * build ICU (make install)
931 so that the tools build can pick up the new definitions from the installed header files.
933 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
935 * build Unicode tools using CMake+make
937 $ICU_SRC/tools/unicode/c/icudefs.txt:
939 # Location (--prefix) of where ICU was installed.
940 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
941 # Location of the ICU4C source tree.
942 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
945 mkdir -p tools/unicode/c
948 $ICU_ROOT/dbg/tools/unicode/c$
949 cmake ../../../../src/tools/unicode/c
952 * generate core properties data files
953 $ICU_ROOT/dbg/tools/unicode/c$
954 genprops/genprops $ICU_SRC/icu4c
955 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
956 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
957 - rebuild ICU (make install) & tools
960 genprops error: casepropsbuilder: too many exceptions words
961 genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
962 - With the addition of Georgian Mtavruli capital letters,
963 there are now too many simple case mappings with big mapping deltas
964 that yield uncompressible exceptions.
965 - Changing the data structure (now formatVersion 4),
966 adding one bit for no-simple-case-folding (for Cherokee), and
967 one optional slot for a big delta (for most faraway mappings),
968 together with another bit for whether that is negative.
969 This makes most Cherokee & Georgian etc. case mappings compressible,
970 reducing the number of exceptions words.
971 - Further changes to gain one more bit for the exceptions index,
972 for future growth. Details see casepropsbuilder.cpp.
974 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
975 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
976 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
977 - Unicode 6.0..11.0: U+2260, U+226E, U+226F
978 - nothing new in this Unicode version, no test file to update
980 * run & fix ICU4C tests
981 - Andy handles RBBI & spoof check test failures
983 - Errors in char.txt, word.txt, word_POSIX.txt like
984 createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET" at line 46, column 16
985 because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
986 -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
987 not empty, just to get ICU building.
988 -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
989 and properties together with the rules that used them (GB 10, WB 14).
990 -> Andy adjusts the rule sets further to sync with
991 Unicode 11 grapheme, word, and line break spec changes.
993 * collation: CLDR collation root, UCA DUCET
995 - UCA DUCET goes into Mark's Unicode tools, see
996 https://sites.google.com/site/unicodetools/home#TOC-UCA
997 diff the main mapping file, look for bad changes
998 (for example, more bytes per weight for common characters)
999 ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
1000 ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
1002 - CLDR root data files are checked into $CLDR_SRC/common/uca/
1003 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1005 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1006 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1007 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1008 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1009 (note removing the underscore before "Rules")
1010 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1011 - restore TODO diffs in UCARules.txt
1012 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1013 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1014 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1015 from the CLDR root files (..._CLDR_..._SHORT.txt)
1016 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1017 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1018 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1019 - if CLDR common/uca/unihan-index.txt changes, then update
1020 CLDR common/collation/root.xml <collation type="private-unihan">
1021 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1023 - run genuca, see command line above;
1025 Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1026 FDD1 1180B; [71 CC 02, 05, 05] # Dogra first primary (compressible)
1027 (add the character to genuca.cpp sampleCharsToScripts[])
1028 + look up the USCRIPT_ code for the new sample characters
1029 (should be obvious from the comment in the error output)
1030 + *add* mappings to sampleCharsToScripts[], do not replace them
1031 (in case the script sample characters flip-flop)
1032 + insert new scripts in DUCET script order, see the top_byte table
1033 at the beginning of FractionalUCA.txt
1037 https://sites.google.com/site/unicodetools/unihan
1039 org.unicode.draft.GenerateUnihanCollators
1042 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1043 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1044 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1045 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1048 org.unicode.draft.GenerateUnihanCollatorFiles
1049 with the same arguments
1052 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1053 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1056 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1057 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1058 - run CLDR unit tests, commit to CLDR
1059 - generate ICU zh collation data: run CLDR
1060 org.unicode.cldr.icu.NewLdml2IcuConverter
1061 with program arguments
1063 -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1064 -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1065 -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
1066 -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
1070 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1073 * run & fix ICU4C tests, now with new CLDR collation root data
1074 - run all tests with the collation test data *_SHORT.txt or the full files
1075 (the full ones have comments, useful for debugging)
1076 - note on intltest: if collate/UCAConformanceTest fails, then
1077 utility/MultithreadTest/TestCollators will fail as well;
1078 fix the conformance test before looking into the multi-thread test
1080 * update Java data files
1081 - refresh just the UCD/UCA-related/derived files, just to be safe
1082 - see (ICU4C)/source/data/icu4j-readme.txt
1083 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1084 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1087 Unicode .icu files built to ./out/build/icudt61l
1088 echo timestamp > uni-core-data
1089 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1090 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
1091 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1092 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1093 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
1094 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
1095 mkdir -p /tmp/icu4j/main/shared/data
1096 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1097 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
1098 mkdir -p /tmp/icu4j/main/shared/data
1099 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1100 make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
1101 - copy the big-endian Unicode data files to another location,
1102 separate from the other data files,
1103 and then refresh ICU4J
1104 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1105 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1106 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1107 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1108 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1109 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1110 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1111 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1112 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1113 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1115 * When refreshing all of ICU4J data from ICU4C
1116 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1117 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1119 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1121 * update CollationFCD.java
1122 + copy & paste the initializers of lcccIndex[] etc. from
1123 ICU4C/source/i18n/collationfcd.cpp to
1124 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1126 * refresh Java test .txt files
1127 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1128 cd $ICU_SRC/icu4c/source/data/unidata
1129 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1130 cd ../../test/testdata
1131 cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1132 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1134 * run & fix ICU4J tests
1137 - send notice to icu-design about new born-@stable API (enum constants etc.)
1139 *** CLDR numbering systems
1140 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1141 Unicode 11: using Unicode 11 CLDR ticket #10978
1142 rohg 10D30..10D39 Hanifi_Rohingya
1143 gong 11DA0..11DA9 Gunjala_Gondi
1144 Earlier: CLDR tickets specific to adding new numbering systems.
1145 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1146 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1148 *** merge the Unicode update branches back onto the trunk
1149 - do not merge the icudata.jar and testdata.jar,
1150 instead rebuild them from merged & tested ICU4C
1151 - make sure that changes to Unicode tools are checked in:
1152 http://www.unicode.org/utility/trac/log/trunk/unicodetools
1154 ---------------------------------------------------------------------------- ***
1156 Unicode 10.0 update for ICU 60
1158 http://www.unicode.org/versions/Unicode10.0.0/
1159 http://www.unicode.org/versions/beta-10.0.0.html
1160 http://blog.unicode.org/2017/03/unicode-100-beta-review.html
1161 http://www.unicode.org/review/pri350/
1162 http://www.unicode.org/reports/uax-proposed-updates.html
1163 http://www.unicode.org/reports/tr44/tr44-19.html
1165 * Command-line environment setup
1167 UNICODE_DATA=~/unidata/uni10/20170605
1168 CLDR_SRC=~/svn.cldr/uni10
1169 ICU_ROOT=~/svn.icu/uni10
1170 ICU_SRC=$ICU_ROOT/src
1172 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1173 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1174 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1178 - ticket:12985: Unicode 10
1179 - ticket:13061: undo hacks from emoji 5.0 update
1180 - ticket:13062: add Emoji_Component property
1181 - ^/branches/markus/uni10
1185 - cldrbug 10055: Unicode 10
1186 - cldrbug 9882: Unicode 10 script metadata
1187 - cldrbug 10219: numbering systems for Unicode 10
1189 *** Unicode version numbers
1192 - com.ibm.icu.util.VersionInfo
1193 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1195 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1196 so that the makefiles see the new version number.
1198 *** data files & enums & parser code
1201 - mkdir -p $UNICODE_DATA
1202 - download Unicode 10.0 files into $UNICODE_DATA
1203 + subfolders: ucd, uca, idna, security
1204 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1205 - download emoji 5.0 files into $UNICODE_DATA/emoji
1207 * for manual diffs: remove version suffixes from the file names
1208 ~$ unidata/desuffixucd.py $UNICODE_DATA
1209 (see https://sites.google.com/site/unicodetools/inputdata)
1211 * process and/or copy files
1212 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1213 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1214 + For debugging, and tweaking how ppucd.txt is written,
1215 the tool has an --only_ppucd option:
1216 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1218 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1220 * build ICU (make install)
1221 so that the tools build can pick up the new definitions from the installed header files.
1223 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1225 * preparseucd.py changes
1226 - remove or add new Unicode scripts from/to the
1227 only-in-ISO-15924 list according to the error messages:
1228 ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
1229 -> adjust _scripts_only_in_iso15924 as indicated
1231 Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
1232 -> add vo=Vertical_Orientation to _ignored_properties
1233 -> later removed again, parsing the file, even though we do not yet store data for runtime use
1235 * new constants for new property values
1236 - preparseucd.py error:
1237 ValueError: missing uchar.h enum constants for some property values:
1238 [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
1239 u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
1240 (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
1241 u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
1242 u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
1243 (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
1244 = PropertyValueAliases.txt new property values (diff old & new .txt files)
1245 blk; CJK_Ext_F ; CJK_Unified_Ideographs_Extension_F
1246 blk; Kana_Ext_A ; Kana_Extended_A
1247 blk; Masaram_Gondi ; Masaram_Gondi
1249 blk; Soyombo ; Soyombo
1250 blk; Syriac_Sup ; Syriac_Supplement
1251 blk; Zanabazar_Square ; Zanabazar_Square
1253 use long property names for enum constants,
1254 for the trailing comment get the block start code point: diff old & new Blocks.txt
1255 -> add to UCharacter.UnicodeBlock IDs
1256 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1257 replace public static final int \1_ID = \2; \3
1258 -> add to UCharacter.UnicodeBlock objects
1259 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1260 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1262 jg ; Malayalam_Bha ; Malayalam_Bha
1263 jg ; Malayalam_Ja ; Malayalam_Ja
1264 jg ; Malayalam_Lla ; Malayalam_Lla
1265 jg ; Malayalam_Llla ; Malayalam_Llla
1266 jg ; Malayalam_Nga ; Malayalam_Nga
1267 jg ; Malayalam_Nna ; Malayalam_Nna
1268 jg ; Malayalam_Nnna ; Malayalam_Nnna
1269 jg ; Malayalam_Nya ; Malayalam_Nya
1270 jg ; Malayalam_Ra ; Malayalam_Ra
1271 jg ; Malayalam_Ssa ; Malayalam_Ssa
1272 jg ; Malayalam_Tta ; Malayalam_Tta
1273 -> uchar.h & UCharacter.JoiningGroup
1275 sc ; Gonm ; Masaram_Gondi
1278 sc ; Zanb ; Zanabazar_Square
1279 -> uscript.h & com.ibm.icu.lang.UScript
1280 -> Nushu had been added already
1281 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1282 and in com.ibm.icu.dev.test.lang.TestUScript.java
1284 * New properties as shown in PropertyValueAliases.txt changes
1285 - boolean Emoji_Component from emoji 5
1286 -> uchar.h & UProperty.java
1288 # Regional_Indicator (RI)
1290 RI ; N ; No ; F ; False
1291 RI ; Y ; Yes ; T ; True
1292 -> uchar.h & UProperty.java
1293 -> single immutable range, to be hardcoded
1295 # Prepended_Concatenation_Mark (PCM)
1297 PCM; N ; No ; F ; False
1298 PCM; Y ; Yes ; T ; True
1299 -> was new in Unicode 9
1300 -> uchar.h & UProperty.java
1302 # Vertical_Orientation (vo)
1305 vo ; Tr ; Transformed_Rotated
1306 vo ; Tu ; Transformed_Upright
1308 -> only pre-parsed for now, but not yet stored for runtime use
1310 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1311 (not strictly necessary for NOT_ENCODED scripts)
1312 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1314 * generate normalization data files
1315 cd $ICU_ROOT/dbg/icu4c
1316 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1317 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
1318 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1319 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1320 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1322 * build ICU (make install)
1323 so that the tools build can pick up the new definitions from the installed header files.
1325 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1327 * build Unicode tools using CMake+make
1329 $ICU_SRC/tools/unicode/c/icudefs.txt:
1331 # Location (--prefix) of where ICU was installed.
1332 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1333 # Location of the ICU4C source tree.
1334 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
1336 $ICU_ROOT/dbg/tools/unicode/c$
1337 cmake ../../../../src/tools/unicode/c
1340 * generate core properties data files
1341 $ICU_ROOT/dbg/tools/unicode/c$
1342 genprops/genprops $ICU_SRC/icu4c
1343 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1344 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1345 - rebuild ICU (make install) & tools
1347 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1348 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1349 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1350 - Unicode 6.0..10.0: U+2260, U+226E, U+226F
1351 - nothing new in this Unicode version, no test file to update
1353 * run & fix ICU4C tests
1354 - Andy handles RBBI & spoof check test failures
1356 * collation: CLDR collation root, UCA DUCET
1358 - UCA DUCET goes into Mark's Unicode tools, see
1359 https://sites.google.com/site/unicodetools/home#TOC-UCA
1360 - CLDR root data files are checked into $CLDR_SRC/common/uca/
1361 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1363 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1364 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1365 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1366 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1367 (note removing the underscore before "Rules")
1368 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1369 - restore TODO diffs in UCARules.txt
1370 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1371 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1372 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1373 from the CLDR root files (..._CLDR_..._SHORT.txt)
1374 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1375 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1376 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1377 - if CLDR common/uca/unihan-index.txt changes, then update
1378 CLDR common/collation/root.xml <collation type="private-unihan">
1379 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1381 - run genuca, see command line above;
1383 Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
1384 FDD1 11D10; [70 D5 02, 05, 05] # Masaram_Gondi first primary (compressible)
1385 (add the character to genuca.cpp sampleCharsToScripts[])
1386 + look up the USCRIPT_ code for the new sample characters
1387 (should be obvious from the comment in the error output)
1388 + *add* mappings to sampleCharsToScripts[], do not replace them
1389 (in case the script sample characters flip-flop)
1390 + insert new scripts in DUCET script order, see the top_byte table
1391 at the beginning of FractionalUCA.txt
1395 https://sites.google.com/site/unicodetools/unihan
1397 org.unicode.draft.GenerateUnihanCollators
1400 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1401 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1402 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1403 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1406 org.unicode.draft.GenerateUnihanCollatorFiles
1407 with the same arguments
1410 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1411 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1414 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1415 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1416 - run CLDR unit tests, commit to CLDR
1417 - generate ICU zh collation data: run CLDR
1418 org.unicode.cldr.icu.NewLdml2IcuConverter
1419 with program arguments
1421 -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
1422 -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
1423 -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
1424 -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
1428 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1431 * run & fix ICU4C tests, now with new CLDR collation root data
1432 - run all tests with the collation test data *_SHORT.txt or the full files
1433 (the full ones have comments, useful for debugging)
1434 - note on intltest: if collate/UCAConformanceTest fails, then
1435 utility/MultithreadTest/TestCollators will fail as well;
1436 fix the conformance test before looking into the multi-thread test
1438 * update Java data files
1439 - refresh just the UCD/UCA-related/derived files, just to be safe
1440 - see (ICU4C)/source/data/icu4j-readme.txt
1441 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1442 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1445 Unicode .icu files built to ./out/build/icudt60l
1446 echo timestamp > uni-core-data
1447 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1448 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
1449 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1450 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1451 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
1452 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
1453 mkdir -p /tmp/icu4j/main/shared/data
1454 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1455 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
1456 mkdir -p /tmp/icu4j/main/shared/data
1457 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1458 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
1459 - copy the big-endian Unicode data files to another location,
1460 separate from the other data files,
1461 and then refresh ICU4J
1462 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1463 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1464 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1465 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1466 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1467 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1468 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1469 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1470 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1471 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1473 * When refreshing all of ICU4J data from ICU4C
1474 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1475 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1477 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1479 * update CollationFCD.java
1480 + copy & paste the initializers of lcccIndex[] etc. from
1481 ICU4C/source/i18n/collationfcd.cpp to
1482 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1484 * refresh Java test .txt files
1485 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1486 cd $ICU_SRC/icu4c/source/data/unidata
1487 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1488 cd ../../test/testdata
1489 cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1490 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1492 * run & fix ICU4J tests
1495 - send notice to icu-design about new born-@stable API (enum constants etc.)
1497 *** CLDR numbering systems
1498 - look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
1499 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1500 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1502 *** merge the Unicode update branches back onto the trunk
1503 - do not merge the icudata.jar and testdata.jar,
1504 instead rebuild them from merged & tested ICU4C
1505 - make sure that changes to Unicode tools are checked in:
1506 http://www.unicode.org/utility/trac/log/trunk/unicodetools
1508 ---------------------------------------------------------------------------- ***
1510 Emoji 5.0 update for ICU 59
1511 - ICU 59 mostly remains on Unicode 9.0
1512 - except updates bidi and segmentation data to Unicode 10 beta
1514 First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
1516 * Command-line environment setup
1518 ICU_ROOT=~/svn.icu/trunk
1519 ICU_SRC_DIR=$ICU_ROOT/src
1520 ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
1522 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1523 SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
1524 UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
1528 - ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
1529 - changes directly on trunk
1531 *** data files & enums & parser code
1535 - download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
1536 - download emoji 5.0 beta files into the same uni90e50 folder
1537 - download Unicode 10.0 beta files: ucd
1538 + copy Unicode 10 bidi files to the uni90e50/ucd folder:
1540 BidiCharacterTest.txt
1543 extracted/DerivedBidiClass.txt
1544 + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
1548 * preparseucd.py changes
1549 - adjust for combined trunks
1550 - write new copyright lines
1551 - ignore new Emoji_Component property for now
1553 * process and/or copy files
1554 - ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
1555 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1557 - cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
1559 * build ICU (make install)
1560 so that the tools build can pick up the new definitions from the installed header files.
1562 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1564 * build Unicode tools using CMake+make
1566 ~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
1568 # Location (--prefix) of where ICU was installed.
1569 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1570 # Location of the ICU4C source tree.
1571 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
1573 ~/svn.icu/trunk/dbg/tools/unicode/c$
1574 cmake ../../../../src/tools/unicode/c
1577 * generate core properties data files
1578 ~/svn.icu/trunk/dbg/tools/unicode/c$
1579 genprops/genprops $ICU4C_SRC_DIR
1580 - rebuild ICU (make install) & tools
1582 * run & fix ICU4C tests
1583 - Andy handles RBBI & spoof check test failures
1585 * update Java data files
1586 - refresh just the UCD/UCA-related/derived files, just to be safe
1587 - see (ICU4C)/source/data/icu4j-readme.txt
1589 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1592 Unicode .icu files built to ./out/build/icudt59l
1593 echo timestamp > uni-core-data
1594 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1595 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
1596 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1597 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1598 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
1599 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
1600 mkdir -p /tmp/icu4j/main/shared/data
1601 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1602 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
1603 mkdir -p /tmp/icu4j/main/shared/data
1604 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1605 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
1606 - copy the big-endian Unicode data files to another location,
1607 separate from the other data files,
1608 and then refresh ICU4J
1609 cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
1610 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1611 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1612 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1613 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1614 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1615 jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1617 * When refreshing all of ICU4J data from ICU4C
1618 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1619 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
1621 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
1623 * refresh Java test .txt files
1624 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1625 cd $ICU4C_SRC_DIR/source/data/unidata
1626 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1627 cd ../../test/testdata
1628 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1629 cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1631 * run & fix ICU4J tests
1633 ---------------------------------------------------------------------------- ***
1635 Unicode 9.0 update for ICU 58
1637 * Command-line environment setup
1639 ICU_ROOT=~/svn.icu/trunk
1640 ICU_SRC_DIR=$ICU_ROOT/src
1642 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1643 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
1644 UNIDATA=$ICU_SRC_DIR/source/data/unidata
1646 http://www.unicode.org/review/pri323/ -- beta review
1647 http://www.unicode.org/reports/uax-proposed-updates.html
1648 http://www.unicode.org/versions/beta-9.0.0.html
1649 http://www.unicode.org/versions/Unicode9.0.0/
1650 http://www.unicode.org/reports/tr44/tr44-17.html
1654 - ticket:12526: integrate Unicode 9
1655 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
1656 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
1660 - cldrbug 9414: UCA 9
1661 - ^/branches/markus/uni90 at r11518 from trunk at r11517
1663 - cldrbug 8745: Unicode 9.0 script metadata
1665 *** Unicode version numbers
1668 - com.ibm.icu.util.VersionInfo
1669 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1671 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1672 so that the makefiles see the new version number.
1674 *** data files & enums & parser code
1678 - download UCD & IDNA files
1679 - make sure that the Unicode data folder passed into preparseucd.py
1680 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1681 - only for manual diffs: remove version suffixes from the file names
1682 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
1683 (see https://sites.google.com/site/unicodetools/inputdata)
1684 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1685 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
1686 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1688 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
1689 and copy to $UNIDATA
1690 cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
1692 * preparseucd.py changes
1693 - remove or add new Unicode scripts from/to the
1694 only-in-ISO-15924 list according to the error messages:
1695 ValueError: remove ['Tang'] from _scripts_only_in_iso15924
1696 ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
1697 ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
1698 ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
1699 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1700 and in com.ibm.icu.dev.test.lang.TestUScript.java
1701 - DerivedNumericValues.txt new numeric values
1702 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
1703 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH
1704 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS
1705 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH
1706 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS
1707 -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
1708 uchar.c, UCharacterProperty.java
1709 to support a new series of values
1710 - adjust preparseucd.py for Tangut algorithmic names
1712 algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
1714 algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
1715 - avoid block-compressing most String/Miscellaneous property values,
1716 triggered by genprops not coping with a multi-code point Case_Folding on
1717 block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
1718 keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
1720 * PropertyAliases.txt changes
1721 - 1 new property PCM=Prepended_Concatenation_Mark
1722 Ignore: Only useful for layout engines.
1723 Ok to list in ppucd.txt.
1725 * PropertyValueAliases.txt new property values
1727 blk; Bhaiksuki ; Bhaiksuki
1728 blk; Cyrillic_Ext_C ; Cyrillic_Extended_C
1729 blk; Glagolitic_Sup ; Glagolitic_Supplement
1730 blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation
1731 blk; Marchen ; Marchen
1732 blk; Mongolian_Sup ; Mongolian_Supplement
1735 blk; Tangut ; Tangut
1736 blk; Tangut_Components ; Tangut_Components
1738 use long property names for enum constants
1739 -> add to UCharacter.UnicodeBlock IDs
1740 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1741 replace public static final int \1_ID = \2; \3
1742 -> add to UCharacter.UnicodeBlock objects
1743 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1744 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1747 GCB; EBG ; E_Base_GAZ
1748 GCB; EM ; E_Modifier
1749 GCB; GAZ ; Glue_After_Zwj
1751 -> uchar.h & UCharacter.GraphemeClusterBreak
1753 jg ; African_Feh ; African_Feh
1754 jg ; African_Noon ; African_Noon
1755 jg ; African_Qaf ; African_Qaf
1756 -> uchar.h & UCharacter.JoiningGroup
1759 lb ; EM ; E_Modifier
1761 -> uchar.h & UCharacter.LineBreak
1764 sc ; Bhks ; Bhaiksuki
1769 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
1772 WB ; EBG ; E_Base_GAZ
1773 WB ; EM ; E_Modifier
1774 WB ; GAZ ; Glue_After_Zwj
1776 -> uchar.h & UCharacter.WordBreak
1778 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1779 (not strictly necessary for NOT_ENCODED scripts)
1780 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
1782 * generate normalization data files
1784 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1785 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1786 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1787 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1788 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1790 * build ICU (make install)
1791 so that the tools build can pick up the new definitions from the installed header files.
1793 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
1795 * build Unicode tools using CMake+make
1797 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1799 # Location (--prefix) of where ICU was installed.
1800 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
1801 # Location of the ICU source tree.
1802 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
1804 ~/svn.icutools/trunk/dbg/unicode/c$
1805 cmake ../../../src/unicode/c
1808 * generate core properties data files
1809 ~/svn.icutools/trunk/dbg/unicode/c$
1810 genprops/genprops $ICU_SRC_DIR
1811 genuca/genuca --hanOrder implicit $ICU_SRC_DIR
1812 genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
1813 - rebuild ICU (make install) & tools
1815 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1816 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1817 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1818 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
1819 - nothing new in 9.0, no test file to update
1821 * run & fix ICU4C tests
1822 - Andy handles RBBI & spoof check test failures
1824 * collation: CLDR collation root, UCA DUCET
1826 - UCA DUCET goes into Mark's Unicode tools, see
1827 https://sites.google.com/site/unicodetools/home#TOC-UCA
1828 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
1829 cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
1831 - cd (CLDR UCA branch)/common/uca/
1832 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1833 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1834 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1835 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
1836 (note removing the underscore before "Rules")
1837 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1838 - restore TODO diffs in UCARules.txt
1839 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1840 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1841 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1842 from the CLDR root files (..._CLDR_..._SHORT.txt)
1843 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1844 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1845 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1846 - if CLDR common/uca/unihan-index.txt changes, then update
1847 CLDR common/collation/root.xml <collation type="private-unihan">
1848 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
1850 - run genuca, see command line above;
1852 Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
1853 FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)
1854 (add the character to genuca.cpp sampleCharsToScripts[])
1855 + look up the USCRIPT_ code for the new sample characters
1856 (should be obvious from the comment in the error output)
1857 + *add* mappings to sampleCharsToScripts[], do not replace them
1858 (in case the script sample characters flip-flop)
1859 + insert new scripts in DUCET script order, see the top_byte table
1860 at the beginning of FractionalUCA.txt
1865 org.unicode.draft.GenerateUnihanCollators
1867 -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
1868 -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
1869 -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
1870 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
1874 org.unicode.draft.GenerateUnihanCollatorFiles
1875 with the same arguments
1878 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1879 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1882 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1883 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1885 - generate ICU zh collation data: run CLDR
1886 org.unicode.cldr.icu.NewLdml2IcuConverter
1887 with program arguments
1889 -s /home/mscherer/svn.cldr/trunk/common/collation
1890 -m /home/mscherer/svn.cldr/trunk/common/supplemental
1891 -d /home/mscherer/svn.icu/trunk/src/source/data/coll
1892 -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
1895 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
1898 * run & fix ICU4C tests, now with new CLDR collation root data
1899 - run all tests with the collation test data *_SHORT.txt or the full files
1900 (the full ones have comments, useful for debugging)
1901 - note on intltest: if collate/UCAConformanceTest fails, then
1902 utility/MultithreadTest/TestCollators will fail as well;
1903 fix the conformance test before looking into the multi-thread test
1905 * update Java data files
1906 - refresh just the UCD/UCA-related/derived files, just to be safe
1907 - see (ICU4C)/source/data/icu4j-readme.txt
1909 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1912 Unicode .icu files built to ./out/build/icudt58l
1913 echo timestamp > uni-core-data
1914 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
1915 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
1916 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1917 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
1918 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
1919 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
1920 mkdir -p /tmp/icu4j/main/shared/data
1921 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1922 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
1923 mkdir -p /tmp/icu4j/main/shared/data
1924 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1925 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
1926 - copy the big-endian Unicode data files to another location,
1927 separate from the other data files,
1928 and then refresh ICU4J
1929 cd ~/svn.icu/trunk/dbg/data/out/icu4j
1930 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1931 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1932 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1933 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1934 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1935 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1936 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1937 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1938 jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1940 * When refreshing all of ICU4J data from ICU4C
1941 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1942 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1944 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1946 * update CollationFCD.java
1947 + copy & paste the initializers of lcccIndex[] etc. from
1948 ICU4C/source/i18n/collationfcd.cpp to
1949 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1951 * refresh Java test .txt files
1952 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1953 cd $ICU_SRC_DIR/source/data/unidata
1954 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1955 cd ../../test/testdata
1956 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1957 cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1959 * run & fix ICU4J tests
1961 *** LayoutEngine script information
1963 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1964 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1965 in the working directory.
1967 (It also generates ScriptRunData.cpp, which is no longer needed.)
1969 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
1971 which maps ICU versions to the numbers of script/language constants
1972 that were added then.
1973 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
1975 The generated files have a current copyright date and "@deprecated" statement.
1977 * Review changes, fix Java tool if necessary, and copy to ICU4C
1978 cd ~/svn.icu4j/trunk/src
1979 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1980 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
1981 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
1984 - send notice to icu-design about new born-@stable API (enum constants etc.)
1986 *** merge the Unicode update branches back onto the trunk
1987 - do not merge the icudata.jar and testdata.jar,
1988 instead rebuild them from merged & tested ICU4C
1989 - make sure that changes to Unicode tools & ICU tools are checked in
1990 http://www.unicode.org/utility/trac/log/trunk/unicodetools
1991 http://bugs.icu-project.org/trac/log/tools/trunk
1993 ---------------------------------------------------------------------------- ***
1995 New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
1998 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
1999 - new combination/alias codes: Hanb, Jamo
2000 - used in CLDR 29 and in spoof checker
2003 Add new codes to uscript.h & UScript.java, see Unicode update logs.
2004 -> com.ibm.icu.lang.UScript
2005 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2006 replace public static final int \1 = \2; \3
2008 Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
2009 add new script codes.
2010 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
2012 Note: If we have to run preparseucd.py again before the Unicode 9 update,
2013 then we need to manually keep/restore the new script codes.
2015 ICU_ROOT=~/svn.icu/trunk
2016 ICU_SRC_DIR=$ICU_ROOT/src
2018 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2019 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2020 UNIDATA=$ICU_SRC_DIR/source/data/unidata
2022 Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
2023 see http://bugs.icu-project.org/trac/ticket/12141
2025 make install, then icutools cmake & make, then
2026 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2028 Generate Java data as usual, only update pnames.icu & uprops.icu.
2030 *** LayoutEngine script information
2032 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2033 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2034 in the working directory.
2036 (It also generates ScriptRunData.cpp, which is no longer needed.)
2038 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2040 which maps ICU versions to the numbers of script/language constants
2041 that were added then.
2042 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2044 The generated files have a current copyright date and "@deprecated" statement.
2046 * Review changes, fix Java tool if necessary, and copy to ICU4C
2047 cd ~/svn.icu4j/trunk/src
2048 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2049 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2050 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2052 ---------------------------------------------------------------------------- ***
2054 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
2056 Edit preparseucd.py to add & parse new properties.
2057 They share the UCD property namespace but are not listed in PropertyAliases.txt.
2059 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
2060 Initial data from emoji/2.0/
2062 ICU_ROOT=~/svn.icu/trunk
2063 ICU_SRC_DIR=$ICU_ROOT/src
2065 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2066 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2067 UNIDATA=$ICU_SRC_DIR/source/data/unidata
2069 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
2071 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2072 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
2074 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
2076 make install, then icutools cmake & make, then
2077 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2079 Generate Java data as usual, only update pnames.icu & uprops.icu.
2081 ---------------------------------------------------------------------------- ***
2083 Unicode 8.0 update for ICU 56
2085 * Command-line environment setup
2087 ICU_ROOT=~/svn.icu/trunk
2088 ICU_SRC_DIR=$ICU_ROOT/src
2090 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2091 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2092 UNIDATA=$ICU_SRC_DIR/source/data/unidata
2094 http://www.unicode.org/review/pri297/ -- beta review
2095 http://www.unicode.org/reports/uax-proposed-updates.html
2096 http://unicode.org/versions/beta-8.0.0.html
2097 http://www.unicode.org/versions/Unicode8.0.0/
2098 http://www.unicode.org/reports/tr44/tr44-15.html
2102 - ticket:11574: Unicode 8
2103 - C++ branches/markus/uni80 at r37351 from trunk at r37343
2104 - Java branches/markus/uni80 at r37352 from trunk at r37338
2108 - cldrbug 8311: UCA 8
2109 - branches/markus/uni80 at r11518 from trunk at r11517
2111 - cldrbug 8109: Unicode 8.0 script metadata
2112 - cldrbug 8418: Updated segmentation for Unicode 8.0
2114 *** Unicode version numbers
2117 - com.ibm.icu.util.VersionInfo
2118 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2120 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2121 so that the makefiles see the new version number.
2123 *** data files & enums & parser code
2127 - download UCD & IDNA files
2128 - make sure that the Unicode data folder passed into preparseucd.py
2129 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2130 - only for manual diffs: remove version suffixes from the file names
2131 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2132 (see https://sites.google.com/site/unicodetools/inputdata)
2133 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2134 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2135 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2137 - also: from http://unicode.org/Public/security/8.0.0/ download new
2138 confusables.txt & confusablesWholeScript.txt
2139 and copy to $UNIDATA
2140 ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
2141 ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
2143 * initial preparseucd.py changes
2144 - remove new Unicode scripts from the
2145 only-in-ISO-15924 list according to the error message:
2146 ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
2147 from _scripts_only_in_iso15924
2148 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2149 and in com.ibm.icu.dev.test.lang.TestUScript.java
2150 - property and file name change:
2151 IndicMatraCategory -> IndicPositionalCategory
2152 - UnicodeData.txt unusual numeric values (improper fractions)
2153 109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
2154 109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
2155 109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
2156 109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
2157 109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
2158 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
2159 109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
2160 109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
2161 109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
2162 109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
2163 -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
2164 which are listed in DerivedNumericValues.txt;
2165 keeps storage in data file simple
2167 * PropertyValueAliases.txt changes
2168 - 10 new Block (blk) values:
2170 blk; Anatolian_Hieroglyphs ; Anatolian_Hieroglyphs
2171 blk; Cherokee_Sup ; Cherokee_Supplement
2172 blk; CJK_Ext_E ; CJK_Unified_Ideographs_Extension_E
2173 blk; Early_Dynastic_Cuneiform ; Early_Dynastic_Cuneiform
2174 blk; Hatran ; Hatran
2175 blk; Multani ; Multani
2176 blk; Old_Hungarian ; Old_Hungarian
2177 blk; Sup_Symbols_And_Pictographs ; Supplemental_Symbols_And_Pictographs
2178 blk; Sutton_SignWriting ; Sutton_SignWriting
2180 use long property names for enum constants
2181 -> add to UCharacter.UnicodeBlock IDs
2182 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2183 replace public static final int \1_ID = \2; \3
2184 -> add to UCharacter.UnicodeBlock objects
2185 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2186 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2187 - 6 new Script (sc) values:
2190 sc ; Hluw ; Anatolian_Hieroglyphs
2191 sc ; Hung ; Old_Hungarian
2193 sc ; Sgnw ; SignWriting
2194 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2196 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2197 (not strictly necessary for NOT_ENCODED scripts)
2198 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2200 * generate normalization data files
2202 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2203 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2204 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2205 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2206 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2208 * build ICU (make install)
2209 so that the tools build can pick up the new definitions from the installed header files.
2211 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2213 * build Unicode tools using CMake+make
2215 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2217 # Location (--prefix) of where ICU was installed.
2218 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2219 # Location of the ICU source tree.
2220 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2222 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2223 ~/svn.icutools/trunk/dbg/unicode/c$ make
2225 * generate core properties data files
2226 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2227 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2228 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2229 - rebuild ICU (make install) & tools
2230 - run genuca again (see step above) so that it picks up the new nfc.nrm
2231 - rebuild ICU (make install) & tools
2233 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2234 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2235 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2236 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
2237 - nothing new in 8.0, no test file to update
2239 * run & fix ICU4C tests
2240 - bad Cherokee case folding due to difference in fallbacks:
2241 UCD case folding falls back to no mapping,
2242 ICU runtime case folding falls back to lowercasing;
2243 fixed casepropsbuilder.cpp to generate scf mappings to self
2244 when there is an slc mapping but no scf
2245 - Andy handles RBBI & spoof check test failures
2247 * collation: CLDR collation root, UCA DUCET
2249 - UCA DUCET goes into Mark's Unicode tools, see
2250 https://sites.google.com/site/unicodetools/home#TOC-UCA
2251 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2252 - cd (CLDR UCA branch)/common/uca/
2253 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2254 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2255 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2256 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2257 (note removing the underscore before "Rules")
2258 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2259 - restore TODO diffs in UCARules.txt
2260 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2261 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2262 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2263 from the CLDR root files (..._CLDR_..._SHORT.txt)
2264 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2265 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2266 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2267 - if CLDR common/uca/unihan-index.txt changes, then update
2268 CLDR common/collation/root.xml <collation type="private-unihan">
2269 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2270 - run genuca, see command line above;
2272 Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
2273 (add the character to genuca.cpp sampleCharsToScripts[])
2274 + look up the script for the new sample characters
2275 (e.g., in FractionalUCA.txt)
2276 + *add* mappings to sampleCharsToScripts[], do not replace them
2277 (in case the script sample characters flip-flop)
2278 + insert new scripts in DUCET script order, see the top_byte table
2279 at the beginning of FractionalUCA.txt
2282 * run & fix ICU4C tests, now with new CLDR collation root data
2283 - run all tests with the collation test data *_SHORT.txt or the full files
2284 (the full ones have comments, useful for debugging)
2285 - note on intltest: if collate/UCAConformanceTest fails, then
2286 utility/MultithreadTest/TestCollators will fail as well;
2287 fix the conformance test before looking into the multi-thread test
2288 - fixed bug in CollationWeights::getWeightRanges()
2289 exposed by new data and CollationTest::TestRootElements
2291 * update Java data files
2292 - refresh just the UCD/UCA-related/derived files, just to be safe
2293 - see (ICU4C)/source/data/icu4j-readme.txt
2295 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2298 Unicode .icu files built to ./out/build/icudt56l
2299 echo timestamp > uni-core-data
2300 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2301 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
2302 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2303 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2304 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
2305 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
2306 mkdir -p /tmp/icu4j/main/shared/data
2307 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2308 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
2309 mkdir -p /tmp/icu4j/main/shared/data
2310 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2311 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2312 - copy the big-endian Unicode data files to another location,
2313 separate from the other data files,
2314 and then refresh ICU4J
2315 cd ~/svn.icu/trunk/dbg/data/out/icu4j
2316 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2317 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2318 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2319 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2320 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2321 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2322 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2323 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2324 jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2326 * When refreshing all of ICU4J data from ICU4C
2327 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2328 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2330 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2332 * update CollationFCD.java
2333 + copy & paste the initializers of lcccIndex[] etc. from
2334 ICU4C/source/i18n/collationfcd.cpp to
2335 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2337 * refresh Java test .txt files
2338 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2339 cd $ICU_SRC_DIR/source/data/unidata
2340 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2341 cd ../../test/testdata
2342 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2343 cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2345 * run & fix ICU4J tests
2347 *** LayoutEngine script information
2349 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
2350 because the layout engine was deprecated in ICU 54.
2351 Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
2352 to write lines that we used to add manually.
2354 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2355 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2356 in the working directory.
2358 (It also generates ScriptRunData.cpp, which is no longer needed.)
2360 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2362 which maps ICU versions to the numbers of script/language constants
2363 that were added then.
2364 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2366 The generated files have a current copyright date and "@deprecated" statement.
2368 * Review changes, fix Java tool if necessary, and copy to ICU4C
2369 cd ~/svn.icu4j/trunk/src
2370 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2371 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2372 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2375 - send notice to icu-design about new born-@stable API (enum constants etc.)
2377 *** merge the Unicode update branches back onto the trunk
2378 - do not merge the icudata.jar and testdata.jar,
2379 instead rebuild them from merged & tested ICU4C
2380 - make sure that changes to Unicode tools & ICU tools are checked in
2381 http://www.unicode.org/utility/trac/log/trunk/unicodetools
2382 http://bugs.icu-project.org/trac/log/tools/trunk
2384 ---------------------------------------------------------------------------- ***
2386 Unicode 7.0 update for ICU 54
2388 http://www.unicode.org/review/pri271/ -- beta review
2389 http://www.unicode.org/reports/uax-proposed-updates.html
2390 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
2391 http://www.unicode.org/reports/tr44/tr44-13.html
2395 - ticket 10821: Unicode 7.0, UCA 7.0
2396 - C++ branches/markus/uni70 at r35584 from trunk at r35580
2397 - Java branches/markus/uni70 at r35587 from trunk at r35545
2401 - ticket 7195: UCA 7.0 CLDR root collation
2402 - branches/markus/uni70 at r10062 from trunk at r10061
2404 - ticket 6762: script metadata for Unicode 7.0 new scripts
2406 *** Unicode version numbers
2409 - com.ibm.icu.util.VersionInfo
2410 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2412 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2413 so that the makefiles see the new version number.
2415 *** data files & enums & parser code
2419 - download UCD & IDNA files
2420 - make sure that the Unicode data folder passed into preparseucd.py
2421 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2422 - only for manual diffs: remove version suffixes from the file names
2423 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2424 (see https://sites.google.com/site/unicodetools/inputdata)
2425 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2426 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2427 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2428 - Restore TODO diffs in source/data/unidata/UCARules.txt
2430 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
2431 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
2433 - also: from http://unicode.org/Public/security/7.0.0/ download new
2434 confusables.txt & confusablesWholeScript.txt
2435 and copy to $ICU_ROOT/src/source/data/unidata/
2437 * initial preparseucd.py changes
2438 - remove new Unicode scripts from the
2439 only-in-ISO-15924 list according to the error message:
2440 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
2441 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
2442 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
2443 from _scripts_only_in_iso15924
2444 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2445 and in com.ibm.icu.dev.test.lang.TestUScript.java
2446 - NamesList.txt now has a heading with a non-ASCII character
2447 + keep ppucd.txt in platform charset, rather than changing tool/test parsers
2448 + escape non-ASCII characters in heading comments
2449 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
2450 + get the copyright from the first file whose copyright line contains the current year
2452 * PropertyValueAliases.txt changes
2453 - 32 new Block (blk) values:
2454 blk; Bassa_Vah ; Bassa_Vah
2455 blk; Caucasian_Albanian ; Caucasian_Albanian
2456 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers
2457 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended
2458 blk; Duployan ; Duployan
2459 blk; Elbasan ; Elbasan
2460 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended
2461 blk; Grantha ; Grantha
2462 blk; Khojki ; Khojki
2463 blk; Khudawadi ; Khudawadi
2464 blk; Latin_Ext_E ; Latin_Extended_E
2465 blk; Linear_A ; Linear_A
2466 blk; Mahajani ; Mahajani
2467 blk; Manichaean ; Manichaean
2468 blk; Mende_Kikakui ; Mende_Kikakui
2471 blk; Myanmar_Ext_B ; Myanmar_Extended_B
2472 blk; Nabataean ; Nabataean
2473 blk; Old_North_Arabian ; Old_North_Arabian
2474 blk; Old_Permic ; Old_Permic
2475 blk; Ornamental_Dingbats ; Ornamental_Dingbats
2476 blk; Pahawh_Hmong ; Pahawh_Hmong
2477 blk; Palmyrene ; Palmyrene
2478 blk; Pau_Cin_Hau ; Pau_Cin_Hau
2479 blk; Psalter_Pahlavi ; Psalter_Pahlavi
2480 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls
2481 blk; Siddham ; Siddham
2482 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers
2483 blk; Sup_Arrows_C ; Supplemental_Arrows_C
2484 blk; Tirhuta ; Tirhuta
2485 blk; Warang_Citi ; Warang_Citi
2487 use long property names for enum constants
2488 -> add to UCharacter.UnicodeBlock IDs
2489 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2490 replace public static final int \1_ID = \2; \3
2491 -> add to UCharacter.UnicodeBlock objects
2492 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2493 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2494 - 28 new Joining_Group (jg) values:
2495 jg ; Manichaean_Aleph ; Manichaean_Aleph
2496 jg ; Manichaean_Ayin ; Manichaean_Ayin
2497 jg ; Manichaean_Beth ; Manichaean_Beth
2498 jg ; Manichaean_Daleth ; Manichaean_Daleth
2499 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh
2500 jg ; Manichaean_Five ; Manichaean_Five
2501 jg ; Manichaean_Gimel ; Manichaean_Gimel
2502 jg ; Manichaean_Heth ; Manichaean_Heth
2503 jg ; Manichaean_Hundred ; Manichaean_Hundred
2504 jg ; Manichaean_Kaph ; Manichaean_Kaph
2505 jg ; Manichaean_Lamedh ; Manichaean_Lamedh
2506 jg ; Manichaean_Mem ; Manichaean_Mem
2507 jg ; Manichaean_Nun ; Manichaean_Nun
2508 jg ; Manichaean_One ; Manichaean_One
2509 jg ; Manichaean_Pe ; Manichaean_Pe
2510 jg ; Manichaean_Qoph ; Manichaean_Qoph
2511 jg ; Manichaean_Resh ; Manichaean_Resh
2512 jg ; Manichaean_Sadhe ; Manichaean_Sadhe
2513 jg ; Manichaean_Samekh ; Manichaean_Samekh
2514 jg ; Manichaean_Taw ; Manichaean_Taw
2515 jg ; Manichaean_Ten ; Manichaean_Ten
2516 jg ; Manichaean_Teth ; Manichaean_Teth
2517 jg ; Manichaean_Thamedh ; Manichaean_Thamedh
2518 jg ; Manichaean_Twenty ; Manichaean_Twenty
2519 jg ; Manichaean_Waw ; Manichaean_Waw
2520 jg ; Manichaean_Yodh ; Manichaean_Yodh
2521 jg ; Manichaean_Zayin ; Manichaean_Zayin
2522 jg ; Straight_Waw ; Straight_Waw
2523 -> uchar.h & UCharacter.JoiningGroup
2524 - 23 new Script (sc) values:
2525 sc ; Aghb ; Caucasian_Albanian
2526 sc ; Bass ; Bassa_Vah
2527 sc ; Dupl ; Duployan
2530 sc ; Hmng ; Pahawh_Hmong
2532 sc ; Lina ; Linear_A
2533 sc ; Mahj ; Mahajani
2534 sc ; Mani ; Manichaean
2535 sc ; Mend ; Mende_Kikakui
2538 sc ; Narb ; Old_North_Arabian
2539 sc ; Nbat ; Nabataean
2540 sc ; Palm ; Palmyrene
2541 sc ; Pauc ; Pau_Cin_Hau
2542 sc ; Perm ; Old_Permic
2543 sc ; Phlp ; Psalter_Pahlavi
2545 sc ; Sind ; Khudawadi
2547 sc ; Wara ; Warang_Citi
2548 -> uscript.h (many were added before)
2549 comment "Mende Kikakui" for USCRIPT_MENDE
2550 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
2551 -> com.ibm.icu.lang.UScript
2552 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2553 replace public static final int \1 = \2; \3
2554 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2561 Pauc 263 Pau Cin Hau
2563 -> uscript.h (some overlap with additions from Unicode)
2564 -> com.ibm.icu.lang.UScript
2565 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2566 replace public static final int \1 = \2; \3
2567 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
2568 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2569 and in com.ibm.icu.dev.test.lang.TestUScript.java
2571 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2572 (not strictly necessary for NOT_ENCODED scripts)
2573 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2575 * generate normalization data files
2577 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2578 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2579 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
2580 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2581 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2582 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2583 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2584 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2586 * build ICU (make install)
2587 so that the tools build can pick up the new definitions from the installed header files.
2589 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2591 * build Unicode tools using CMake+make
2593 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2595 # Location (--prefix) of where ICU was installed.
2596 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
2597 # Location of the ICU source tree.
2598 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
2600 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2601 ~/svn.icutools/trunk/dbg/unicode/c$ make
2604 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
2605 + add second array of Joining_Group values for at most 10800..10FFF
2606 icutools: unicode/c/genprops/bidipropsbuilder.cpp
2607 icu: source/common/ubidi_props.h/.c/_data.h
2608 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
2610 * generate core properties data files
2611 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2612 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
2613 - rebuild ICU (make install) & tools
2614 - run genuca again (see step above) so that it picks up the new nfc.nrm
2615 - rebuild ICU (make install) & tools
2617 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2618 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2619 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2620 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
2621 - nothing new in 7.0, no test file to update
2623 * run & fix ICU4C tests
2625 * update Java data files
2626 - refresh just the UCD-related files, just to be safe
2627 - see (ICU4C)/source/data/icu4j-readme.txt
2629 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2632 Unicode .icu files built to ./out/build/icudt53l
2633 echo timestamp > uni-core-data
2634 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2635 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
2636 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2637 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2638 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
2639 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
2640 mkdir -p /tmp/icu4j/main/shared/data
2641 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2642 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
2643 mkdir -p /tmp/icu4j/main/shared/data
2644 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2645 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
2646 - copy the big-endian Unicode data files to another location,
2647 separate from the other data files
2649 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2650 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2651 cd ~/svn.icu/uni70/dbg/data/out/icu4j
2652 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2653 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2654 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2655 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2656 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2657 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2659 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2661 * update CollationFCD.java
2662 + copy & paste the initializers of lcccIndex[] etc. from
2663 ICU4C/source/i18n/collationfcd.cpp to
2664 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2666 * refresh Java test .txt files
2667 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2668 cd $ICU_SRC_DIR/source/data/unidata
2669 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2670 cd ../../test/testdata
2671 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2672 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2676 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
2677 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
2678 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
2679 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
2680 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
2681 - review data; compare files, use blankweights.sed or similar
2682 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
2683 - cd ~/svn.unitools/Generated/uca/7.0.0/
2684 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2685 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2686 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2687 (note removing the underscore before "Rules")
2688 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2689 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2690 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2691 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2692 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2693 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2694 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2695 - run genuca, see command line above
2697 - refresh ICU4J collation data:
2698 (subset of instructions above for properties data refresh, except copies all coll/*)
2700 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2701 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2702 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2703 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2704 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
2705 - note on intltest: if collate/UCAConformanceTest fails, then
2706 utility/MultithreadTest/TestCollators will fail as well;
2707 fix the conformance test before looking into the multi-thread test
2708 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
2709 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
2710 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
2712 * When refreshing all of ICU4J data from ICU4C
2713 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2714 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2716 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2718 * run & fix ICU4J tests
2720 *** LayoutEngine script information
2722 (For details see the Unicode 5.2 change log below.)
2724 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2725 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2726 in the working directory.
2727 (It also generates ScriptRunData.cpp, which is no longer needed.)
2729 The generated files have a current copyright date and "@stable" statement.
2730 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
2731 for "born stable" Unicode API constants, and to stop parsing ICU version numbers
2732 which may not contain dots any more.
2734 - diff current <icu>/source/layout files vs. generated ones
2735 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2736 review and manually merge desired changes;
2737 fix gratuitous changes, incorrect @draft/@stable and missing aliases;
2738 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2739 - if you just copy the above files, then
2740 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
2741 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2744 - send notice to icu-design about new born-@stable API (enum constants etc.)
2746 *** merge the Unicode update branches back onto the trunk
2747 - do not merge the icudata.jar and testdata.jar,
2748 instead rebuild them from merged & tested ICU4C
2750 ---------------------------------------------------------------------------- ***
2754 http://www.unicode.org/review/pri249/ -- beta review
2755 http://www.unicode.org/reports/uax-proposed-updates.html
2756 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
2757 http://www.unicode.org/reports/tr44/tr44-11.html
2761 - ticket 10128: update ICU to Unicode 6.3 beta
2762 - ticket 10168: update ICU to Unicode 6.3 final
2763 - C++ branches/markus/uni63 at r33552 from trunk at r33551
2764 - Java branches/markus/uni63 at r33550 from trunk at r33553
2766 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
2768 *** Unicode version numbers
2771 (configure.in & configure: have been modified to extract the version from uchar.h)
2772 - com.ibm.icu.util.VersionInfo
2773 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2775 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2776 so that the makefiles see the new version number.
2778 *** data files & enums & parser code
2782 - download UCD, UCA & IDNA files
2783 - make sure that the Unicode data folder passed into preparseucd.py
2784 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2785 - modify preparseucd.py:
2786 parse new file BidiBrackets.txt
2787 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
2788 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
2789 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2790 - Check test file diffs for previously commented-out, known-failing data lines;
2791 probably need to keep those commented out.
2793 * PropertyAliases.txt changes
2794 - 1 new Enumerated Property
2795 bpt ; Bidi_Paired_Bracket_Type
2796 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
2797 -> ubidi_props.h & .c & UBiDiProps.java
2798 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
2800 -> change ubidi.icu format version from 2.0 to 2.1
2801 - 1 new Miscellaneous Property
2802 bpb ; Bidi_Paired_Bracket
2803 -> uchar.h & UProperty.java
2806 * PropertyValueAliases.txt changes
2807 - 3 Bidi_Paired_Bracket_Type (bpt) values:
2811 -> uchar.h & UCharacter.BidiPairedBracketType
2812 -> ubidi_props.h & .c & UBiDiProps.java
2813 -> change ubidi.icu format version from 2.0 to 2.1
2814 - 4 new Bidi_Class (bc) values:
2815 bc ; FSI ; First_Strong_Isolate
2816 bc ; LRI ; Left_To_Right_Isolate
2817 bc ; RLI ; Right_To_Left_Isolate
2818 bc ; PDI ; Pop_Directional_Isolate
2819 -> uchar.h & UCharacterEnums.ECharacterDirection
2820 -> until the bidi code gets updated,
2821 Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
2822 - 3 new Word_Break (WB) values:
2823 WB ; HL ; Hebrew_Letter
2824 WB ; SQ ; Single_Quote
2825 WB ; DQ ; Double_Quote
2826 -> uchar.h & UCharacter.WordBreak
2827 -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
2828 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2830 Aghb 239 Caucasian Albanian
2833 -> com.ibm.icu.lang.UScript
2834 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2835 replace public static final int \1 = \2;\3
2836 -> preparseucd.py _scripts_only_in_iso15924
2837 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2838 and in com.ibm.icu.dev.test.lang.TestUScript.java
2839 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2840 (not strictly necessary for NOT_ENCODED scripts)
2842 * generate normalization data files
2843 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
2844 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
2845 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
2846 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2847 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2848 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2849 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2851 * build ICU (make install)
2852 so that the tools build can pick up the new definitions from the installed header files.
2854 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2856 * build Unicode tools using CMake+make
2858 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2860 # Location (--prefix) of where ICU was installed.
2861 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
2862 # Location of the ICU source tree.
2863 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
2865 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2866 ~/svn.icutools/trunk/dbg/unicode/c$ make
2868 * generate core properties data files
2869 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
2870 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
2871 - rebuild ICU (make install) & tools
2872 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
2873 - rebuild ICU (make install) & tools
2875 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2876 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2877 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2878 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
2879 - nothing new in 6.3, no test file to update
2881 * update Java data files
2882 - refresh just the UCD-related files, just to be safe
2883 - see (ICU4C)/source/data/icu4j-readme.txt
2885 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2888 Unicode .icu files built to ./out/build/icudt52l
2889 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
2890 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
2891 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2892 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
2893 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
2894 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
2895 mkdir -p /tmp/icu4j/main/shared/data
2896 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2897 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
2898 mkdir -p /tmp/icu4j/main/shared/data
2899 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2900 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
2901 - copy the big-endian Unicode data files to another location,
2902 separate from the other data files
2903 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
2904 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
2905 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
2906 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
2907 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
2908 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
2909 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
2911 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
2913 * refresh Java test .txt files
2914 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2916 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
2918 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
2919 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
2920 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2921 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2922 (note removing the underscore before "Rules")
2923 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2924 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2925 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2926 - check test file diffs for previously commented-out, known-failing data lines;
2927 probably need to keep those commented out
2928 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
2929 - run genuca, see command line above
2931 - refresh ICU4J collation data:
2932 (subset of instructions above for properties data refresh, except copies all coll/*)
2933 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2934 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
2935 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
2936 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
2937 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
2938 - note on intltest: if collate/UCAConformanceTest fails, then
2939 utility/MultithreadTest/TestCollators will fail as well;
2940 fix the conformance test before looking into the multi-thread test
2942 * test ICU, fix test code where necessary
2944 * When refreshing all of ICU4J data from ICU4C
2945 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2946 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2948 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2950 *** LayoutEngine script information
2951 - skipped for Unicode 6.3: no new scripts
2953 *** merge the Unicode update branches back onto the trunk
2954 - do not merge the icudata.jar and testdata.jar,
2955 instead rebuild them from merged & tested ICU4C
2957 ---------------------------------------------------------------------------- ***
2961 http://www.unicode.org/review/pri230/
2962 http://www.unicode.org/versions/beta-6.2.0.html
2963 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
2964 http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values
2965 http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol
2966 http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols
2967 http://www.unicode.org/reports/tr46/tr46-8.html IDNA
2968 http://unicode.org/Public/idna/6.2.0/
2972 - ticket 9515: Unicode 6.2: final ICU update
2974 - ticket 9514: UCA 6.2: fix UCARules.txt
2976 - ticket 9437: update ICU to Unicode 6.2
2977 - C++ branches/markus/uni62 at r32050 from trunk at r32041
2978 - Java branches/markus/uni62 at r32068 from trunk at r32066
2980 *** Unicode version numbers
2983 (configure.in & configure: have been modified to extract the version from uchar.h)
2984 - com.ibm.icu.util.VersionInfo
2985 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2987 *** data files & enums & parser code
2991 - download UCD, UCA & IDNA files
2992 - make sure that the Unicode data folder passed into preparseucd.py
2993 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2994 - modify preparseucd.py: NamesList.txt is now in UTF-8
2995 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
2996 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2997 - Check test file diffs for previously commented-out, known-failing data lines;
2998 probably need to keep those commented out.
3000 * PropertyValueAliases.txt changes
3001 - 1 new Line_Break (lb) value:
3002 lb ; RI ; Regional_Indicator
3003 -> uchar.h & UCharacter.LineBreak
3004 - 1 new Word_Break (WB) value:
3005 WB ; RI ; Regional_Indicator
3006 -> uchar.h & UCharacter.WordBreak
3007 - 1 new Grapheme_Cluster_Break (GCB) value:
3008 GCB; RI ; Regional_Indicator
3009 -> uchar.h & UCharacter.GraphemeClusterBreak
3011 * 3 new numeric values
3012 The new value -1, which was really supposed to be NaN but that would have required
3013 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
3014 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
3015 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
3016 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
3017 The two new values 216000 and 432000 require an addition to the encoding of numeric values.
3018 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
3019 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
3020 -> uprops.h, uchar.c & UCharacterProperty.java
3021 -> cucdtst.c & UCharacterTest.java
3023 * generate normalization data files
3024 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
3025 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
3026 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
3027 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
3028 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
3029 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3030 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
3032 * build ICU (make install)
3033 so that the tools build can pick up the new definitions from the installed header files.
3034 * build Unicode tools using CMake+make
3036 * generate core properties data files
3037 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
3038 - in initial bootstrapping, change the UCA version
3039 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3040 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
3041 - rebuild ICU (make install) & tools
3042 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3043 check if the UCA version in FractionalUCA.txt matches the new Unicode version
3045 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3046 - rebuild ICU (make install) & tools
3048 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3049 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3050 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3051 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
3052 - nothing new in 6.2, no test file to update
3054 * update Java data files
3055 - refresh just the UCD-related files, just to be safe
3056 - see (ICU4C)/source/data/icu4j-readme.txt
3058 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3061 Unicode .icu files built to ./out/build/icudt50l
3062 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3063 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
3064 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3065 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3066 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
3067 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
3068 mkdir -p /tmp/icu4j/main/shared/data
3069 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3070 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
3071 mkdir -p /tmp/icu4j/main/shared/data
3072 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3073 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
3074 - copy the big-endian Unicode data files to another location,
3075 separate from the other data files
3076 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3077 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3078 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3079 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
3080 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3081 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3082 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3084 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3086 * refresh Java test .txt files
3087 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3091 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3092 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3093 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3094 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3095 (note removing the underscore before "Rules")
3096 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
3097 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3098 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3099 - check test file diffs for previously commented-out, known-failing data lines;
3100 probably need to keep those commented out
3101 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3102 - run genuca, see command line above
3104 - refresh ICU4J collation data:
3105 (subset of instructions above for properties data refresh, except copies all coll/*)
3106 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3107 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3108 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3109 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3110 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3111 - note on intltest: if collate/UCAConformanceTest fails, then
3112 utility/MultithreadTest/TestCollators will fail as well;
3113 fix the conformance test before looking into the multi-thread test
3115 * test ICU, fix test code where necessary
3117 * When refreshing all of ICU4J data from ICU4C
3118 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3119 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3121 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3123 *** LayoutEngine script information
3124 - skipped for Unicode 6.2: no new scripts
3126 *** merge the Unicode update branches back onto the trunk
3127 - do not merge the icudata.jar and testdata.jar,
3128 instead rebuild them from merged & tested ICU4C
3130 ---------------------------------------------------------------------------- ***
3132 Future Unicode update
3134 Tools simplified since the Unicode 6.1 update. See
3135 - http://site.icu-project.org/design/props/ppucd
3136 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
3138 * Unicode version numbers
3139 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
3142 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
3143 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
3144 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3145 - Check test file diffs for previously commented-out, known-failing data lines;
3146 probably need to keep those commented out.
3148 * PropertyValueAliases.txt changes
3149 - Script codes that are in ISO 15924 but not in Unicode are now listed in
3150 preparseucd.py, in the _scripts_only_in_iso15924 variable.
3151 If there are new ISO codes, then add them.
3152 If Unicode adds some of them, then remove them from the .py variable.
3154 * UnicodeData.txt changes
3155 - No more manual changes for CJK ranges for algorithmic names;
3156 those are now written to ppucd.txt and genprops reads them from there.
3158 * generate core properties data files (makeprops.sh was deleted)
3159 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
3161 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
3162 - it is now generated by preparseucd.py
3164 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
3165 - it is now generated by preparseucd.py
3166 - make sure that the Unicode data folder passed into preparseucd.py
3167 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3168 (can be in some subfolder)
3170 * generate normalization data files
3171 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
3172 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
3173 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
3174 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
3175 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
3176 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3177 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
3179 * build ICU (make install)
3180 * build Unicode tools using CMake+make
3182 * new way to call genuca (makeuca.sh was deleted)
3183 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
3185 ---------------------------------------------------------------------------- ***
3191 - ticket 8995 final update to Unicode 6.1
3192 - ticket 8994 regenerate source/layout/CanonData.cpp
3194 - ticket 8961 support Unicode "Age" value *names*
3195 - ticket 8963 support multiple character name aliases & types
3197 - ticket 8827 "update ICU to Unicode 6.1"
3198 - C++ branches/markus/uni61 at r30864 from trunk at r30843
3199 - Java branches/markus/uni61 at r30865 from trunk at r30863
3201 *** Unicode version numbers
3204 (configure.in & configure: have been modified to extract the version from uchar.h)
3205 - com.ibm.icu.util.VersionInfo
3206 - icutools/unicode/makedefs.sh
3207 + also review & update other definitions in that file,
3208 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
3210 *** data files & enums & parser code
3214 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
3215 - This prepares both unidata and testdata files in respective output subfolders.
3216 - Check test file diffs for previously commented-out, known-failing data lines;
3217 probably need to keep those commented out.
3219 * PropertyValueAliases.txt changes
3220 - 11 new block names:
3222 Arabic_Mathematical_Alphabetic_Symbols
3224 Meetei_Mayek_Extensions
3226 Meroitic_Hieroglyphs
3230 Sundanese_Supplement
3233 -> add to UCharacter.UnicodeBlock IDs
3234 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3235 replace public static final int \1_ID = \2; \3
3236 -> add to UCharacter.UnicodeBlock objects
3237 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
3238 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3239 - 1 new Joining_Group (jg) value:
3241 -> uchar.h & UCharacter.JoiningGroup
3242 - 2 new Line_Break (lb) values:
3243 CJ=Conditional_Japanese_Starter
3245 -> uchar.h & UCharacter.LineBreak
3248 sc ; Merc ; Meroitic_Cursive
3249 sc ; Mero ; Meroitic_Hieroglyphs
3252 sc ; Sora ; Sora_Sompeng
3254 -> remove these from SyntheticPropertyValueAliases.txt
3255 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3256 and in com.ibm.icu.dev.test.lang.TestUScript.java
3257 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3261 and another one added 2011-12-09
3262 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
3264 -> com.ibm.icu.lang.UScript
3265 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3266 replace public static final int \1 = \2;\3
3267 -> SyntheticPropertyValueAliases.txt
3268 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3269 and in com.ibm.icu.dev.test.lang.TestUScript.java
3271 * UnicodeData.txt changes
3272 - the last Unihan code point changes from U+9FCB to U+9FCC
3273 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
3274 + do change gennames.c
3275 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
3277 * DerivedBidiClass.txt changes
3278 - 2 new default-AL blocks:
3279 # Arabic Extended-A: U+08A0 - U+08FF (was default-R)
3280 # Arabic Mathematical Alphabetic Symbols:
3281 # U+1EE00 - U+1EEFF (was default-R)
3282 - 2 new default-R blocks:
3283 # Meroitic Hieroglyphs:
3285 # Meroitic Cursive: U+109A0 - U+109FF
3286 -> should be picked up by the explicit data in the file
3288 * NameAliases.txt changes
3290 # Each line has two fields
3291 # First field: Code point
3292 # Second field: Alias
3294 # Each line has three fields, as described here:
3296 # First field: Code point
3297 # Second field: Alias
3299 - Also, the file previously allowed multiple aliases but only now does it
3300 actually provide multiple, even multiple of the same type. For example,
3301 FEFF;BYTE ORDER MARK;alternate
3302 FEFF;BOM;abbreviation
3303 FEFF;ZWNBSP;abbreviation
3304 - This breaks our gennames parser, unames.icu data structure, and API.
3305 Fix gennames to only pick up "correction" aliases.
3306 New ticket #8963 for further changes.
3308 * run genpname/preparse.pl (on Linux)
3309 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3310 + make sure that data.h is writable
3311 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3312 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3314 * build ICU (make install)
3315 so that the tools build can pick up the new definitions from the installed header files.
3316 * build Unicode tools (at least genpname) using CMake+make
3319 (builds both pnames.icu and propname_data.h)
3320 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3321 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3323 * build ICU (make install)
3324 * build Unicode tools using CMake+make
3326 * update source/data/unidata/norm2/nfkc_cf.txt
3327 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3329 * update source/data/unidata/norm2/uts46.txt
3330 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3331 to ~/svn.icu/tools/trunk/src/unicode/py
3332 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
3333 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3334 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3336 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3337 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3338 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3339 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
3340 - nothing new in 6.1, no test file to update
3342 * generate core properties data files
3343 - in initial bootstrapping, change the UCA version
3344 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3345 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3346 - rebuild ICU & tools
3347 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3348 check if the UCA version in FractionalUCA.txt matches the new Unicode version
3350 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
3351 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3352 - rebuild ICU & tools
3354 * update Java data files
3355 - refresh just the UCD-related files, just to be safe
3356 - see (ICU4C)/source/data/icu4j-readme.txt
3358 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3361 Unicode .icu files built to ./out/build/icudt49l
3362 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3363 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
3364 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3365 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3366 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
3367 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
3368 mkdir -p /tmp/icu4j/main/shared/data
3369 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3370 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
3371 mkdir -p /tmp/icu4j/main/shared/data
3372 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3373 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
3374 - copy the big-endian Unicode data files to another location,
3375 separate from the other data files
3376 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3377 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3378 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3379 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
3380 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3381 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3382 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3384 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3386 * refresh Java test .txt files
3387 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3389 * test ICU so far, fix test code where necessary
3390 - temporarily ignore collation issues that look like UCA/UCD mismatches,
3391 until UCA data is updated
3395 - get output from Mark's tools; look in
3396 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
3397 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3398 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3399 (note removing the underscore before "Rules")
3400 - update (ICU)/source/test/testdata/CollationTest_*.txt
3401 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3402 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3403 - check test file diffs for previously commented-out, known-failing data lines;
3404 probably need to keep those commented out
3405 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3407 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3409 - refresh ICU4J collation data:
3410 (subset of instructions above for properties data refresh, except copies all coll/*)
3411 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3412 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3413 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3414 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3415 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3416 - note on intltest: if collate/UCAConformanceTest fails, then
3417 utility/MultithreadTest/TestCollators will fail as well;
3418 fix the conformance test before looking into the multi-thread test
3420 * When refreshing all of ICU4J data from ICU4C
3421 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3422 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3424 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3426 *** LayoutEngine script information
3428 (For details see the Unicode 5.2 change log below.)
3430 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3431 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3432 in the working directory.
3433 (It also generates ScriptRunData.cpp, which is no longer needed.)
3435 The generated files have a current copyright date and "@draft" statement.
3437 - diff current <icu>/source/layout files vs. generated ones
3438 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3439 review and manually merge desired changes;
3440 fix gratuitous changes, incorrect @draft and missing aliases;
3441 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3442 - if you just copy the above files, then
3443 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3444 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3446 *** merge the Unicode update branches back onto the trunk
3447 - do not merge the icudata.jar and testdata.jar,
3448 instead rebuild them from merged & tested ICU4C
3450 ---------------------------------------------------------------------------- ***
3452 ICU 4.8 (no Unicode update, just new script codes)
3454 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3460 Shrd 319 Sharada, Śāradā
3461 Sora 398 Sora Sompeng
3462 Takr 321 Takri, Ṭākrī, Ṭāṅkrī
3466 -> com.ibm.icu.lang.UScript
3467 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3468 replace public static final int \1 = \2;\3
3469 -> genpname/SyntheticPropertyValueAliases.txt
3470 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3471 and in com.ibm.icu.dev.test.lang.TestUScript.java
3473 * run genpname/preparse.pl (on Linux)
3474 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3475 + make sure that data.h is writable
3476 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3477 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3479 * rebuild Unicode tools (at least genpname) using make
3480 - You might first need to "make install" ICU so that the tools build can pick
3481 up the new definitions from the installed header files.
3484 (builds both pnames.icu and propname_data.h)
3485 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3486 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3487 - rebuild ICU & tools
3490 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3491 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3492 - rebuild ICU & tools
3494 * update Java data files
3495 - refresh just the UCD-related files, just to be safe
3496 - see (ICU4C)/source/data/icu4j-readme.txt
3498 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3499 - copy the big-endian Unicode data files to another location,
3500 separate from the other data files
3501 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3502 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3503 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3505 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
3507 * should have updated the layout engine script codes but forgot
3509 ---------------------------------------------------------------------------- ***
3513 *** related ICU Trac tickets
3515 7264 Unicode 6.0 Update
3517 *** Unicode version numbers
3520 (configure.in & configure: have been modified to extract the version from uchar.h)
3521 - com.ibm.icu.util.VersionInfo
3523 *** data files & enums & parser code
3527 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
3528 - This now prepares both unidata and testdata files in respective output subfolders.
3530 * PropertyAliases.txt changes
3531 - new Script_Extensions property defined in the new ScriptExtensions.txt file
3532 but not listed in PropertyAliases.txt; reported to unicode.org;
3533 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
3534 scx; Script_Extensions
3535 -> uchar.h with new UProperty section
3536 -> com.ibm.icu.lang.UProperty, parallel with uchar.h
3538 * PropertyValueAliases.txt changes
3539 - 12 new block names:
3544 CJK_Unified_Ideographs_Extension_D
3549 Miscellaneous_Symbols_And_Pictographs
3551 Transport_And_Map_Symbols
3553 -> add to UCharacter.UnicodeBlock
3554 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
3555 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3556 - Joining_Group (jg) values:
3557 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
3558 -> uchar.h & UCharacter.JoiningGroup
3563 -> remove these from SyntheticPropertyValueAliases.txt
3564 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
3565 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3566 and in com.ibm.icu.dev.test.lang.TestUScript.java
3567 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3568 (added 2009-11-11..2010-07-18)
3570 Dupl 755 Duployan shortand
3576 Merc 101 Meroitic Cursive
3577 Narb 106 Old North Arabian
3581 Wara 262 Warang Citi
3583 -> com.ibm.icu.lang.UScript
3584 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3585 replace public static final int \1 = \2;\3
3586 -> SyntheticPropertyValueAliases.txt
3587 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3588 and in com.ibm.icu.dev.test.lang.TestUScript.java
3589 - ISO 15924 name change
3590 Mero 100 Meroitic Hieroglyphs (was Meroitic)
3591 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
3592 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
3594 * UnicodeData.txt changes
3596 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
3597 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
3598 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
3600 * build Unicode tools using CMake+make
3602 * run genpname/preparse.pl (on Linux)
3603 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3604 + make sure that data.h is writable
3605 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3606 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3608 * rebuild Unicode tools (at least genpname) using make
3609 - You might first need to "make install" ICU so that the tools build can pick
3610 up the new definitions from the installed header files.
3613 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3614 - rebuild ICU & tools
3616 * update source/data/unidata/norm2/nfkc_cf.txt
3617 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3619 * update source/data/unidata/norm2/uts46.txt
3620 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
3621 to ~/svn.icu/tools/trunk/src/unicode/py
3622 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
3623 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3624 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3626 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3627 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3628 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3629 - Unicode 6.0: U+2260, U+226E, U+226F
3631 * generate core properties data files
3632 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3633 - rebuild ICU & tools
3634 - run makeuca.sh so that genuca picks up the new nfc.nrm:
3635 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3636 - rebuild ICU & tools
3638 * implement new Script_Extensions property (provisional)
3639 - parser & generator: genprops & uprops.icu
3640 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
3641 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
3643 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
3645 - genbidi/gencase/genprops tools changes
3646 - re-run makeprops.sh (see above)
3647 - UCharacterProperty.java, UCharacterTypeIterator.java,
3648 UBiDiProps.java, UCaseProps.java, and several others with minor changes;
3649 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
3651 * update Java data files
3652 - refresh just the UCD-related files, just to be safe
3653 - see (ICU4C)/source/data/icu4j-readme.txt
3655 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3658 Unicode .icu files built to ./out/build/icudt45l
3659 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3660 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3661 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3662 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
3663 mkdir -p /tmp/icu4j/main/shared/data
3664 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3665 - copy the big-endian Unicode data files to another location,
3666 separate from the other data files
3667 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3668 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3669 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3670 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
3671 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3672 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3673 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3675 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
3677 * refresh Java test .txt files
3678 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3680 * un-hardcode normalization skippable (NF*_Inert) test data
3681 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
3683 * copy updated break iterator test files
3684 - now handled by early ucdcopy.py and
3685 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
3687 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
3688 to ~/svn.icu/trunk/src/source/test/testdata)
3689 - they are not used in ICU4J
3693 - get output from Mark's tools; look in
3694 http://www.unicode.org/~book/incoming/mark/uca6.0.0/
3695 http://www.macchiato.com/unicode/utc/additional-uca-files
3696 http://www.unicode.org/Public/UCA/6.0.0/
3697 http://www.unicode.org/~mdavis/uca/
3698 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3699 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3700 - update Han-implicit ranges for new CJK extensions:
3701 swapCJK() in ucol.cpp & ImplicitCEGenerator.java
3702 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
3703 do not add it into invuca so that tailoring primary-after an ignorable works
3704 - genuca: permit space between [variable top] bytes
3705 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
3707 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3709 - refresh ICU4J collation data:
3710 (subset of instructions above for properties data refresh, except copies all coll/*)
3711 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3712 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3713 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3714 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
3715 - update (ICU)/source/test/testdata/CollationTest_*.txt
3716 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3717 with output from Mark's Unicode tools
3718 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
3719 - note on intltest: if collate/UCAConformanceTest fails, then
3720 utility/MultithreadTest/TestCollators will fail as well;
3721 fix the conformance test before looking into the multi-thread test
3723 * When refreshing all of ICU4J data from ICU4C
3724 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3725 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3727 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3729 *** LayoutEngine script information
3731 (For details see the Unicode 5.2 change log below.)
3733 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
3734 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
3735 ScriptRunData.cpp, which is no longer needed.)
3737 The generated files have a current copyright date and "@draft" statement.
3739 * copy the above files into <icu>/source/layout, replacing the old files.
3740 * fix mixed line endings
3741 * review the diffs and fix incorrect @draft and missing aliases;
3742 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3743 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3745 ---------------------------------------------------------------------------- ***
3749 *** related ICU Trac tickets
3753 7167 verify collation bytes
3754 7235 Java test NAME_ALIAS
3755 7236 Java DerivedCoreProperties.txt test
3756 7237 Java BidiTest.txt
3757 7238 UTrie2 in core unidata
3758 7239 test for tailoring gaps
3759 7240 Java fix CollationMiscTest
3760 7243 update layout engine for Unicode 5.2
3762 *** Unicode version numbers
3765 - configure.in & configure
3766 - update ucdVersion in gennames.c if an algorithmic range changes
3768 *** data files & enums & parser code
3772 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
3773 - includes finding files regardless of version numbers,
3774 copying them, and performing the equivalent processing of the
3775 ucdstrip and ucdmerge tools on the desired set of files
3778 - PropertyAliases.txt
3779 moved from numeric to enumerated:
3780 ccc ; Canonical_Combining_Class
3781 new string properties:
3782 NFKC_CF ; NFKC_Casefold
3783 Name_Alias; Name_Alias
3784 new binary properties:
3787 CWCF ; Changes_When_Casefolded
3788 CWCM ; Changes_When_Casemapped
3789 CWKCF ; Changes_When_NFKC_Casefolded
3790 CWL ; Changes_When_Lowercased
3791 CWT ; Changes_When_Titlecased
3792 CWU ; Changes_When_Uppercased
3793 new CJK Unihan properties (not supported by ICU)
3794 - PropertyValueAliases.txt
3797 one script code change:
3798 sc ; Qaai ; Inherited
3800 sc ; Zinh ; Inherited ; Qaai
3801 new Line_Break (lb) value:
3802 lb ; CP ; Close_Parenthesis
3803 new Joining_Group (jg) values: Farsi_Yeh, Nya
3805 ccc; 214; ATA ; Attached_Above
3806 - DerivedBidiClass.txt
3807 new default-R range: U+1E800 - U+1EFFF
3809 all of the ISO comments are gone
3811 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
3813 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
3814 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
3818 + cd \svn\icuproj\icu\trunk\source\tools\genpname
3819 + make sure that data.h is writable
3820 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
3821 + preparse.pl complains with errors like the following:
3822 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
3823 This is because ICU 4.0 had scripts from ISO 15924 which are now
3824 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
3825 and PropertyValueAliases.txt.
3826 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
3827 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
3828 + preparse.pl complains with errors about block names missing from uchar.h; add them
3830 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
3831 - new block & script values
3833 copy new blocks from Blocks.txt
3834 MS VC++ 2008 regular expression:
3835 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
3836 replace with " UBLOCK_\3 = 172, /*[\1]*/"
3837 + several new script values already added in ICU 4.0 for ISO 15924 coverage
3838 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
3839 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
3840 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
3841 (added to SyntheticPropertyValueAliases.txt)
3842 - new Joining Group (JG) values: Farsi_Yeh, Nya
3843 - new Line_Break (lb) value:
3844 lb ; CP ; Close_Parenthesis
3846 * hardcoded Unihan range end/limit
3847 - Unihan range end moves from 9FC3 to 9FCB
3848 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
3849 + do change gennames.c
3851 * Compare definitions of new binary properties with what we used to use
3852 in algorithms, to see if the definitions changed.
3853 - Verified that definitions for Cased and Case_Ignorable are unchanged.
3854 The gencase tool now parses the newly public Case_Ignorable values
3855 in case the definition changes in the future.
3857 * uchar.c & uprops.h & uprops.c & genprops
3858 - new numeric values that didn't exist in Unicode data before:
3859 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
3860 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
3861 therefore redesign the encoding of numeric types and values for formatVersion 6;
3862 design for simple numbers up to at least 144 ("one gross"),
3863 large values up to at least 10^20,
3864 and fractions with numerators -1..17 and denominators 1..16
3865 to cover current and expected future values
3866 (e.g., more Han numeric values, Meroitic twelfths)
3868 * reimplement Hangul_Syllable_Type for new Jamo characters
3869 - the old code assumed that all Jamo characters are in the 11xx block
3870 - Unicode 5.2 fills holes there and adds new Jamo characters in
3871 A960..A97F; Hangul Jamo Extended-A
3873 D7B0..D7FF; Hangul Jamo Extended-B
3874 - Hangul_Syllable_Type can be trivially derived from a subset of
3875 Grapheme_Cluster_Break values
3877 * build Unicode data source code for hardcoding core data
3878 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
3880 ICU data make path is \svn\icuproj\icu\trunk\source\data\
3881 ICU root path is \svn\icuproj\icu\trunk
3882 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3883 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
3884 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
3885 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
3886 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
3887 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
3888 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
3889 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
3890 Creating data file for Unicode Property Names
3891 Creating data file for Unicode Character Properties
3892 Creating data file for Unicode Case Mapping Properties
3893 Creating data file for Unicode BiDi/Shaping Properties
3894 Creating data file for Unicode Normalization
3895 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
3896 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
3898 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
3899 and rebuild the common library
3903 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
3904 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
3905 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
3906 [ Begin obsolete instructions:
3907 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
3908 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
3910 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
3911 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
3912 End obsolete instructions]
3913 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
3914 not just the *_STUB.txt files
3915 - note on intltest: if collate/UCAConformanceTest fails, then
3916 utility/MultithreadTest/TestCollators will fail as well;
3917 fix the conformance test before looking into the multi-thread test
3919 *** Implement Cased & Case_Ignorable properties
3920 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
3921 - Problem: These properties should be disjoint, but aren't
3922 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
3923 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
3925 *** Implement Changes_When_Xyz properties
3926 - without stored data
3928 *** Implement Name_Alias property
3929 - add it as another name field in unames.icu
3930 - make it available via u_charName() and UCharNameChoice and
3931 - consider it in u_charFromName()
3935 * Update break iterator rules to new UAX versions and new property values
3936 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
3938 *** new BidiTest file
3939 - review format and data
3940 - copy BidiTest.txt to source/test/testdata
3941 - write test code using this data
3942 - fix ICU code where it fails the conformance test
3945 - generally, find and update code corresponding to C/C++
3946 - UCharacter.UnicodeBlock constants:
3947 a) add an _ID integer per new block, update COUNT
3948 b) add a class instance per new block
3949 Visual Studio regex:
3950 find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
3951 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3952 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
3954 - port test changes to Java
3956 *** LayoutEngine script information
3958 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
3960 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
3961 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
3962 ScriptRunData.cpp, which is no longer needed.)
3964 The generated files have a current copyright date and "@draft" statement.
3966 -> Eric Mader wrote in email on 20090930:
3967 "I think the tool has been modified to update @draft to @stable for
3968 older scripts and to add @draft for new scripts.
3969 (I worked with an intern on this last year.)
3970 You should check the output after you run it."
3972 * copy the above files into <icu>/source/layout, replacing the old files.
3973 * fix mixed line endings
3974 * review the diffs and fix incorrect @draft and missing aliases
3975 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3977 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
3978 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
3980 -> Eric Mader wrote in email on 20090930:
3981 "This is just a matter of making sure that all the per-script tables have
3982 entries for any new scripts that were added.
3983 If any new Indic characters were added, then the class tables in
3984 IndicClassTables.cpp should be updated to reflect this.
3985 John Emmons should know how to do this if it's required."
3987 * rebuild the layout and layoutex libraries.
3991 + Jamo_Short_Name, sfc->scf, binary property value aliases
3993 ---------------------------------------------------------------------------- ***
3997 *** related ICU Trac tickets
3999 5696 Update to Unicode 5.1
4001 *** Unicode version numbers
4004 - configure.in & configure
4005 - update ucdVersion in gennames.c if an algorithmic range changes
4007 *** data files & enums & parser code
4011 DerivedCoreProperties.txt
4012 DerivedNormalizationProps.txt
4013 NormalizationTest.txt
4016 GraphemeBreakProperty.txt
4017 SentenceBreakProperty.txt
4018 WordBreakProperty.txt
4019 - ucdstrip and ucdmerge:
4023 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4024 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
4025 copy 5.1.0\ucd\Blocks.txt ..\unidata\
4026 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
4027 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
4028 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4029 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4030 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4031 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4032 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
4033 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
4034 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
4035 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
4036 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
4038 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4039 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4040 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4041 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
4042 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4043 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4044 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4045 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4046 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4047 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4051 + cd \svn\icuproj\icu\uni51\source\tools\genpname
4052 + make sure that data.h is writable
4053 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
4054 + preparse.pl complains with errors like the following:
4055 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
4056 This is because ICU 3.8 had scripts from ISO 15924 which are now
4057 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
4058 and PropertyValueAliases.txt.
4059 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4060 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
4061 + PropertyValueAliases.txt now explicitly contains values for boolean properties:
4062 N/Y, No/Yes, F/T, False/True
4063 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
4064 It will use further values from the file if present.
4066 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
4067 - new block & script values
4069 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
4070 (removed from SyntheticPropertyValueAliases.txt)
4071 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
4072 (added to SyntheticPropertyValueAliases.txt)
4073 - uprops.icu (uprops.h) only provides 7 bits for script codes.
4074 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
4075 There is none above 127 yet which is the script code for an
4076 assigned Unicode character, so ICU 4.0 uprops.icu does not store any
4077 script code values greater than 127.
4078 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
4079 in a parallel bit field, and that overflows now.
4080 Also, future values >=128 would be incompatible anyway.
4081 uprops.h is modified to move around several of the bit fields
4082 in the properties vector words, and now uses 8 bits for the script code.
4083 Two other bit fields also grow to accommodate future growth:
4084 Block (current count: 172) grows from 8 to 9 bits,
4085 and Word_Break grows from 4 to 5 bits.
4086 - renamed property Simple_Case_Folding (sfc->scf)
4087 + nothing to be done: handled as normal alias
4088 - new property JSN Jamo_Short_Name
4089 + no new API: only contributes to the Name property
4090 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
4091 - new Joining Group (JG) value: Burushashki_Yeh_Barree
4092 - new Sentence_Break (SB) values:
4097 - new Word_Break (WB) values:
4099 WB ; Extend ; Extend
4103 * Further changes in the 2008-02-29 update:
4104 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
4105 because they should not normally be invisible.
4106 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
4107 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
4108 - new Word_Break (WB) value: NL=Newline
4110 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
4111 - Unihan range end moves from 9FBB to 9FC3
4112 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
4113 + do change gennames.c
4115 * build Unicode data source code for hardcoding core data
4116 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
4118 ICU data make path is \svn\icuproj\icu\uni51\source\data\
4119 ICU root path is \svn\icuproj\icu\uni51
4120 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4121 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4122 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4123 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4124 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4125 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4126 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4127 Creating data file for Unicode Character Properties
4128 Creating data file for Unicode Case Mapping Properties
4129 Creating data file for Unicode BiDi/Shaping Properties
4130 Creating data file for Unicode Normalization
4131 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
4132 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
4134 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
4135 and rebuild the common library
4139 * Update break iterator rules to new UAX versions and new property values
4143 * update FractionalUCA.txt and UCARules.txt with new canonical closure
4146 - Test that APIs using Unicode property value aliases (like UnicodeSet)
4147 support all of the boolean values N/Y, No/Yes, F/T, False/True
4148 -> TestBinaryValues() tests in both cintltst and intltest
4150 *** LayoutEngine script information
4151 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4152 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4153 ScriptRunData.cpp, which is no longer needed.)
4155 The generated files have a current copyright date and "@draft" statement.
4157 * copy the above files into <icu>/source/layout, replacing the old files.
4159 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4160 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4162 * rebuild the layout and layoutex libraries.
4166 + Jamo_Short_Name, sfc->scf, binary property value aliases
4168 ---------------------------------------------------------------------------- ***
4172 *** related Jitterbugs
4174 5084 RFE: Update to Unicode 5.0
4176 *** data files & enums & parser code
4180 DerivedCoreProperties.txt
4181 DerivedNormalizationProps.txt
4182 NormalizationTest.txt
4185 GraphemeBreakProperty.txt
4186 SentenceBreakProperty.txt
4187 WordBreakProperty.txt
4188 - ucdstrip and ucdmerge:
4192 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4193 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
4194 copy 5.0.0\ucd\Blocks.txt ..\unidata\
4195 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
4196 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
4197 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4198 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4199 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4200 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4201 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
4202 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
4203 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
4204 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
4205 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
4207 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4208 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4209 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4210 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
4211 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4212 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4213 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4214 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4215 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4216 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4218 * update FractionalUCA.txt and UCARules.txt with new canonical closure
4222 + make sure that data.h is writable
4223 + perl preparse.pl \cvs\oss\icu > out.txt
4225 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
4226 - new block & script values
4227 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
4229 * build Unicode data source code for hardcoding core data
4230 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
4232 ICU data make path is \cvs\oss\icu\source\data\
4233 ICU root path is \cvs\oss\icu
4234 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4236 Creating data file for Unicode Character Properties
4237 Creating data file for Unicode Case Mapping Properties
4238 Creating data file for Unicode BiDi/Shaping Properties
4239 Creating data file for Unicode Normalization
4240 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
4241 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
4243 - copy the .c source files to C:\cvs\oss\icu\source\common
4244 and rebuild the common library
4246 *** Unicode version numbers
4251 *** LayoutEngine script information
4252 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4253 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4254 ScriptRunData.cpp, which is no longer needed.)
4256 The generated files have a current copyright date and "@draft" statement.
4258 * copy the above files into <icu>/source/layout, replacing the old files.
4260 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4261 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4263 * rebuild the layout and layoutex libraries.
4265 ---------------------------------------------------------------------------- ***
4269 *** related Jitterbugs
4271 4332 RFE: Update to Unicode 4.1
4272 4157 RBBI, TR29 4.1 updates
4274 *** data files & enums & parser code
4278 DerivedCoreProperties.txt
4279 DerivedNormalizationProps.txt
4280 NormalizationTest.txt
4281 GraphemeBreakProperty.txt
4282 SentenceBreakProperty.txt
4283 WordBreakProperty.txt
4284 - ucdstrip and ucdmerge:
4288 * add new files to the repository
4289 GraphemeBreakProperty.txt
4290 SentenceBreakProperty.txt
4291 WordBreakProperty.txt
4293 * update FractionalUCA.txt and UCARules.txt with new canonical closure
4296 - handle new enumerated properties in sub read_uchar
4299 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
4300 - new binary properties
4302 + Pattern_White_Space
4303 - new enumerated properties
4304 + Grapheme_Cluster_Break
4307 - new block & script & line break values
4310 - case-ignorable changes
4311 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4312 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
4314 *** Unicode version numbers
4320 - verify that u_charMirror() round-trips
4321 - test all new properties and some new values of old properties
4325 * hardcoded Unihan range end/limit
4326 - Unihan range end moves from 9FA5 to 9FBB
4327 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
4328 + do not modify BOCU/BOCSU code because that would change the encoding
4329 and break binary compatibility!
4330 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
4332 + ignore trietest.c: test data is arbitrary
4333 + ignore tstnorm.cpp: test optimization, not important
4334 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
4335 + do change line_th.txt and word_th.txt
4336 by replacing hardcoded ranges with the new property values
4337 + do change gennames.c
4339 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4340 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4341 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
4344 - compare new special casing context conditions with previous ones
4345 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4348 - consider storing only the short name if it is the same as the long name
4351 - UAX #29 changes (grapheme/word/sentence breaks)
4352 - UAX #14 changes (line breaks)
4353 - Pattern_Syntax & Pattern_White_Space
4355 ---------------------------------------------------------------------------- ***
4357 Unicode 4.0.1 update
4359 *** related Jitterbugs
4361 3170 RFE: Update to Unicode 4.0.1
4362 3171 Add new Unicode 4.0.1 properties
4363 3520 use Unicode 4.0.1 updates for break iteration
4365 *** data files & enums & parser code
4368 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
4369 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
4372 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
4373 according to PRI #26
4374 http://www.unicode.org/review/resolved-pri.html#pri26
4375 - undone again because no corrigendum in sight;
4376 instead modified tests to not check consistency on this for Unicode 4.0.1
4379 - update from http://www.unicode.org/copyright.html
4380 formatted for plain text
4382 * uchar.h & uprops.h & uprops.c & genprops
4383 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
4384 - add U_LB_INSEPARABLE due to a spelling fix
4385 + put short name comment only on line with new constant
4386 for genpname perl script parser
4387 - new binary properties
4389 + Variation_Selector
4392 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
4393 - perl script: correctly calculate the maximum number of fields per row
4396 - new script code Hrkt=Katakana_Or_Hiragana
4398 * gennorm.c track changes in DerivedNormalizationProps.txt
4399 - "FNC" -> "FC_NFKC"
4400 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
4402 * genprops/props2.c track changes in DerivedNumericValues.txt
4403 - changed from 3 columns to 2, dropping the numeric type
4404 + assume that the type is always numeric for Han characters,
4405 and that only those are added in addition to what UnicodeData.txt lists
4407 *** Unicode version numbers
4413 - update test of default bidi classes according to PRI #28
4414 /tsutil/cucdtst/TestUnicodeData
4415 http://www.unicode.org/review/resolved-pri.html#pri28
4416 - bidi tests: change exemplar character for ES depending on Unicode version
4417 - change hardcoded expected property values where they change
4425 - use new Hrkt=Katakana_Or_Hiragana
4428 - are now part of combining character sequences
4429 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ