1 * Copyright (C) 2016 and later: Unicode, Inc. and others.
2 * License & terms of use: http://www.unicode.org/copyright.html
3 * Copyright (C) 2004-2016, International Business Machines
4 * Corporation and others. All Rights Reserved.
6 * file name: changes.txt
8 * tab size: 8 (not used)
11 * created on: 2004may06
12 * created by: Markus W. Scherer
14 * change log for Unicode updates
16 * For each new Unicode version, during the beta period,
17 * I copy the change log for the previous version to the top of this file.
18 * I adjust the versions, tickets, URLs, and paths.
19 * I work my way through the steps listed in the log, top to bottom,
20 * adjusting the log as necessary.
21 * I report problems to the UTC and/or CLDR and/or ICU.
22 * Before the data is final, I "turn the crank" several more times,
23 * using appropriate subsets of the steps.
25 ---------------------------------------------------------------------------- ***
27 * New ISO 15924 script codes
29 Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
30 until they are encoded in Unicode,
31 or can be assumed to be encoded in the next Unicode version.
32 Script enum constant names want to follow the Unicode script property value aliases,
33 which are assigned only when the scripts are encoded.
34 When we encode scripts early and guess wrong, then we have confusing enum constants
35 and have sometimes added aliases.
37 Variant script codes like Latf and Aran that are not subject to separate encoding
38 can be added at any time.
39 (For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
41 We add script codes used in CLDR or in the spoof checker.
42 This includes combination/alias codes like Hanb and Jamo.
43 See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
44 and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
46 We add special Z* script codes like Zsye.
48 For new script codes see http://www.unicode.org/iso15924/codechanges.html
50 ---------------------------------------------------------------------------- ***
52 Unicode 13.0 update for ICU 66
54 https://www.unicode.org/versions/Unicode13.0.0/
55 https://www.unicode.org/versions/beta-13.0.0.html
56 https://www.unicode.org/Public/13.0.0/ucd/
57 https://www.unicode.org/reports/uax-proposed-updates.html
58 https://www.unicode.org/reports/tr44/tr44-25.html
60 https://unicode-org.atlassian.net/browse/CLDR-13387
61 https://unicode-org.atlassian.net/browse/ICU-20893
63 * Command-line environment setup
65 UNICODE_DATA=~/unidata/uni13/20200212
66 CLDR_SRC=~/cldr/uni/src
70 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
71 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
72 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
74 *** Unicode version numbers
77 - com.ibm.icu.util.VersionInfo
78 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
80 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
81 so that the makefiles see the new version number.
82 cd $ICU_ROOT/dbg/icu4c
83 ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
85 *** data files & enums & parser code
88 - mkdir -p $UNICODE_DATA
89 - download Unicode files into $UNICODE_DATA
90 + subfolders: emoji, idna, security, ucd, uca
91 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
92 + split Unihan into single-property files
93 ~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
94 + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
95 or from the ucd/cldr/ output folder of the Unicode Tools:
96 Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
97 cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
99 * for manual diffs and for Unicode Tools input data updates:
100 remove version suffixes from the file names
101 ~$ unidata/desuffixucd.py $UNICODE_DATA
102 (see https://sites.google.com/site/unicodetools/inputdata)
104 * process and/or copy files
105 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
106 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
107 + For debugging, and tweaking how ppucd.txt is written,
108 the tool has an --only_ppucd option:
109 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
111 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
113 * new constants for new property values
114 - preparseucd.py error:
115 ValueError: missing uchar.h enum constants for some property values:
116 [(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
117 u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
118 (u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
119 (u'InPC', set([u'Top_And_Bottom_And_Left']))]
120 = PropertyValueAliases.txt new property values (diff old & new .txt files)
121 blk; Chorasmian ; Chorasmian
122 blk; CJK_Ext_G ; CJK_Unified_Ideographs_Extension_G
123 blk; Dives_Akuru ; Dives_Akuru
124 blk; Khitan_Small_Script ; Khitan_Small_Script
125 blk; Lisu_Sup ; Lisu_Supplement
126 blk; Symbols_For_Legacy_Computing ; Symbols_For_Legacy_Computing
127 blk; Tangut_Sup ; Tangut_Supplement
129 -> add to uchar.h before UBLOCK_COUNT
130 use long property names for enum constants,
131 for the trailing comment get the block start code point: diff old & new Blocks.txt
132 -> add to UCharacter.UnicodeBlock IDs
133 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
134 replace public static final int \1_ID = \2; \3
135 -> add to UCharacter.UnicodeBlock objects
136 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
137 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
139 sc ; Chrs ; Chorasmian
140 sc ; Diak ; Dives_Akuru
141 sc ; Kits ; Khitan_Small_Script
143 -> uscript.h & com.ibm.icu.lang.UScript
144 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
145 and in com.ibm.icu.dev.test.lang.TestUScript.java
147 InPC; Top_And_Bottom_And_Left ; Top_And_Bottom_And_Left
148 -> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
150 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
151 (not strictly necessary for NOT_ENCODED scripts)
152 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
154 * build ICU (make install)
155 to make sure that there are no syntax errors, and
156 so that the tools build can pick up the new definitions from the installed header files.
158 $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
160 * update spoof checker UnicodeSet initializers:
161 inclusionPat & recommendedPat in i18n/uspoof.cpp
162 INCLUSION & RECOMMENDED in SpoofChecker.java
163 - make sure that the Unicode Tools tree contains the latest security data files
164 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
165 - update the hardcoded version number there in the DIRECTORY path
166 - run the tool (no special environment variables needed)
167 - copy & paste from the Console output into the .cpp & .java files
169 * generate normalization data files
170 cd $ICU_ROOT/dbg/icu4c
171 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
172 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
173 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
174 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
175 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
177 * build ICU (make install)
178 so that the tools build can pick up the new definitions from the installed header files.
180 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
182 * build Unicode tools using CMake+make
184 $ICU_SRC/tools/unicode/c/icudefs.txt:
186 # Location (--prefix) of where ICU was installed.
187 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
188 # Location of the ICU4C source tree.
189 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
192 mkdir -p tools/unicode/c
195 $ICU_ROOT/dbg/tools/unicode/c$
196 cmake ../../../../src/tools/unicode/c
199 * generate core properties data files
200 $ICU_ROOT/dbg/tools/unicode/c$
201 genprops/genprops $ICU_SRC/icu4c
203 genprops: Script_Extensions indexes overflow bit field
204 genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
205 -> uprops.icu data file format :
206 add two more bits to store a script code or Script_Extensions index
207 -> generator code, C++ & Java runtime, uprops.icu format version 7.7
208 - rebuild ICU (make install) & tools
210 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
211 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
212 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
213 - Unicode 6.0..13.0: U+2260, U+226E, U+226F
214 - nothing new in this Unicode version, no test file to update
216 * run & fix ICU4C tests
217 - fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
218 - Andy helps with RBBI & spoof check test failures
220 * collation: CLDR collation root, UCA DUCET
222 - UCA DUCET goes into Mark's Unicode tools, see
223 https://sites.google.com/site/unicodetools/home#TOC-UCA
224 diff the main mapping file, look for bad changes
225 (for example, more bytes per weight for common characters)
226 ~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
227 ~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
229 - CLDR root data files are checked into $CLDR_SRC/common/uca/
230 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
232 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
233 cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
234 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
235 cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
236 (note removing the underscore before "Rules")
237 cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
238 - restore TODO diffs in UCARules.txt
239 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
240 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
241 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
242 from the CLDR root files (..._CLDR_..._SHORT.txt)
243 cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
244 cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
245 cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
246 - if CLDR common/uca/unihan-index.txt changes, then update
247 CLDR common/collation/root.xml <collation type="private-unihan">
248 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
251 $ICU_ROOT/dbg/tools/unicode/c$
252 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
253 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
257 https://sites.google.com/site/unicodetools/unihan
259 org.unicode.draft.GenerateUnihanCollators
262 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
263 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
264 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
265 -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
268 org.unicode.draft.GenerateUnihanCollatorFiles
269 with the same arguments
272 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
273 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
276 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
277 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
278 - run CLDR unit tests, commit to CLDR
279 - generate ICU zh collation data: run CLDR
280 org.unicode.cldr.icu.NewLdml2IcuConverter
281 with program arguments
283 -s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
284 -m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
285 -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
286 -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
290 -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
293 * run & fix ICU4C tests, now with new CLDR collation root data
294 - run all tests with the collation test data *_SHORT.txt or the full files
295 (the full ones have comments, useful for debugging)
296 - note on intltest: if collate/UCAConformanceTest fails, then
297 utility/MultithreadTest/TestCollators will fail as well;
298 fix the conformance test before looking into the multi-thread test
300 * update Java data files
301 - refresh just the UCD/UCA-related/derived files, just to be safe
302 - see (ICU4C)/source/data/icu4j-readme.txt
303 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
304 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
307 make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
308 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
309 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
310 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
311 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
312 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
313 mkdir -p /tmp/icu4j/main/shared/data
314 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
315 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
316 mkdir -p /tmp/icu4j/main/shared/data
317 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
318 make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
319 - copy the big-endian Unicode data files to another location,
320 separate from the other data files,
321 and then refresh ICU4J
322 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
323 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
324 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
325 cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
326 cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
327 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
328 cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
329 cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
330 cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
331 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
333 * When refreshing all of ICU4J data from ICU4C
334 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
335 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
337 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
339 * update CollationFCD.java
340 + copy & paste the initializers of lcccIndex[] etc. from
341 ICU4C/source/i18n/collationfcd.cpp to
342 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
344 * refresh Java test .txt files
345 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
346 cd $ICU_SRC/icu4c/source/data/unidata
347 cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
348 cd ../../test/testdata
349 cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
350 cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
352 * run & fix ICU4J tests
355 - send notice to icu-design about new born-@stable API (enum constants etc.)
357 *** CLDR numbering systems
358 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
359 for example, look for
360 ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
361 in new blocks (Blocks.txt)
363 diak 11950..11959 Dives_Akuru
365 *** merge the Unicode update branches back onto the trunk
366 - do not merge the icudata.jar and testdata.jar,
367 instead rebuild them from merged & tested ICU4C
368 - make sure that changes to Unicode tools are checked in:
369 http://www.unicode.org/utility/trac/log/trunk/unicodetools
371 ---------------------------------------------------------------------------- ***
373 Unicode 12.1 update for ICU 64.2
375 ** This is an abbreviated update with one new character for the new
376 ** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
377 https://en.wikipedia.org/wiki/Reiwa_period
379 http://www.unicode.org/versions/Unicode12.1.0/
381 ICU-20497 Unicode 12.1
383 cldrbug 11978: Unicode 12.1
385 * Command-line environment setup
387 UNICODE_DATA=~/unidata/uni121/20190403
388 CLDR_SRC=~/svn.cldr/uni
390 ICU_SRC=$ICU_ROOT/src
392 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
393 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
394 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
396 *** Unicode version numbers
399 - com.ibm.icu.util.VersionInfo
400 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
402 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
403 so that the makefiles see the new version number.
404 cd $ICU_ROOT/dbg/icu4c
405 ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
407 *** data files & enums & parser code
410 - mkdir -p $UNICODE_DATA
411 - download Unicode files into $UNICODE_DATA
412 + subfolders: emoji, idna, security, ucd, uca
413 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
415 * for manual diffs and for Unicode Tools input data updates:
416 remove version suffixes from the file names
417 ~$ unidata/desuffixucd.py $UNICODE_DATA
418 (see https://sites.google.com/site/unicodetools/inputdata)
420 * process and/or copy files
421 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
422 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
423 + For debugging, and tweaking how ppucd.txt is written,
424 the tool has an --only_ppucd option:
425 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
427 - cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
429 * build ICU (make install)
430 so that the tools build can pick up the new definitions from the installed header files.
432 $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
434 * update spoof checker UnicodeSet initializers:
435 inclusionPat & recommendedPat in uspoof.cpp
436 INCLUSION & RECOMMENDED in SpoofChecker.java
437 - make sure that the Unicode Tools tree contains the latest security data files
438 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
439 - update the hardcoded version number there in the DIRECTORY path
440 - run the tool (no special environment variables needed)
441 - copy & paste from the Console output into the .cpp & .java files
443 * generate normalization data files
444 cd $ICU_ROOT/dbg/icu4c
445 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
446 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
447 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
448 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
449 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
451 * build ICU (make install)
452 so that the tools build can pick up the new definitions from the installed header files.
454 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
456 * build Unicode tools using CMake+make
458 $ICU_SRC/tools/unicode/c/icudefs.txt:
460 # Location (--prefix) of where ICU was installed.
461 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
462 # Location of the ICU4C source tree.
463 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
466 mkdir -p tools/unicode/c
469 $ICU_ROOT/dbg/tools/unicode/c$
470 cmake ../../../../src/tools/unicode/c
473 * generate core properties data files
474 $ICU_ROOT/dbg/tools/unicode/c$
475 genprops/genprops $ICU_SRC/icu4c
476 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
477 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
478 - rebuild ICU (make install) & tools
480 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
481 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
482 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
483 - Unicode 6.0..12.1: U+2260, U+226E, U+226F
484 - nothing new in this Unicode version, no test file to update
486 * run & fix ICU4C tests
487 - Andy handles RBBI & spoof check test failures
489 * collation: CLDR collation root, UCA DUCET
491 - UCA DUCET goes into Mark's Unicode tools, see
492 https://sites.google.com/site/unicodetools/home#TOC-UCA
493 diff the main mapping file, look for bad changes
494 (for example, more bytes per weight for common characters)
495 ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
496 ~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
498 - CLDR root data files are checked into $CLDR_SRC/common/uca/
499 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
501 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
502 cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
503 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
504 cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
505 (note removing the underscore before "Rules")
506 cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
507 - restore TODO diffs in UCARules.txt
508 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
509 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
510 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
511 from the CLDR root files (..._CLDR_..._SHORT.txt)
512 cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
513 cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
514 cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
515 - if CLDR common/uca/unihan-index.txt changes, then update
516 CLDR common/collation/root.xml <collation type="private-unihan">
517 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
519 - run genuca, see command line above
523 https://sites.google.com/site/unicodetools/unihan
525 org.unicode.draft.GenerateUnihanCollators
528 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
529 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
530 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
531 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
534 org.unicode.draft.GenerateUnihanCollatorFiles
535 with the same arguments
538 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
539 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
542 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
543 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
544 - run CLDR unit tests, commit to CLDR
545 - generate ICU zh collation data: run CLDR
546 org.unicode.cldr.icu.NewLdml2IcuConverter
547 with program arguments
549 -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
550 -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
551 -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
552 -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
556 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
559 * run & fix ICU4C tests, now with new CLDR collation root data
560 - run all tests with the collation test data *_SHORT.txt or the full files
561 (the full ones have comments, useful for debugging)
562 - note on intltest: if collate/UCAConformanceTest fails, then
563 utility/MultithreadTest/TestCollators will fail as well;
564 fix the conformance test before looking into the multi-thread test
566 * update Java data files
567 - refresh just the UCD/UCA-related/derived files, just to be safe
568 - see (ICU4C)/source/data/icu4j-readme.txt
569 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
570 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
573 make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
574 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
575 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
576 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
577 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
578 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
579 mkdir -p /tmp/icu4j/main/shared/data
580 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
581 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
582 mkdir -p /tmp/icu4j/main/shared/data
583 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
584 make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
585 - copy the big-endian Unicode data files to another location,
586 separate from the other data files,
587 and then refresh ICU4J
588 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
589 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
590 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
591 cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
592 cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
593 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
594 cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
595 cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
596 cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
597 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
599 * When refreshing all of ICU4J data from ICU4C
600 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
601 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
603 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
605 * update CollationFCD.java
606 + copy & paste the initializers of lcccIndex[] etc. from
607 ICU4C/source/i18n/collationfcd.cpp to
608 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
610 * refresh Java test .txt files
611 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
612 cd $ICU_SRC/icu4c/source/data/unidata
613 cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
614 cd ../../test/testdata
615 cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
616 cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
618 * run & fix ICU4J tests
621 - send notice to icu-design about new born-@stable API (enum constants etc.)
623 *** CLDR numbering systems
624 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
625 for example, look for
626 ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
627 in new blocks (Blocks.txt)
628 Unicode 12: using Unicode 12 CLDR ticket #11478
629 hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
630 wcho 1E2F0..1E2F9 Wancho
631 Unicode 11: using Unicode 11 CLDR ticket #10978
632 rohg 10D30..10D39 Hanifi_Rohingya
633 gong 11DA0..11DA9 Gunjala_Gondi
634 Earlier: CLDR tickets specific to adding new numbering systems.
635 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
636 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
638 *** merge the Unicode update branches back onto the trunk
639 - do not merge the icudata.jar and testdata.jar,
640 instead rebuild them from merged & tested ICU4C
641 - make sure that changes to Unicode tools are checked in:
642 http://www.unicode.org/utility/trac/log/trunk/unicodetools
644 ---------------------------------------------------------------------------- ***
646 Unicode 12.0 update for ICU 64
648 http://www.unicode.org/versions/Unicode12.0.0/
649 http://unicode.org/versions/beta-12.0.0.html
650 https://www.unicode.org/review/pri389/
651 http://www.unicode.org/reports/uax-proposed-updates.html
652 http://www.unicode.org/reports/tr44/tr44-23.html
656 ICU-20111 move text layout properties data into a data file
658 cldrbug 11478: Unicode 12
659 Accidentally used ^/trunk instead of ^/branches/markus/uni12
661 * Command-line environment setup
663 UNICODE_DATA=~/unidata/uni12/20190309
664 CLDR_SRC=~/svn.cldr/uni
666 ICU_SRC=$ICU_ROOT/src
668 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
669 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
670 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
672 *** Unicode version numbers
675 - com.ibm.icu.util.VersionInfo
676 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
678 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
679 so that the makefiles see the new version number.
681 *** data files & enums & parser code
684 - mkdir -p $UNICODE_DATA
685 - download Unicode files into $UNICODE_DATA
686 + subfolders: emoji, idna, security, ucd, uca
687 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
689 * for manual diffs and for Unicode Tools input data updates:
690 remove version suffixes from the file names
691 ~$ unidata/desuffixucd.py $UNICODE_DATA
692 (see https://sites.google.com/site/unicodetools/inputdata)
694 * process and/or copy files
695 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
696 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
697 + For debugging, and tweaking how ppucd.txt is written,
698 the tool has an --only_ppucd option:
699 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
701 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
703 * build ICU (make install)
704 so that the tools build can pick up the new definitions from the installed header files.
706 $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
708 * new constants for new property values
709 - preparseucd.py error:
710 ValueError: missing uchar.h enum constants for some property values:
711 [(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
712 u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
713 u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
714 (u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
715 = PropertyValueAliases.txt new property values (diff old & new .txt files)
716 blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
717 blk; Elymaic ; Elymaic
718 blk; Nandinagari ; Nandinagari
719 blk; Nyiakeng_Puachue_Hmong ; Nyiakeng_Puachue_Hmong
720 blk; Ottoman_Siyaq_Numbers ; Ottoman_Siyaq_Numbers
721 blk; Small_Kana_Ext ; Small_Kana_Extension
722 blk; Symbols_And_Pictographs_Ext_A ; Symbols_And_Pictographs_Extended_A
723 blk; Tamil_Sup ; Tamil_Supplement
726 use long property names for enum constants,
727 for the trailing comment get the block start code point: diff old & new Blocks.txt
728 -> add to UCharacter.UnicodeBlock IDs
729 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
730 replace public static final int \1_ID = \2; \3
731 -> add to UCharacter.UnicodeBlock objects
732 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
733 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
736 sc ; Hmnp ; Nyiakeng_Puachue_Hmong
737 sc ; Nand ; Nandinagari
739 -> uscript.h & com.ibm.icu.lang.UScript
740 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
741 and in com.ibm.icu.dev.test.lang.TestUScript.java
743 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
744 (not strictly necessary for NOT_ENCODED scripts)
745 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
747 * update spoof checker UnicodeSet initializers:
748 inclusionPat & recommendedPat in uspoof.cpp
749 INCLUSION & RECOMMENDED in SpoofChecker.java
750 - make sure that the Unicode Tools tree contains the latest security data files
751 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
752 - update the hardcoded version number there in the DIRECTORY path
753 - run the tool (no special environment variables needed)
754 - copy & paste from the Console output into the .cpp & .java files
756 * generate normalization data files
757 cd $ICU_ROOT/dbg/icu4c
758 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
759 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
760 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
761 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
762 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
764 * build ICU (make install)
765 so that the tools build can pick up the new definitions from the installed header files.
767 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
769 * build Unicode tools using CMake+make
771 $ICU_SRC/tools/unicode/c/icudefs.txt:
773 # Location (--prefix) of where ICU was installed.
774 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
775 # Location of the ICU4C source tree.
776 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
779 mkdir -p tools/unicode/c
782 $ICU_ROOT/dbg/tools/unicode/c$
783 cmake ../../../../src/tools/unicode/c
786 * generate core properties data files
787 $ICU_ROOT/dbg/tools/unicode/c$
788 genprops/genprops $ICU_SRC/icu4c
789 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
790 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
791 - rebuild ICU (make install) & tools
793 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
794 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
795 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
796 - Unicode 6.0..12.0: U+2260, U+226E, U+226F
797 - nothing new in this Unicode version, no test file to update
799 * run & fix ICU4C tests
800 - update test of default bidi classes:
801 Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
802 see diffs in DerivedBidiClass.txt
803 + /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
804 + UCharacterTest.java TestIteration() defaultBidi[]
805 - Andy handles RBBI & spoof check test failures
807 * collation: CLDR collation root, UCA DUCET
809 - UCA DUCET goes into Mark's Unicode tools, see
810 https://sites.google.com/site/unicodetools/home#TOC-UCA
811 diff the main mapping file, look for bad changes
812 (for example, more bytes per weight for common characters)
813 ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
814 ~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
816 - CLDR root data files are checked into $CLDR_SRC/common/uca/
817 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
819 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
820 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
821 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
822 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
823 (note removing the underscore before "Rules")
824 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
825 - restore TODO diffs in UCARules.txt
826 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
827 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
828 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
829 from the CLDR root files (..._CLDR_..._SHORT.txt)
830 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
831 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
832 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
833 - if CLDR common/uca/unihan-index.txt changes, then update
834 CLDR common/collation/root.xml <collation type="private-unihan">
835 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
837 - run genuca, see command line above;
839 Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
840 FDD1 119CE; [71 CD 02, 05, 05] # Nandinagari first primary (compressible)
841 (add the character to genuca.cpp sampleCharsToScripts[])
842 + This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
843 and cache its values.
844 Works as long as the script metadata is updated before the collation data.
848 https://sites.google.com/site/unicodetools/unihan
850 org.unicode.draft.GenerateUnihanCollators
853 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
854 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
855 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
856 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
859 org.unicode.draft.GenerateUnihanCollatorFiles
860 with the same arguments
863 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
864 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
867 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
868 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
869 - run CLDR unit tests, commit to CLDR
870 - generate ICU zh collation data: run CLDR
871 org.unicode.cldr.icu.NewLdml2IcuConverter
872 with program arguments
874 -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
875 -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
876 -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
877 -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
881 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
884 * run & fix ICU4C tests, now with new CLDR collation root data
885 - run all tests with the collation test data *_SHORT.txt or the full files
886 (the full ones have comments, useful for debugging)
887 - note on intltest: if collate/UCAConformanceTest fails, then
888 utility/MultithreadTest/TestCollators will fail as well;
889 fix the conformance test before looking into the multi-thread test
891 * update Java data files
892 - refresh just the UCD/UCA-related/derived files, just to be safe
893 - see (ICU4C)/source/data/icu4j-readme.txt
894 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
895 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
898 Unicode .icu files built to ./out/build/icudt63l
899 echo timestamp > uni-core-data
900 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
901 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
902 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
903 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
904 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
905 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
906 mkdir -p /tmp/icu4j/main/shared/data
907 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
908 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
909 mkdir -p /tmp/icu4j/main/shared/data
910 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
911 make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
912 - copy the big-endian Unicode data files to another location,
913 separate from the other data files,
914 and then refresh ICU4J
915 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
916 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
917 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
918 cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
919 cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
920 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
921 cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
922 cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
923 cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
924 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
926 * When refreshing all of ICU4J data from ICU4C
927 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
928 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
930 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
932 * update CollationFCD.java
933 + copy & paste the initializers of lcccIndex[] etc. from
934 ICU4C/source/i18n/collationfcd.cpp to
935 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
937 * refresh Java test .txt files
938 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
939 cd $ICU_SRC/icu4c/source/data/unidata
940 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
941 cd ../../test/testdata
942 cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
943 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
945 * run & fix ICU4J tests
948 - send notice to icu-design about new born-@stable API (enum constants etc.)
950 *** CLDR numbering systems
951 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
952 for example, look for
953 ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
954 in new blocks (Blocks.txt)
955 Unicode 12: using Unicode 12 CLDR ticket #11478
956 hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
957 wcho 1E2F0..1E2F9 Wancho
958 Unicode 11: using Unicode 11 CLDR ticket #10978
959 rohg 10D30..10D39 Hanifi_Rohingya
960 gong 11DA0..11DA9 Gunjala_Gondi
961 Earlier: CLDR tickets specific to adding new numbering systems.
962 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
963 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
965 *** merge the Unicode update branches back onto the trunk
966 - do not merge the icudata.jar and testdata.jar,
967 instead rebuild them from merged & tested ICU4C
968 - make sure that changes to Unicode tools are checked in:
969 http://www.unicode.org/utility/trac/log/trunk/unicodetools
971 ---------------------------------------------------------------------------- ***
973 ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
975 * Command-line environment setup
977 UNICODE_DATA=~/unidata/uni11/20180609
978 CLDR_SRC=~/svn.cldr/uni
980 ICU_SRC=$ICU_ROOT/src
982 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
983 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
984 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
988 https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
989 https://unicode-org.atlassian.net/browse/ICU-12850 vo
991 *** data files & enums & parser code
994 - for each of the three new enumerated properties
995 + uchar.h: add the enum UProperty constant UCHAR_<long prop name>
996 + uchar.h: update UCHAR_INT_LIMIT
997 + uchar.h: add the enum U<long prop name>
998 with constants U_<short prop name>_<long value name>
999 + UProperty.java: add the constant <long prop name>
1000 + UProperty.java: update INT_LIMIT
1001 + UCharacter.java: add the interface <long prop name>
1002 with constants <long value name>
1004 * process and/or copy files
1005 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1006 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1007 + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
1009 + For debugging, and tweaking how ppucd.txt is written,
1010 the tool has an --only_ppucd option:
1011 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1013 * preparseucd.py changes
1014 - add new property short names (uppercase) to _prop_and_value_re
1015 so that ParseUCharHeader() parses the new enum constants
1017 * build ICU (make install)
1018 so that the tools build can pick up the new definitions from the installed header files.
1020 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1022 * build Unicode tools using CMake+make
1024 $ICU_SRC/tools/unicode/c/icudefs.txt:
1026 # Location (--prefix) of where ICU was installed.
1027 set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1028 # Location of the ICU4C source tree.
1029 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
1032 mkdir -p tools/unicode/c
1035 $ICU_ROOT/dbg/tools/unicode/c$
1036 cmake ../../../../../src/tools/unicode/c
1039 * generate core properties data files
1040 $ICU_ROOT/dbg/tools/unicode/c$
1041 genprops/genprops $ICU_SRC/icu4c
1042 - rebuild ICU (make install) & tools
1044 * write data for runtime, hardcoded for now
1045 - add genprops/layoutpropsbuilder.cpp with pieces from sibling files
1046 - generate new icu4c/source/common/ulayout_props_data.h
1047 - for each of the three new enumerated properties
1048 + int property max value
1049 + small, 8-bit UCPTrie
1050 (A small 16-bit trie with bit fields for these three properties
1051 is very nearly the same size as the sum of the three.)
1054 - uprops.cpp: #include ulayout_props_data.h
1055 - uprops.cpp: add getInPC() etc. functions
1056 - uprops.cpp: add lines to intProps[], include max values
1057 - uprops.h: add UPropertySource constants
1058 - uprops.cpp: add uprops_addPropertyStarts(src)
1059 - uniset_props.cpp: add to UnicodeSet_initInclusion()
1060 - intltest/ucdtest.cpp: write unit tests
1062 * update Java data files
1063 - refresh just the pnames.icu file with the new property [value] names, just to be safe
1064 - see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
1065 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1066 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1067 - copy the big-endian Unicode data files to another location,
1068 separate from the other data files,
1069 and then refresh ICU4J
1070 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1071 cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1072 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1075 - UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
1076 - UCharacterProperty.java: for each new property
1077 + create a nested class to hold its CodePointTrie
1078 + initialize it from a string literal
1079 + paste in the initializer printed by genprops
1080 + add a new IntProperty object to the intProps[] array
1081 + use the correct max int value for each property, also printed by genprops
1082 - UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
1083 - UnicodeSet.java: add to getInclusions()
1084 - UCharacterTest.java: write unit tests
1086 ---------------------------------------------------------------------------- ***
1088 Unicode 11.0 update for ICU 62
1090 http://www.unicode.org/versions/Unicode11.0.0/
1091 http://unicode.org/versions/beta-11.0.0.html
1092 https://www.unicode.org/review/pri372/
1093 http://www.unicode.org/reports/uax-proposed-updates.html
1094 http://www.unicode.org/reports/tr44/tr44-21.html
1096 * Command-line environment setup
1098 UNICODE_DATA=~/unidata/uni11/20180521
1099 CLDR_SRC=~/svn.cldr/uni
1100 ICU_ROOT=~/svn.icu/uni
1101 ICU_SRC=$ICU_ROOT/src
1103 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1104 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1105 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1109 - ticket:13630: Unicode 11
1110 - ^/branches/markus/uni11
1114 - cldrbug 10978: Unicode 11
1115 - ^/branches/markus/uni11
1117 *** Unicode version numbers
1120 - com.ibm.icu.util.VersionInfo
1121 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1123 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1124 so that the makefiles see the new version number.
1126 *** data files & enums & parser code
1129 - mkdir -p $UNICODE_DATA
1130 - download Unicode files into $UNICODE_DATA
1131 + subfolders: emoji, idna, security, ucd, uca
1132 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1134 * for manual diffs and for Unicode Tools input data updates:
1135 remove version suffixes from the file names
1136 ~$ unidata/desuffixucd.py $UNICODE_DATA
1137 (see https://sites.google.com/site/unicodetools/inputdata)
1139 * process and/or copy files
1140 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1141 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1142 + For debugging, and tweaking how ppucd.txt is written,
1143 the tool has an --only_ppucd option:
1144 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1146 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1148 * build ICU (make install)
1149 so that the tools build can pick up the new definitions from the installed header files.
1151 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1153 * preparseucd.py changes
1155 NameError: unknown property Extended_Pictographic
1156 -> add Extended_Pictographic binary property
1157 -> add new short names for all Emoji properties
1159 * new constants for new property values
1160 - preparseucd.py error:
1161 ValueError: missing uchar.h enum constants for some property values:
1162 [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
1163 u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
1164 u'Indic_Siyaq_Numbers'])),
1165 (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
1166 (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
1167 (u'GCB', set([u'LinkC', u'Virama'])),
1168 (u'WB', set([u'WSegSpace']))]
1169 = PropertyValueAliases.txt new property values (diff old & new .txt files)
1170 blk; Chess_Symbols ; Chess_Symbols
1172 blk; Georgian_Ext ; Georgian_Extended
1173 blk; Gunjala_Gondi ; Gunjala_Gondi
1174 blk; Hanifi_Rohingya ; Hanifi_Rohingya
1175 blk; Indic_Siyaq_Numbers ; Indic_Siyaq_Numbers
1176 blk; Makasar ; Makasar
1177 blk; Mayan_Numerals ; Mayan_Numerals
1178 blk; Medefaidrin ; Medefaidrin
1179 blk; Old_Sogdian ; Old_Sogdian
1180 blk; Sogdian ; Sogdian
1182 use long property names for enum constants,
1183 for the trailing comment get the block start code point: diff old & new Blocks.txt
1184 -> add to UCharacter.UnicodeBlock IDs
1185 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1186 replace public static final int \1_ID = \2; \3
1187 -> add to UCharacter.UnicodeBlock objects
1188 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1189 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1191 GCB; LinkC ; LinkingConsonant
1192 GCB; Virama ; Virama
1193 -> uchar.h & UCharacter.GraphemeClusterBreak
1194 -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
1196 InSC; Consonant_Initial_Postfixed ; Consonant_Initial_Postfixed
1197 -> ignore: ICU does not yet support this property
1199 jg ; Hanifi_Rohingya_Kinna_Ya ; Hanifi_Rohingya_Kinna_Ya
1200 jg ; Hanifi_Rohingya_Pa ; Hanifi_Rohingya_Pa
1201 -> uchar.h & UCharacter.JoiningGroup
1204 sc ; Gong ; Gunjala_Gondi
1206 sc ; Medf ; Medefaidrin
1207 sc ; Rohg ; Hanifi_Rohingya
1209 sc ; Sogo ; Old_Sogdian
1210 -> uscript.h & com.ibm.icu.lang.UScript
1211 -> Nushu had been added already
1212 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1213 and in com.ibm.icu.dev.test.lang.TestUScript.java
1215 WB ; WSegSpace ; WSegSpace
1216 -> uchar.h & UCharacter.WordBreak
1218 * New short names for emoji properties
1220 - short names set in preparseucd.py
1223 - boolean emoji property Extended_Pictographic
1224 -> added in preparseucd.py
1225 -> uchar.h & UProperty.java
1226 - misc. property Equivalent_Unified_Ideograph (EqUIdeo)
1227 as shown in PropertyValueAliases.txt
1230 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1231 (not strictly necessary for NOT_ENCODED scripts)
1232 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1234 * update spoof checker UnicodeSet initializers:
1235 inclusionPat & recommendedPat in uspoof.cpp
1236 INCLUSION & RECOMMENDED in SpoofChecker.java
1237 - make sure that the Unicode Tools tree contains the latest security data files
1238 - go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1239 - update the hardcoded version number there in the DIRECTORY path
1240 - run the tool (no special environment variables needed)
1241 - copy & paste from the Console output into the .cpp & .java files
1243 * generate normalization data files
1244 cd $ICU_ROOT/dbg/icu4c
1245 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1246 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
1247 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1248 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1249 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1251 * build ICU (make install)
1252 so that the tools build can pick up the new definitions from the installed header files.
1254 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1256 * build Unicode tools using CMake+make
1258 $ICU_SRC/tools/unicode/c/icudefs.txt:
1260 # Location (--prefix) of where ICU was installed.
1261 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1262 # Location of the ICU4C source tree.
1263 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
1266 mkdir -p tools/unicode/c
1269 $ICU_ROOT/dbg/tools/unicode/c$
1270 cmake ../../../../src/tools/unicode/c
1273 * generate core properties data files
1274 $ICU_ROOT/dbg/tools/unicode/c$
1275 genprops/genprops $ICU_SRC/icu4c
1276 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1277 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1278 - rebuild ICU (make install) & tools
1281 genprops error: casepropsbuilder: too many exceptions words
1282 genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
1283 - With the addition of Georgian Mtavruli capital letters,
1284 there are now too many simple case mappings with big mapping deltas
1285 that yield uncompressible exceptions.
1286 - Changing the data structure (now formatVersion 4),
1287 adding one bit for no-simple-case-folding (for Cherokee), and
1288 one optional slot for a big delta (for most faraway mappings),
1289 together with another bit for whether that is negative.
1290 This makes most Cherokee & Georgian etc. case mappings compressible,
1291 reducing the number of exceptions words.
1292 - Further changes to gain one more bit for the exceptions index,
1293 for future growth. Details see casepropsbuilder.cpp.
1295 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1296 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1297 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1298 - Unicode 6.0..11.0: U+2260, U+226E, U+226F
1299 - nothing new in this Unicode version, no test file to update
1301 * run & fix ICU4C tests
1302 - Andy handles RBBI & spoof check test failures
1304 - Errors in char.txt, word.txt, word_POSIX.txt like
1305 createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET" at line 46, column 16
1306 because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
1307 -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
1308 not empty, just to get ICU building.
1309 -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
1310 and properties together with the rules that used them (GB 10, WB 14).
1311 -> Andy adjusts the rule sets further to sync with
1312 Unicode 11 grapheme, word, and line break spec changes.
1314 * collation: CLDR collation root, UCA DUCET
1316 - UCA DUCET goes into Mark's Unicode tools, see
1317 https://sites.google.com/site/unicodetools/home#TOC-UCA
1318 diff the main mapping file, look for bad changes
1319 (for example, more bytes per weight for common characters)
1320 ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
1321 ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
1323 - CLDR root data files are checked into $CLDR_SRC/common/uca/
1324 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1326 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1327 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1328 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1329 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1330 (note removing the underscore before "Rules")
1331 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1332 - restore TODO diffs in UCARules.txt
1333 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1334 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1335 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1336 from the CLDR root files (..._CLDR_..._SHORT.txt)
1337 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1338 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1339 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1340 - if CLDR common/uca/unihan-index.txt changes, then update
1341 CLDR common/collation/root.xml <collation type="private-unihan">
1342 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1344 - run genuca, see command line above;
1346 Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1347 FDD1 1180B; [71 CC 02, 05, 05] # Dogra first primary (compressible)
1348 (add the character to genuca.cpp sampleCharsToScripts[])
1349 + look up the USCRIPT_ code for the new sample characters
1350 (should be obvious from the comment in the error output)
1351 + *add* mappings to sampleCharsToScripts[], do not replace them
1352 (in case the script sample characters flip-flop)
1353 + insert new scripts in DUCET script order, see the top_byte table
1354 at the beginning of FractionalUCA.txt
1358 https://sites.google.com/site/unicodetools/unihan
1360 org.unicode.draft.GenerateUnihanCollators
1363 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1364 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1365 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1366 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1369 org.unicode.draft.GenerateUnihanCollatorFiles
1370 with the same arguments
1373 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1374 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1377 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1378 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1379 - run CLDR unit tests, commit to CLDR
1380 - generate ICU zh collation data: run CLDR
1381 org.unicode.cldr.icu.NewLdml2IcuConverter
1382 with program arguments
1384 -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1385 -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1386 -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
1387 -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
1391 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1394 * run & fix ICU4C tests, now with new CLDR collation root data
1395 - run all tests with the collation test data *_SHORT.txt or the full files
1396 (the full ones have comments, useful for debugging)
1397 - note on intltest: if collate/UCAConformanceTest fails, then
1398 utility/MultithreadTest/TestCollators will fail as well;
1399 fix the conformance test before looking into the multi-thread test
1401 * update Java data files
1402 - refresh just the UCD/UCA-related/derived files, just to be safe
1403 - see (ICU4C)/source/data/icu4j-readme.txt
1404 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1405 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1408 Unicode .icu files built to ./out/build/icudt61l
1409 echo timestamp > uni-core-data
1410 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1411 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
1412 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1413 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1414 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
1415 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
1416 mkdir -p /tmp/icu4j/main/shared/data
1417 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1418 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
1419 mkdir -p /tmp/icu4j/main/shared/data
1420 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1421 make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
1422 - copy the big-endian Unicode data files to another location,
1423 separate from the other data files,
1424 and then refresh ICU4J
1425 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1426 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1427 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1428 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1429 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1430 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1431 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1432 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1433 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1434 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1436 * When refreshing all of ICU4J data from ICU4C
1437 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1438 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1440 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1442 * update CollationFCD.java
1443 + copy & paste the initializers of lcccIndex[] etc. from
1444 ICU4C/source/i18n/collationfcd.cpp to
1445 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1447 * refresh Java test .txt files
1448 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1449 cd $ICU_SRC/icu4c/source/data/unidata
1450 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1451 cd ../../test/testdata
1452 cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1453 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1455 * run & fix ICU4J tests
1458 - send notice to icu-design about new born-@stable API (enum constants etc.)
1460 *** CLDR numbering systems
1461 - look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1462 Unicode 11: using Unicode 11 CLDR ticket #10978
1463 rohg 10D30..10D39 Hanifi_Rohingya
1464 gong 11DA0..11DA9 Gunjala_Gondi
1465 Earlier: CLDR tickets specific to adding new numbering systems.
1466 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1467 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1469 *** merge the Unicode update branches back onto the trunk
1470 - do not merge the icudata.jar and testdata.jar,
1471 instead rebuild them from merged & tested ICU4C
1472 - make sure that changes to Unicode tools are checked in:
1473 http://www.unicode.org/utility/trac/log/trunk/unicodetools
1475 ---------------------------------------------------------------------------- ***
1477 Unicode 10.0 update for ICU 60
1479 http://www.unicode.org/versions/Unicode10.0.0/
1480 http://www.unicode.org/versions/beta-10.0.0.html
1481 http://blog.unicode.org/2017/03/unicode-100-beta-review.html
1482 http://www.unicode.org/review/pri350/
1483 http://www.unicode.org/reports/uax-proposed-updates.html
1484 http://www.unicode.org/reports/tr44/tr44-19.html
1486 * Command-line environment setup
1488 UNICODE_DATA=~/unidata/uni10/20170605
1489 CLDR_SRC=~/svn.cldr/uni10
1490 ICU_ROOT=~/svn.icu/uni10
1491 ICU_SRC=$ICU_ROOT/src
1493 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1494 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1495 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1499 - ticket:12985: Unicode 10
1500 - ticket:13061: undo hacks from emoji 5.0 update
1501 - ticket:13062: add Emoji_Component property
1502 - ^/branches/markus/uni10
1506 - cldrbug 10055: Unicode 10
1507 - cldrbug 9882: Unicode 10 script metadata
1508 - cldrbug 10219: numbering systems for Unicode 10
1510 *** Unicode version numbers
1513 - com.ibm.icu.util.VersionInfo
1514 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1516 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1517 so that the makefiles see the new version number.
1519 *** data files & enums & parser code
1522 - mkdir -p $UNICODE_DATA
1523 - download Unicode 10.0 files into $UNICODE_DATA
1524 + subfolders: ucd, uca, idna, security
1525 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1526 - download emoji 5.0 files into $UNICODE_DATA/emoji
1528 * for manual diffs: remove version suffixes from the file names
1529 ~$ unidata/desuffixucd.py $UNICODE_DATA
1530 (see https://sites.google.com/site/unicodetools/inputdata)
1532 * process and/or copy files
1533 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1534 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1535 + For debugging, and tweaking how ppucd.txt is written,
1536 the tool has an --only_ppucd option:
1537 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1539 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1541 * build ICU (make install)
1542 so that the tools build can pick up the new definitions from the installed header files.
1544 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1546 * preparseucd.py changes
1547 - remove or add new Unicode scripts from/to the
1548 only-in-ISO-15924 list according to the error messages:
1549 ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
1550 -> adjust _scripts_only_in_iso15924 as indicated
1552 Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
1553 -> add vo=Vertical_Orientation to _ignored_properties
1554 -> later removed again, parsing the file, even though we do not yet store data for runtime use
1556 * new constants for new property values
1557 - preparseucd.py error:
1558 ValueError: missing uchar.h enum constants for some property values:
1559 [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
1560 u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
1561 (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
1562 u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
1563 u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
1564 (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
1565 = PropertyValueAliases.txt new property values (diff old & new .txt files)
1566 blk; CJK_Ext_F ; CJK_Unified_Ideographs_Extension_F
1567 blk; Kana_Ext_A ; Kana_Extended_A
1568 blk; Masaram_Gondi ; Masaram_Gondi
1570 blk; Soyombo ; Soyombo
1571 blk; Syriac_Sup ; Syriac_Supplement
1572 blk; Zanabazar_Square ; Zanabazar_Square
1574 use long property names for enum constants,
1575 for the trailing comment get the block start code point: diff old & new Blocks.txt
1576 -> add to UCharacter.UnicodeBlock IDs
1577 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1578 replace public static final int \1_ID = \2; \3
1579 -> add to UCharacter.UnicodeBlock objects
1580 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1581 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1583 jg ; Malayalam_Bha ; Malayalam_Bha
1584 jg ; Malayalam_Ja ; Malayalam_Ja
1585 jg ; Malayalam_Lla ; Malayalam_Lla
1586 jg ; Malayalam_Llla ; Malayalam_Llla
1587 jg ; Malayalam_Nga ; Malayalam_Nga
1588 jg ; Malayalam_Nna ; Malayalam_Nna
1589 jg ; Malayalam_Nnna ; Malayalam_Nnna
1590 jg ; Malayalam_Nya ; Malayalam_Nya
1591 jg ; Malayalam_Ra ; Malayalam_Ra
1592 jg ; Malayalam_Ssa ; Malayalam_Ssa
1593 jg ; Malayalam_Tta ; Malayalam_Tta
1594 -> uchar.h & UCharacter.JoiningGroup
1596 sc ; Gonm ; Masaram_Gondi
1599 sc ; Zanb ; Zanabazar_Square
1600 -> uscript.h & com.ibm.icu.lang.UScript
1601 -> Nushu had been added already
1602 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1603 and in com.ibm.icu.dev.test.lang.TestUScript.java
1605 * New properties as shown in PropertyValueAliases.txt changes
1606 - boolean Emoji_Component from emoji 5
1607 -> uchar.h & UProperty.java
1609 # Regional_Indicator (RI)
1611 RI ; N ; No ; F ; False
1612 RI ; Y ; Yes ; T ; True
1613 -> uchar.h & UProperty.java
1614 -> single immutable range, to be hardcoded
1616 # Prepended_Concatenation_Mark (PCM)
1618 PCM; N ; No ; F ; False
1619 PCM; Y ; Yes ; T ; True
1620 -> was new in Unicode 9
1621 -> uchar.h & UProperty.java
1623 # Vertical_Orientation (vo)
1626 vo ; Tr ; Transformed_Rotated
1627 vo ; Tu ; Transformed_Upright
1629 -> only pre-parsed for now, but not yet stored for runtime use
1631 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1632 (not strictly necessary for NOT_ENCODED scripts)
1633 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1635 * generate normalization data files
1636 cd $ICU_ROOT/dbg/icu4c
1637 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1638 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
1639 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1640 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1641 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1643 * build ICU (make install)
1644 so that the tools build can pick up the new definitions from the installed header files.
1646 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1648 * build Unicode tools using CMake+make
1650 $ICU_SRC/tools/unicode/c/icudefs.txt:
1652 # Location (--prefix) of where ICU was installed.
1653 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1654 # Location of the ICU4C source tree.
1655 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
1657 $ICU_ROOT/dbg/tools/unicode/c$
1658 cmake ../../../../src/tools/unicode/c
1661 * generate core properties data files
1662 $ICU_ROOT/dbg/tools/unicode/c$
1663 genprops/genprops $ICU_SRC/icu4c
1664 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1665 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1666 - rebuild ICU (make install) & tools
1668 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1669 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1670 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1671 - Unicode 6.0..10.0: U+2260, U+226E, U+226F
1672 - nothing new in this Unicode version, no test file to update
1674 * run & fix ICU4C tests
1675 - Andy handles RBBI & spoof check test failures
1677 * collation: CLDR collation root, UCA DUCET
1679 - UCA DUCET goes into Mark's Unicode tools, see
1680 https://sites.google.com/site/unicodetools/home#TOC-UCA
1681 - CLDR root data files are checked into $CLDR_SRC/common/uca/
1682 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1684 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1685 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1686 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1687 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1688 (note removing the underscore before "Rules")
1689 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1690 - restore TODO diffs in UCARules.txt
1691 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1692 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1693 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1694 from the CLDR root files (..._CLDR_..._SHORT.txt)
1695 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1696 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1697 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1698 - if CLDR common/uca/unihan-index.txt changes, then update
1699 CLDR common/collation/root.xml <collation type="private-unihan">
1700 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1702 - run genuca, see command line above;
1704 Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
1705 FDD1 11D10; [70 D5 02, 05, 05] # Masaram_Gondi first primary (compressible)
1706 (add the character to genuca.cpp sampleCharsToScripts[])
1707 + look up the USCRIPT_ code for the new sample characters
1708 (should be obvious from the comment in the error output)
1709 + *add* mappings to sampleCharsToScripts[], do not replace them
1710 (in case the script sample characters flip-flop)
1711 + insert new scripts in DUCET script order, see the top_byte table
1712 at the beginning of FractionalUCA.txt
1716 https://sites.google.com/site/unicodetools/unihan
1718 org.unicode.draft.GenerateUnihanCollators
1721 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1722 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1723 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1724 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1727 org.unicode.draft.GenerateUnihanCollatorFiles
1728 with the same arguments
1731 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1732 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1735 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1736 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1737 - run CLDR unit tests, commit to CLDR
1738 - generate ICU zh collation data: run CLDR
1739 org.unicode.cldr.icu.NewLdml2IcuConverter
1740 with program arguments
1742 -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
1743 -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
1744 -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
1745 -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
1749 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1752 * run & fix ICU4C tests, now with new CLDR collation root data
1753 - run all tests with the collation test data *_SHORT.txt or the full files
1754 (the full ones have comments, useful for debugging)
1755 - note on intltest: if collate/UCAConformanceTest fails, then
1756 utility/MultithreadTest/TestCollators will fail as well;
1757 fix the conformance test before looking into the multi-thread test
1759 * update Java data files
1760 - refresh just the UCD/UCA-related/derived files, just to be safe
1761 - see (ICU4C)/source/data/icu4j-readme.txt
1762 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1763 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1766 Unicode .icu files built to ./out/build/icudt60l
1767 echo timestamp > uni-core-data
1768 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1769 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
1770 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1771 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1772 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
1773 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
1774 mkdir -p /tmp/icu4j/main/shared/data
1775 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1776 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
1777 mkdir -p /tmp/icu4j/main/shared/data
1778 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1779 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
1780 - copy the big-endian Unicode data files to another location,
1781 separate from the other data files,
1782 and then refresh ICU4J
1783 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1784 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1785 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1786 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1787 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1788 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1789 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1790 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1791 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1792 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1794 * When refreshing all of ICU4J data from ICU4C
1795 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1796 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1798 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1800 * update CollationFCD.java
1801 + copy & paste the initializers of lcccIndex[] etc. from
1802 ICU4C/source/i18n/collationfcd.cpp to
1803 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1805 * refresh Java test .txt files
1806 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1807 cd $ICU_SRC/icu4c/source/data/unidata
1808 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1809 cd ../../test/testdata
1810 cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1811 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1813 * run & fix ICU4J tests
1816 - send notice to icu-design about new born-@stable API (enum constants etc.)
1818 *** CLDR numbering systems
1819 - look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
1820 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1821 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1823 *** merge the Unicode update branches back onto the trunk
1824 - do not merge the icudata.jar and testdata.jar,
1825 instead rebuild them from merged & tested ICU4C
1826 - make sure that changes to Unicode tools are checked in:
1827 http://www.unicode.org/utility/trac/log/trunk/unicodetools
1829 ---------------------------------------------------------------------------- ***
1831 Emoji 5.0 update for ICU 59
1832 - ICU 59 mostly remains on Unicode 9.0
1833 - except updates bidi and segmentation data to Unicode 10 beta
1835 First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
1837 * Command-line environment setup
1839 ICU_ROOT=~/svn.icu/trunk
1840 ICU_SRC_DIR=$ICU_ROOT/src
1841 ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
1843 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1844 SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
1845 UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
1849 - ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
1850 - changes directly on trunk
1852 *** data files & enums & parser code
1856 - download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
1857 - download emoji 5.0 beta files into the same uni90e50 folder
1858 - download Unicode 10.0 beta files: ucd
1859 + copy Unicode 10 bidi files to the uni90e50/ucd folder:
1861 BidiCharacterTest.txt
1864 extracted/DerivedBidiClass.txt
1865 + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
1869 * preparseucd.py changes
1870 - adjust for combined trunks
1871 - write new copyright lines
1872 - ignore new Emoji_Component property for now
1874 * process and/or copy files
1875 - ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
1876 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1878 - cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
1880 * build ICU (make install)
1881 so that the tools build can pick up the new definitions from the installed header files.
1883 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1885 * build Unicode tools using CMake+make
1887 ~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
1889 # Location (--prefix) of where ICU was installed.
1890 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1891 # Location of the ICU4C source tree.
1892 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
1894 ~/svn.icu/trunk/dbg/tools/unicode/c$
1895 cmake ../../../../src/tools/unicode/c
1898 * generate core properties data files
1899 ~/svn.icu/trunk/dbg/tools/unicode/c$
1900 genprops/genprops $ICU4C_SRC_DIR
1901 - rebuild ICU (make install) & tools
1903 * run & fix ICU4C tests
1904 - Andy handles RBBI & spoof check test failures
1906 * update Java data files
1907 - refresh just the UCD/UCA-related/derived files, just to be safe
1908 - see (ICU4C)/source/data/icu4j-readme.txt
1910 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1913 Unicode .icu files built to ./out/build/icudt59l
1914 echo timestamp > uni-core-data
1915 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1916 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
1917 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1918 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1919 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
1920 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
1921 mkdir -p /tmp/icu4j/main/shared/data
1922 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1923 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
1924 mkdir -p /tmp/icu4j/main/shared/data
1925 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1926 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
1927 - copy the big-endian Unicode data files to another location,
1928 separate from the other data files,
1929 and then refresh ICU4J
1930 cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
1931 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1932 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1933 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1934 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1935 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1936 jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1938 * When refreshing all of ICU4J data from ICU4C
1939 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1940 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
1942 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
1944 * refresh Java test .txt files
1945 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1946 cd $ICU4C_SRC_DIR/source/data/unidata
1947 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1948 cd ../../test/testdata
1949 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1950 cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1952 * run & fix ICU4J tests
1954 ---------------------------------------------------------------------------- ***
1956 Unicode 9.0 update for ICU 58
1958 * Command-line environment setup
1960 ICU_ROOT=~/svn.icu/trunk
1961 ICU_SRC_DIR=$ICU_ROOT/src
1963 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1964 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
1965 UNIDATA=$ICU_SRC_DIR/source/data/unidata
1967 http://www.unicode.org/review/pri323/ -- beta review
1968 http://www.unicode.org/reports/uax-proposed-updates.html
1969 http://www.unicode.org/versions/beta-9.0.0.html
1970 http://www.unicode.org/versions/Unicode9.0.0/
1971 http://www.unicode.org/reports/tr44/tr44-17.html
1975 - ticket:12526: integrate Unicode 9
1976 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
1977 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
1981 - cldrbug 9414: UCA 9
1982 - ^/branches/markus/uni90 at r11518 from trunk at r11517
1984 - cldrbug 8745: Unicode 9.0 script metadata
1986 *** Unicode version numbers
1989 - com.ibm.icu.util.VersionInfo
1990 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1992 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1993 so that the makefiles see the new version number.
1995 *** data files & enums & parser code
1999 - download UCD & IDNA files
2000 - make sure that the Unicode data folder passed into preparseucd.py
2001 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2002 - only for manual diffs: remove version suffixes from the file names
2003 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2004 (see https://sites.google.com/site/unicodetools/inputdata)
2005 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2006 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2007 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2009 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
2010 and copy to $UNIDATA
2011 cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
2013 * preparseucd.py changes
2014 - remove or add new Unicode scripts from/to the
2015 only-in-ISO-15924 list according to the error messages:
2016 ValueError: remove ['Tang'] from _scripts_only_in_iso15924
2017 ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
2018 ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
2019 ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
2020 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2021 and in com.ibm.icu.dev.test.lang.TestUScript.java
2022 - DerivedNumericValues.txt new numeric values
2023 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
2024 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH
2025 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS
2026 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH
2027 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS
2028 -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
2029 uchar.c, UCharacterProperty.java
2030 to support a new series of values
2031 - adjust preparseucd.py for Tangut algorithmic names
2033 algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
2035 algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
2036 - avoid block-compressing most String/Miscellaneous property values,
2037 triggered by genprops not coping with a multi-code point Case_Folding on
2038 block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
2039 keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
2041 * PropertyAliases.txt changes
2042 - 1 new property PCM=Prepended_Concatenation_Mark
2043 Ignore: Only useful for layout engines.
2044 Ok to list in ppucd.txt.
2046 * PropertyValueAliases.txt new property values
2048 blk; Bhaiksuki ; Bhaiksuki
2049 blk; Cyrillic_Ext_C ; Cyrillic_Extended_C
2050 blk; Glagolitic_Sup ; Glagolitic_Supplement
2051 blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation
2052 blk; Marchen ; Marchen
2053 blk; Mongolian_Sup ; Mongolian_Supplement
2056 blk; Tangut ; Tangut
2057 blk; Tangut_Components ; Tangut_Components
2059 use long property names for enum constants
2060 -> add to UCharacter.UnicodeBlock IDs
2061 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2062 replace public static final int \1_ID = \2; \3
2063 -> add to UCharacter.UnicodeBlock objects
2064 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2065 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2068 GCB; EBG ; E_Base_GAZ
2069 GCB; EM ; E_Modifier
2070 GCB; GAZ ; Glue_After_Zwj
2072 -> uchar.h & UCharacter.GraphemeClusterBreak
2074 jg ; African_Feh ; African_Feh
2075 jg ; African_Noon ; African_Noon
2076 jg ; African_Qaf ; African_Qaf
2077 -> uchar.h & UCharacter.JoiningGroup
2080 lb ; EM ; E_Modifier
2082 -> uchar.h & UCharacter.LineBreak
2085 sc ; Bhks ; Bhaiksuki
2090 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2093 WB ; EBG ; E_Base_GAZ
2094 WB ; EM ; E_Modifier
2095 WB ; GAZ ; Glue_After_Zwj
2097 -> uchar.h & UCharacter.WordBreak
2099 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2100 (not strictly necessary for NOT_ENCODED scripts)
2101 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2103 * generate normalization data files
2105 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2106 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2107 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2108 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2109 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2111 * build ICU (make install)
2112 so that the tools build can pick up the new definitions from the installed header files.
2114 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
2116 * build Unicode tools using CMake+make
2118 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2120 # Location (--prefix) of where ICU was installed.
2121 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2122 # Location of the ICU source tree.
2123 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2125 ~/svn.icutools/trunk/dbg/unicode/c$
2126 cmake ../../../src/unicode/c
2129 * generate core properties data files
2130 ~/svn.icutools/trunk/dbg/unicode/c$
2131 genprops/genprops $ICU_SRC_DIR
2132 genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2133 genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2134 - rebuild ICU (make install) & tools
2136 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2137 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2138 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2139 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
2140 - nothing new in 9.0, no test file to update
2142 * run & fix ICU4C tests
2143 - Andy handles RBBI & spoof check test failures
2145 * collation: CLDR collation root, UCA DUCET
2147 - UCA DUCET goes into Mark's Unicode tools, see
2148 https://sites.google.com/site/unicodetools/home#TOC-UCA
2149 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2150 cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
2152 - cd (CLDR UCA branch)/common/uca/
2153 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2154 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2155 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2156 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2157 (note removing the underscore before "Rules")
2158 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2159 - restore TODO diffs in UCARules.txt
2160 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2161 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2162 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2163 from the CLDR root files (..._CLDR_..._SHORT.txt)
2164 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2165 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2166 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2167 - if CLDR common/uca/unihan-index.txt changes, then update
2168 CLDR common/collation/root.xml <collation type="private-unihan">
2169 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2171 - run genuca, see command line above;
2173 Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
2174 FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)
2175 (add the character to genuca.cpp sampleCharsToScripts[])
2176 + look up the USCRIPT_ code for the new sample characters
2177 (should be obvious from the comment in the error output)
2178 + *add* mappings to sampleCharsToScripts[], do not replace them
2179 (in case the script sample characters flip-flop)
2180 + insert new scripts in DUCET script order, see the top_byte table
2181 at the beginning of FractionalUCA.txt
2186 org.unicode.draft.GenerateUnihanCollators
2188 -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
2189 -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
2190 -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
2191 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2195 org.unicode.draft.GenerateUnihanCollatorFiles
2196 with the same arguments
2199 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2200 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2203 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2204 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2206 - generate ICU zh collation data: run CLDR
2207 org.unicode.cldr.icu.NewLdml2IcuConverter
2208 with program arguments
2210 -s /home/mscherer/svn.cldr/trunk/common/collation
2211 -m /home/mscherer/svn.cldr/trunk/common/supplemental
2212 -d /home/mscherer/svn.icu/trunk/src/source/data/coll
2213 -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
2216 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2219 * run & fix ICU4C tests, now with new CLDR collation root data
2220 - run all tests with the collation test data *_SHORT.txt or the full files
2221 (the full ones have comments, useful for debugging)
2222 - note on intltest: if collate/UCAConformanceTest fails, then
2223 utility/MultithreadTest/TestCollators will fail as well;
2224 fix the conformance test before looking into the multi-thread test
2226 * update Java data files
2227 - refresh just the UCD/UCA-related/derived files, just to be safe
2228 - see (ICU4C)/source/data/icu4j-readme.txt
2230 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2233 Unicode .icu files built to ./out/build/icudt58l
2234 echo timestamp > uni-core-data
2235 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2236 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
2237 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2238 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2239 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
2240 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
2241 mkdir -p /tmp/icu4j/main/shared/data
2242 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2243 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
2244 mkdir -p /tmp/icu4j/main/shared/data
2245 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2246 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2247 - copy the big-endian Unicode data files to another location,
2248 separate from the other data files,
2249 and then refresh ICU4J
2250 cd ~/svn.icu/trunk/dbg/data/out/icu4j
2251 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2252 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2253 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2254 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2255 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2256 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2257 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2258 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2259 jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2261 * When refreshing all of ICU4J data from ICU4C
2262 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2263 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2265 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2267 * update CollationFCD.java
2268 + copy & paste the initializers of lcccIndex[] etc. from
2269 ICU4C/source/i18n/collationfcd.cpp to
2270 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2272 * refresh Java test .txt files
2273 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2274 cd $ICU_SRC_DIR/source/data/unidata
2275 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2276 cd ../../test/testdata
2277 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2278 cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2280 * run & fix ICU4J tests
2282 *** LayoutEngine script information
2284 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2285 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2286 in the working directory.
2288 (It also generates ScriptRunData.cpp, which is no longer needed.)
2290 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2292 which maps ICU versions to the numbers of script/language constants
2293 that were added then.
2294 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2296 The generated files have a current copyright date and "@deprecated" statement.
2298 * Review changes, fix Java tool if necessary, and copy to ICU4C
2299 cd ~/svn.icu4j/trunk/src
2300 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2301 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2302 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2305 - send notice to icu-design about new born-@stable API (enum constants etc.)
2307 *** merge the Unicode update branches back onto the trunk
2308 - do not merge the icudata.jar and testdata.jar,
2309 instead rebuild them from merged & tested ICU4C
2310 - make sure that changes to Unicode tools & ICU tools are checked in
2311 http://www.unicode.org/utility/trac/log/trunk/unicodetools
2312 http://bugs.icu-project.org/trac/log/tools/trunk
2314 ---------------------------------------------------------------------------- ***
2316 New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
2319 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
2320 - new combination/alias codes: Hanb, Jamo
2321 - used in CLDR 29 and in spoof checker
2324 Add new codes to uscript.h & UScript.java, see Unicode update logs.
2325 -> com.ibm.icu.lang.UScript
2326 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2327 replace public static final int \1 = \2; \3
2329 Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
2330 add new script codes.
2331 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
2333 Note: If we have to run preparseucd.py again before the Unicode 9 update,
2334 then we need to manually keep/restore the new script codes.
2336 ICU_ROOT=~/svn.icu/trunk
2337 ICU_SRC_DIR=$ICU_ROOT/src
2339 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2340 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2341 UNIDATA=$ICU_SRC_DIR/source/data/unidata
2343 Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
2344 see http://bugs.icu-project.org/trac/ticket/12141
2346 make install, then icutools cmake & make, then
2347 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2349 Generate Java data as usual, only update pnames.icu & uprops.icu.
2351 *** LayoutEngine script information
2353 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2354 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2355 in the working directory.
2357 (It also generates ScriptRunData.cpp, which is no longer needed.)
2359 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2361 which maps ICU versions to the numbers of script/language constants
2362 that were added then.
2363 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2365 The generated files have a current copyright date and "@deprecated" statement.
2367 * Review changes, fix Java tool if necessary, and copy to ICU4C
2368 cd ~/svn.icu4j/trunk/src
2369 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2370 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2371 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2373 ---------------------------------------------------------------------------- ***
2375 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
2377 Edit preparseucd.py to add & parse new properties.
2378 They share the UCD property namespace but are not listed in PropertyAliases.txt.
2380 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
2381 Initial data from emoji/2.0/
2383 ICU_ROOT=~/svn.icu/trunk
2384 ICU_SRC_DIR=$ICU_ROOT/src
2386 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2387 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2388 UNIDATA=$ICU_SRC_DIR/source/data/unidata
2390 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
2392 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2393 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
2395 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
2397 make install, then icutools cmake & make, then
2398 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2400 Generate Java data as usual, only update pnames.icu & uprops.icu.
2402 ---------------------------------------------------------------------------- ***
2404 Unicode 8.0 update for ICU 56
2406 * Command-line environment setup
2408 ICU_ROOT=~/svn.icu/trunk
2409 ICU_SRC_DIR=$ICU_ROOT/src
2411 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2412 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2413 UNIDATA=$ICU_SRC_DIR/source/data/unidata
2415 http://www.unicode.org/review/pri297/ -- beta review
2416 http://www.unicode.org/reports/uax-proposed-updates.html
2417 http://unicode.org/versions/beta-8.0.0.html
2418 http://www.unicode.org/versions/Unicode8.0.0/
2419 http://www.unicode.org/reports/tr44/tr44-15.html
2423 - ticket:11574: Unicode 8
2424 - C++ branches/markus/uni80 at r37351 from trunk at r37343
2425 - Java branches/markus/uni80 at r37352 from trunk at r37338
2429 - cldrbug 8311: UCA 8
2430 - branches/markus/uni80 at r11518 from trunk at r11517
2432 - cldrbug 8109: Unicode 8.0 script metadata
2433 - cldrbug 8418: Updated segmentation for Unicode 8.0
2435 *** Unicode version numbers
2438 - com.ibm.icu.util.VersionInfo
2439 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2441 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2442 so that the makefiles see the new version number.
2444 *** data files & enums & parser code
2448 - download UCD & IDNA files
2449 - make sure that the Unicode data folder passed into preparseucd.py
2450 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2451 - only for manual diffs: remove version suffixes from the file names
2452 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2453 (see https://sites.google.com/site/unicodetools/inputdata)
2454 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2455 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2456 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2458 - also: from http://unicode.org/Public/security/8.0.0/ download new
2459 confusables.txt & confusablesWholeScript.txt
2460 and copy to $UNIDATA
2461 ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
2462 ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
2464 * initial preparseucd.py changes
2465 - remove new Unicode scripts from the
2466 only-in-ISO-15924 list according to the error message:
2467 ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
2468 from _scripts_only_in_iso15924
2469 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2470 and in com.ibm.icu.dev.test.lang.TestUScript.java
2471 - property and file name change:
2472 IndicMatraCategory -> IndicPositionalCategory
2473 - UnicodeData.txt unusual numeric values (improper fractions)
2474 109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
2475 109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
2476 109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
2477 109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
2478 109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
2479 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
2480 109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
2481 109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
2482 109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
2483 109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
2484 -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
2485 which are listed in DerivedNumericValues.txt;
2486 keeps storage in data file simple
2488 * PropertyValueAliases.txt changes
2489 - 10 new Block (blk) values:
2491 blk; Anatolian_Hieroglyphs ; Anatolian_Hieroglyphs
2492 blk; Cherokee_Sup ; Cherokee_Supplement
2493 blk; CJK_Ext_E ; CJK_Unified_Ideographs_Extension_E
2494 blk; Early_Dynastic_Cuneiform ; Early_Dynastic_Cuneiform
2495 blk; Hatran ; Hatran
2496 blk; Multani ; Multani
2497 blk; Old_Hungarian ; Old_Hungarian
2498 blk; Sup_Symbols_And_Pictographs ; Supplemental_Symbols_And_Pictographs
2499 blk; Sutton_SignWriting ; Sutton_SignWriting
2501 use long property names for enum constants
2502 -> add to UCharacter.UnicodeBlock IDs
2503 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2504 replace public static final int \1_ID = \2; \3
2505 -> add to UCharacter.UnicodeBlock objects
2506 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2507 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2508 - 6 new Script (sc) values:
2511 sc ; Hluw ; Anatolian_Hieroglyphs
2512 sc ; Hung ; Old_Hungarian
2514 sc ; Sgnw ; SignWriting
2515 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2517 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2518 (not strictly necessary for NOT_ENCODED scripts)
2519 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2521 * generate normalization data files
2523 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2524 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2525 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2526 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2527 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2529 * build ICU (make install)
2530 so that the tools build can pick up the new definitions from the installed header files.
2532 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2534 * build Unicode tools using CMake+make
2536 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2538 # Location (--prefix) of where ICU was installed.
2539 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2540 # Location of the ICU source tree.
2541 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2543 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2544 ~/svn.icutools/trunk/dbg/unicode/c$ make
2546 * generate core properties data files
2547 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2548 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2549 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2550 - rebuild ICU (make install) & tools
2551 - run genuca again (see step above) so that it picks up the new nfc.nrm
2552 - rebuild ICU (make install) & tools
2554 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2555 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2556 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2557 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
2558 - nothing new in 8.0, no test file to update
2560 * run & fix ICU4C tests
2561 - bad Cherokee case folding due to difference in fallbacks:
2562 UCD case folding falls back to no mapping,
2563 ICU runtime case folding falls back to lowercasing;
2564 fixed casepropsbuilder.cpp to generate scf mappings to self
2565 when there is an slc mapping but no scf
2566 - Andy handles RBBI & spoof check test failures
2568 * collation: CLDR collation root, UCA DUCET
2570 - UCA DUCET goes into Mark's Unicode tools, see
2571 https://sites.google.com/site/unicodetools/home#TOC-UCA
2572 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2573 - cd (CLDR UCA branch)/common/uca/
2574 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2575 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2576 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2577 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2578 (note removing the underscore before "Rules")
2579 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2580 - restore TODO diffs in UCARules.txt
2581 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2582 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
2583 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2584 from the CLDR root files (..._CLDR_..._SHORT.txt)
2585 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2586 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2587 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2588 - if CLDR common/uca/unihan-index.txt changes, then update
2589 CLDR common/collation/root.xml <collation type="private-unihan">
2590 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2591 - run genuca, see command line above;
2593 Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
2594 (add the character to genuca.cpp sampleCharsToScripts[])
2595 + look up the script for the new sample characters
2596 (e.g., in FractionalUCA.txt)
2597 + *add* mappings to sampleCharsToScripts[], do not replace them
2598 (in case the script sample characters flip-flop)
2599 + insert new scripts in DUCET script order, see the top_byte table
2600 at the beginning of FractionalUCA.txt
2603 * run & fix ICU4C tests, now with new CLDR collation root data
2604 - run all tests with the collation test data *_SHORT.txt or the full files
2605 (the full ones have comments, useful for debugging)
2606 - note on intltest: if collate/UCAConformanceTest fails, then
2607 utility/MultithreadTest/TestCollators will fail as well;
2608 fix the conformance test before looking into the multi-thread test
2609 - fixed bug in CollationWeights::getWeightRanges()
2610 exposed by new data and CollationTest::TestRootElements
2612 * update Java data files
2613 - refresh just the UCD/UCA-related/derived files, just to be safe
2614 - see (ICU4C)/source/data/icu4j-readme.txt
2616 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2619 Unicode .icu files built to ./out/build/icudt56l
2620 echo timestamp > uni-core-data
2621 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2622 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
2623 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2624 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2625 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
2626 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
2627 mkdir -p /tmp/icu4j/main/shared/data
2628 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2629 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
2630 mkdir -p /tmp/icu4j/main/shared/data
2631 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2632 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2633 - copy the big-endian Unicode data files to another location,
2634 separate from the other data files,
2635 and then refresh ICU4J
2636 cd ~/svn.icu/trunk/dbg/data/out/icu4j
2637 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2638 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2639 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2640 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2641 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2642 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2643 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2644 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2645 jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2647 * When refreshing all of ICU4J data from ICU4C
2648 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2649 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2651 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2653 * update CollationFCD.java
2654 + copy & paste the initializers of lcccIndex[] etc. from
2655 ICU4C/source/i18n/collationfcd.cpp to
2656 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2658 * refresh Java test .txt files
2659 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2660 cd $ICU_SRC_DIR/source/data/unidata
2661 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2662 cd ../../test/testdata
2663 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2664 cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2666 * run & fix ICU4J tests
2668 *** LayoutEngine script information
2670 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
2671 because the layout engine was deprecated in ICU 54.
2672 Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
2673 to write lines that we used to add manually.
2675 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2676 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2677 in the working directory.
2679 (It also generates ScriptRunData.cpp, which is no longer needed.)
2681 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2683 which maps ICU versions to the numbers of script/language constants
2684 that were added then.
2685 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2687 The generated files have a current copyright date and "@deprecated" statement.
2689 * Review changes, fix Java tool if necessary, and copy to ICU4C
2690 cd ~/svn.icu4j/trunk/src
2691 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2692 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2693 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2696 - send notice to icu-design about new born-@stable API (enum constants etc.)
2698 *** merge the Unicode update branches back onto the trunk
2699 - do not merge the icudata.jar and testdata.jar,
2700 instead rebuild them from merged & tested ICU4C
2701 - make sure that changes to Unicode tools & ICU tools are checked in
2702 http://www.unicode.org/utility/trac/log/trunk/unicodetools
2703 http://bugs.icu-project.org/trac/log/tools/trunk
2705 ---------------------------------------------------------------------------- ***
2707 Unicode 7.0 update for ICU 54
2709 http://www.unicode.org/review/pri271/ -- beta review
2710 http://www.unicode.org/reports/uax-proposed-updates.html
2711 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
2712 http://www.unicode.org/reports/tr44/tr44-13.html
2716 - ticket 10821: Unicode 7.0, UCA 7.0
2717 - C++ branches/markus/uni70 at r35584 from trunk at r35580
2718 - Java branches/markus/uni70 at r35587 from trunk at r35545
2722 - ticket 7195: UCA 7.0 CLDR root collation
2723 - branches/markus/uni70 at r10062 from trunk at r10061
2725 - ticket 6762: script metadata for Unicode 7.0 new scripts
2727 *** Unicode version numbers
2730 - com.ibm.icu.util.VersionInfo
2731 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2733 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2734 so that the makefiles see the new version number.
2736 *** data files & enums & parser code
2740 - download UCD & IDNA files
2741 - make sure that the Unicode data folder passed into preparseucd.py
2742 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2743 - only for manual diffs: remove version suffixes from the file names
2744 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2745 (see https://sites.google.com/site/unicodetools/inputdata)
2746 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2747 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2748 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2749 - Restore TODO diffs in source/data/unidata/UCARules.txt
2751 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
2752 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
2754 - also: from http://unicode.org/Public/security/7.0.0/ download new
2755 confusables.txt & confusablesWholeScript.txt
2756 and copy to $ICU_ROOT/src/source/data/unidata/
2758 * initial preparseucd.py changes
2759 - remove new Unicode scripts from the
2760 only-in-ISO-15924 list according to the error message:
2761 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
2762 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
2763 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
2764 from _scripts_only_in_iso15924
2765 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2766 and in com.ibm.icu.dev.test.lang.TestUScript.java
2767 - NamesList.txt now has a heading with a non-ASCII character
2768 + keep ppucd.txt in platform charset, rather than changing tool/test parsers
2769 + escape non-ASCII characters in heading comments
2770 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
2771 + get the copyright from the first file whose copyright line contains the current year
2773 * PropertyValueAliases.txt changes
2774 - 32 new Block (blk) values:
2775 blk; Bassa_Vah ; Bassa_Vah
2776 blk; Caucasian_Albanian ; Caucasian_Albanian
2777 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers
2778 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended
2779 blk; Duployan ; Duployan
2780 blk; Elbasan ; Elbasan
2781 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended
2782 blk; Grantha ; Grantha
2783 blk; Khojki ; Khojki
2784 blk; Khudawadi ; Khudawadi
2785 blk; Latin_Ext_E ; Latin_Extended_E
2786 blk; Linear_A ; Linear_A
2787 blk; Mahajani ; Mahajani
2788 blk; Manichaean ; Manichaean
2789 blk; Mende_Kikakui ; Mende_Kikakui
2792 blk; Myanmar_Ext_B ; Myanmar_Extended_B
2793 blk; Nabataean ; Nabataean
2794 blk; Old_North_Arabian ; Old_North_Arabian
2795 blk; Old_Permic ; Old_Permic
2796 blk; Ornamental_Dingbats ; Ornamental_Dingbats
2797 blk; Pahawh_Hmong ; Pahawh_Hmong
2798 blk; Palmyrene ; Palmyrene
2799 blk; Pau_Cin_Hau ; Pau_Cin_Hau
2800 blk; Psalter_Pahlavi ; Psalter_Pahlavi
2801 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls
2802 blk; Siddham ; Siddham
2803 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers
2804 blk; Sup_Arrows_C ; Supplemental_Arrows_C
2805 blk; Tirhuta ; Tirhuta
2806 blk; Warang_Citi ; Warang_Citi
2808 use long property names for enum constants
2809 -> add to UCharacter.UnicodeBlock IDs
2810 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2811 replace public static final int \1_ID = \2; \3
2812 -> add to UCharacter.UnicodeBlock objects
2813 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2814 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2815 - 28 new Joining_Group (jg) values:
2816 jg ; Manichaean_Aleph ; Manichaean_Aleph
2817 jg ; Manichaean_Ayin ; Manichaean_Ayin
2818 jg ; Manichaean_Beth ; Manichaean_Beth
2819 jg ; Manichaean_Daleth ; Manichaean_Daleth
2820 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh
2821 jg ; Manichaean_Five ; Manichaean_Five
2822 jg ; Manichaean_Gimel ; Manichaean_Gimel
2823 jg ; Manichaean_Heth ; Manichaean_Heth
2824 jg ; Manichaean_Hundred ; Manichaean_Hundred
2825 jg ; Manichaean_Kaph ; Manichaean_Kaph
2826 jg ; Manichaean_Lamedh ; Manichaean_Lamedh
2827 jg ; Manichaean_Mem ; Manichaean_Mem
2828 jg ; Manichaean_Nun ; Manichaean_Nun
2829 jg ; Manichaean_One ; Manichaean_One
2830 jg ; Manichaean_Pe ; Manichaean_Pe
2831 jg ; Manichaean_Qoph ; Manichaean_Qoph
2832 jg ; Manichaean_Resh ; Manichaean_Resh
2833 jg ; Manichaean_Sadhe ; Manichaean_Sadhe
2834 jg ; Manichaean_Samekh ; Manichaean_Samekh
2835 jg ; Manichaean_Taw ; Manichaean_Taw
2836 jg ; Manichaean_Ten ; Manichaean_Ten
2837 jg ; Manichaean_Teth ; Manichaean_Teth
2838 jg ; Manichaean_Thamedh ; Manichaean_Thamedh
2839 jg ; Manichaean_Twenty ; Manichaean_Twenty
2840 jg ; Manichaean_Waw ; Manichaean_Waw
2841 jg ; Manichaean_Yodh ; Manichaean_Yodh
2842 jg ; Manichaean_Zayin ; Manichaean_Zayin
2843 jg ; Straight_Waw ; Straight_Waw
2844 -> uchar.h & UCharacter.JoiningGroup
2845 - 23 new Script (sc) values:
2846 sc ; Aghb ; Caucasian_Albanian
2847 sc ; Bass ; Bassa_Vah
2848 sc ; Dupl ; Duployan
2851 sc ; Hmng ; Pahawh_Hmong
2853 sc ; Lina ; Linear_A
2854 sc ; Mahj ; Mahajani
2855 sc ; Mani ; Manichaean
2856 sc ; Mend ; Mende_Kikakui
2859 sc ; Narb ; Old_North_Arabian
2860 sc ; Nbat ; Nabataean
2861 sc ; Palm ; Palmyrene
2862 sc ; Pauc ; Pau_Cin_Hau
2863 sc ; Perm ; Old_Permic
2864 sc ; Phlp ; Psalter_Pahlavi
2866 sc ; Sind ; Khudawadi
2868 sc ; Wara ; Warang_Citi
2869 -> uscript.h (many were added before)
2870 comment "Mende Kikakui" for USCRIPT_MENDE
2871 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
2872 -> com.ibm.icu.lang.UScript
2873 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2874 replace public static final int \1 = \2; \3
2875 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2882 Pauc 263 Pau Cin Hau
2884 -> uscript.h (some overlap with additions from Unicode)
2885 -> com.ibm.icu.lang.UScript
2886 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2887 replace public static final int \1 = \2; \3
2888 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
2889 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2890 and in com.ibm.icu.dev.test.lang.TestUScript.java
2892 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2893 (not strictly necessary for NOT_ENCODED scripts)
2894 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2896 * generate normalization data files
2898 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2899 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2900 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
2901 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2902 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2903 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2904 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2905 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2907 * build ICU (make install)
2908 so that the tools build can pick up the new definitions from the installed header files.
2910 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2912 * build Unicode tools using CMake+make
2914 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2916 # Location (--prefix) of where ICU was installed.
2917 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
2918 # Location of the ICU source tree.
2919 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
2921 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2922 ~/svn.icutools/trunk/dbg/unicode/c$ make
2925 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
2926 + add second array of Joining_Group values for at most 10800..10FFF
2927 icutools: unicode/c/genprops/bidipropsbuilder.cpp
2928 icu: source/common/ubidi_props.h/.c/_data.h
2929 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
2931 * generate core properties data files
2932 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2933 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
2934 - rebuild ICU (make install) & tools
2935 - run genuca again (see step above) so that it picks up the new nfc.nrm
2936 - rebuild ICU (make install) & tools
2938 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2939 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2940 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2941 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
2942 - nothing new in 7.0, no test file to update
2944 * run & fix ICU4C tests
2946 * update Java data files
2947 - refresh just the UCD-related files, just to be safe
2948 - see (ICU4C)/source/data/icu4j-readme.txt
2950 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2953 Unicode .icu files built to ./out/build/icudt53l
2954 echo timestamp > uni-core-data
2955 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2956 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
2957 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2958 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2959 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
2960 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
2961 mkdir -p /tmp/icu4j/main/shared/data
2962 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2963 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
2964 mkdir -p /tmp/icu4j/main/shared/data
2965 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2966 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
2967 - copy the big-endian Unicode data files to another location,
2968 separate from the other data files
2970 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2971 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2972 cd ~/svn.icu/uni70/dbg/data/out/icu4j
2973 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2974 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2975 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2976 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2977 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2978 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2980 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2982 * update CollationFCD.java
2983 + copy & paste the initializers of lcccIndex[] etc. from
2984 ICU4C/source/i18n/collationfcd.cpp to
2985 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2987 * refresh Java test .txt files
2988 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2989 cd $ICU_SRC_DIR/source/data/unidata
2990 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2991 cd ../../test/testdata
2992 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2993 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2997 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
2998 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
2999 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
3000 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
3001 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
3002 - review data; compare files, use blankweights.sed or similar
3003 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
3004 - cd ~/svn.unitools/Generated/uca/7.0.0/
3005 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3006 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3007 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3008 (note removing the underscore before "Rules")
3009 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3010 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
3011 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3012 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3013 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3014 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3015 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3016 - run genuca, see command line above
3018 - refresh ICU4J collation data:
3019 (subset of instructions above for properties data refresh, except copies all coll/*)
3021 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3022 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3023 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3024 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3025 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3026 - note on intltest: if collate/UCAConformanceTest fails, then
3027 utility/MultithreadTest/TestCollators will fail as well;
3028 fix the conformance test before looking into the multi-thread test
3029 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
3030 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
3031 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
3033 * When refreshing all of ICU4J data from ICU4C
3034 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3035 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3037 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3039 * run & fix ICU4J tests
3041 *** LayoutEngine script information
3043 (For details see the Unicode 5.2 change log below.)
3045 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3046 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3047 in the working directory.
3048 (It also generates ScriptRunData.cpp, which is no longer needed.)
3050 The generated files have a current copyright date and "@stable" statement.
3051 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
3052 for "born stable" Unicode API constants, and to stop parsing ICU version numbers
3053 which may not contain dots any more.
3055 - diff current <icu>/source/layout files vs. generated ones
3056 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3057 review and manually merge desired changes;
3058 fix gratuitous changes, incorrect @draft/@stable and missing aliases;
3059 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3060 - if you just copy the above files, then
3061 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3062 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3065 - send notice to icu-design about new born-@stable API (enum constants etc.)
3067 *** merge the Unicode update branches back onto the trunk
3068 - do not merge the icudata.jar and testdata.jar,
3069 instead rebuild them from merged & tested ICU4C
3071 ---------------------------------------------------------------------------- ***
3075 http://www.unicode.org/review/pri249/ -- beta review
3076 http://www.unicode.org/reports/uax-proposed-updates.html
3077 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
3078 http://www.unicode.org/reports/tr44/tr44-11.html
3082 - ticket 10128: update ICU to Unicode 6.3 beta
3083 - ticket 10168: update ICU to Unicode 6.3 final
3084 - C++ branches/markus/uni63 at r33552 from trunk at r33551
3085 - Java branches/markus/uni63 at r33550 from trunk at r33553
3087 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
3089 *** Unicode version numbers
3092 (configure.in & configure: have been modified to extract the version from uchar.h)
3093 - com.ibm.icu.util.VersionInfo
3094 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3096 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3097 so that the makefiles see the new version number.
3099 *** data files & enums & parser code
3103 - download UCD, UCA & IDNA files
3104 - make sure that the Unicode data folder passed into preparseucd.py
3105 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3106 - modify preparseucd.py:
3107 parse new file BidiBrackets.txt
3108 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
3109 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
3110 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3111 - Check test file diffs for previously commented-out, known-failing data lines;
3112 probably need to keep those commented out.
3114 * PropertyAliases.txt changes
3115 - 1 new Enumerated Property
3116 bpt ; Bidi_Paired_Bracket_Type
3117 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
3118 -> ubidi_props.h & .c & UBiDiProps.java
3119 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
3121 -> change ubidi.icu format version from 2.0 to 2.1
3122 - 1 new Miscellaneous Property
3123 bpb ; Bidi_Paired_Bracket
3124 -> uchar.h & UProperty.java
3127 * PropertyValueAliases.txt changes
3128 - 3 Bidi_Paired_Bracket_Type (bpt) values:
3132 -> uchar.h & UCharacter.BidiPairedBracketType
3133 -> ubidi_props.h & .c & UBiDiProps.java
3134 -> change ubidi.icu format version from 2.0 to 2.1
3135 - 4 new Bidi_Class (bc) values:
3136 bc ; FSI ; First_Strong_Isolate
3137 bc ; LRI ; Left_To_Right_Isolate
3138 bc ; RLI ; Right_To_Left_Isolate
3139 bc ; PDI ; Pop_Directional_Isolate
3140 -> uchar.h & UCharacterEnums.ECharacterDirection
3141 -> until the bidi code gets updated,
3142 Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
3143 - 3 new Word_Break (WB) values:
3144 WB ; HL ; Hebrew_Letter
3145 WB ; SQ ; Single_Quote
3146 WB ; DQ ; Double_Quote
3147 -> uchar.h & UCharacter.WordBreak
3148 -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
3149 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3151 Aghb 239 Caucasian Albanian
3154 -> com.ibm.icu.lang.UScript
3155 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3156 replace public static final int \1 = \2;\3
3157 -> preparseucd.py _scripts_only_in_iso15924
3158 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3159 and in com.ibm.icu.dev.test.lang.TestUScript.java
3160 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3161 (not strictly necessary for NOT_ENCODED scripts)
3163 * generate normalization data files
3164 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
3165 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
3166 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
3167 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
3168 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
3169 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3170 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
3172 * build ICU (make install)
3173 so that the tools build can pick up the new definitions from the installed header files.
3175 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3177 * build Unicode tools using CMake+make
3179 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3181 # Location (--prefix) of where ICU was installed.
3182 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
3183 # Location of the ICU source tree.
3184 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
3186 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3187 ~/svn.icutools/trunk/dbg/unicode/c$ make
3189 * generate core properties data files
3190 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
3191 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
3192 - rebuild ICU (make install) & tools
3193 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3194 - rebuild ICU (make install) & tools
3196 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3197 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3198 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3199 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
3200 - nothing new in 6.3, no test file to update
3202 * update Java data files
3203 - refresh just the UCD-related files, just to be safe
3204 - see (ICU4C)/source/data/icu4j-readme.txt
3206 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3209 Unicode .icu files built to ./out/build/icudt52l
3210 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3211 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
3212 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3213 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3214 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
3215 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
3216 mkdir -p /tmp/icu4j/main/shared/data
3217 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3218 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
3219 mkdir -p /tmp/icu4j/main/shared/data
3220 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3221 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
3222 - copy the big-endian Unicode data files to another location,
3223 separate from the other data files
3224 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3225 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3226 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3227 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
3228 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3229 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3230 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3232 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3234 * refresh Java test .txt files
3235 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3237 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
3239 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3240 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3241 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3242 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3243 (note removing the underscore before "Rules")
3244 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
3245 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3246 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3247 - check test file diffs for previously commented-out, known-failing data lines;
3248 probably need to keep those commented out
3249 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3250 - run genuca, see command line above
3252 - refresh ICU4J collation data:
3253 (subset of instructions above for properties data refresh, except copies all coll/*)
3254 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3255 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3256 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3257 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3258 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3259 - note on intltest: if collate/UCAConformanceTest fails, then
3260 utility/MultithreadTest/TestCollators will fail as well;
3261 fix the conformance test before looking into the multi-thread test
3263 * test ICU, fix test code where necessary
3265 * When refreshing all of ICU4J data from ICU4C
3266 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3267 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3269 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3271 *** LayoutEngine script information
3272 - skipped for Unicode 6.3: no new scripts
3274 *** merge the Unicode update branches back onto the trunk
3275 - do not merge the icudata.jar and testdata.jar,
3276 instead rebuild them from merged & tested ICU4C
3278 ---------------------------------------------------------------------------- ***
3282 http://www.unicode.org/review/pri230/
3283 http://www.unicode.org/versions/beta-6.2.0.html
3284 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
3285 http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values
3286 http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol
3287 http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols
3288 http://www.unicode.org/reports/tr46/tr46-8.html IDNA
3289 http://unicode.org/Public/idna/6.2.0/
3293 - ticket 9515: Unicode 6.2: final ICU update
3295 - ticket 9514: UCA 6.2: fix UCARules.txt
3297 - ticket 9437: update ICU to Unicode 6.2
3298 - C++ branches/markus/uni62 at r32050 from trunk at r32041
3299 - Java branches/markus/uni62 at r32068 from trunk at r32066
3301 *** Unicode version numbers
3304 (configure.in & configure: have been modified to extract the version from uchar.h)
3305 - com.ibm.icu.util.VersionInfo
3306 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3308 *** data files & enums & parser code
3312 - download UCD, UCA & IDNA files
3313 - make sure that the Unicode data folder passed into preparseucd.py
3314 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3315 - modify preparseucd.py: NamesList.txt is now in UTF-8
3316 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
3317 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3318 - Check test file diffs for previously commented-out, known-failing data lines;
3319 probably need to keep those commented out.
3321 * PropertyValueAliases.txt changes
3322 - 1 new Line_Break (lb) value:
3323 lb ; RI ; Regional_Indicator
3324 -> uchar.h & UCharacter.LineBreak
3325 - 1 new Word_Break (WB) value:
3326 WB ; RI ; Regional_Indicator
3327 -> uchar.h & UCharacter.WordBreak
3328 - 1 new Grapheme_Cluster_Break (GCB) value:
3329 GCB; RI ; Regional_Indicator
3330 -> uchar.h & UCharacter.GraphemeClusterBreak
3332 * 3 new numeric values
3333 The new value -1, which was really supposed to be NaN but that would have required
3334 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
3335 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
3336 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
3337 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
3338 The two new values 216000 and 432000 require an addition to the encoding of numeric values.
3339 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
3340 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
3341 -> uprops.h, uchar.c & UCharacterProperty.java
3342 -> cucdtst.c & UCharacterTest.java
3344 * generate normalization data files
3345 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
3346 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
3347 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
3348 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
3349 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
3350 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3351 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
3353 * build ICU (make install)
3354 so that the tools build can pick up the new definitions from the installed header files.
3355 * build Unicode tools using CMake+make
3357 * generate core properties data files
3358 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
3359 - in initial bootstrapping, change the UCA version
3360 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3361 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
3362 - rebuild ICU (make install) & tools
3363 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3364 check if the UCA version in FractionalUCA.txt matches the new Unicode version
3366 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3367 - rebuild ICU (make install) & tools
3369 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3370 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3371 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3372 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
3373 - nothing new in 6.2, no test file to update
3375 * update Java data files
3376 - refresh just the UCD-related files, just to be safe
3377 - see (ICU4C)/source/data/icu4j-readme.txt
3379 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3382 Unicode .icu files built to ./out/build/icudt50l
3383 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3384 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
3385 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3386 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3387 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
3388 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
3389 mkdir -p /tmp/icu4j/main/shared/data
3390 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3391 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
3392 mkdir -p /tmp/icu4j/main/shared/data
3393 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3394 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
3395 - copy the big-endian Unicode data files to another location,
3396 separate from the other data files
3397 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3398 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3399 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3400 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
3401 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3402 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3403 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3405 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3407 * refresh Java test .txt files
3408 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3412 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3413 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3414 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3415 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3416 (note removing the underscore before "Rules")
3417 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
3418 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3419 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3420 - check test file diffs for previously commented-out, known-failing data lines;
3421 probably need to keep those commented out
3422 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3423 - run genuca, see command line above
3425 - refresh ICU4J collation data:
3426 (subset of instructions above for properties data refresh, except copies all coll/*)
3427 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3428 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3429 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3430 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3431 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3432 - note on intltest: if collate/UCAConformanceTest fails, then
3433 utility/MultithreadTest/TestCollators will fail as well;
3434 fix the conformance test before looking into the multi-thread test
3436 * test ICU, fix test code where necessary
3438 * When refreshing all of ICU4J data from ICU4C
3439 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3440 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3442 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3444 *** LayoutEngine script information
3445 - skipped for Unicode 6.2: no new scripts
3447 *** merge the Unicode update branches back onto the trunk
3448 - do not merge the icudata.jar and testdata.jar,
3449 instead rebuild them from merged & tested ICU4C
3451 ---------------------------------------------------------------------------- ***
3453 Future Unicode update
3455 Tools simplified since the Unicode 6.1 update. See
3456 - http://site.icu-project.org/design/props/ppucd
3457 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
3459 * Unicode version numbers
3460 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
3463 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
3464 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
3465 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3466 - Check test file diffs for previously commented-out, known-failing data lines;
3467 probably need to keep those commented out.
3469 * PropertyValueAliases.txt changes
3470 - Script codes that are in ISO 15924 but not in Unicode are now listed in
3471 preparseucd.py, in the _scripts_only_in_iso15924 variable.
3472 If there are new ISO codes, then add them.
3473 If Unicode adds some of them, then remove them from the .py variable.
3475 * UnicodeData.txt changes
3476 - No more manual changes for CJK ranges for algorithmic names;
3477 those are now written to ppucd.txt and genprops reads them from there.
3479 * generate core properties data files (makeprops.sh was deleted)
3480 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
3482 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
3483 - it is now generated by preparseucd.py
3485 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
3486 - it is now generated by preparseucd.py
3487 - make sure that the Unicode data folder passed into preparseucd.py
3488 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3489 (can be in some subfolder)
3491 * generate normalization data files
3492 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
3493 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
3494 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
3495 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
3496 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
3497 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3498 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
3500 * build ICU (make install)
3501 * build Unicode tools using CMake+make
3503 * new way to call genuca (makeuca.sh was deleted)
3504 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
3506 ---------------------------------------------------------------------------- ***
3512 - ticket 8995 final update to Unicode 6.1
3513 - ticket 8994 regenerate source/layout/CanonData.cpp
3515 - ticket 8961 support Unicode "Age" value *names*
3516 - ticket 8963 support multiple character name aliases & types
3518 - ticket 8827 "update ICU to Unicode 6.1"
3519 - C++ branches/markus/uni61 at r30864 from trunk at r30843
3520 - Java branches/markus/uni61 at r30865 from trunk at r30863
3522 *** Unicode version numbers
3525 (configure.in & configure: have been modified to extract the version from uchar.h)
3526 - com.ibm.icu.util.VersionInfo
3527 - icutools/unicode/makedefs.sh
3528 + also review & update other definitions in that file,
3529 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
3531 *** data files & enums & parser code
3535 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
3536 - This prepares both unidata and testdata files in respective output subfolders.
3537 - Check test file diffs for previously commented-out, known-failing data lines;
3538 probably need to keep those commented out.
3540 * PropertyValueAliases.txt changes
3541 - 11 new block names:
3543 Arabic_Mathematical_Alphabetic_Symbols
3545 Meetei_Mayek_Extensions
3547 Meroitic_Hieroglyphs
3551 Sundanese_Supplement
3554 -> add to UCharacter.UnicodeBlock IDs
3555 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3556 replace public static final int \1_ID = \2; \3
3557 -> add to UCharacter.UnicodeBlock objects
3558 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
3559 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3560 - 1 new Joining_Group (jg) value:
3562 -> uchar.h & UCharacter.JoiningGroup
3563 - 2 new Line_Break (lb) values:
3564 CJ=Conditional_Japanese_Starter
3566 -> uchar.h & UCharacter.LineBreak
3569 sc ; Merc ; Meroitic_Cursive
3570 sc ; Mero ; Meroitic_Hieroglyphs
3573 sc ; Sora ; Sora_Sompeng
3575 -> remove these from SyntheticPropertyValueAliases.txt
3576 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3577 and in com.ibm.icu.dev.test.lang.TestUScript.java
3578 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3582 and another one added 2011-12-09
3583 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
3585 -> com.ibm.icu.lang.UScript
3586 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3587 replace public static final int \1 = \2;\3
3588 -> SyntheticPropertyValueAliases.txt
3589 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3590 and in com.ibm.icu.dev.test.lang.TestUScript.java
3592 * UnicodeData.txt changes
3593 - the last Unihan code point changes from U+9FCB to U+9FCC
3594 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
3595 + do change gennames.c
3596 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
3598 * DerivedBidiClass.txt changes
3599 - 2 new default-AL blocks:
3600 # Arabic Extended-A: U+08A0 - U+08FF (was default-R)
3601 # Arabic Mathematical Alphabetic Symbols:
3602 # U+1EE00 - U+1EEFF (was default-R)
3603 - 2 new default-R blocks:
3604 # Meroitic Hieroglyphs:
3606 # Meroitic Cursive: U+109A0 - U+109FF
3607 -> should be picked up by the explicit data in the file
3609 * NameAliases.txt changes
3611 # Each line has two fields
3612 # First field: Code point
3613 # Second field: Alias
3615 # Each line has three fields, as described here:
3617 # First field: Code point
3618 # Second field: Alias
3620 - Also, the file previously allowed multiple aliases but only now does it
3621 actually provide multiple, even multiple of the same type. For example,
3622 FEFF;BYTE ORDER MARK;alternate
3623 FEFF;BOM;abbreviation
3624 FEFF;ZWNBSP;abbreviation
3625 - This breaks our gennames parser, unames.icu data structure, and API.
3626 Fix gennames to only pick up "correction" aliases.
3627 New ticket #8963 for further changes.
3629 * run genpname/preparse.pl (on Linux)
3630 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3631 + make sure that data.h is writable
3632 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3633 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3635 * build ICU (make install)
3636 so that the tools build can pick up the new definitions from the installed header files.
3637 * build Unicode tools (at least genpname) using CMake+make
3640 (builds both pnames.icu and propname_data.h)
3641 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3642 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3644 * build ICU (make install)
3645 * build Unicode tools using CMake+make
3647 * update source/data/unidata/norm2/nfkc_cf.txt
3648 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3650 * update source/data/unidata/norm2/uts46.txt
3651 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3652 to ~/svn.icu/tools/trunk/src/unicode/py
3653 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
3654 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3655 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3657 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3658 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3659 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3660 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
3661 - nothing new in 6.1, no test file to update
3663 * generate core properties data files
3664 - in initial bootstrapping, change the UCA version
3665 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3666 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3667 - rebuild ICU & tools
3668 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3669 check if the UCA version in FractionalUCA.txt matches the new Unicode version
3671 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
3672 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3673 - rebuild ICU & tools
3675 * update Java data files
3676 - refresh just the UCD-related files, just to be safe
3677 - see (ICU4C)/source/data/icu4j-readme.txt
3679 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3682 Unicode .icu files built to ./out/build/icudt49l
3683 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3684 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
3685 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3686 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3687 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
3688 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
3689 mkdir -p /tmp/icu4j/main/shared/data
3690 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3691 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
3692 mkdir -p /tmp/icu4j/main/shared/data
3693 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3694 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
3695 - copy the big-endian Unicode data files to another location,
3696 separate from the other data files
3697 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3698 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3699 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3700 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
3701 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3702 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3703 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3705 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3707 * refresh Java test .txt files
3708 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3710 * test ICU so far, fix test code where necessary
3711 - temporarily ignore collation issues that look like UCA/UCD mismatches,
3712 until UCA data is updated
3716 - get output from Mark's tools; look in
3717 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
3718 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3719 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3720 (note removing the underscore before "Rules")
3721 - update (ICU)/source/test/testdata/CollationTest_*.txt
3722 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3723 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3724 - check test file diffs for previously commented-out, known-failing data lines;
3725 probably need to keep those commented out
3726 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3728 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3730 - refresh ICU4J collation data:
3731 (subset of instructions above for properties data refresh, except copies all coll/*)
3732 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3733 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3734 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3735 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3736 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3737 - note on intltest: if collate/UCAConformanceTest fails, then
3738 utility/MultithreadTest/TestCollators will fail as well;
3739 fix the conformance test before looking into the multi-thread test
3741 * When refreshing all of ICU4J data from ICU4C
3742 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3743 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3745 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3747 *** LayoutEngine script information
3749 (For details see the Unicode 5.2 change log below.)
3751 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3752 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3753 in the working directory.
3754 (It also generates ScriptRunData.cpp, which is no longer needed.)
3756 The generated files have a current copyright date and "@draft" statement.
3758 - diff current <icu>/source/layout files vs. generated ones
3759 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3760 review and manually merge desired changes;
3761 fix gratuitous changes, incorrect @draft and missing aliases;
3762 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3763 - if you just copy the above files, then
3764 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3765 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3767 *** merge the Unicode update branches back onto the trunk
3768 - do not merge the icudata.jar and testdata.jar,
3769 instead rebuild them from merged & tested ICU4C
3771 ---------------------------------------------------------------------------- ***
3773 ICU 4.8 (no Unicode update, just new script codes)
3775 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3781 Shrd 319 Sharada, Śāradā
3782 Sora 398 Sora Sompeng
3783 Takr 321 Takri, Ṭākrī, Ṭāṅkrī
3787 -> com.ibm.icu.lang.UScript
3788 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3789 replace public static final int \1 = \2;\3
3790 -> genpname/SyntheticPropertyValueAliases.txt
3791 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3792 and in com.ibm.icu.dev.test.lang.TestUScript.java
3794 * run genpname/preparse.pl (on Linux)
3795 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3796 + make sure that data.h is writable
3797 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3798 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3800 * rebuild Unicode tools (at least genpname) using make
3801 - You might first need to "make install" ICU so that the tools build can pick
3802 up the new definitions from the installed header files.
3805 (builds both pnames.icu and propname_data.h)
3806 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3807 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3808 - rebuild ICU & tools
3811 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3812 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3813 - rebuild ICU & tools
3815 * update Java data files
3816 - refresh just the UCD-related files, just to be safe
3817 - see (ICU4C)/source/data/icu4j-readme.txt
3819 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3820 - copy the big-endian Unicode data files to another location,
3821 separate from the other data files
3822 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3823 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3824 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3826 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
3828 * should have updated the layout engine script codes but forgot
3830 ---------------------------------------------------------------------------- ***
3834 *** related ICU Trac tickets
3836 7264 Unicode 6.0 Update
3838 *** Unicode version numbers
3841 (configure.in & configure: have been modified to extract the version from uchar.h)
3842 - com.ibm.icu.util.VersionInfo
3844 *** data files & enums & parser code
3848 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
3849 - This now prepares both unidata and testdata files in respective output subfolders.
3851 * PropertyAliases.txt changes
3852 - new Script_Extensions property defined in the new ScriptExtensions.txt file
3853 but not listed in PropertyAliases.txt; reported to unicode.org;
3854 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
3855 scx; Script_Extensions
3856 -> uchar.h with new UProperty section
3857 -> com.ibm.icu.lang.UProperty, parallel with uchar.h
3859 * PropertyValueAliases.txt changes
3860 - 12 new block names:
3865 CJK_Unified_Ideographs_Extension_D
3870 Miscellaneous_Symbols_And_Pictographs
3872 Transport_And_Map_Symbols
3874 -> add to UCharacter.UnicodeBlock
3875 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
3876 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3877 - Joining_Group (jg) values:
3878 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
3879 -> uchar.h & UCharacter.JoiningGroup
3884 -> remove these from SyntheticPropertyValueAliases.txt
3885 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
3886 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3887 and in com.ibm.icu.dev.test.lang.TestUScript.java
3888 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3889 (added 2009-11-11..2010-07-18)
3891 Dupl 755 Duployan shortand
3897 Merc 101 Meroitic Cursive
3898 Narb 106 Old North Arabian
3902 Wara 262 Warang Citi
3904 -> com.ibm.icu.lang.UScript
3905 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3906 replace public static final int \1 = \2;\3
3907 -> SyntheticPropertyValueAliases.txt
3908 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3909 and in com.ibm.icu.dev.test.lang.TestUScript.java
3910 - ISO 15924 name change
3911 Mero 100 Meroitic Hieroglyphs (was Meroitic)
3912 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
3913 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
3915 * UnicodeData.txt changes
3917 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
3918 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
3919 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
3921 * build Unicode tools using CMake+make
3923 * run genpname/preparse.pl (on Linux)
3924 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3925 + make sure that data.h is writable
3926 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3927 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3929 * rebuild Unicode tools (at least genpname) using make
3930 - You might first need to "make install" ICU so that the tools build can pick
3931 up the new definitions from the installed header files.
3934 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3935 - rebuild ICU & tools
3937 * update source/data/unidata/norm2/nfkc_cf.txt
3938 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3940 * update source/data/unidata/norm2/uts46.txt
3941 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
3942 to ~/svn.icu/tools/trunk/src/unicode/py
3943 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
3944 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3945 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3947 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3948 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3949 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3950 - Unicode 6.0: U+2260, U+226E, U+226F
3952 * generate core properties data files
3953 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3954 - rebuild ICU & tools
3955 - run makeuca.sh so that genuca picks up the new nfc.nrm:
3956 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3957 - rebuild ICU & tools
3959 * implement new Script_Extensions property (provisional)
3960 - parser & generator: genprops & uprops.icu
3961 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
3962 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
3964 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
3966 - genbidi/gencase/genprops tools changes
3967 - re-run makeprops.sh (see above)
3968 - UCharacterProperty.java, UCharacterTypeIterator.java,
3969 UBiDiProps.java, UCaseProps.java, and several others with minor changes;
3970 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
3972 * update Java data files
3973 - refresh just the UCD-related files, just to be safe
3974 - see (ICU4C)/source/data/icu4j-readme.txt
3976 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3979 Unicode .icu files built to ./out/build/icudt45l
3980 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3981 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3982 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3983 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
3984 mkdir -p /tmp/icu4j/main/shared/data
3985 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3986 - copy the big-endian Unicode data files to another location,
3987 separate from the other data files
3988 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3989 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3990 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3991 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
3992 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3993 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3994 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3996 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
3998 * refresh Java test .txt files
3999 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4001 * un-hardcode normalization skippable (NF*_Inert) test data
4002 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
4004 * copy updated break iterator test files
4005 - now handled by early ucdcopy.py and
4006 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
4008 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
4009 to ~/svn.icu/trunk/src/source/test/testdata)
4010 - they are not used in ICU4J
4014 - get output from Mark's tools; look in
4015 http://www.unicode.org/~book/incoming/mark/uca6.0.0/
4016 http://www.macchiato.com/unicode/utc/additional-uca-files
4017 http://www.unicode.org/Public/UCA/6.0.0/
4018 http://www.unicode.org/~mdavis/uca/
4019 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4020 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4021 - update Han-implicit ranges for new CJK extensions:
4022 swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4023 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
4024 do not add it into invuca so that tailoring primary-after an ignorable works
4025 - genuca: permit space between [variable top] bytes
4026 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
4028 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4030 - refresh ICU4J collation data:
4031 (subset of instructions above for properties data refresh, except copies all coll/*)
4032 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4033 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4034 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4035 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4036 - update (ICU)/source/test/testdata/CollationTest_*.txt
4037 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4038 with output from Mark's Unicode tools
4039 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4040 - note on intltest: if collate/UCAConformanceTest fails, then
4041 utility/MultithreadTest/TestCollators will fail as well;
4042 fix the conformance test before looking into the multi-thread test
4044 * When refreshing all of ICU4J data from ICU4C
4045 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4046 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4048 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4050 *** LayoutEngine script information
4052 (For details see the Unicode 5.2 change log below.)
4054 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4055 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4056 ScriptRunData.cpp, which is no longer needed.)
4058 The generated files have a current copyright date and "@draft" statement.
4060 * copy the above files into <icu>/source/layout, replacing the old files.
4061 * fix mixed line endings
4062 * review the diffs and fix incorrect @draft and missing aliases;
4063 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4064 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4066 ---------------------------------------------------------------------------- ***
4070 *** related ICU Trac tickets
4074 7167 verify collation bytes
4075 7235 Java test NAME_ALIAS
4076 7236 Java DerivedCoreProperties.txt test
4077 7237 Java BidiTest.txt
4078 7238 UTrie2 in core unidata
4079 7239 test for tailoring gaps
4080 7240 Java fix CollationMiscTest
4081 7243 update layout engine for Unicode 5.2
4083 *** Unicode version numbers
4086 - configure.in & configure
4087 - update ucdVersion in gennames.c if an algorithmic range changes
4089 *** data files & enums & parser code
4093 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
4094 - includes finding files regardless of version numbers,
4095 copying them, and performing the equivalent processing of the
4096 ucdstrip and ucdmerge tools on the desired set of files
4099 - PropertyAliases.txt
4100 moved from numeric to enumerated:
4101 ccc ; Canonical_Combining_Class
4102 new string properties:
4103 NFKC_CF ; NFKC_Casefold
4104 Name_Alias; Name_Alias
4105 new binary properties:
4108 CWCF ; Changes_When_Casefolded
4109 CWCM ; Changes_When_Casemapped
4110 CWKCF ; Changes_When_NFKC_Casefolded
4111 CWL ; Changes_When_Lowercased
4112 CWT ; Changes_When_Titlecased
4113 CWU ; Changes_When_Uppercased
4114 new CJK Unihan properties (not supported by ICU)
4115 - PropertyValueAliases.txt
4118 one script code change:
4119 sc ; Qaai ; Inherited
4121 sc ; Zinh ; Inherited ; Qaai
4122 new Line_Break (lb) value:
4123 lb ; CP ; Close_Parenthesis
4124 new Joining_Group (jg) values: Farsi_Yeh, Nya
4126 ccc; 214; ATA ; Attached_Above
4127 - DerivedBidiClass.txt
4128 new default-R range: U+1E800 - U+1EFFF
4130 all of the ISO comments are gone
4132 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
4134 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
4135 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
4139 + cd \svn\icuproj\icu\trunk\source\tools\genpname
4140 + make sure that data.h is writable
4141 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
4142 + preparse.pl complains with errors like the following:
4143 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
4144 This is because ICU 4.0 had scripts from ISO 15924 which are now
4145 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
4146 and PropertyValueAliases.txt.
4147 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4148 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
4149 + preparse.pl complains with errors about block names missing from uchar.h; add them
4151 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
4152 - new block & script values
4154 copy new blocks from Blocks.txt
4155 MS VC++ 2008 regular expression:
4156 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
4157 replace with " UBLOCK_\3 = 172, /*[\1]*/"
4158 + several new script values already added in ICU 4.0 for ISO 15924 coverage
4159 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
4160 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
4161 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
4162 (added to SyntheticPropertyValueAliases.txt)
4163 - new Joining Group (JG) values: Farsi_Yeh, Nya
4164 - new Line_Break (lb) value:
4165 lb ; CP ; Close_Parenthesis
4167 * hardcoded Unihan range end/limit
4168 - Unihan range end moves from 9FC3 to 9FCB
4169 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
4170 + do change gennames.c
4172 * Compare definitions of new binary properties with what we used to use
4173 in algorithms, to see if the definitions changed.
4174 - Verified that definitions for Cased and Case_Ignorable are unchanged.
4175 The gencase tool now parses the newly public Case_Ignorable values
4176 in case the definition changes in the future.
4178 * uchar.c & uprops.h & uprops.c & genprops
4179 - new numeric values that didn't exist in Unicode data before:
4180 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
4181 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
4182 therefore redesign the encoding of numeric types and values for formatVersion 6;
4183 design for simple numbers up to at least 144 ("one gross"),
4184 large values up to at least 10^20,
4185 and fractions with numerators -1..17 and denominators 1..16
4186 to cover current and expected future values
4187 (e.g., more Han numeric values, Meroitic twelfths)
4189 * reimplement Hangul_Syllable_Type for new Jamo characters
4190 - the old code assumed that all Jamo characters are in the 11xx block
4191 - Unicode 5.2 fills holes there and adds new Jamo characters in
4192 A960..A97F; Hangul Jamo Extended-A
4194 D7B0..D7FF; Hangul Jamo Extended-B
4195 - Hangul_Syllable_Type can be trivially derived from a subset of
4196 Grapheme_Cluster_Break values
4198 * build Unicode data source code for hardcoding core data
4199 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
4201 ICU data make path is \svn\icuproj\icu\trunk\source\data\
4202 ICU root path is \svn\icuproj\icu\trunk
4203 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4204 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4205 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4206 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4207 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4208 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4209 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4210 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
4211 Creating data file for Unicode Property Names
4212 Creating data file for Unicode Character Properties
4213 Creating data file for Unicode Case Mapping Properties
4214 Creating data file for Unicode BiDi/Shaping Properties
4215 Creating data file for Unicode Normalization
4216 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
4217 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
4219 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
4220 and rebuild the common library
4224 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
4225 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
4226 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
4227 [ Begin obsolete instructions:
4228 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
4229 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
4231 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
4232 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
4233 End obsolete instructions]
4234 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4235 not just the *_STUB.txt files
4236 - note on intltest: if collate/UCAConformanceTest fails, then
4237 utility/MultithreadTest/TestCollators will fail as well;
4238 fix the conformance test before looking into the multi-thread test
4240 *** Implement Cased & Case_Ignorable properties
4241 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
4242 - Problem: These properties should be disjoint, but aren't
4243 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
4244 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
4246 *** Implement Changes_When_Xyz properties
4247 - without stored data
4249 *** Implement Name_Alias property
4250 - add it as another name field in unames.icu
4251 - make it available via u_charName() and UCharNameChoice and
4252 - consider it in u_charFromName()
4256 * Update break iterator rules to new UAX versions and new property values
4257 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
4259 *** new BidiTest file
4260 - review format and data
4261 - copy BidiTest.txt to source/test/testdata
4262 - write test code using this data
4263 - fix ICU code where it fails the conformance test
4266 - generally, find and update code corresponding to C/C++
4267 - UCharacter.UnicodeBlock constants:
4268 a) add an _ID integer per new block, update COUNT
4269 b) add a class instance per new block
4270 Visual Studio regex:
4271 find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
4272 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4273 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
4275 - port test changes to Java
4277 *** LayoutEngine script information
4279 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
4281 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4282 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4283 ScriptRunData.cpp, which is no longer needed.)
4285 The generated files have a current copyright date and "@draft" statement.
4287 -> Eric Mader wrote in email on 20090930:
4288 "I think the tool has been modified to update @draft to @stable for
4289 older scripts and to add @draft for new scripts.
4290 (I worked with an intern on this last year.)
4291 You should check the output after you run it."
4293 * copy the above files into <icu>/source/layout, replacing the old files.
4294 * fix mixed line endings
4295 * review the diffs and fix incorrect @draft and missing aliases
4296 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4298 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4299 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4301 -> Eric Mader wrote in email on 20090930:
4302 "This is just a matter of making sure that all the per-script tables have
4303 entries for any new scripts that were added.
4304 If any new Indic characters were added, then the class tables in
4305 IndicClassTables.cpp should be updated to reflect this.
4306 John Emmons should know how to do this if it's required."
4308 * rebuild the layout and layoutex libraries.
4312 + Jamo_Short_Name, sfc->scf, binary property value aliases
4314 ---------------------------------------------------------------------------- ***
4318 *** related ICU Trac tickets
4320 5696 Update to Unicode 5.1
4322 *** Unicode version numbers
4325 - configure.in & configure
4326 - update ucdVersion in gennames.c if an algorithmic range changes
4328 *** data files & enums & parser code
4332 DerivedCoreProperties.txt
4333 DerivedNormalizationProps.txt
4334 NormalizationTest.txt
4337 GraphemeBreakProperty.txt
4338 SentenceBreakProperty.txt
4339 WordBreakProperty.txt
4340 - ucdstrip and ucdmerge:
4344 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4345 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
4346 copy 5.1.0\ucd\Blocks.txt ..\unidata\
4347 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
4348 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
4349 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4350 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4351 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4352 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4353 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
4354 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
4355 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
4356 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
4357 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
4359 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4360 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4361 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4362 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
4363 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4364 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4365 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4366 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4367 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4368 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4372 + cd \svn\icuproj\icu\uni51\source\tools\genpname
4373 + make sure that data.h is writable
4374 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
4375 + preparse.pl complains with errors like the following:
4376 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
4377 This is because ICU 3.8 had scripts from ISO 15924 which are now
4378 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
4379 and PropertyValueAliases.txt.
4380 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4381 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
4382 + PropertyValueAliases.txt now explicitly contains values for boolean properties:
4383 N/Y, No/Yes, F/T, False/True
4384 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
4385 It will use further values from the file if present.
4387 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
4388 - new block & script values
4390 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
4391 (removed from SyntheticPropertyValueAliases.txt)
4392 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
4393 (added to SyntheticPropertyValueAliases.txt)
4394 - uprops.icu (uprops.h) only provides 7 bits for script codes.
4395 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
4396 There is none above 127 yet which is the script code for an
4397 assigned Unicode character, so ICU 4.0 uprops.icu does not store any
4398 script code values greater than 127.
4399 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
4400 in a parallel bit field, and that overflows now.
4401 Also, future values >=128 would be incompatible anyway.
4402 uprops.h is modified to move around several of the bit fields
4403 in the properties vector words, and now uses 8 bits for the script code.
4404 Two other bit fields also grow to accommodate future growth:
4405 Block (current count: 172) grows from 8 to 9 bits,
4406 and Word_Break grows from 4 to 5 bits.
4407 - renamed property Simple_Case_Folding (sfc->scf)
4408 + nothing to be done: handled as normal alias
4409 - new property JSN Jamo_Short_Name
4410 + no new API: only contributes to the Name property
4411 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
4412 - new Joining Group (JG) value: Burushashki_Yeh_Barree
4413 - new Sentence_Break (SB) values:
4418 - new Word_Break (WB) values:
4420 WB ; Extend ; Extend
4424 * Further changes in the 2008-02-29 update:
4425 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
4426 because they should not normally be invisible.
4427 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
4428 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
4429 - new Word_Break (WB) value: NL=Newline
4431 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
4432 - Unihan range end moves from 9FBB to 9FC3
4433 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
4434 + do change gennames.c
4436 * build Unicode data source code for hardcoding core data
4437 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
4439 ICU data make path is \svn\icuproj\icu\uni51\source\data\
4440 ICU root path is \svn\icuproj\icu\uni51
4441 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4442 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4443 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4444 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4445 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4446 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4447 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4448 Creating data file for Unicode Character Properties
4449 Creating data file for Unicode Case Mapping Properties
4450 Creating data file for Unicode BiDi/Shaping Properties
4451 Creating data file for Unicode Normalization
4452 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
4453 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
4455 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
4456 and rebuild the common library
4460 * Update break iterator rules to new UAX versions and new property values
4464 * update FractionalUCA.txt and UCARules.txt with new canonical closure
4467 - Test that APIs using Unicode property value aliases (like UnicodeSet)
4468 support all of the boolean values N/Y, No/Yes, F/T, False/True
4469 -> TestBinaryValues() tests in both cintltst and intltest
4471 *** LayoutEngine script information
4472 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4473 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4474 ScriptRunData.cpp, which is no longer needed.)
4476 The generated files have a current copyright date and "@draft" statement.
4478 * copy the above files into <icu>/source/layout, replacing the old files.
4480 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4481 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4483 * rebuild the layout and layoutex libraries.
4487 + Jamo_Short_Name, sfc->scf, binary property value aliases
4489 ---------------------------------------------------------------------------- ***
4493 *** related Jitterbugs
4495 5084 RFE: Update to Unicode 5.0
4497 *** data files & enums & parser code
4501 DerivedCoreProperties.txt
4502 DerivedNormalizationProps.txt
4503 NormalizationTest.txt
4506 GraphemeBreakProperty.txt
4507 SentenceBreakProperty.txt
4508 WordBreakProperty.txt
4509 - ucdstrip and ucdmerge:
4513 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4514 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
4515 copy 5.0.0\ucd\Blocks.txt ..\unidata\
4516 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
4517 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
4518 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4519 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4520 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4521 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4522 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
4523 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
4524 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
4525 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
4526 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
4528 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4529 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4530 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4531 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
4532 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4533 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4534 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4535 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4536 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4537 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4539 * update FractionalUCA.txt and UCARules.txt with new canonical closure
4543 + make sure that data.h is writable
4544 + perl preparse.pl \cvs\oss\icu > out.txt
4546 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
4547 - new block & script values
4548 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
4550 * build Unicode data source code for hardcoding core data
4551 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
4553 ICU data make path is \cvs\oss\icu\source\data\
4554 ICU root path is \cvs\oss\icu
4555 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4557 Creating data file for Unicode Character Properties
4558 Creating data file for Unicode Case Mapping Properties
4559 Creating data file for Unicode BiDi/Shaping Properties
4560 Creating data file for Unicode Normalization
4561 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
4562 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
4564 - copy the .c source files to C:\cvs\oss\icu\source\common
4565 and rebuild the common library
4567 *** Unicode version numbers
4572 *** LayoutEngine script information
4573 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4574 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4575 ScriptRunData.cpp, which is no longer needed.)
4577 The generated files have a current copyright date and "@draft" statement.
4579 * copy the above files into <icu>/source/layout, replacing the old files.
4581 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4582 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4584 * rebuild the layout and layoutex libraries.
4586 ---------------------------------------------------------------------------- ***
4590 *** related Jitterbugs
4592 4332 RFE: Update to Unicode 4.1
4593 4157 RBBI, TR29 4.1 updates
4595 *** data files & enums & parser code
4599 DerivedCoreProperties.txt
4600 DerivedNormalizationProps.txt
4601 NormalizationTest.txt
4602 GraphemeBreakProperty.txt
4603 SentenceBreakProperty.txt
4604 WordBreakProperty.txt
4605 - ucdstrip and ucdmerge:
4609 * add new files to the repository
4610 GraphemeBreakProperty.txt
4611 SentenceBreakProperty.txt
4612 WordBreakProperty.txt
4614 * update FractionalUCA.txt and UCARules.txt with new canonical closure
4617 - handle new enumerated properties in sub read_uchar
4620 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
4621 - new binary properties
4623 + Pattern_White_Space
4624 - new enumerated properties
4625 + Grapheme_Cluster_Break
4628 - new block & script & line break values
4631 - case-ignorable changes
4632 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4633 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
4635 *** Unicode version numbers
4641 - verify that u_charMirror() round-trips
4642 - test all new properties and some new values of old properties
4646 * hardcoded Unihan range end/limit
4647 - Unihan range end moves from 9FA5 to 9FBB
4648 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
4649 + do not modify BOCU/BOCSU code because that would change the encoding
4650 and break binary compatibility!
4651 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
4653 + ignore trietest.c: test data is arbitrary
4654 + ignore tstnorm.cpp: test optimization, not important
4655 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
4656 + do change line_th.txt and word_th.txt
4657 by replacing hardcoded ranges with the new property values
4658 + do change gennames.c
4660 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4661 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4662 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
4665 - compare new special casing context conditions with previous ones
4666 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4669 - consider storing only the short name if it is the same as the long name
4672 - UAX #29 changes (grapheme/word/sentence breaks)
4673 - UAX #14 changes (line breaks)
4674 - Pattern_Syntax & Pattern_White_Space
4676 ---------------------------------------------------------------------------- ***
4678 Unicode 4.0.1 update
4680 *** related Jitterbugs
4682 3170 RFE: Update to Unicode 4.0.1
4683 3171 Add new Unicode 4.0.1 properties
4684 3520 use Unicode 4.0.1 updates for break iteration
4686 *** data files & enums & parser code
4689 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
4690 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
4693 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
4694 according to PRI #26
4695 http://www.unicode.org/review/resolved-pri.html#pri26
4696 - undone again because no corrigendum in sight;
4697 instead modified tests to not check consistency on this for Unicode 4.0.1
4700 - update from http://www.unicode.org/copyright.html
4701 formatted for plain text
4703 * uchar.h & uprops.h & uprops.c & genprops
4704 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
4705 - add U_LB_INSEPARABLE due to a spelling fix
4706 + put short name comment only on line with new constant
4707 for genpname perl script parser
4708 - new binary properties
4710 + Variation_Selector
4713 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
4714 - perl script: correctly calculate the maximum number of fields per row
4717 - new script code Hrkt=Katakana_Or_Hiragana
4719 * gennorm.c track changes in DerivedNormalizationProps.txt
4720 - "FNC" -> "FC_NFKC"
4721 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
4723 * genprops/props2.c track changes in DerivedNumericValues.txt
4724 - changed from 3 columns to 2, dropping the numeric type
4725 + assume that the type is always numeric for Han characters,
4726 and that only those are added in addition to what UnicodeData.txt lists
4728 *** Unicode version numbers
4734 - update test of default bidi classes according to PRI #28
4735 /tsutil/cucdtst/TestUnicodeData
4736 http://www.unicode.org/review/resolved-pri.html#pri28
4737 - bidi tests: change exemplar character for ES depending on Unicode version
4738 - change hardcoded expected property values where they change
4746 - use new Hrkt=Katakana_Or_Hiragana
4749 - are now part of combining character sequences
4750 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ