1 * Copyright (C) 2016 and later: Unicode, Inc. and others.
2 * License & terms of use: http://www.unicode.org/copyright.html
3 * Copyright (C) 2004-2016, International Business Machines
4 * Corporation and others. All Rights Reserved.
6 * file name: changes.txt
8 * tab size: 8 (not used)
11 * created on: 2004may06
12 * created by: Markus W. Scherer
14 * change log for Unicode updates
16 * For each new Unicode version, during the beta period,
17 * I copy the change log for the previous version to the top of this file.
18 * I adjust the versions, tickets, URLs, and paths.
19 * I work my way through the steps listed in the log, top to bottom,
20 * adjusting the log as necessary.
21 * I report problems to the UTC and/or CLDR and/or ICU.
22 * Before the data is final, I "turn the crank" several more times,
23 * using appropriate subsets of the steps.
25 ---------------------------------------------------------------------------- ***
27 * New ISO 15924 script codes
29 Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
30 until they are encoded in Unicode,
31 or can be assumed to be encoded in the next Unicode version.
32 Script enum constant names want to follow the Unicode script property value aliases,
33 which are assigned only when the scripts are encoded.
34 When we encode scripts early and guess wrong, then we have confusing enum constants
35 and have sometimes added aliases.
37 Variant script codes like Latf and Aran that are not subject to separate encoding
38 can be added at any time.
39 (For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
41 We add script codes used in CLDR or in the spoof checker.
42 This includes combination/alias codes like Hanb and Jamo.
43 See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
44 and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
46 We add special Z* script codes like Zsye.
48 For new script codes see http://www.unicode.org/iso15924/codechanges.html
50 ---------------------------------------------------------------------------- ***
52 Unicode 10.0 update for ICU 60
54 http://www.unicode.org/versions/Unicode10.0.0/
55 http://www.unicode.org/versions/beta-10.0.0.html
56 http://blog.unicode.org/2017/03/unicode-100-beta-review.html
57 http://www.unicode.org/review/pri350/
58 http://www.unicode.org/reports/uax-proposed-updates.html
59 http://www.unicode.org/reports/tr44/tr44-19.html
61 * Command-line environment setup
63 UNICODE_DATA=~/unidata/uni10/20170605
64 CLDR_SRC=~/svn.cldr/uni10
65 ICU_ROOT=~/svn.icu/uni10
68 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
69 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
70 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
74 - ticket:12985: Unicode 10
75 - ticket:13061: undo hacks from emoji 5.0 update
76 - ticket:13062: add Emoji_Component property
77 - ^/branches/markus/uni10
81 - cldrbug 10055: Unicode 10
82 - cldrbug 9882: Unicode 10 script metadata
83 - cldrbug 10219: numbering systems for Unicode 10
85 *** Unicode version numbers
88 - com.ibm.icu.util.VersionInfo
89 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
91 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
92 so that the makefiles see the new version number.
94 *** data files & enums & parser code
97 - mkdir -p $UNICODE_DATA
98 - download Unicode 10.0 files into $UNICODE_DATA
99 + subfolders: ucd, uca, idna, security
100 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
101 - download emoji 5.0 files into $UNICODE_DATA/emoji
103 * for manual diffs: remove version suffixes from the file names
104 ~$ unidata/desuffixucd.py $UNICODE_DATA
105 (see https://sites.google.com/site/unicodetools/inputdata)
107 * process and/or copy files
108 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
109 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
110 + For debugging, and tweaking how ppucd.txt is written,
111 the tool has an --only_ppucd option:
112 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
114 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
116 * build ICU (make install)
117 so that the tools build can pick up the new definitions from the installed header files.
119 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
121 * preparseucd.py changes
122 - remove or add new Unicode scripts from/to the
123 only-in-ISO-15924 list according to the error messages:
124 ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
125 -> adjust _scripts_only_in_iso15924 as indicated
127 Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
128 -> add vo=Vertical_Orientation to _ignored_properties
129 -> later removed again, parsing the file, even though we do not yet store data for runtime use
131 * new constants for new property values
132 - preparseucd.py error:
133 ValueError: missing uchar.h enum constants for some property values:
134 [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
135 u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
136 (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
137 u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
138 u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
139 (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
140 = PropertyValueAliases.txt new property values (diff old & new .txt files)
141 blk; CJK_Ext_F ; CJK_Unified_Ideographs_Extension_F
142 blk; Kana_Ext_A ; Kana_Extended_A
143 blk; Masaram_Gondi ; Masaram_Gondi
145 blk; Soyombo ; Soyombo
146 blk; Syriac_Sup ; Syriac_Supplement
147 blk; Zanabazar_Square ; Zanabazar_Square
149 use long property names for enum constants,
150 for the trailing comment get the block start code point: diff old & new Blocks.txt
151 -> add to UCharacter.UnicodeBlock IDs
152 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
153 replace public static final int \1_ID = \2; \3
154 -> add to UCharacter.UnicodeBlock objects
155 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
156 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
158 jg ; Malayalam_Bha ; Malayalam_Bha
159 jg ; Malayalam_Ja ; Malayalam_Ja
160 jg ; Malayalam_Lla ; Malayalam_Lla
161 jg ; Malayalam_Llla ; Malayalam_Llla
162 jg ; Malayalam_Nga ; Malayalam_Nga
163 jg ; Malayalam_Nna ; Malayalam_Nna
164 jg ; Malayalam_Nnna ; Malayalam_Nnna
165 jg ; Malayalam_Nya ; Malayalam_Nya
166 jg ; Malayalam_Ra ; Malayalam_Ra
167 jg ; Malayalam_Ssa ; Malayalam_Ssa
168 jg ; Malayalam_Tta ; Malayalam_Tta
169 -> uchar.h & UCharacter.JoiningGroup
171 sc ; Gonm ; Masaram_Gondi
174 sc ; Zanb ; Zanabazar_Square
175 -> uscript.h & com.ibm.icu.lang.UScript
176 -> Nushu had been added already
177 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
178 and in com.ibm.icu.dev.test.lang.TestUScript.java
180 * New properties as shown in PropertyValueAliases.txt changes
181 - boolean Emoji_Component from emoji 5
182 -> uchar.h & UProperty.java
184 # Regional_Indicator (RI)
186 RI ; N ; No ; F ; False
187 RI ; Y ; Yes ; T ; True
188 -> uchar.h & UProperty.java
189 -> single immutable range, to be hardcoded
191 # Prepended_Concatenation_Mark (PCM)
193 PCM; N ; No ; F ; False
194 PCM; Y ; Yes ; T ; True
195 -> was new in Unicode 9
196 -> uchar.h & UProperty.java
198 # Vertical_Orientation (vo)
201 vo ; Tr ; Transformed_Rotated
202 vo ; Tu ; Transformed_Upright
204 -> only pre-parsed for now, but not yet stored for runtime use
206 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
207 (not strictly necessary for NOT_ENCODED scripts)
208 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
210 * generate normalization data files
211 cd $ICU_ROOT/dbg/icu4c
212 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
213 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
214 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
215 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
216 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
218 * build ICU (make install)
219 so that the tools build can pick up the new definitions from the installed header files.
221 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
223 * build Unicode tools using CMake+make
225 $ICU_SRC/tools/unicode/c/icudefs.txt:
227 # Location (--prefix) of where ICU was installed.
228 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
229 # Location of the ICU4C source tree.
230 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
232 $ICU_ROOT/dbg/tools/unicode/c$
233 cmake ../../../../src/tools/unicode/c
236 * generate core properties data files
237 $ICU_ROOT/dbg/tools/unicode/c$
238 genprops/genprops $ICU_SRC/icu4c
239 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
240 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
241 - rebuild ICU (make install) & tools
243 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
244 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
245 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
246 - Unicode 6.0..10.0: U+2260, U+226E, U+226F
247 - nothing new in this Unicode version, no test file to update
249 * run & fix ICU4C tests
250 - Andy handles RBBI & spoof check test failures
252 * collation: CLDR collation root, UCA DUCET
254 - UCA DUCET goes into Mark's Unicode tools, see
255 https://sites.google.com/site/unicodetools/home#TOC-UCA
256 - CLDR root data files are checked into $CLDR_SRC/common/uca/
257 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
259 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
260 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
261 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
262 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
263 (note removing the underscore before "Rules")
264 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
265 - restore TODO diffs in UCARules.txt
266 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
267 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
268 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
269 from the CLDR root files (..._CLDR_..._SHORT.txt)
270 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
271 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
272 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
273 - if CLDR common/uca/unihan-index.txt changes, then update
274 CLDR common/collation/root.xml <collation type="private-unihan">
275 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
277 - run genuca, see command line above;
279 Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
280 FDD1 11D10; [70 D5 02, 05, 05] # Masaram_Gondi first primary (compressible)
281 (add the character to genuca.cpp sampleCharsToScripts[])
282 + look up the USCRIPT_ code for the new sample characters
283 (should be obvious from the comment in the error output)
284 + *add* mappings to sampleCharsToScripts[], do not replace them
285 (in case the script sample characters flip-flop)
286 + insert new scripts in DUCET script order, see the top_byte table
287 at the beginning of FractionalUCA.txt
291 https://sites.google.com/site/unicodetools/unihan
293 org.unicode.draft.GenerateUnihanCollators
296 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
297 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
298 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
299 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
302 org.unicode.draft.GenerateUnihanCollatorFiles
303 with the same arguments
306 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
307 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
310 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
311 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
312 - run CLDR unit tests, commit to CLDR
313 - generate ICU zh collation data: run CLDR
314 org.unicode.cldr.icu.NewLdml2IcuConverter
315 with program arguments
317 -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
318 -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
319 -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
320 -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
324 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
327 * run & fix ICU4C tests, now with new CLDR collation root data
328 - run all tests with the collation test data *_SHORT.txt or the full files
329 (the full ones have comments, useful for debugging)
330 - note on intltest: if collate/UCAConformanceTest fails, then
331 utility/MultithreadTest/TestCollators will fail as well;
332 fix the conformance test before looking into the multi-thread test
334 * update Java data files
335 - refresh just the UCD/UCA-related/derived files, just to be safe
336 - see (ICU4C)/source/data/icu4j-readme.txt
337 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
338 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
341 Unicode .icu files built to ./out/build/icudt60l
342 echo timestamp > uni-core-data
343 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
344 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
345 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
346 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
347 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
348 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
349 mkdir -p /tmp/icu4j/main/shared/data
350 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
351 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
352 mkdir -p /tmp/icu4j/main/shared/data
353 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
354 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
355 - copy the big-endian Unicode data files to another location,
356 separate from the other data files,
357 and then refresh ICU4J
358 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
359 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
360 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
361 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
362 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
363 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
364 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
365 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
366 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
367 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
369 * When refreshing all of ICU4J data from ICU4C
370 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
371 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
373 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
375 * update CollationFCD.java
376 + copy & paste the initializers of lcccIndex[] etc. from
377 ICU4C/source/i18n/collationfcd.cpp to
378 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
380 * refresh Java test .txt files
381 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
382 cd $ICU_SRC/icu4c/source/data/unidata
383 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
384 cd ../../test/testdata
385 cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
386 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
388 * run & fix ICU4J tests
391 - send notice to icu-design about new born-@stable API (enum constants etc.)
393 *** CLDR numbering systems
394 - look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
395 Unicode 10: http://unicode.org/cldr/trac/ticket/10219
396 Unicode 9: http://unicode.org/cldr/trac/ticket/9692
398 *** merge the Unicode update branches back onto the trunk
399 - do not merge the icudata.jar and testdata.jar,
400 instead rebuild them from merged & tested ICU4C
401 - make sure that changes to Unicode tools are checked in:
402 http://www.unicode.org/utility/trac/log/trunk/unicodetools
404 ---------------------------------------------------------------------------- ***
406 Emoji 5.0 update for ICU 59
407 - ICU 59 mostly remains on Unicode 9.0
408 - except updates bidi and segmentation data to Unicode 10 beta
410 First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
412 * Command-line environment setup
414 ICU_ROOT=~/svn.icu/trunk
415 ICU_SRC_DIR=$ICU_ROOT/src
416 ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
418 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
419 SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
420 UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
424 - ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
425 - changes directly on trunk
427 *** data files & enums & parser code
431 - download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
432 - download emoji 5.0 beta files into the same uni90e50 folder
433 - download Unicode 10.0 beta files: ucd
434 + copy Unicode 10 bidi files to the uni90e50/ucd folder:
436 BidiCharacterTest.txt
439 extracted/DerivedBidiClass.txt
440 + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
444 * preparseucd.py changes
445 - adjust for combined trunks
446 - write new copyright lines
447 - ignore new Emoji_Component property for now
449 * process and/or copy files
450 - ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
451 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
453 - cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
455 * build ICU (make install)
456 so that the tools build can pick up the new definitions from the installed header files.
458 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
460 * build Unicode tools using CMake+make
462 ~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
464 # Location (--prefix) of where ICU was installed.
465 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
466 # Location of the ICU4C source tree.
467 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
469 ~/svn.icu/trunk/dbg/tools/unicode/c$
470 cmake ../../../../src/tools/unicode/c
473 * generate core properties data files
474 ~/svn.icu/trunk/dbg/tools/unicode/c$
475 genprops/genprops $ICU4C_SRC_DIR
476 - rebuild ICU (make install) & tools
478 * run & fix ICU4C tests
479 - Andy handles RBBI & spoof check test failures
481 * update Java data files
482 - refresh just the UCD/UCA-related/derived files, just to be safe
483 - see (ICU4C)/source/data/icu4j-readme.txt
485 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
488 Unicode .icu files built to ./out/build/icudt59l
489 echo timestamp > uni-core-data
490 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
491 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
492 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
493 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
494 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
495 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
496 mkdir -p /tmp/icu4j/main/shared/data
497 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
498 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
499 mkdir -p /tmp/icu4j/main/shared/data
500 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
501 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
502 - copy the big-endian Unicode data files to another location,
503 separate from the other data files,
504 and then refresh ICU4J
505 cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
506 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
507 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
508 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
509 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
510 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
511 jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
513 * When refreshing all of ICU4J data from ICU4C
514 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
515 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
517 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
519 * refresh Java test .txt files
520 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
521 cd $ICU4C_SRC_DIR/source/data/unidata
522 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
523 cd ../../test/testdata
524 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
525 cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
527 * run & fix ICU4J tests
529 ---------------------------------------------------------------------------- ***
531 Unicode 9.0 update for ICU 58
533 * Command-line environment setup
535 ICU_ROOT=~/svn.icu/trunk
536 ICU_SRC_DIR=$ICU_ROOT/src
538 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
539 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
540 UNIDATA=$ICU_SRC_DIR/source/data/unidata
542 http://www.unicode.org/review/pri323/ -- beta review
543 http://www.unicode.org/reports/uax-proposed-updates.html
544 http://www.unicode.org/versions/beta-9.0.0.html
545 http://www.unicode.org/versions/Unicode9.0.0/
546 http://www.unicode.org/reports/tr44/tr44-17.html
550 - ticket:12526: integrate Unicode 9
551 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
552 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
556 - cldrbug 9414: UCA 9
557 - ^/branches/markus/uni90 at r11518 from trunk at r11517
559 - cldrbug 8745: Unicode 9.0 script metadata
561 *** Unicode version numbers
564 - com.ibm.icu.util.VersionInfo
565 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
567 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
568 so that the makefiles see the new version number.
570 *** data files & enums & parser code
574 - download UCD & IDNA files
575 - make sure that the Unicode data folder passed into preparseucd.py
576 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
577 - only for manual diffs: remove version suffixes from the file names
578 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
579 (see https://sites.google.com/site/unicodetools/inputdata)
580 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
581 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
582 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
584 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
586 cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
588 * preparseucd.py changes
589 - remove or add new Unicode scripts from/to the
590 only-in-ISO-15924 list according to the error messages:
591 ValueError: remove ['Tang'] from _scripts_only_in_iso15924
592 ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
593 ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
594 ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
595 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
596 and in com.ibm.icu.dev.test.lang.TestUScript.java
597 - DerivedNumericValues.txt new numeric values
598 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
599 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH
600 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS
601 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH
602 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS
603 -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
604 uchar.c, UCharacterProperty.java
605 to support a new series of values
606 - adjust preparseucd.py for Tangut algorithmic names
608 algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
610 algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
611 - avoid block-compressing most String/Miscellaneous property values,
612 triggered by genprops not coping with a multi-code point Case_Folding on
613 block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
614 keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
616 * PropertyAliases.txt changes
617 - 1 new property PCM=Prepended_Concatenation_Mark
618 Ignore: Only useful for layout engines.
619 Ok to list in ppucd.txt.
621 * PropertyValueAliases.txt new property values
623 blk; Bhaiksuki ; Bhaiksuki
624 blk; Cyrillic_Ext_C ; Cyrillic_Extended_C
625 blk; Glagolitic_Sup ; Glagolitic_Supplement
626 blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation
627 blk; Marchen ; Marchen
628 blk; Mongolian_Sup ; Mongolian_Supplement
632 blk; Tangut_Components ; Tangut_Components
634 use long property names for enum constants
635 -> add to UCharacter.UnicodeBlock IDs
636 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
637 replace public static final int \1_ID = \2; \3
638 -> add to UCharacter.UnicodeBlock objects
639 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
640 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
643 GCB; EBG ; E_Base_GAZ
645 GCB; GAZ ; Glue_After_Zwj
647 -> uchar.h & UCharacter.GraphemeClusterBreak
649 jg ; African_Feh ; African_Feh
650 jg ; African_Noon ; African_Noon
651 jg ; African_Qaf ; African_Qaf
652 -> uchar.h & UCharacter.JoiningGroup
657 -> uchar.h & UCharacter.LineBreak
660 sc ; Bhks ; Bhaiksuki
665 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
668 WB ; EBG ; E_Base_GAZ
670 WB ; GAZ ; Glue_After_Zwj
672 -> uchar.h & UCharacter.WordBreak
674 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
675 (not strictly necessary for NOT_ENCODED scripts)
676 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
678 * generate normalization data files
680 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
681 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
682 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
683 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
684 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
686 * build ICU (make install)
687 so that the tools build can pick up the new definitions from the installed header files.
689 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
691 * build Unicode tools using CMake+make
693 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
695 # Location (--prefix) of where ICU was installed.
696 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
697 # Location of the ICU source tree.
698 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
700 ~/svn.icutools/trunk/dbg/unicode/c$
701 cmake ../../../src/unicode/c
704 * generate core properties data files
705 ~/svn.icutools/trunk/dbg/unicode/c$
706 genprops/genprops $ICU_SRC_DIR
707 genuca/genuca --hanOrder implicit $ICU_SRC_DIR
708 genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
709 - rebuild ICU (make install) & tools
711 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
712 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
713 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
714 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
715 - nothing new in 9.0, no test file to update
717 * run & fix ICU4C tests
718 - Andy handles RBBI & spoof check test failures
720 * collation: CLDR collation root, UCA DUCET
722 - UCA DUCET goes into Mark's Unicode tools, see
723 https://sites.google.com/site/unicodetools/home#TOC-UCA
724 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
725 cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
727 - cd (CLDR UCA branch)/common/uca/
728 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
729 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
730 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
731 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
732 (note removing the underscore before "Rules")
733 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
734 - restore TODO diffs in UCARules.txt
735 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
736 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
737 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
738 from the CLDR root files (..._CLDR_..._SHORT.txt)
739 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
740 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
741 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
742 - if CLDR common/uca/unihan-index.txt changes, then update
743 CLDR common/collation/root.xml <collation type="private-unihan">
744 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
746 - run genuca, see command line above;
748 Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
749 FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)
750 (add the character to genuca.cpp sampleCharsToScripts[])
751 + look up the USCRIPT_ code for the new sample characters
752 (should be obvious from the comment in the error output)
753 + *add* mappings to sampleCharsToScripts[], do not replace them
754 (in case the script sample characters flip-flop)
755 + insert new scripts in DUCET script order, see the top_byte table
756 at the beginning of FractionalUCA.txt
761 org.unicode.draft.GenerateUnihanCollators
763 -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
764 -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
765 -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
766 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
770 org.unicode.draft.GenerateUnihanCollatorFiles
771 with the same arguments
774 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
775 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
778 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
779 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
781 - generate ICU zh collation data: run CLDR
782 org.unicode.cldr.icu.NewLdml2IcuConverter
783 with program arguments
785 -s /home/mscherer/svn.cldr/trunk/common/collation
786 -m /home/mscherer/svn.cldr/trunk/common/supplemental
787 -d /home/mscherer/svn.icu/trunk/src/source/data/coll
788 -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
791 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
794 * run & fix ICU4C tests, now with new CLDR collation root data
795 - run all tests with the collation test data *_SHORT.txt or the full files
796 (the full ones have comments, useful for debugging)
797 - note on intltest: if collate/UCAConformanceTest fails, then
798 utility/MultithreadTest/TestCollators will fail as well;
799 fix the conformance test before looking into the multi-thread test
801 * update Java data files
802 - refresh just the UCD/UCA-related/derived files, just to be safe
803 - see (ICU4C)/source/data/icu4j-readme.txt
805 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
808 Unicode .icu files built to ./out/build/icudt58l
809 echo timestamp > uni-core-data
810 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
811 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
812 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
813 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
814 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
815 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
816 mkdir -p /tmp/icu4j/main/shared/data
817 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
818 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
819 mkdir -p /tmp/icu4j/main/shared/data
820 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
821 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
822 - copy the big-endian Unicode data files to another location,
823 separate from the other data files,
824 and then refresh ICU4J
825 cd ~/svn.icu/trunk/dbg/data/out/icu4j
826 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
827 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
828 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
829 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
830 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
831 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
832 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
833 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
834 jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
836 * When refreshing all of ICU4J data from ICU4C
837 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
838 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
840 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
842 * update CollationFCD.java
843 + copy & paste the initializers of lcccIndex[] etc. from
844 ICU4C/source/i18n/collationfcd.cpp to
845 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
847 * refresh Java test .txt files
848 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
849 cd $ICU_SRC_DIR/source/data/unidata
850 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
851 cd ../../test/testdata
852 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
853 cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
855 * run & fix ICU4J tests
857 *** LayoutEngine script information
859 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
860 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
861 in the working directory.
863 (It also generates ScriptRunData.cpp, which is no longer needed.)
865 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
867 which maps ICU versions to the numbers of script/language constants
868 that were added then.
869 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
871 The generated files have a current copyright date and "@deprecated" statement.
873 * Review changes, fix Java tool if necessary, and copy to ICU4C
874 cd ~/svn.icu4j/trunk/src
875 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
876 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
877 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
880 - send notice to icu-design about new born-@stable API (enum constants etc.)
882 *** merge the Unicode update branches back onto the trunk
883 - do not merge the icudata.jar and testdata.jar,
884 instead rebuild them from merged & tested ICU4C
885 - make sure that changes to Unicode tools & ICU tools are checked in
886 http://www.unicode.org/utility/trac/log/trunk/unicodetools
887 http://bugs.icu-project.org/trac/log/tools/trunk
889 ---------------------------------------------------------------------------- ***
891 New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
894 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
895 - new combination/alias codes: Hanb, Jamo
896 - used in CLDR 29 and in spoof checker
899 Add new codes to uscript.h & UScript.java, see Unicode update logs.
900 -> com.ibm.icu.lang.UScript
901 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
902 replace public static final int \1 = \2; \3
904 Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
905 add new script codes.
906 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
908 Note: If we have to run preparseucd.py again before the Unicode 9 update,
909 then we need to manually keep/restore the new script codes.
911 ICU_ROOT=~/svn.icu/trunk
912 ICU_SRC_DIR=$ICU_ROOT/src
914 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
915 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
916 UNIDATA=$ICU_SRC_DIR/source/data/unidata
918 Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
919 see http://bugs.icu-project.org/trac/ticket/12141
921 make install, then icutools cmake & make, then
922 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
924 Generate Java data as usual, only update pnames.icu & uprops.icu.
926 *** LayoutEngine script information
928 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
929 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
930 in the working directory.
932 (It also generates ScriptRunData.cpp, which is no longer needed.)
934 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
936 which maps ICU versions to the numbers of script/language constants
937 that were added then.
938 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
940 The generated files have a current copyright date and "@deprecated" statement.
942 * Review changes, fix Java tool if necessary, and copy to ICU4C
943 cd ~/svn.icu4j/trunk/src
944 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
945 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
946 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
948 ---------------------------------------------------------------------------- ***
950 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
952 Edit preparseucd.py to add & parse new properties.
953 They share the UCD property namespace but are not listed in PropertyAliases.txt.
955 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
956 Initial data from emoji/2.0/
958 ICU_ROOT=~/svn.icu/trunk
959 ICU_SRC_DIR=$ICU_ROOT/src
961 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
962 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
963 UNIDATA=$ICU_SRC_DIR/source/data/unidata
965 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
967 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
968 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
970 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
972 make install, then icutools cmake & make, then
973 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
975 Generate Java data as usual, only update pnames.icu & uprops.icu.
977 ---------------------------------------------------------------------------- ***
979 Unicode 8.0 update for ICU 56
981 * Command-line environment setup
983 ICU_ROOT=~/svn.icu/trunk
984 ICU_SRC_DIR=$ICU_ROOT/src
986 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
987 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
988 UNIDATA=$ICU_SRC_DIR/source/data/unidata
990 http://www.unicode.org/review/pri297/ -- beta review
991 http://www.unicode.org/reports/uax-proposed-updates.html
992 http://unicode.org/versions/beta-8.0.0.html
993 http://www.unicode.org/versions/Unicode8.0.0/
994 http://www.unicode.org/reports/tr44/tr44-15.html
998 - ticket:11574: Unicode 8
999 - C++ branches/markus/uni80 at r37351 from trunk at r37343
1000 - Java branches/markus/uni80 at r37352 from trunk at r37338
1004 - cldrbug 8311: UCA 8
1005 - branches/markus/uni80 at r11518 from trunk at r11517
1007 - cldrbug 8109: Unicode 8.0 script metadata
1008 - cldrbug 8418: Updated segmentation for Unicode 8.0
1010 *** Unicode version numbers
1013 - com.ibm.icu.util.VersionInfo
1014 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1016 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1017 so that the makefiles see the new version number.
1019 *** data files & enums & parser code
1023 - download UCD & IDNA files
1024 - make sure that the Unicode data folder passed into preparseucd.py
1025 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1026 - only for manual diffs: remove version suffixes from the file names
1027 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
1028 (see https://sites.google.com/site/unicodetools/inputdata)
1029 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1030 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
1031 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1033 - also: from http://unicode.org/Public/security/8.0.0/ download new
1034 confusables.txt & confusablesWholeScript.txt
1035 and copy to $UNIDATA
1036 ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
1037 ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
1039 * initial preparseucd.py changes
1040 - remove new Unicode scripts from the
1041 only-in-ISO-15924 list according to the error message:
1042 ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
1043 from _scripts_only_in_iso15924
1044 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1045 and in com.ibm.icu.dev.test.lang.TestUScript.java
1046 - property and file name change:
1047 IndicMatraCategory -> IndicPositionalCategory
1048 - UnicodeData.txt unusual numeric values (improper fractions)
1049 109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
1050 109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
1051 109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
1052 109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
1053 109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
1054 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
1055 109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
1056 109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
1057 109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
1058 109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
1059 -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
1060 which are listed in DerivedNumericValues.txt;
1061 keeps storage in data file simple
1063 * PropertyValueAliases.txt changes
1064 - 10 new Block (blk) values:
1066 blk; Anatolian_Hieroglyphs ; Anatolian_Hieroglyphs
1067 blk; Cherokee_Sup ; Cherokee_Supplement
1068 blk; CJK_Ext_E ; CJK_Unified_Ideographs_Extension_E
1069 blk; Early_Dynastic_Cuneiform ; Early_Dynastic_Cuneiform
1070 blk; Hatran ; Hatran
1071 blk; Multani ; Multani
1072 blk; Old_Hungarian ; Old_Hungarian
1073 blk; Sup_Symbols_And_Pictographs ; Supplemental_Symbols_And_Pictographs
1074 blk; Sutton_SignWriting ; Sutton_SignWriting
1076 use long property names for enum constants
1077 -> add to UCharacter.UnicodeBlock IDs
1078 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1079 replace public static final int \1_ID = \2; \3
1080 -> add to UCharacter.UnicodeBlock objects
1081 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1082 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1083 - 6 new Script (sc) values:
1086 sc ; Hluw ; Anatolian_Hieroglyphs
1087 sc ; Hung ; Old_Hungarian
1089 sc ; Sgnw ; SignWriting
1090 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
1092 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1093 (not strictly necessary for NOT_ENCODED scripts)
1094 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
1096 * generate normalization data files
1098 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1099 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1100 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1101 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1102 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1104 * build ICU (make install)
1105 so that the tools build can pick up the new definitions from the installed header files.
1107 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1109 * build Unicode tools using CMake+make
1111 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1113 # Location (--prefix) of where ICU was installed.
1114 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
1115 # Location of the ICU source tree.
1116 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
1118 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1119 ~/svn.icutools/trunk/dbg/unicode/c$ make
1121 * generate core properties data files
1122 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1123 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
1124 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
1125 - rebuild ICU (make install) & tools
1126 - run genuca again (see step above) so that it picks up the new nfc.nrm
1127 - rebuild ICU (make install) & tools
1129 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1130 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1131 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1132 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
1133 - nothing new in 8.0, no test file to update
1135 * run & fix ICU4C tests
1136 - bad Cherokee case folding due to difference in fallbacks:
1137 UCD case folding falls back to no mapping,
1138 ICU runtime case folding falls back to lowercasing;
1139 fixed casepropsbuilder.cpp to generate scf mappings to self
1140 when there is an slc mapping but no scf
1141 - Andy handles RBBI & spoof check test failures
1143 * collation: CLDR collation root, UCA DUCET
1145 - UCA DUCET goes into Mark's Unicode tools, see
1146 https://sites.google.com/site/unicodetools/home#TOC-UCA
1147 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
1148 - cd (CLDR UCA branch)/common/uca/
1149 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1150 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1151 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1152 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
1153 (note removing the underscore before "Rules")
1154 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1155 - restore TODO diffs in UCARules.txt
1156 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1157 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1158 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1159 from the CLDR root files (..._CLDR_..._SHORT.txt)
1160 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1161 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1162 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1163 - if CLDR common/uca/unihan-index.txt changes, then update
1164 CLDR common/collation/root.xml <collation type="private-unihan">
1165 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
1166 - run genuca, see command line above;
1168 Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
1169 (add the character to genuca.cpp sampleCharsToScripts[])
1170 + look up the script for the new sample characters
1171 (e.g., in FractionalUCA.txt)
1172 + *add* mappings to sampleCharsToScripts[], do not replace them
1173 (in case the script sample characters flip-flop)
1174 + insert new scripts in DUCET script order, see the top_byte table
1175 at the beginning of FractionalUCA.txt
1178 * run & fix ICU4C tests, now with new CLDR collation root data
1179 - run all tests with the collation test data *_SHORT.txt or the full files
1180 (the full ones have comments, useful for debugging)
1181 - note on intltest: if collate/UCAConformanceTest fails, then
1182 utility/MultithreadTest/TestCollators will fail as well;
1183 fix the conformance test before looking into the multi-thread test
1184 - fixed bug in CollationWeights::getWeightRanges()
1185 exposed by new data and CollationTest::TestRootElements
1187 * update Java data files
1188 - refresh just the UCD/UCA-related/derived files, just to be safe
1189 - see (ICU4C)/source/data/icu4j-readme.txt
1191 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1194 Unicode .icu files built to ./out/build/icudt56l
1195 echo timestamp > uni-core-data
1196 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
1197 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
1198 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1199 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
1200 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
1201 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
1202 mkdir -p /tmp/icu4j/main/shared/data
1203 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1204 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
1205 mkdir -p /tmp/icu4j/main/shared/data
1206 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1207 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
1208 - copy the big-endian Unicode data files to another location,
1209 separate from the other data files,
1210 and then refresh ICU4J
1211 cd ~/svn.icu/trunk/dbg/data/out/icu4j
1212 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1213 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1214 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1215 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1216 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1217 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1218 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1219 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1220 jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1222 * When refreshing all of ICU4J data from ICU4C
1223 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1224 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1226 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1228 * update CollationFCD.java
1229 + copy & paste the initializers of lcccIndex[] etc. from
1230 ICU4C/source/i18n/collationfcd.cpp to
1231 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1233 * refresh Java test .txt files
1234 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1235 cd $ICU_SRC_DIR/source/data/unidata
1236 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1237 cd ../../test/testdata
1238 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1239 cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1241 * run & fix ICU4J tests
1243 *** LayoutEngine script information
1245 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
1246 because the layout engine was deprecated in ICU 54.
1247 Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
1248 to write lines that we used to add manually.
1250 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1251 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1252 in the working directory.
1254 (It also generates ScriptRunData.cpp, which is no longer needed.)
1256 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
1258 which maps ICU versions to the numbers of script/language constants
1259 that were added then.
1260 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
1262 The generated files have a current copyright date and "@deprecated" statement.
1264 * Review changes, fix Java tool if necessary, and copy to ICU4C
1265 cd ~/svn.icu4j/trunk/src
1266 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1267 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
1268 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
1271 - send notice to icu-design about new born-@stable API (enum constants etc.)
1273 *** merge the Unicode update branches back onto the trunk
1274 - do not merge the icudata.jar and testdata.jar,
1275 instead rebuild them from merged & tested ICU4C
1276 - make sure that changes to Unicode tools & ICU tools are checked in
1277 http://www.unicode.org/utility/trac/log/trunk/unicodetools
1278 http://bugs.icu-project.org/trac/log/tools/trunk
1280 ---------------------------------------------------------------------------- ***
1282 Unicode 7.0 update for ICU 54
1284 http://www.unicode.org/review/pri271/ -- beta review
1285 http://www.unicode.org/reports/uax-proposed-updates.html
1286 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
1287 http://www.unicode.org/reports/tr44/tr44-13.html
1291 - ticket 10821: Unicode 7.0, UCA 7.0
1292 - C++ branches/markus/uni70 at r35584 from trunk at r35580
1293 - Java branches/markus/uni70 at r35587 from trunk at r35545
1297 - ticket 7195: UCA 7.0 CLDR root collation
1298 - branches/markus/uni70 at r10062 from trunk at r10061
1300 - ticket 6762: script metadata for Unicode 7.0 new scripts
1302 *** Unicode version numbers
1305 - com.ibm.icu.util.VersionInfo
1306 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1308 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1309 so that the makefiles see the new version number.
1311 *** data files & enums & parser code
1315 - download UCD & IDNA files
1316 - make sure that the Unicode data folder passed into preparseucd.py
1317 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1318 - only for manual diffs: remove version suffixes from the file names
1319 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
1320 (see https://sites.google.com/site/unicodetools/inputdata)
1321 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1322 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
1323 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1324 - Restore TODO diffs in source/data/unidata/UCARules.txt
1326 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
1327 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
1329 - also: from http://unicode.org/Public/security/7.0.0/ download new
1330 confusables.txt & confusablesWholeScript.txt
1331 and copy to $ICU_ROOT/src/source/data/unidata/
1333 * initial preparseucd.py changes
1334 - remove new Unicode scripts from the
1335 only-in-ISO-15924 list according to the error message:
1336 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
1337 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
1338 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
1339 from _scripts_only_in_iso15924
1340 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1341 and in com.ibm.icu.dev.test.lang.TestUScript.java
1342 - NamesList.txt now has a heading with a non-ASCII character
1343 + keep ppucd.txt in platform charset, rather than changing tool/test parsers
1344 + escape non-ASCII characters in heading comments
1345 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
1346 + get the copyright from the first file whose copyright line contains the current year
1348 * PropertyValueAliases.txt changes
1349 - 32 new Block (blk) values:
1350 blk; Bassa_Vah ; Bassa_Vah
1351 blk; Caucasian_Albanian ; Caucasian_Albanian
1352 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers
1353 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended
1354 blk; Duployan ; Duployan
1355 blk; Elbasan ; Elbasan
1356 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended
1357 blk; Grantha ; Grantha
1358 blk; Khojki ; Khojki
1359 blk; Khudawadi ; Khudawadi
1360 blk; Latin_Ext_E ; Latin_Extended_E
1361 blk; Linear_A ; Linear_A
1362 blk; Mahajani ; Mahajani
1363 blk; Manichaean ; Manichaean
1364 blk; Mende_Kikakui ; Mende_Kikakui
1367 blk; Myanmar_Ext_B ; Myanmar_Extended_B
1368 blk; Nabataean ; Nabataean
1369 blk; Old_North_Arabian ; Old_North_Arabian
1370 blk; Old_Permic ; Old_Permic
1371 blk; Ornamental_Dingbats ; Ornamental_Dingbats
1372 blk; Pahawh_Hmong ; Pahawh_Hmong
1373 blk; Palmyrene ; Palmyrene
1374 blk; Pau_Cin_Hau ; Pau_Cin_Hau
1375 blk; Psalter_Pahlavi ; Psalter_Pahlavi
1376 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls
1377 blk; Siddham ; Siddham
1378 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers
1379 blk; Sup_Arrows_C ; Supplemental_Arrows_C
1380 blk; Tirhuta ; Tirhuta
1381 blk; Warang_Citi ; Warang_Citi
1383 use long property names for enum constants
1384 -> add to UCharacter.UnicodeBlock IDs
1385 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1386 replace public static final int \1_ID = \2; \3
1387 -> add to UCharacter.UnicodeBlock objects
1388 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1389 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1390 - 28 new Joining_Group (jg) values:
1391 jg ; Manichaean_Aleph ; Manichaean_Aleph
1392 jg ; Manichaean_Ayin ; Manichaean_Ayin
1393 jg ; Manichaean_Beth ; Manichaean_Beth
1394 jg ; Manichaean_Daleth ; Manichaean_Daleth
1395 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh
1396 jg ; Manichaean_Five ; Manichaean_Five
1397 jg ; Manichaean_Gimel ; Manichaean_Gimel
1398 jg ; Manichaean_Heth ; Manichaean_Heth
1399 jg ; Manichaean_Hundred ; Manichaean_Hundred
1400 jg ; Manichaean_Kaph ; Manichaean_Kaph
1401 jg ; Manichaean_Lamedh ; Manichaean_Lamedh
1402 jg ; Manichaean_Mem ; Manichaean_Mem
1403 jg ; Manichaean_Nun ; Manichaean_Nun
1404 jg ; Manichaean_One ; Manichaean_One
1405 jg ; Manichaean_Pe ; Manichaean_Pe
1406 jg ; Manichaean_Qoph ; Manichaean_Qoph
1407 jg ; Manichaean_Resh ; Manichaean_Resh
1408 jg ; Manichaean_Sadhe ; Manichaean_Sadhe
1409 jg ; Manichaean_Samekh ; Manichaean_Samekh
1410 jg ; Manichaean_Taw ; Manichaean_Taw
1411 jg ; Manichaean_Ten ; Manichaean_Ten
1412 jg ; Manichaean_Teth ; Manichaean_Teth
1413 jg ; Manichaean_Thamedh ; Manichaean_Thamedh
1414 jg ; Manichaean_Twenty ; Manichaean_Twenty
1415 jg ; Manichaean_Waw ; Manichaean_Waw
1416 jg ; Manichaean_Yodh ; Manichaean_Yodh
1417 jg ; Manichaean_Zayin ; Manichaean_Zayin
1418 jg ; Straight_Waw ; Straight_Waw
1419 -> uchar.h & UCharacter.JoiningGroup
1420 - 23 new Script (sc) values:
1421 sc ; Aghb ; Caucasian_Albanian
1422 sc ; Bass ; Bassa_Vah
1423 sc ; Dupl ; Duployan
1426 sc ; Hmng ; Pahawh_Hmong
1428 sc ; Lina ; Linear_A
1429 sc ; Mahj ; Mahajani
1430 sc ; Mani ; Manichaean
1431 sc ; Mend ; Mende_Kikakui
1434 sc ; Narb ; Old_North_Arabian
1435 sc ; Nbat ; Nabataean
1436 sc ; Palm ; Palmyrene
1437 sc ; Pauc ; Pau_Cin_Hau
1438 sc ; Perm ; Old_Permic
1439 sc ; Phlp ; Psalter_Pahlavi
1441 sc ; Sind ; Khudawadi
1443 sc ; Wara ; Warang_Citi
1444 -> uscript.h (many were added before)
1445 comment "Mende Kikakui" for USCRIPT_MENDE
1446 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
1447 -> com.ibm.icu.lang.UScript
1448 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1449 replace public static final int \1 = \2; \3
1450 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1457 Pauc 263 Pau Cin Hau
1459 -> uscript.h (some overlap with additions from Unicode)
1460 -> com.ibm.icu.lang.UScript
1461 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1462 replace public static final int \1 = \2; \3
1463 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
1464 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1465 and in com.ibm.icu.dev.test.lang.TestUScript.java
1467 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1468 (not strictly necessary for NOT_ENCODED scripts)
1469 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
1471 * generate normalization data files
1473 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1474 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
1475 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
1476 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1477 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1478 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1479 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1480 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1482 * build ICU (make install)
1483 so that the tools build can pick up the new definitions from the installed header files.
1485 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1487 * build Unicode tools using CMake+make
1489 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1491 # Location (--prefix) of where ICU was installed.
1492 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
1493 # Location of the ICU source tree.
1494 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
1496 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1497 ~/svn.icutools/trunk/dbg/unicode/c$ make
1500 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
1501 + add second array of Joining_Group values for at most 10800..10FFF
1502 icutools: unicode/c/genprops/bidipropsbuilder.cpp
1503 icu: source/common/ubidi_props.h/.c/_data.h
1504 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
1506 * generate core properties data files
1507 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1508 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
1509 - rebuild ICU (make install) & tools
1510 - run genuca again (see step above) so that it picks up the new nfc.nrm
1511 - rebuild ICU (make install) & tools
1513 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1514 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1515 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1516 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
1517 - nothing new in 7.0, no test file to update
1519 * run & fix ICU4C tests
1521 * update Java data files
1522 - refresh just the UCD-related files, just to be safe
1523 - see (ICU4C)/source/data/icu4j-readme.txt
1525 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1528 Unicode .icu files built to ./out/build/icudt53l
1529 echo timestamp > uni-core-data
1530 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1531 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
1532 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1533 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1534 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
1535 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
1536 mkdir -p /tmp/icu4j/main/shared/data
1537 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1538 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
1539 mkdir -p /tmp/icu4j/main/shared/data
1540 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1541 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
1542 - copy the big-endian Unicode data files to another location,
1543 separate from the other data files
1545 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1546 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1547 cd ~/svn.icu/uni70/dbg/data/out/icu4j
1548 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1549 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1550 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1551 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1552 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1553 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1555 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1557 * update CollationFCD.java
1558 + copy & paste the initializers of lcccIndex[] etc. from
1559 ICU4C/source/i18n/collationfcd.cpp to
1560 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1562 * refresh Java test .txt files
1563 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1564 cd $ICU_SRC_DIR/source/data/unidata
1565 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1566 cd ../../test/testdata
1567 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1568 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1572 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
1573 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
1574 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
1575 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
1576 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
1577 - review data; compare files, use blankweights.sed or similar
1578 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
1579 - cd ~/svn.unitools/Generated/uca/7.0.0/
1580 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1581 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1582 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1583 (note removing the underscore before "Rules")
1584 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1585 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1586 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1587 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1588 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1589 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1590 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1591 - run genuca, see command line above
1593 - refresh ICU4J collation data:
1594 (subset of instructions above for properties data refresh, except copies all coll/*)
1596 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1597 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1598 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1599 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1600 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1601 - note on intltest: if collate/UCAConformanceTest fails, then
1602 utility/MultithreadTest/TestCollators will fail as well;
1603 fix the conformance test before looking into the multi-thread test
1604 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
1605 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
1606 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
1608 * When refreshing all of ICU4J data from ICU4C
1609 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1610 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1612 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1614 * run & fix ICU4J tests
1616 *** LayoutEngine script information
1618 (For details see the Unicode 5.2 change log below.)
1620 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1621 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1622 in the working directory.
1623 (It also generates ScriptRunData.cpp, which is no longer needed.)
1625 The generated files have a current copyright date and "@stable" statement.
1626 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
1627 for "born stable" Unicode API constants, and to stop parsing ICU version numbers
1628 which may not contain dots any more.
1630 - diff current <icu>/source/layout files vs. generated ones
1631 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1632 review and manually merge desired changes;
1633 fix gratuitous changes, incorrect @draft/@stable and missing aliases;
1634 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1635 - if you just copy the above files, then
1636 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1637 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1640 - send notice to icu-design about new born-@stable API (enum constants etc.)
1642 *** merge the Unicode update branches back onto the trunk
1643 - do not merge the icudata.jar and testdata.jar,
1644 instead rebuild them from merged & tested ICU4C
1646 ---------------------------------------------------------------------------- ***
1650 http://www.unicode.org/review/pri249/ -- beta review
1651 http://www.unicode.org/reports/uax-proposed-updates.html
1652 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
1653 http://www.unicode.org/reports/tr44/tr44-11.html
1657 - ticket 10128: update ICU to Unicode 6.3 beta
1658 - ticket 10168: update ICU to Unicode 6.3 final
1659 - C++ branches/markus/uni63 at r33552 from trunk at r33551
1660 - Java branches/markus/uni63 at r33550 from trunk at r33553
1662 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
1664 *** Unicode version numbers
1667 (configure.in & configure: have been modified to extract the version from uchar.h)
1668 - com.ibm.icu.util.VersionInfo
1669 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1671 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1672 so that the makefiles see the new version number.
1674 *** data files & enums & parser code
1678 - download UCD, UCA & IDNA files
1679 - make sure that the Unicode data folder passed into preparseucd.py
1680 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1681 - modify preparseucd.py:
1682 parse new file BidiBrackets.txt
1683 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
1684 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
1685 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1686 - Check test file diffs for previously commented-out, known-failing data lines;
1687 probably need to keep those commented out.
1689 * PropertyAliases.txt changes
1690 - 1 new Enumerated Property
1691 bpt ; Bidi_Paired_Bracket_Type
1692 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
1693 -> ubidi_props.h & .c & UBiDiProps.java
1694 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
1696 -> change ubidi.icu format version from 2.0 to 2.1
1697 - 1 new Miscellaneous Property
1698 bpb ; Bidi_Paired_Bracket
1699 -> uchar.h & UProperty.java
1702 * PropertyValueAliases.txt changes
1703 - 3 Bidi_Paired_Bracket_Type (bpt) values:
1707 -> uchar.h & UCharacter.BidiPairedBracketType
1708 -> ubidi_props.h & .c & UBiDiProps.java
1709 -> change ubidi.icu format version from 2.0 to 2.1
1710 - 4 new Bidi_Class (bc) values:
1711 bc ; FSI ; First_Strong_Isolate
1712 bc ; LRI ; Left_To_Right_Isolate
1713 bc ; RLI ; Right_To_Left_Isolate
1714 bc ; PDI ; Pop_Directional_Isolate
1715 -> uchar.h & UCharacterEnums.ECharacterDirection
1716 -> until the bidi code gets updated,
1717 Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
1718 - 3 new Word_Break (WB) values:
1719 WB ; HL ; Hebrew_Letter
1720 WB ; SQ ; Single_Quote
1721 WB ; DQ ; Double_Quote
1722 -> uchar.h & UCharacter.WordBreak
1723 -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
1724 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1726 Aghb 239 Caucasian Albanian
1729 -> com.ibm.icu.lang.UScript
1730 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1731 replace public static final int \1 = \2;\3
1732 -> preparseucd.py _scripts_only_in_iso15924
1733 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1734 and in com.ibm.icu.dev.test.lang.TestUScript.java
1735 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1736 (not strictly necessary for NOT_ENCODED scripts)
1738 * generate normalization data files
1739 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
1740 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
1741 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
1742 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1743 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1744 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1745 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1747 * build ICU (make install)
1748 so that the tools build can pick up the new definitions from the installed header files.
1750 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1752 * build Unicode tools using CMake+make
1754 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1756 # Location (--prefix) of where ICU was installed.
1757 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
1758 # Location of the ICU source tree.
1759 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
1761 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1762 ~/svn.icutools/trunk/dbg/unicode/c$ make
1764 * generate core properties data files
1765 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
1766 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
1767 - rebuild ICU (make install) & tools
1768 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1769 - rebuild ICU (make install) & tools
1771 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1772 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1773 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1774 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
1775 - nothing new in 6.3, no test file to update
1777 * update Java data files
1778 - refresh just the UCD-related files, just to be safe
1779 - see (ICU4C)/source/data/icu4j-readme.txt
1781 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1784 Unicode .icu files built to ./out/build/icudt52l
1785 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1786 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
1787 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1788 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1789 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
1790 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
1791 mkdir -p /tmp/icu4j/main/shared/data
1792 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1793 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
1794 mkdir -p /tmp/icu4j/main/shared/data
1795 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1796 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
1797 - copy the big-endian Unicode data files to another location,
1798 separate from the other data files
1799 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1800 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1801 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1802 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
1803 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1804 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1805 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1807 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1809 * refresh Java test .txt files
1810 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1812 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
1814 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1815 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1816 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1817 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1818 (note removing the underscore before "Rules")
1819 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1820 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1821 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1822 - check test file diffs for previously commented-out, known-failing data lines;
1823 probably need to keep those commented out
1824 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1825 - run genuca, see command line above
1827 - refresh ICU4J collation data:
1828 (subset of instructions above for properties data refresh, except copies all coll/*)
1829 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1830 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1831 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1832 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1833 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1834 - note on intltest: if collate/UCAConformanceTest fails, then
1835 utility/MultithreadTest/TestCollators will fail as well;
1836 fix the conformance test before looking into the multi-thread test
1838 * test ICU, fix test code where necessary
1840 * When refreshing all of ICU4J data from ICU4C
1841 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1842 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1844 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1846 *** LayoutEngine script information
1847 - skipped for Unicode 6.3: no new scripts
1849 *** merge the Unicode update branches back onto the trunk
1850 - do not merge the icudata.jar and testdata.jar,
1851 instead rebuild them from merged & tested ICU4C
1853 ---------------------------------------------------------------------------- ***
1857 http://www.unicode.org/review/pri230/
1858 http://www.unicode.org/versions/beta-6.2.0.html
1859 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
1860 http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values
1861 http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol
1862 http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols
1863 http://www.unicode.org/reports/tr46/tr46-8.html IDNA
1864 http://unicode.org/Public/idna/6.2.0/
1868 - ticket 9515: Unicode 6.2: final ICU update
1870 - ticket 9514: UCA 6.2: fix UCARules.txt
1872 - ticket 9437: update ICU to Unicode 6.2
1873 - C++ branches/markus/uni62 at r32050 from trunk at r32041
1874 - Java branches/markus/uni62 at r32068 from trunk at r32066
1876 *** Unicode version numbers
1879 (configure.in & configure: have been modified to extract the version from uchar.h)
1880 - com.ibm.icu.util.VersionInfo
1881 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1883 *** data files & enums & parser code
1887 - download UCD, UCA & IDNA files
1888 - make sure that the Unicode data folder passed into preparseucd.py
1889 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1890 - modify preparseucd.py: NamesList.txt is now in UTF-8
1891 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
1892 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1893 - Check test file diffs for previously commented-out, known-failing data lines;
1894 probably need to keep those commented out.
1896 * PropertyValueAliases.txt changes
1897 - 1 new Line_Break (lb) value:
1898 lb ; RI ; Regional_Indicator
1899 -> uchar.h & UCharacter.LineBreak
1900 - 1 new Word_Break (WB) value:
1901 WB ; RI ; Regional_Indicator
1902 -> uchar.h & UCharacter.WordBreak
1903 - 1 new Grapheme_Cluster_Break (GCB) value:
1904 GCB; RI ; Regional_Indicator
1905 -> uchar.h & UCharacter.GraphemeClusterBreak
1907 * 3 new numeric values
1908 The new value -1, which was really supposed to be NaN but that would have required
1909 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
1910 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
1911 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
1912 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
1913 The two new values 216000 and 432000 require an addition to the encoding of numeric values.
1914 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
1915 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
1916 -> uprops.h, uchar.c & UCharacterProperty.java
1917 -> cucdtst.c & UCharacterTest.java
1919 * generate normalization data files
1920 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
1921 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
1922 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
1923 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1924 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1925 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1926 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1928 * build ICU (make install)
1929 so that the tools build can pick up the new definitions from the installed header files.
1930 * build Unicode tools using CMake+make
1932 * generate core properties data files
1933 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
1934 - in initial bootstrapping, change the UCA version
1935 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1936 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
1937 - rebuild ICU (make install) & tools
1938 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1939 check if the UCA version in FractionalUCA.txt matches the new Unicode version
1941 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1942 - rebuild ICU (make install) & tools
1944 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1945 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1946 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1947 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
1948 - nothing new in 6.2, no test file to update
1950 * update Java data files
1951 - refresh just the UCD-related files, just to be safe
1952 - see (ICU4C)/source/data/icu4j-readme.txt
1954 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1957 Unicode .icu files built to ./out/build/icudt50l
1958 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1959 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
1960 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1961 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1962 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
1963 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
1964 mkdir -p /tmp/icu4j/main/shared/data
1965 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1966 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
1967 mkdir -p /tmp/icu4j/main/shared/data
1968 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1969 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
1970 - copy the big-endian Unicode data files to another location,
1971 separate from the other data files
1972 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1973 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1974 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1975 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
1976 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1977 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1978 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1980 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1982 * refresh Java test .txt files
1983 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1987 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1988 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1989 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1990 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1991 (note removing the underscore before "Rules")
1992 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1993 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1994 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1995 - check test file diffs for previously commented-out, known-failing data lines;
1996 probably need to keep those commented out
1997 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1998 - run genuca, see command line above
2000 - refresh ICU4J collation data:
2001 (subset of instructions above for properties data refresh, except copies all coll/*)
2002 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2003 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
2004 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
2005 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
2006 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
2007 - note on intltest: if collate/UCAConformanceTest fails, then
2008 utility/MultithreadTest/TestCollators will fail as well;
2009 fix the conformance test before looking into the multi-thread test
2011 * test ICU, fix test code where necessary
2013 * When refreshing all of ICU4J data from ICU4C
2014 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2015 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2017 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2019 *** LayoutEngine script information
2020 - skipped for Unicode 6.2: no new scripts
2022 *** merge the Unicode update branches back onto the trunk
2023 - do not merge the icudata.jar and testdata.jar,
2024 instead rebuild them from merged & tested ICU4C
2026 ---------------------------------------------------------------------------- ***
2028 Future Unicode update
2030 Tools simplified since the Unicode 6.1 update. See
2031 - http://site.icu-project.org/design/props/ppucd
2032 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
2034 * Unicode version numbers
2035 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
2038 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
2039 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
2040 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2041 - Check test file diffs for previously commented-out, known-failing data lines;
2042 probably need to keep those commented out.
2044 * PropertyValueAliases.txt changes
2045 - Script codes that are in ISO 15924 but not in Unicode are now listed in
2046 preparseucd.py, in the _scripts_only_in_iso15924 variable.
2047 If there are new ISO codes, then add them.
2048 If Unicode adds some of them, then remove them from the .py variable.
2050 * UnicodeData.txt changes
2051 - No more manual changes for CJK ranges for algorithmic names;
2052 those are now written to ppucd.txt and genprops reads them from there.
2054 * generate core properties data files (makeprops.sh was deleted)
2055 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
2057 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
2058 - it is now generated by preparseucd.py
2060 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
2061 - it is now generated by preparseucd.py
2062 - make sure that the Unicode data folder passed into preparseucd.py
2063 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
2064 (can be in some subfolder)
2066 * generate normalization data files
2067 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
2068 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
2069 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
2070 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2071 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2072 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2073 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2075 * build ICU (make install)
2076 * build Unicode tools using CMake+make
2078 * new way to call genuca (makeuca.sh was deleted)
2079 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
2081 ---------------------------------------------------------------------------- ***
2087 - ticket 8995 final update to Unicode 6.1
2088 - ticket 8994 regenerate source/layout/CanonData.cpp
2090 - ticket 8961 support Unicode "Age" value *names*
2091 - ticket 8963 support multiple character name aliases & types
2093 - ticket 8827 "update ICU to Unicode 6.1"
2094 - C++ branches/markus/uni61 at r30864 from trunk at r30843
2095 - Java branches/markus/uni61 at r30865 from trunk at r30863
2097 *** Unicode version numbers
2100 (configure.in & configure: have been modified to extract the version from uchar.h)
2101 - com.ibm.icu.util.VersionInfo
2102 - icutools/unicode/makedefs.sh
2103 + also review & update other definitions in that file,
2104 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
2106 *** data files & enums & parser code
2110 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
2111 - This prepares both unidata and testdata files in respective output subfolders.
2112 - Check test file diffs for previously commented-out, known-failing data lines;
2113 probably need to keep those commented out.
2115 * PropertyValueAliases.txt changes
2116 - 11 new block names:
2118 Arabic_Mathematical_Alphabetic_Symbols
2120 Meetei_Mayek_Extensions
2122 Meroitic_Hieroglyphs
2126 Sundanese_Supplement
2129 -> add to UCharacter.UnicodeBlock IDs
2130 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2131 replace public static final int \1_ID = \2; \3
2132 -> add to UCharacter.UnicodeBlock objects
2133 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2134 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2135 - 1 new Joining_Group (jg) value:
2137 -> uchar.h & UCharacter.JoiningGroup
2138 - 2 new Line_Break (lb) values:
2139 CJ=Conditional_Japanese_Starter
2141 -> uchar.h & UCharacter.LineBreak
2144 sc ; Merc ; Meroitic_Cursive
2145 sc ; Mero ; Meroitic_Hieroglyphs
2148 sc ; Sora ; Sora_Sompeng
2150 -> remove these from SyntheticPropertyValueAliases.txt
2151 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2152 and in com.ibm.icu.dev.test.lang.TestUScript.java
2153 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2157 and another one added 2011-12-09
2158 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
2160 -> com.ibm.icu.lang.UScript
2161 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2162 replace public static final int \1 = \2;\3
2163 -> SyntheticPropertyValueAliases.txt
2164 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2165 and in com.ibm.icu.dev.test.lang.TestUScript.java
2167 * UnicodeData.txt changes
2168 - the last Unihan code point changes from U+9FCB to U+9FCC
2169 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
2170 + do change gennames.c
2171 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
2173 * DerivedBidiClass.txt changes
2174 - 2 new default-AL blocks:
2175 # Arabic Extended-A: U+08A0 - U+08FF (was default-R)
2176 # Arabic Mathematical Alphabetic Symbols:
2177 # U+1EE00 - U+1EEFF (was default-R)
2178 - 2 new default-R blocks:
2179 # Meroitic Hieroglyphs:
2181 # Meroitic Cursive: U+109A0 - U+109FF
2182 -> should be picked up by the explicit data in the file
2184 * NameAliases.txt changes
2186 # Each line has two fields
2187 # First field: Code point
2188 # Second field: Alias
2190 # Each line has three fields, as described here:
2192 # First field: Code point
2193 # Second field: Alias
2195 - Also, the file previously allowed multiple aliases but only now does it
2196 actually provide multiple, even multiple of the same type. For example,
2197 FEFF;BYTE ORDER MARK;alternate
2198 FEFF;BOM;abbreviation
2199 FEFF;ZWNBSP;abbreviation
2200 - This breaks our gennames parser, unames.icu data structure, and API.
2201 Fix gennames to only pick up "correction" aliases.
2202 New ticket #8963 for further changes.
2204 * run genpname/preparse.pl (on Linux)
2205 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2206 + make sure that data.h is writable
2207 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2208 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2210 * build ICU (make install)
2211 so that the tools build can pick up the new definitions from the installed header files.
2212 * build Unicode tools (at least genpname) using CMake+make
2215 (builds both pnames.icu and propname_data.h)
2216 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2217 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
2219 * build ICU (make install)
2220 * build Unicode tools using CMake+make
2222 * update source/data/unidata/norm2/nfkc_cf.txt
2223 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2225 * update source/data/unidata/norm2/uts46.txt
2226 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
2227 to ~/svn.icu/tools/trunk/src/unicode/py
2228 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
2229 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2230 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2232 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2233 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2234 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2235 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
2236 - nothing new in 6.1, no test file to update
2238 * generate core properties data files
2239 - in initial bootstrapping, change the UCA version
2240 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
2241 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2242 - rebuild ICU & tools
2243 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
2244 check if the UCA version in FractionalUCA.txt matches the new Unicode version
2246 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
2247 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2248 - rebuild ICU & tools
2250 * update Java data files
2251 - refresh just the UCD-related files, just to be safe
2252 - see (ICU4C)/source/data/icu4j-readme.txt
2254 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2257 Unicode .icu files built to ./out/build/icudt49l
2258 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
2259 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
2260 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2261 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
2262 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
2263 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
2264 mkdir -p /tmp/icu4j/main/shared/data
2265 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2266 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
2267 mkdir -p /tmp/icu4j/main/shared/data
2268 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2269 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
2270 - copy the big-endian Unicode data files to another location,
2271 separate from the other data files
2272 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2273 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
2274 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
2275 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
2276 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
2277 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2278 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
2280 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
2282 * refresh Java test .txt files
2283 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2285 * test ICU so far, fix test code where necessary
2286 - temporarily ignore collation issues that look like UCA/UCD mismatches,
2287 until UCA data is updated
2291 - get output from Mark's tools; look in
2292 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
2293 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2294 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2295 (note removing the underscore before "Rules")
2296 - update (ICU)/source/test/testdata/CollationTest_*.txt
2297 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2298 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2299 - check test file diffs for previously commented-out, known-failing data lines;
2300 probably need to keep those commented out
2301 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
2303 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2305 - refresh ICU4J collation data:
2306 (subset of instructions above for properties data refresh, except copies all coll/*)
2307 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2308 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2309 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2310 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
2311 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
2312 - note on intltest: if collate/UCAConformanceTest fails, then
2313 utility/MultithreadTest/TestCollators will fail as well;
2314 fix the conformance test before looking into the multi-thread test
2316 * When refreshing all of ICU4J data from ICU4C
2317 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2318 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2320 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2322 *** LayoutEngine script information
2324 (For details see the Unicode 5.2 change log below.)
2326 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2327 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2328 in the working directory.
2329 (It also generates ScriptRunData.cpp, which is no longer needed.)
2331 The generated files have a current copyright date and "@draft" statement.
2333 - diff current <icu>/source/layout files vs. generated ones
2334 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2335 review and manually merge desired changes;
2336 fix gratuitous changes, incorrect @draft and missing aliases;
2337 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2338 - if you just copy the above files, then
2339 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
2340 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2342 *** merge the Unicode update branches back onto the trunk
2343 - do not merge the icudata.jar and testdata.jar,
2344 instead rebuild them from merged & tested ICU4C
2346 ---------------------------------------------------------------------------- ***
2348 ICU 4.8 (no Unicode update, just new script codes)
2350 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2356 Shrd 319 Sharada, Śāradā
2357 Sora 398 Sora Sompeng
2358 Takr 321 Takri, Ṭākrī, Ṭāṅkrī
2362 -> com.ibm.icu.lang.UScript
2363 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2364 replace public static final int \1 = \2;\3
2365 -> genpname/SyntheticPropertyValueAliases.txt
2366 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2367 and in com.ibm.icu.dev.test.lang.TestUScript.java
2369 * run genpname/preparse.pl (on Linux)
2370 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2371 + make sure that data.h is writable
2372 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2373 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2375 * rebuild Unicode tools (at least genpname) using make
2376 - You might first need to "make install" ICU so that the tools build can pick
2377 up the new definitions from the installed header files.
2380 (builds both pnames.icu and propname_data.h)
2381 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2382 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
2383 - rebuild ICU & tools
2386 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
2387 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
2388 - rebuild ICU & tools
2390 * update Java data files
2391 - refresh just the UCD-related files, just to be safe
2392 - see (ICU4C)/source/data/icu4j-readme.txt
2394 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2395 - copy the big-endian Unicode data files to another location,
2396 separate from the other data files
2397 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
2398 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
2399 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
2401 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
2403 * should have updated the layout engine script codes but forgot
2405 ---------------------------------------------------------------------------- ***
2409 *** related ICU Trac tickets
2411 7264 Unicode 6.0 Update
2413 *** Unicode version numbers
2416 (configure.in & configure: have been modified to extract the version from uchar.h)
2417 - com.ibm.icu.util.VersionInfo
2419 *** data files & enums & parser code
2423 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
2424 - This now prepares both unidata and testdata files in respective output subfolders.
2426 * PropertyAliases.txt changes
2427 - new Script_Extensions property defined in the new ScriptExtensions.txt file
2428 but not listed in PropertyAliases.txt; reported to unicode.org;
2429 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
2430 scx; Script_Extensions
2431 -> uchar.h with new UProperty section
2432 -> com.ibm.icu.lang.UProperty, parallel with uchar.h
2434 * PropertyValueAliases.txt changes
2435 - 12 new block names:
2440 CJK_Unified_Ideographs_Extension_D
2445 Miscellaneous_Symbols_And_Pictographs
2447 Transport_And_Map_Symbols
2449 -> add to UCharacter.UnicodeBlock
2450 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2451 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2452 - Joining_Group (jg) values:
2453 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
2454 -> uchar.h & UCharacter.JoiningGroup
2459 -> remove these from SyntheticPropertyValueAliases.txt
2460 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
2461 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2462 and in com.ibm.icu.dev.test.lang.TestUScript.java
2463 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2464 (added 2009-11-11..2010-07-18)
2466 Dupl 755 Duployan shortand
2472 Merc 101 Meroitic Cursive
2473 Narb 106 Old North Arabian
2477 Wara 262 Warang Citi
2479 -> com.ibm.icu.lang.UScript
2480 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2481 replace public static final int \1 = \2;\3
2482 -> SyntheticPropertyValueAliases.txt
2483 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2484 and in com.ibm.icu.dev.test.lang.TestUScript.java
2485 - ISO 15924 name change
2486 Mero 100 Meroitic Hieroglyphs (was Meroitic)
2487 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
2488 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
2490 * UnicodeData.txt changes
2492 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
2493 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
2494 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
2496 * build Unicode tools using CMake+make
2498 * run genpname/preparse.pl (on Linux)
2499 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2500 + make sure that data.h is writable
2501 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2502 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2504 * rebuild Unicode tools (at least genpname) using make
2505 - You might first need to "make install" ICU so that the tools build can pick
2506 up the new definitions from the installed header files.
2509 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2510 - rebuild ICU & tools
2512 * update source/data/unidata/norm2/nfkc_cf.txt
2513 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2515 * update source/data/unidata/norm2/uts46.txt
2516 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
2517 to ~/svn.icu/tools/trunk/src/unicode/py
2518 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
2519 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2520 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2522 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2523 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2524 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2525 - Unicode 6.0: U+2260, U+226E, U+226F
2527 * generate core properties data files
2528 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2529 - rebuild ICU & tools
2530 - run makeuca.sh so that genuca picks up the new nfc.nrm:
2531 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2532 - rebuild ICU & tools
2534 * implement new Script_Extensions property (provisional)
2535 - parser & generator: genprops & uprops.icu
2536 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
2537 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
2539 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
2541 - genbidi/gencase/genprops tools changes
2542 - re-run makeprops.sh (see above)
2543 - UCharacterProperty.java, UCharacterTypeIterator.java,
2544 UBiDiProps.java, UCaseProps.java, and several others with minor changes;
2545 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
2547 * update Java data files
2548 - refresh just the UCD-related files, just to be safe
2549 - see (ICU4C)/source/data/icu4j-readme.txt
2551 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2554 Unicode .icu files built to ./out/build/icudt45l
2555 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2556 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2557 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2558 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
2559 mkdir -p /tmp/icu4j/main/shared/data
2560 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2561 - copy the big-endian Unicode data files to another location,
2562 separate from the other data files
2563 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2564 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2565 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2566 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
2567 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2568 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2569 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2571 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2573 * refresh Java test .txt files
2574 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2576 * un-hardcode normalization skippable (NF*_Inert) test data
2577 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
2579 * copy updated break iterator test files
2580 - now handled by early ucdcopy.py and
2581 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
2583 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
2584 to ~/svn.icu/trunk/src/source/test/testdata)
2585 - they are not used in ICU4J
2589 - get output from Mark's tools; look in
2590 http://www.unicode.org/~book/incoming/mark/uca6.0.0/
2591 http://www.macchiato.com/unicode/utc/additional-uca-files
2592 http://www.unicode.org/Public/UCA/6.0.0/
2593 http://www.unicode.org/~mdavis/uca/
2594 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2595 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2596 - update Han-implicit ranges for new CJK extensions:
2597 swapCJK() in ucol.cpp & ImplicitCEGenerator.java
2598 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
2599 do not add it into invuca so that tailoring primary-after an ignorable works
2600 - genuca: permit space between [variable top] bytes
2601 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
2603 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2605 - refresh ICU4J collation data:
2606 (subset of instructions above for properties data refresh, except copies all coll/*)
2607 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2608 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2609 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2610 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2611 - update (ICU)/source/test/testdata/CollationTest_*.txt
2612 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2613 with output from Mark's Unicode tools
2614 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2615 - note on intltest: if collate/UCAConformanceTest fails, then
2616 utility/MultithreadTest/TestCollators will fail as well;
2617 fix the conformance test before looking into the multi-thread test
2619 * When refreshing all of ICU4J data from ICU4C
2620 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2621 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2623 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2625 *** LayoutEngine script information
2627 (For details see the Unicode 5.2 change log below.)
2629 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2630 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2631 ScriptRunData.cpp, which is no longer needed.)
2633 The generated files have a current copyright date and "@draft" statement.
2635 * copy the above files into <icu>/source/layout, replacing the old files.
2636 * fix mixed line endings
2637 * review the diffs and fix incorrect @draft and missing aliases;
2638 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2639 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2641 ---------------------------------------------------------------------------- ***
2645 *** related ICU Trac tickets
2649 7167 verify collation bytes
2650 7235 Java test NAME_ALIAS
2651 7236 Java DerivedCoreProperties.txt test
2652 7237 Java BidiTest.txt
2653 7238 UTrie2 in core unidata
2654 7239 test for tailoring gaps
2655 7240 Java fix CollationMiscTest
2656 7243 update layout engine for Unicode 5.2
2658 *** Unicode version numbers
2661 - configure.in & configure
2662 - update ucdVersion in gennames.c if an algorithmic range changes
2664 *** data files & enums & parser code
2668 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
2669 - includes finding files regardless of version numbers,
2670 copying them, and performing the equivalent processing of the
2671 ucdstrip and ucdmerge tools on the desired set of files
2674 - PropertyAliases.txt
2675 moved from numeric to enumerated:
2676 ccc ; Canonical_Combining_Class
2677 new string properties:
2678 NFKC_CF ; NFKC_Casefold
2679 Name_Alias; Name_Alias
2680 new binary properties:
2683 CWCF ; Changes_When_Casefolded
2684 CWCM ; Changes_When_Casemapped
2685 CWKCF ; Changes_When_NFKC_Casefolded
2686 CWL ; Changes_When_Lowercased
2687 CWT ; Changes_When_Titlecased
2688 CWU ; Changes_When_Uppercased
2689 new CJK Unihan properties (not supported by ICU)
2690 - PropertyValueAliases.txt
2693 one script code change:
2694 sc ; Qaai ; Inherited
2696 sc ; Zinh ; Inherited ; Qaai
2697 new Line_Break (lb) value:
2698 lb ; CP ; Close_Parenthesis
2699 new Joining_Group (jg) values: Farsi_Yeh, Nya
2701 ccc; 214; ATA ; Attached_Above
2702 - DerivedBidiClass.txt
2703 new default-R range: U+1E800 - U+1EFFF
2705 all of the ISO comments are gone
2707 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
2709 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
2710 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
2714 + cd \svn\icuproj\icu\trunk\source\tools\genpname
2715 + make sure that data.h is writable
2716 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
2717 + preparse.pl complains with errors like the following:
2718 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
2719 This is because ICU 4.0 had scripts from ISO 15924 which are now
2720 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
2721 and PropertyValueAliases.txt.
2722 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2723 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
2724 + preparse.pl complains with errors about block names missing from uchar.h; add them
2726 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2727 - new block & script values
2729 copy new blocks from Blocks.txt
2730 MS VC++ 2008 regular expression:
2731 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
2732 replace with " UBLOCK_\3 = 172, /*[\1]*/"
2733 + several new script values already added in ICU 4.0 for ISO 15924 coverage
2734 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
2735 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
2736 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
2737 (added to SyntheticPropertyValueAliases.txt)
2738 - new Joining Group (JG) values: Farsi_Yeh, Nya
2739 - new Line_Break (lb) value:
2740 lb ; CP ; Close_Parenthesis
2742 * hardcoded Unihan range end/limit
2743 - Unihan range end moves from 9FC3 to 9FCB
2744 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
2745 + do change gennames.c
2747 * Compare definitions of new binary properties with what we used to use
2748 in algorithms, to see if the definitions changed.
2749 - Verified that definitions for Cased and Case_Ignorable are unchanged.
2750 The gencase tool now parses the newly public Case_Ignorable values
2751 in case the definition changes in the future.
2753 * uchar.c & uprops.h & uprops.c & genprops
2754 - new numeric values that didn't exist in Unicode data before:
2755 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
2756 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
2757 therefore redesign the encoding of numeric types and values for formatVersion 6;
2758 design for simple numbers up to at least 144 ("one gross"),
2759 large values up to at least 10^20,
2760 and fractions with numerators -1..17 and denominators 1..16
2761 to cover current and expected future values
2762 (e.g., more Han numeric values, Meroitic twelfths)
2764 * reimplement Hangul_Syllable_Type for new Jamo characters
2765 - the old code assumed that all Jamo characters are in the 11xx block
2766 - Unicode 5.2 fills holes there and adds new Jamo characters in
2767 A960..A97F; Hangul Jamo Extended-A
2769 D7B0..D7FF; Hangul Jamo Extended-B
2770 - Hangul_Syllable_Type can be trivially derived from a subset of
2771 Grapheme_Cluster_Break values
2773 * build Unicode data source code for hardcoding core data
2774 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
2776 ICU data make path is \svn\icuproj\icu\trunk\source\data\
2777 ICU root path is \svn\icuproj\icu\trunk
2778 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2779 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2780 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2781 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2782 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2783 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2784 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2785 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
2786 Creating data file for Unicode Property Names
2787 Creating data file for Unicode Character Properties
2788 Creating data file for Unicode Case Mapping Properties
2789 Creating data file for Unicode BiDi/Shaping Properties
2790 Creating data file for Unicode Normalization
2791 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
2792 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
2794 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
2795 and rebuild the common library
2799 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
2800 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
2801 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
2802 [ Begin obsolete instructions:
2803 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
2804 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
2806 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
2807 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
2808 End obsolete instructions]
2809 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2810 not just the *_STUB.txt files
2811 - note on intltest: if collate/UCAConformanceTest fails, then
2812 utility/MultithreadTest/TestCollators will fail as well;
2813 fix the conformance test before looking into the multi-thread test
2815 *** Implement Cased & Case_Ignorable properties
2816 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
2817 - Problem: These properties should be disjoint, but aren't
2818 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
2819 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
2821 *** Implement Changes_When_Xyz properties
2822 - without stored data
2824 *** Implement Name_Alias property
2825 - add it as another name field in unames.icu
2826 - make it available via u_charName() and UCharNameChoice and
2827 - consider it in u_charFromName()
2831 * Update break iterator rules to new UAX versions and new property values
2832 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
2834 *** new BidiTest file
2835 - review format and data
2836 - copy BidiTest.txt to source/test/testdata
2837 - write test code using this data
2838 - fix ICU code where it fails the conformance test
2841 - generally, find and update code corresponding to C/C++
2842 - UCharacter.UnicodeBlock constants:
2843 a) add an _ID integer per new block, update COUNT
2844 b) add a class instance per new block
2845 Visual Studio regex:
2846 find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
2847 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2848 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
2850 - port test changes to Java
2852 *** LayoutEngine script information
2854 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
2856 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2857 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2858 ScriptRunData.cpp, which is no longer needed.)
2860 The generated files have a current copyright date and "@draft" statement.
2862 -> Eric Mader wrote in email on 20090930:
2863 "I think the tool has been modified to update @draft to @stable for
2864 older scripts and to add @draft for new scripts.
2865 (I worked with an intern on this last year.)
2866 You should check the output after you run it."
2868 * copy the above files into <icu>/source/layout, replacing the old files.
2869 * fix mixed line endings
2870 * review the diffs and fix incorrect @draft and missing aliases
2871 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2873 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2874 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2876 -> Eric Mader wrote in email on 20090930:
2877 "This is just a matter of making sure that all the per-script tables have
2878 entries for any new scripts that were added.
2879 If any new Indic characters were added, then the class tables in
2880 IndicClassTables.cpp should be updated to reflect this.
2881 John Emmons should know how to do this if it's required."
2883 * rebuild the layout and layoutex libraries.
2887 + Jamo_Short_Name, sfc->scf, binary property value aliases
2889 ---------------------------------------------------------------------------- ***
2893 *** related ICU Trac tickets
2895 5696 Update to Unicode 5.1
2897 *** Unicode version numbers
2900 - configure.in & configure
2901 - update ucdVersion in gennames.c if an algorithmic range changes
2903 *** data files & enums & parser code
2907 DerivedCoreProperties.txt
2908 DerivedNormalizationProps.txt
2909 NormalizationTest.txt
2912 GraphemeBreakProperty.txt
2913 SentenceBreakProperty.txt
2914 WordBreakProperty.txt
2915 - ucdstrip and ucdmerge:
2919 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2920 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
2921 copy 5.1.0\ucd\Blocks.txt ..\unidata\
2922 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
2923 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
2924 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2925 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2926 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2927 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2928 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
2929 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
2930 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
2931 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
2932 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
2934 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2935 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2936 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2937 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
2938 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2939 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2940 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2941 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2942 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2943 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2947 + cd \svn\icuproj\icu\uni51\source\tools\genpname
2948 + make sure that data.h is writable
2949 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
2950 + preparse.pl complains with errors like the following:
2951 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
2952 This is because ICU 3.8 had scripts from ISO 15924 which are now
2953 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
2954 and PropertyValueAliases.txt.
2955 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2956 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
2957 + PropertyValueAliases.txt now explicitly contains values for boolean properties:
2958 N/Y, No/Yes, F/T, False/True
2959 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
2960 It will use further values from the file if present.
2962 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2963 - new block & script values
2965 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
2966 (removed from SyntheticPropertyValueAliases.txt)
2967 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
2968 (added to SyntheticPropertyValueAliases.txt)
2969 - uprops.icu (uprops.h) only provides 7 bits for script codes.
2970 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
2971 There is none above 127 yet which is the script code for an
2972 assigned Unicode character, so ICU 4.0 uprops.icu does not store any
2973 script code values greater than 127.
2974 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
2975 in a parallel bit field, and that overflows now.
2976 Also, future values >=128 would be incompatible anyway.
2977 uprops.h is modified to move around several of the bit fields
2978 in the properties vector words, and now uses 8 bits for the script code.
2979 Two other bit fields also grow to accommodate future growth:
2980 Block (current count: 172) grows from 8 to 9 bits,
2981 and Word_Break grows from 4 to 5 bits.
2982 - renamed property Simple_Case_Folding (sfc->scf)
2983 + nothing to be done: handled as normal alias
2984 - new property JSN Jamo_Short_Name
2985 + no new API: only contributes to the Name property
2986 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
2987 - new Joining Group (JG) value: Burushashki_Yeh_Barree
2988 - new Sentence_Break (SB) values:
2993 - new Word_Break (WB) values:
2995 WB ; Extend ; Extend
2999 * Further changes in the 2008-02-29 update:
3000 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
3001 because they should not normally be invisible.
3002 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
3003 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
3004 - new Word_Break (WB) value: NL=Newline
3006 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
3007 - Unihan range end moves from 9FBB to 9FC3
3008 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
3009 + do change gennames.c
3011 * build Unicode data source code for hardcoding core data
3012 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
3014 ICU data make path is \svn\icuproj\icu\uni51\source\data\
3015 ICU root path is \svn\icuproj\icu\uni51
3016 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3017 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
3018 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
3019 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
3020 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
3021 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
3022 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
3023 Creating data file for Unicode Character Properties
3024 Creating data file for Unicode Case Mapping Properties
3025 Creating data file for Unicode BiDi/Shaping Properties
3026 Creating data file for Unicode Normalization
3027 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
3028 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
3030 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
3031 and rebuild the common library
3035 * Update break iterator rules to new UAX versions and new property values
3039 * update FractionalUCA.txt and UCARules.txt with new canonical closure
3042 - Test that APIs using Unicode property value aliases (like UnicodeSet)
3043 support all of the boolean values N/Y, No/Yes, F/T, False/True
3044 -> TestBinaryValues() tests in both cintltst and intltest
3046 *** LayoutEngine script information
3047 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
3048 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
3049 ScriptRunData.cpp, which is no longer needed.)
3051 The generated files have a current copyright date and "@draft" statement.
3053 * copy the above files into <icu>/source/layout, replacing the old files.
3055 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
3056 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
3058 * rebuild the layout and layoutex libraries.
3062 + Jamo_Short_Name, sfc->scf, binary property value aliases
3064 ---------------------------------------------------------------------------- ***
3068 *** related Jitterbugs
3070 5084 RFE: Update to Unicode 5.0
3072 *** data files & enums & parser code
3076 DerivedCoreProperties.txt
3077 DerivedNormalizationProps.txt
3078 NormalizationTest.txt
3081 GraphemeBreakProperty.txt
3082 SentenceBreakProperty.txt
3083 WordBreakProperty.txt
3084 - ucdstrip and ucdmerge:
3088 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
3089 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
3090 copy 5.0.0\ucd\Blocks.txt ..\unidata\
3091 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
3092 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
3093 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
3094 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
3095 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
3096 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
3097 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
3098 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
3099 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
3100 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
3101 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
3103 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
3104 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
3105 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
3106 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
3107 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
3108 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
3109 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
3110 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
3111 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
3112 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
3114 * update FractionalUCA.txt and UCARules.txt with new canonical closure
3118 + make sure that data.h is writable
3119 + perl preparse.pl \cvs\oss\icu > out.txt
3121 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
3122 - new block & script values
3123 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
3125 * build Unicode data source code for hardcoding core data
3126 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
3128 ICU data make path is \cvs\oss\icu\source\data\
3129 ICU root path is \cvs\oss\icu
3130 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3132 Creating data file for Unicode Character Properties
3133 Creating data file for Unicode Case Mapping Properties
3134 Creating data file for Unicode BiDi/Shaping Properties
3135 Creating data file for Unicode Normalization
3136 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
3137 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
3139 - copy the .c source files to C:\cvs\oss\icu\source\common
3140 and rebuild the common library
3142 *** Unicode version numbers
3147 *** LayoutEngine script information
3148 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
3149 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
3150 ScriptRunData.cpp, which is no longer needed.)
3152 The generated files have a current copyright date and "@draft" statement.
3154 * copy the above files into <icu>/source/layout, replacing the old files.
3156 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
3157 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
3159 * rebuild the layout and layoutex libraries.
3161 ---------------------------------------------------------------------------- ***
3165 *** related Jitterbugs
3167 4332 RFE: Update to Unicode 4.1
3168 4157 RBBI, TR29 4.1 updates
3170 *** data files & enums & parser code
3174 DerivedCoreProperties.txt
3175 DerivedNormalizationProps.txt
3176 NormalizationTest.txt
3177 GraphemeBreakProperty.txt
3178 SentenceBreakProperty.txt
3179 WordBreakProperty.txt
3180 - ucdstrip and ucdmerge:
3184 * add new files to the repository
3185 GraphemeBreakProperty.txt
3186 SentenceBreakProperty.txt
3187 WordBreakProperty.txt
3189 * update FractionalUCA.txt and UCARules.txt with new canonical closure
3192 - handle new enumerated properties in sub read_uchar
3195 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
3196 - new binary properties
3198 + Pattern_White_Space
3199 - new enumerated properties
3200 + Grapheme_Cluster_Break
3203 - new block & script & line break values
3206 - case-ignorable changes
3207 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
3208 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
3210 *** Unicode version numbers
3216 - verify that u_charMirror() round-trips
3217 - test all new properties and some new values of old properties
3221 * hardcoded Unihan range end/limit
3222 - Unihan range end moves from 9FA5 to 9FBB
3223 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
3224 + do not modify BOCU/BOCSU code because that would change the encoding
3225 and break binary compatibility!
3226 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
3228 + ignore trietest.c: test data is arbitrary
3229 + ignore tstnorm.cpp: test optimization, not important
3230 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
3231 + do change line_th.txt and word_th.txt
3232 by replacing hardcoded ranges with the new property values
3233 + do change gennames.c
3235 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
3236 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
3237 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
3240 - compare new special casing context conditions with previous ones
3241 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
3244 - consider storing only the short name if it is the same as the long name
3247 - UAX #29 changes (grapheme/word/sentence breaks)
3248 - UAX #14 changes (line breaks)
3249 - Pattern_Syntax & Pattern_White_Space
3251 ---------------------------------------------------------------------------- ***
3253 Unicode 4.0.1 update
3255 *** related Jitterbugs
3257 3170 RFE: Update to Unicode 4.0.1
3258 3171 Add new Unicode 4.0.1 properties
3259 3520 use Unicode 4.0.1 updates for break iteration
3261 *** data files & enums & parser code
3264 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
3265 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
3268 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
3269 according to PRI #26
3270 http://www.unicode.org/review/resolved-pri.html#pri26
3271 - undone again because no corrigendum in sight;
3272 instead modified tests to not check consistency on this for Unicode 4.0.1
3275 - update from http://www.unicode.org/copyright.html
3276 formatted for plain text
3278 * uchar.h & uprops.h & uprops.c & genprops
3279 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
3280 - add U_LB_INSEPARABLE due to a spelling fix
3281 + put short name comment only on line with new constant
3282 for genpname perl script parser
3283 - new binary properties
3285 + Variation_Selector
3288 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
3289 - perl script: correctly calculate the maximum number of fields per row
3292 - new script code Hrkt=Katakana_Or_Hiragana
3294 * gennorm.c track changes in DerivedNormalizationProps.txt
3295 - "FNC" -> "FC_NFKC"
3296 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
3298 * genprops/props2.c track changes in DerivedNumericValues.txt
3299 - changed from 3 columns to 2, dropping the numeric type
3300 + assume that the type is always numeric for Han characters,
3301 and that only those are added in addition to what UnicodeData.txt lists
3303 *** Unicode version numbers
3309 - update test of default bidi classes according to PRI #28
3310 /tsutil/cucdtst/TestUnicodeData
3311 http://www.unicode.org/review/resolved-pri.html#pri28
3312 - bidi tests: change exemplar character for ES depending on Unicode version
3313 - change hardcoded expected property values where they change
3321 - use new Hrkt=Katakana_Or_Hiragana
3324 - are now part of combining character sequences
3325 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ