1 * Copyright (C) 2004-2016, International Business Machines
2 * Corporation and others. All Rights Reserved.
4 * file name: changes.txt
6 * tab size: 8 (not used)
9 * created on: 2004may06
10 * created by: Markus W. Scherer
12 * change log for Unicode updates
14 ---------------------------------------------------------------------------- ***
16 * New ISO 15924 script codes
18 Starting with ICU 55, we do not add UScriptCode constants any more until their scripts
19 are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.
20 Script enum constant names want to follow the Unicode script property value aliases,
21 which are assigned only when the scripts are encoded.
22 When we encode scripts early and guess wrong, then we have confusing enum constants
23 and have sometimes added aliases.
25 Exception: Script codes like Latf and Aran that are not subject to separate encoding
26 can be added at any time.
28 Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html
30 Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561
32 - Aran 161 Arabic (Nastaliq variant)
33 - Kitl 505 Khitan large script
34 - Kits 288 Khitan small script
38 Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.
40 Adlam, Marchen, and Osage are expected to go into Unicode 9;
41 we should assign Unicode script property value aliases for them
42 soon after Unicode 8 is released, and add them in ICU 56.
44 Khitan scripts will be encoded later.
46 ---------------------------------------------------------------------------- ***
48 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
50 Edit preparseucd.py to add & parse new properties.
51 They share the UCD property namespace but are not listed in PropertyAliases.txt.
53 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
54 Initial data from emoji/2.0/
56 ICU_ROOT=~/svn.icu/trunk
57 ICU_SRC_DIR=$ICU_ROOT/src
59 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
60 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
61 UNIDATA=$ICU_SRC_DIR/source/data/unidata
63 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
65 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
66 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
68 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
70 make install, then icutools cmake & make, then
71 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
73 Generate Java data as usual, only update pnames.icu & uprops.icu.
75 ---------------------------------------------------------------------------- ***
77 Unicode 8.0 update for ICU 56
79 * Command-line environment setup
81 ICU_ROOT=~/svn.icu/trunk
82 ICU_SRC_DIR=$ICU_ROOT/src
84 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
85 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
86 UNIDATA=$ICU_SRC_DIR/source/data/unidata
88 http://www.unicode.org/review/pri297/ -- beta review
89 http://www.unicode.org/reports/uax-proposed-updates.html
90 http://unicode.org/versions/beta-8.0.0.html
91 http://www.unicode.org/versions/Unicode8.0.0/
92 http://www.unicode.org/reports/tr44/tr44-15.html
96 - ticket:11574: Unicode 8
97 - C++ branches/markus/uni80 at r37351 from trunk at r37343
98 - Java branches/markus/uni80 at r37352 from trunk at r37338
102 - cldrbug 8311: UCA 8
103 - branches/markus/uni80 at r11518 from trunk at r11517
105 - cldrbug 8109: Unicode 8.0 script metadata
106 - cldrbug 8418: Updated segmentation for Unicode 8.0
108 *** Unicode version numbers
111 - com.ibm.icu.util.VersionInfo
112 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
114 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
115 so that the makefiles see the new version number.
117 *** data files & enums & parser code
121 - download UCD & IDNA files
122 - make sure that the Unicode data folder passed into preparseucd.py
123 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
124 - only for manual diffs: remove version suffixes from the file names
125 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
126 (see https://sites.google.com/site/unicodetools/inputdata)
127 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
128 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
129 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
131 - also: from http://unicode.org/Public/security/8.0.0/ download new
132 confusables.txt & confusablesWholeScript.txt
134 ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
135 ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
137 * initial preparseucd.py changes
138 - remove new Unicode scripts from the
139 only-in-ISO-15924 list according to the error message:
140 ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
141 from _scripts_only_in_iso15924
142 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
143 and in com.ibm.icu.dev.test.lang.TestUScript.java
144 - property and file name change:
145 IndicMatraCategory -> IndicPositionalCategory
146 - UnicodeData.txt unusual numeric values (improper fractions)
147 109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
148 109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
149 109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
150 109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
151 109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
152 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
153 109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
154 109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
155 109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
156 109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
157 -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
158 which are listed in DerivedNumericValues.txt;
159 keeps storage in data file simple
161 * PropertyValueAliases.txt changes
162 - 10 new Block (blk) values:
164 blk; Anatolian_Hieroglyphs ; Anatolian_Hieroglyphs
165 blk; Cherokee_Sup ; Cherokee_Supplement
166 blk; CJK_Ext_E ; CJK_Unified_Ideographs_Extension_E
167 blk; Early_Dynastic_Cuneiform ; Early_Dynastic_Cuneiform
169 blk; Multani ; Multani
170 blk; Old_Hungarian ; Old_Hungarian
171 blk; Sup_Symbols_And_Pictographs ; Supplemental_Symbols_And_Pictographs
172 blk; Sutton_SignWriting ; Sutton_SignWriting
174 use long property names for enum constants
175 -> add to UCharacter.UnicodeBlock IDs
176 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
177 replace public static final int \1_ID = \2; \3
178 -> add to UCharacter.UnicodeBlock objects
179 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
180 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
181 - 6 new Script (sc) values:
184 sc ; Hluw ; Anatolian_Hieroglyphs
185 sc ; Hung ; Old_Hungarian
187 sc ; Sgnw ; SignWriting
188 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
190 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
191 (not strictly necessary for NOT_ENCODED scripts)
192 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
194 * generate normalization data files
196 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
197 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
198 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
199 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
200 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
202 * build ICU (make install)
203 so that the tools build can pick up the new definitions from the installed header files.
205 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
207 * build Unicode tools using CMake+make
209 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
211 # Location (--prefix) of where ICU was installed.
212 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
213 # Location of the ICU source tree.
214 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
216 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
217 ~/svn.icutools/trunk/dbg/unicode/c$ make
219 * generate core properties data files
220 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
221 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
222 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
223 - rebuild ICU (make install) & tools
224 - run genuca again (see step above) so that it picks up the new nfc.nrm
225 - rebuild ICU (make install) & tools
227 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
228 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
229 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
230 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
231 - nothing new in 8.0, no test file to update
233 * run & fix ICU4C tests
234 - bad Cherokee case folding due to difference in fallbacks:
235 UCD case folding falls back to no mapping,
236 ICU runtime case folding falls back to lowercasing;
237 fixed casepropsbuilder.cpp to generate scf mappings to self
238 when there is an slc mapping but no scf
239 - Andy handles RBBI & spoof check test failures
241 * collation: CLDR collation root, UCA DUCET
243 - UCA DUCET goes into Mark's Unicode tools, see
244 https://sites.google.com/site/unicodetools/home#TOC-UCA
245 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
246 - cd (CLDR UCA branch)/common/uca/
247 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
248 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
249 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
250 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
251 (note removing the underscore before "Rules")
252 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
253 - restore TODO diffs in UCARules.txt
254 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
255 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
256 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
257 from the CLDR root files (..._CLDR_..._SHORT.txt)
258 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
259 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
260 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
261 - if CLDR common/uca/unihan-index.txt changes, then update
262 CLDR common/collation/root.xml <collation type="private-unihan">
263 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
264 - run genuca, see command line above;
266 Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
267 (add the character to genuca.cpp sampleCharsToScripts[])
268 + look up the script for the new sample characters
269 (e.g., in FractionalUCA.txt)
270 + *add* mappings to sampleCharsToScripts[], do not replace them
271 (in case the script sample characters flip-flop)
272 + insert new scripts in DUCET script order, see the top_byte table
273 at the beginning of FractionalUCA.txt
276 * run & fix ICU4C tests, now with new CLDR collation root data
277 - run all tests with the collation test data *_SHORT.txt or the full files
278 (the full ones have comments, useful for debugging)
279 - note on intltest: if collate/UCAConformanceTest fails, then
280 utility/MultithreadTest/TestCollators will fail as well;
281 fix the conformance test before looking into the multi-thread test
282 - fixed bug in CollationWeights::getWeightRanges()
283 exposed by new data and CollationTest::TestRootElements
285 * update Java data files
286 - refresh just the UCD/UCA-related/derived files, just to be safe
287 - see (ICU4C)/source/data/icu4j-readme.txt
289 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
292 Unicode .icu files built to ./out/build/icudt56l
293 echo timestamp > uni-core-data
294 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
295 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
296 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
297 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
298 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
299 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
300 mkdir -p /tmp/icu4j/main/shared/data
301 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
302 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
303 mkdir -p /tmp/icu4j/main/shared/data
304 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
305 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
306 - copy the big-endian Unicode data files to another location,
307 separate from the other data files,
308 and then refresh ICU4J
309 cd ~/svn.icu/trunk/dbg/data/out/icu4j
310 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
311 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
312 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
313 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
314 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
315 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
316 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
317 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
318 jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
320 * When refreshing all of ICU4J data from ICU4C
321 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
322 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
324 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
326 * update CollationFCD.java
327 + copy & paste the initializers of lcccIndex[] etc. from
328 ICU4C/source/i18n/collationfcd.cpp to
329 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
331 * refresh Java test .txt files
332 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
333 cd $ICU_SRC_DIR/source/data/unidata
334 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
335 cd ../../test/testdata
336 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
337 cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
339 * run & fix ICU4J tests
341 *** LayoutEngine script information
343 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
344 because the layout engine was deprecated in ICU 54.
345 Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
346 to write lines that we used to add manually.
348 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
349 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
350 in the working directory.
352 (It also generates ScriptRunData.cpp, which is no longer needed.)
354 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
356 which maps ICU versions to the numbers of script/language constants
357 that were added then.
358 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
360 The generated files have a current copyright date and "@deprecated" statement.
362 * Review changes, fix Java tool if necessary, and copy to ICU4C
363 cd ~/svn.icu4j/trunk/src
364 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
365 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
366 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
369 - send notice to icu-design about new born-@stable API (enum constants etc.)
371 *** merge the Unicode update branches back onto the trunk
372 - do not merge the icudata.jar and testdata.jar,
373 instead rebuild them from merged & tested ICU4C
374 - make sure that changes to Unicode tools & ICU tools are checked in
375 http://www.unicode.org/utility/trac/log/trunk/unicodetools
376 http://bugs.icu-project.org/trac/log/tools/trunk
378 ---------------------------------------------------------------------------- ***
380 Unicode 7.0 update for ICU 54
382 http://www.unicode.org/review/pri271/ -- beta review
383 http://www.unicode.org/reports/uax-proposed-updates.html
384 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
385 http://www.unicode.org/reports/tr44/tr44-13.html
389 - ticket 10821: Unicode 7.0, UCA 7.0
390 - C++ branches/markus/uni70 at r35584 from trunk at r35580
391 - Java branches/markus/uni70 at r35587 from trunk at r35545
395 - ticket 7195: UCA 7.0 CLDR root collation
396 - branches/markus/uni70 at r10062 from trunk at r10061
398 - ticket 6762: script metadata for Unicode 7.0 new scripts
400 *** Unicode version numbers
403 - com.ibm.icu.util.VersionInfo
404 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
406 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
407 so that the makefiles see the new version number.
409 *** data files & enums & parser code
413 - download UCD & IDNA files
414 - make sure that the Unicode data folder passed into preparseucd.py
415 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
416 - only for manual diffs: remove version suffixes from the file names
417 ~/unidata/uni70/20140403$ ../../desuffixucd.py .
418 (see https://sites.google.com/site/unicodetools/inputdata)
419 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
420 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
421 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
422 - Restore TODO diffs in source/data/unidata/UCARules.txt
424 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
425 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
427 - also: from http://unicode.org/Public/security/7.0.0/ download new
428 confusables.txt & confusablesWholeScript.txt
429 and copy to $ICU_ROOT/src/source/data/unidata/
431 * initial preparseucd.py changes
432 - remove new Unicode scripts from the
433 only-in-ISO-15924 list according to the error message:
434 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
435 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
436 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
437 from _scripts_only_in_iso15924
438 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
439 and in com.ibm.icu.dev.test.lang.TestUScript.java
440 - NamesList.txt now has a heading with a non-ASCII character
441 + keep ppucd.txt in platform charset, rather than changing tool/test parsers
442 + escape non-ASCII characters in heading comments
443 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
444 + get the copyright from the first file whose copyright line contains the current year
446 * PropertyValueAliases.txt changes
447 - 32 new Block (blk) values:
448 blk; Bassa_Vah ; Bassa_Vah
449 blk; Caucasian_Albanian ; Caucasian_Albanian
450 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers
451 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended
452 blk; Duployan ; Duployan
453 blk; Elbasan ; Elbasan
454 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended
455 blk; Grantha ; Grantha
457 blk; Khudawadi ; Khudawadi
458 blk; Latin_Ext_E ; Latin_Extended_E
459 blk; Linear_A ; Linear_A
460 blk; Mahajani ; Mahajani
461 blk; Manichaean ; Manichaean
462 blk; Mende_Kikakui ; Mende_Kikakui
465 blk; Myanmar_Ext_B ; Myanmar_Extended_B
466 blk; Nabataean ; Nabataean
467 blk; Old_North_Arabian ; Old_North_Arabian
468 blk; Old_Permic ; Old_Permic
469 blk; Ornamental_Dingbats ; Ornamental_Dingbats
470 blk; Pahawh_Hmong ; Pahawh_Hmong
471 blk; Palmyrene ; Palmyrene
472 blk; Pau_Cin_Hau ; Pau_Cin_Hau
473 blk; Psalter_Pahlavi ; Psalter_Pahlavi
474 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls
475 blk; Siddham ; Siddham
476 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers
477 blk; Sup_Arrows_C ; Supplemental_Arrows_C
478 blk; Tirhuta ; Tirhuta
479 blk; Warang_Citi ; Warang_Citi
481 use long property names for enum constants
482 -> add to UCharacter.UnicodeBlock IDs
483 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
484 replace public static final int \1_ID = \2; \3
485 -> add to UCharacter.UnicodeBlock objects
486 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
487 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
488 - 28 new Joining_Group (jg) values:
489 jg ; Manichaean_Aleph ; Manichaean_Aleph
490 jg ; Manichaean_Ayin ; Manichaean_Ayin
491 jg ; Manichaean_Beth ; Manichaean_Beth
492 jg ; Manichaean_Daleth ; Manichaean_Daleth
493 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh
494 jg ; Manichaean_Five ; Manichaean_Five
495 jg ; Manichaean_Gimel ; Manichaean_Gimel
496 jg ; Manichaean_Heth ; Manichaean_Heth
497 jg ; Manichaean_Hundred ; Manichaean_Hundred
498 jg ; Manichaean_Kaph ; Manichaean_Kaph
499 jg ; Manichaean_Lamedh ; Manichaean_Lamedh
500 jg ; Manichaean_Mem ; Manichaean_Mem
501 jg ; Manichaean_Nun ; Manichaean_Nun
502 jg ; Manichaean_One ; Manichaean_One
503 jg ; Manichaean_Pe ; Manichaean_Pe
504 jg ; Manichaean_Qoph ; Manichaean_Qoph
505 jg ; Manichaean_Resh ; Manichaean_Resh
506 jg ; Manichaean_Sadhe ; Manichaean_Sadhe
507 jg ; Manichaean_Samekh ; Manichaean_Samekh
508 jg ; Manichaean_Taw ; Manichaean_Taw
509 jg ; Manichaean_Ten ; Manichaean_Ten
510 jg ; Manichaean_Teth ; Manichaean_Teth
511 jg ; Manichaean_Thamedh ; Manichaean_Thamedh
512 jg ; Manichaean_Twenty ; Manichaean_Twenty
513 jg ; Manichaean_Waw ; Manichaean_Waw
514 jg ; Manichaean_Yodh ; Manichaean_Yodh
515 jg ; Manichaean_Zayin ; Manichaean_Zayin
516 jg ; Straight_Waw ; Straight_Waw
517 -> uchar.h & UCharacter.JoiningGroup
518 - 23 new Script (sc) values:
519 sc ; Aghb ; Caucasian_Albanian
520 sc ; Bass ; Bassa_Vah
524 sc ; Hmng ; Pahawh_Hmong
528 sc ; Mani ; Manichaean
529 sc ; Mend ; Mende_Kikakui
532 sc ; Narb ; Old_North_Arabian
533 sc ; Nbat ; Nabataean
534 sc ; Palm ; Palmyrene
535 sc ; Pauc ; Pau_Cin_Hau
536 sc ; Perm ; Old_Permic
537 sc ; Phlp ; Psalter_Pahlavi
539 sc ; Sind ; Khudawadi
541 sc ; Wara ; Warang_Citi
542 -> uscript.h (many were added before)
543 comment "Mende Kikakui" for USCRIPT_MENDE
544 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
545 -> com.ibm.icu.lang.UScript
546 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
547 replace public static final int \1 = \2; \3
548 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
557 -> uscript.h (some overlap with additions from Unicode)
558 -> com.ibm.icu.lang.UScript
559 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
560 replace public static final int \1 = \2; \3
561 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
562 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
563 and in com.ibm.icu.dev.test.lang.TestUScript.java
565 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
566 (not strictly necessary for NOT_ENCODED scripts)
567 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
569 * generate normalization data files
571 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
572 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
573 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
574 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
575 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
576 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
577 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
578 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
580 * build ICU (make install)
581 so that the tools build can pick up the new definitions from the installed header files.
583 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
585 * build Unicode tools using CMake+make
587 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
589 # Location (--prefix) of where ICU was installed.
590 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
591 # Location of the ICU source tree.
592 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
594 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
595 ~/svn.icutools/trunk/dbg/unicode/c$ make
598 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
599 + add second array of Joining_Group values for at most 10800..10FFF
600 icutools: unicode/c/genprops/bidipropsbuilder.cpp
601 icu: source/common/ubidi_props.h/.c/_data.h
602 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
604 * generate core properties data files
605 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
606 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
607 - rebuild ICU (make install) & tools
608 - run genuca again (see step above) so that it picks up the new nfc.nrm
609 - rebuild ICU (make install) & tools
611 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
612 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
613 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
614 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
615 - nothing new in 7.0, no test file to update
617 * run & fix ICU4C tests
619 * update Java data files
620 - refresh just the UCD-related files, just to be safe
621 - see (ICU4C)/source/data/icu4j-readme.txt
623 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
626 Unicode .icu files built to ./out/build/icudt53l
627 echo timestamp > uni-core-data
628 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
629 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
630 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
631 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
632 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
633 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
634 mkdir -p /tmp/icu4j/main/shared/data
635 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
636 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
637 mkdir -p /tmp/icu4j/main/shared/data
638 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
639 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
640 - copy the big-endian Unicode data files to another location,
641 separate from the other data files
643 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
644 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
645 cd ~/svn.icu/uni70/dbg/data/out/icu4j
646 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
647 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
648 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
649 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
650 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
651 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
653 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
655 * update CollationFCD.java
656 + copy & paste the initializers of lcccIndex[] etc. from
657 ICU4C/source/i18n/collationfcd.cpp to
658 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
660 * refresh Java test .txt files
661 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
662 cd $ICU_SRC_DIR/source/data/unidata
663 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
664 cd ../../test/testdata
665 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
666 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
670 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
671 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
672 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
673 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
674 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
675 - review data; compare files, use blankweights.sed or similar
676 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
677 - cd ~/svn.unitools/Generated/uca/7.0.0/
678 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
679 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
680 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
681 (note removing the underscore before "Rules")
682 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
683 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
684 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
685 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
686 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
687 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
688 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
689 - run genuca, see command line above
691 - refresh ICU4J collation data:
692 (subset of instructions above for properties data refresh, except copies all coll/*)
694 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
695 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
696 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
697 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
698 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
699 - note on intltest: if collate/UCAConformanceTest fails, then
700 utility/MultithreadTest/TestCollators will fail as well;
701 fix the conformance test before looking into the multi-thread test
702 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
703 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
704 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
706 * When refreshing all of ICU4J data from ICU4C
707 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
708 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
710 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
712 * run & fix ICU4J tests
714 *** LayoutEngine script information
716 (For details see the Unicode 5.2 change log below.)
718 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
719 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
720 in the working directory.
721 (It also generates ScriptRunData.cpp, which is no longer needed.)
723 The generated files have a current copyright date and "@stable" statement.
724 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
725 for "born stable" Unicode API constants, and to stop parsing ICU version numbers
726 which may not contain dots any more.
728 - diff current <icu>/source/layout files vs. generated ones
729 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
730 review and manually merge desired changes;
731 fix gratuitous changes, incorrect @draft/@stable and missing aliases;
732 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
733 - if you just copy the above files, then
734 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
735 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
738 - send notice to icu-design about new born-@stable API (enum constants etc.)
740 *** merge the Unicode update branches back onto the trunk
741 - do not merge the icudata.jar and testdata.jar,
742 instead rebuild them from merged & tested ICU4C
744 ---------------------------------------------------------------------------- ***
748 http://www.unicode.org/review/pri249/ -- beta review
749 http://www.unicode.org/reports/uax-proposed-updates.html
750 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
751 http://www.unicode.org/reports/tr44/tr44-11.html
755 - ticket 10128: update ICU to Unicode 6.3 beta
756 - ticket 10168: update ICU to Unicode 6.3 final
757 - C++ branches/markus/uni63 at r33552 from trunk at r33551
758 - Java branches/markus/uni63 at r33550 from trunk at r33553
760 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
762 *** Unicode version numbers
765 (configure.in & configure: have been modified to extract the version from uchar.h)
766 - com.ibm.icu.util.VersionInfo
767 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
769 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
770 so that the makefiles see the new version number.
772 *** data files & enums & parser code
776 - download UCD, UCA & IDNA files
777 - make sure that the Unicode data folder passed into preparseucd.py
778 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
779 - modify preparseucd.py:
780 parse new file BidiBrackets.txt
781 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
782 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
783 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
784 - Check test file diffs for previously commented-out, known-failing data lines;
785 probably need to keep those commented out.
787 * PropertyAliases.txt changes
788 - 1 new Enumerated Property
789 bpt ; Bidi_Paired_Bracket_Type
790 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
791 -> ubidi_props.h & .c & UBiDiProps.java
792 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
794 -> change ubidi.icu format version from 2.0 to 2.1
795 - 1 new Miscellaneous Property
796 bpb ; Bidi_Paired_Bracket
797 -> uchar.h & UProperty.java
800 * PropertyValueAliases.txt changes
801 - 3 Bidi_Paired_Bracket_Type (bpt) values:
805 -> uchar.h & UCharacter.BidiPairedBracketType
806 -> ubidi_props.h & .c & UBiDiProps.java
807 -> change ubidi.icu format version from 2.0 to 2.1
808 - 4 new Bidi_Class (bc) values:
809 bc ; FSI ; First_Strong_Isolate
810 bc ; LRI ; Left_To_Right_Isolate
811 bc ; RLI ; Right_To_Left_Isolate
812 bc ; PDI ; Pop_Directional_Isolate
813 -> uchar.h & UCharacterEnums.ECharacterDirection
814 -> until the bidi code gets updated,
815 Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
816 - 3 new Word_Break (WB) values:
817 WB ; HL ; Hebrew_Letter
818 WB ; SQ ; Single_Quote
819 WB ; DQ ; Double_Quote
820 -> uchar.h & UCharacter.WordBreak
821 -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
822 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
824 Aghb 239 Caucasian Albanian
827 -> com.ibm.icu.lang.UScript
828 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
829 replace public static final int \1 = \2;\3
830 -> preparseucd.py _scripts_only_in_iso15924
831 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
832 and in com.ibm.icu.dev.test.lang.TestUScript.java
833 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
834 (not strictly necessary for NOT_ENCODED scripts)
836 * generate normalization data files
837 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
838 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
839 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
840 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
841 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
842 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
843 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
845 * build ICU (make install)
846 so that the tools build can pick up the new definitions from the installed header files.
848 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
850 * build Unicode tools using CMake+make
852 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
854 # Location (--prefix) of where ICU was installed.
855 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
856 # Location of the ICU source tree.
857 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
859 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
860 ~/svn.icutools/trunk/dbg/unicode/c$ make
862 * generate core properties data files
863 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
864 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
865 - rebuild ICU (make install) & tools
866 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
867 - rebuild ICU (make install) & tools
869 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
870 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
871 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
872 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
873 - nothing new in 6.3, no test file to update
875 * update Java data files
876 - refresh just the UCD-related files, just to be safe
877 - see (ICU4C)/source/data/icu4j-readme.txt
879 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
882 Unicode .icu files built to ./out/build/icudt52l
883 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
884 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
885 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
886 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
887 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
888 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
889 mkdir -p /tmp/icu4j/main/shared/data
890 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
891 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
892 mkdir -p /tmp/icu4j/main/shared/data
893 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
894 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
895 - copy the big-endian Unicode data files to another location,
896 separate from the other data files
897 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
898 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
899 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
900 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
901 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
902 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
903 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
905 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
907 * refresh Java test .txt files
908 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
910 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
912 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
913 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
914 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
915 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
916 (note removing the underscore before "Rules")
917 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
918 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
919 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
920 - check test file diffs for previously commented-out, known-failing data lines;
921 probably need to keep those commented out
922 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
923 - run genuca, see command line above
925 - refresh ICU4J collation data:
926 (subset of instructions above for properties data refresh, except copies all coll/*)
927 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
928 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
929 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
930 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
931 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
932 - note on intltest: if collate/UCAConformanceTest fails, then
933 utility/MultithreadTest/TestCollators will fail as well;
934 fix the conformance test before looking into the multi-thread test
936 * test ICU, fix test code where necessary
938 * When refreshing all of ICU4J data from ICU4C
939 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
940 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
942 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
944 *** LayoutEngine script information
945 - skipped for Unicode 6.3: no new scripts
947 *** merge the Unicode update branches back onto the trunk
948 - do not merge the icudata.jar and testdata.jar,
949 instead rebuild them from merged & tested ICU4C
951 ---------------------------------------------------------------------------- ***
955 http://www.unicode.org/review/pri230/
956 http://www.unicode.org/versions/beta-6.2.0.html
957 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
958 http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values
959 http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol
960 http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols
961 http://www.unicode.org/reports/tr46/tr46-8.html IDNA
962 http://unicode.org/Public/idna/6.2.0/
966 - ticket 9515: Unicode 6.2: final ICU update
968 - ticket 9514: UCA 6.2: fix UCARules.txt
970 - ticket 9437: update ICU to Unicode 6.2
971 - C++ branches/markus/uni62 at r32050 from trunk at r32041
972 - Java branches/markus/uni62 at r32068 from trunk at r32066
974 *** Unicode version numbers
977 (configure.in & configure: have been modified to extract the version from uchar.h)
978 - com.ibm.icu.util.VersionInfo
979 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
981 *** data files & enums & parser code
985 - download UCD, UCA & IDNA files
986 - make sure that the Unicode data folder passed into preparseucd.py
987 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
988 - modify preparseucd.py: NamesList.txt is now in UTF-8
989 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
990 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
991 - Check test file diffs for previously commented-out, known-failing data lines;
992 probably need to keep those commented out.
994 * PropertyValueAliases.txt changes
995 - 1 new Line_Break (lb) value:
996 lb ; RI ; Regional_Indicator
997 -> uchar.h & UCharacter.LineBreak
998 - 1 new Word_Break (WB) value:
999 WB ; RI ; Regional_Indicator
1000 -> uchar.h & UCharacter.WordBreak
1001 - 1 new Grapheme_Cluster_Break (GCB) value:
1002 GCB; RI ; Regional_Indicator
1003 -> uchar.h & UCharacter.GraphemeClusterBreak
1005 * 3 new numeric values
1006 The new value -1, which was really supposed to be NaN but that would have required
1007 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
1008 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
1009 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
1010 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
1011 The two new values 216000 and 432000 require an addition to the encoding of numeric values.
1012 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
1013 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
1014 -> uprops.h, uchar.c & UCharacterProperty.java
1015 -> cucdtst.c & UCharacterTest.java
1017 * generate normalization data files
1018 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
1019 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
1020 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
1021 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1022 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1023 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1024 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1026 * build ICU (make install)
1027 so that the tools build can pick up the new definitions from the installed header files.
1028 * build Unicode tools using CMake+make
1030 * generate core properties data files
1031 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
1032 - in initial bootstrapping, change the UCA version
1033 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1034 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
1035 - rebuild ICU (make install) & tools
1036 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1037 check if the UCA version in FractionalUCA.txt matches the new Unicode version
1039 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1040 - rebuild ICU (make install) & tools
1042 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1043 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1044 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1045 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
1046 - nothing new in 6.2, no test file to update
1048 * update Java data files
1049 - refresh just the UCD-related files, just to be safe
1050 - see (ICU4C)/source/data/icu4j-readme.txt
1052 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1055 Unicode .icu files built to ./out/build/icudt50l
1056 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1057 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
1058 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1059 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1060 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
1061 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
1062 mkdir -p /tmp/icu4j/main/shared/data
1063 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1064 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
1065 mkdir -p /tmp/icu4j/main/shared/data
1066 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1067 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
1068 - copy the big-endian Unicode data files to another location,
1069 separate from the other data files
1070 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1071 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1072 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1073 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
1074 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1075 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1076 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1078 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1080 * refresh Java test .txt files
1081 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1085 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1086 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1087 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1088 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1089 (note removing the underscore before "Rules")
1090 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1091 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1092 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1093 - check test file diffs for previously commented-out, known-failing data lines;
1094 probably need to keep those commented out
1095 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1096 - run genuca, see command line above
1098 - refresh ICU4J collation data:
1099 (subset of instructions above for properties data refresh, except copies all coll/*)
1100 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1101 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1102 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1103 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1104 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1105 - note on intltest: if collate/UCAConformanceTest fails, then
1106 utility/MultithreadTest/TestCollators will fail as well;
1107 fix the conformance test before looking into the multi-thread test
1109 * test ICU, fix test code where necessary
1111 * When refreshing all of ICU4J data from ICU4C
1112 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1113 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1115 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1117 *** LayoutEngine script information
1118 - skipped for Unicode 6.2: no new scripts
1120 *** merge the Unicode update branches back onto the trunk
1121 - do not merge the icudata.jar and testdata.jar,
1122 instead rebuild them from merged & tested ICU4C
1124 ---------------------------------------------------------------------------- ***
1126 Future Unicode update
1128 Tools simplified since the Unicode 6.1 update. See
1129 - http://site.icu-project.org/design/props/ppucd
1130 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
1132 * Unicode version numbers
1133 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
1136 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
1137 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
1138 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1139 - Check test file diffs for previously commented-out, known-failing data lines;
1140 probably need to keep those commented out.
1142 * PropertyValueAliases.txt changes
1143 - Script codes that are in ISO 15924 but not in Unicode are now listed in
1144 preparseucd.py, in the _scripts_only_in_iso15924 variable.
1145 If there are new ISO codes, then add them.
1146 If Unicode adds some of them, then remove them from the .py variable.
1148 * UnicodeData.txt changes
1149 - No more manual changes for CJK ranges for algorithmic names;
1150 those are now written to ppucd.txt and genprops reads them from there.
1152 * generate core properties data files (makeprops.sh was deleted)
1153 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
1155 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
1156 - it is now generated by preparseucd.py
1158 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
1159 - it is now generated by preparseucd.py
1160 - make sure that the Unicode data folder passed into preparseucd.py
1161 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1162 (can be in some subfolder)
1164 * generate normalization data files
1165 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
1166 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
1167 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
1168 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
1169 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
1170 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1171 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
1173 * build ICU (make install)
1174 * build Unicode tools using CMake+make
1176 * new way to call genuca (makeuca.sh was deleted)
1177 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
1179 ---------------------------------------------------------------------------- ***
1185 - ticket 8995 final update to Unicode 6.1
1186 - ticket 8994 regenerate source/layout/CanonData.cpp
1188 - ticket 8961 support Unicode "Age" value *names*
1189 - ticket 8963 support multiple character name aliases & types
1191 - ticket 8827 "update ICU to Unicode 6.1"
1192 - C++ branches/markus/uni61 at r30864 from trunk at r30843
1193 - Java branches/markus/uni61 at r30865 from trunk at r30863
1195 *** Unicode version numbers
1198 (configure.in & configure: have been modified to extract the version from uchar.h)
1199 - com.ibm.icu.util.VersionInfo
1200 - icutools/unicode/makedefs.sh
1201 + also review & update other definitions in that file,
1202 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
1204 *** data files & enums & parser code
1208 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
1209 - This prepares both unidata and testdata files in respective output subfolders.
1210 - Check test file diffs for previously commented-out, known-failing data lines;
1211 probably need to keep those commented out.
1213 * PropertyValueAliases.txt changes
1214 - 11 new block names:
1216 Arabic_Mathematical_Alphabetic_Symbols
1218 Meetei_Mayek_Extensions
1220 Meroitic_Hieroglyphs
1224 Sundanese_Supplement
1227 -> add to UCharacter.UnicodeBlock IDs
1228 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1229 replace public static final int \1_ID = \2; \3
1230 -> add to UCharacter.UnicodeBlock objects
1231 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1232 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1233 - 1 new Joining_Group (jg) value:
1235 -> uchar.h & UCharacter.JoiningGroup
1236 - 2 new Line_Break (lb) values:
1237 CJ=Conditional_Japanese_Starter
1239 -> uchar.h & UCharacter.LineBreak
1242 sc ; Merc ; Meroitic_Cursive
1243 sc ; Mero ; Meroitic_Hieroglyphs
1246 sc ; Sora ; Sora_Sompeng
1248 -> remove these from SyntheticPropertyValueAliases.txt
1249 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1250 and in com.ibm.icu.dev.test.lang.TestUScript.java
1251 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1255 and another one added 2011-12-09
1256 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
1258 -> com.ibm.icu.lang.UScript
1259 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1260 replace public static final int \1 = \2;\3
1261 -> SyntheticPropertyValueAliases.txt
1262 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1263 and in com.ibm.icu.dev.test.lang.TestUScript.java
1265 * UnicodeData.txt changes
1266 - the last Unihan code point changes from U+9FCB to U+9FCC
1267 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
1268 + do change gennames.c
1269 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1271 * DerivedBidiClass.txt changes
1272 - 2 new default-AL blocks:
1273 # Arabic Extended-A: U+08A0 - U+08FF (was default-R)
1274 # Arabic Mathematical Alphabetic Symbols:
1275 # U+1EE00 - U+1EEFF (was default-R)
1276 - 2 new default-R blocks:
1277 # Meroitic Hieroglyphs:
1279 # Meroitic Cursive: U+109A0 - U+109FF
1280 -> should be picked up by the explicit data in the file
1282 * NameAliases.txt changes
1284 # Each line has two fields
1285 # First field: Code point
1286 # Second field: Alias
1288 # Each line has three fields, as described here:
1290 # First field: Code point
1291 # Second field: Alias
1293 - Also, the file previously allowed multiple aliases but only now does it
1294 actually provide multiple, even multiple of the same type. For example,
1295 FEFF;BYTE ORDER MARK;alternate
1296 FEFF;BOM;abbreviation
1297 FEFF;ZWNBSP;abbreviation
1298 - This breaks our gennames parser, unames.icu data structure, and API.
1299 Fix gennames to only pick up "correction" aliases.
1300 New ticket #8963 for further changes.
1302 * run genpname/preparse.pl (on Linux)
1303 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1304 + make sure that data.h is writable
1305 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1306 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1308 * build ICU (make install)
1309 so that the tools build can pick up the new definitions from the installed header files.
1310 * build Unicode tools (at least genpname) using CMake+make
1313 (builds both pnames.icu and propname_data.h)
1314 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1315 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1317 * build ICU (make install)
1318 * build Unicode tools using CMake+make
1320 * update source/data/unidata/norm2/nfkc_cf.txt
1321 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1323 * update source/data/unidata/norm2/uts46.txt
1324 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1325 to ~/svn.icu/tools/trunk/src/unicode/py
1326 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
1327 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1328 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1330 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1331 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1332 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1333 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
1334 - nothing new in 6.1, no test file to update
1336 * generate core properties data files
1337 - in initial bootstrapping, change the UCA version
1338 in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1339 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1340 - rebuild ICU & tools
1341 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1342 check if the UCA version in FractionalUCA.txt matches the new Unicode version
1344 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
1345 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1346 - rebuild ICU & tools
1348 * update Java data files
1349 - refresh just the UCD-related files, just to be safe
1350 - see (ICU4C)/source/data/icu4j-readme.txt
1352 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1355 Unicode .icu files built to ./out/build/icudt49l
1356 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1357 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
1358 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1359 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1360 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
1361 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
1362 mkdir -p /tmp/icu4j/main/shared/data
1363 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1364 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
1365 mkdir -p /tmp/icu4j/main/shared/data
1366 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1367 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
1368 - copy the big-endian Unicode data files to another location,
1369 separate from the other data files
1370 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1371 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1372 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1373 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
1374 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1375 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1376 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1378 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1380 * refresh Java test .txt files
1381 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1383 * test ICU so far, fix test code where necessary
1384 - temporarily ignore collation issues that look like UCA/UCD mismatches,
1385 until UCA data is updated
1389 - get output from Mark's tools; look in
1390 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
1391 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1392 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1393 (note removing the underscore before "Rules")
1394 - update (ICU)/source/test/testdata/CollationTest_*.txt
1395 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1396 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1397 - check test file diffs for previously commented-out, known-failing data lines;
1398 probably need to keep those commented out
1399 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1401 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1403 - refresh ICU4J collation data:
1404 (subset of instructions above for properties data refresh, except copies all coll/*)
1405 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1406 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1407 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1408 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1409 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1410 - note on intltest: if collate/UCAConformanceTest fails, then
1411 utility/MultithreadTest/TestCollators will fail as well;
1412 fix the conformance test before looking into the multi-thread test
1414 * When refreshing all of ICU4J data from ICU4C
1415 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1416 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1418 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1420 *** LayoutEngine script information
1422 (For details see the Unicode 5.2 change log below.)
1424 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1425 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1426 in the working directory.
1427 (It also generates ScriptRunData.cpp, which is no longer needed.)
1429 The generated files have a current copyright date and "@draft" statement.
1431 - diff current <icu>/source/layout files vs. generated ones
1432 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1433 review and manually merge desired changes;
1434 fix gratuitous changes, incorrect @draft and missing aliases;
1435 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1436 - if you just copy the above files, then
1437 fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1438 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1440 *** merge the Unicode update branches back onto the trunk
1441 - do not merge the icudata.jar and testdata.jar,
1442 instead rebuild them from merged & tested ICU4C
1444 ---------------------------------------------------------------------------- ***
1446 ICU 4.8 (no Unicode update, just new script codes)
1448 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1454 Shrd 319 Sharada, Śāradā
1455 Sora 398 Sora Sompeng
1456 Takr 321 Takri, Ṭākrī, Ṭāṅkrī
1460 -> com.ibm.icu.lang.UScript
1461 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1462 replace public static final int \1 = \2;\3
1463 -> genpname/SyntheticPropertyValueAliases.txt
1464 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1465 and in com.ibm.icu.dev.test.lang.TestUScript.java
1467 * run genpname/preparse.pl (on Linux)
1468 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1469 + make sure that data.h is writable
1470 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1471 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1473 * rebuild Unicode tools (at least genpname) using make
1474 - You might first need to "make install" ICU so that the tools build can pick
1475 up the new definitions from the installed header files.
1478 (builds both pnames.icu and propname_data.h)
1479 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1480 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1481 - rebuild ICU & tools
1484 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1485 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1486 - rebuild ICU & tools
1488 * update Java data files
1489 - refresh just the UCD-related files, just to be safe
1490 - see (ICU4C)/source/data/icu4j-readme.txt
1492 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1493 - copy the big-endian Unicode data files to another location,
1494 separate from the other data files
1495 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1496 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1497 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1499 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
1501 * should have updated the layout engine script codes but forgot
1503 ---------------------------------------------------------------------------- ***
1507 *** related ICU Trac tickets
1509 7264 Unicode 6.0 Update
1511 *** Unicode version numbers
1514 (configure.in & configure: have been modified to extract the version from uchar.h)
1515 - com.ibm.icu.util.VersionInfo
1517 *** data files & enums & parser code
1521 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
1522 - This now prepares both unidata and testdata files in respective output subfolders.
1524 * PropertyAliases.txt changes
1525 - new Script_Extensions property defined in the new ScriptExtensions.txt file
1526 but not listed in PropertyAliases.txt; reported to unicode.org;
1527 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
1528 scx; Script_Extensions
1529 -> uchar.h with new UProperty section
1530 -> com.ibm.icu.lang.UProperty, parallel with uchar.h
1532 * PropertyValueAliases.txt changes
1533 - 12 new block names:
1538 CJK_Unified_Ideographs_Extension_D
1543 Miscellaneous_Symbols_And_Pictographs
1545 Transport_And_Map_Symbols
1547 -> add to UCharacter.UnicodeBlock
1548 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1549 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1550 - Joining_Group (jg) values:
1551 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
1552 -> uchar.h & UCharacter.JoiningGroup
1557 -> remove these from SyntheticPropertyValueAliases.txt
1558 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
1559 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1560 and in com.ibm.icu.dev.test.lang.TestUScript.java
1561 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1562 (added 2009-11-11..2010-07-18)
1564 Dupl 755 Duployan shortand
1570 Merc 101 Meroitic Cursive
1571 Narb 106 Old North Arabian
1575 Wara 262 Warang Citi
1577 -> com.ibm.icu.lang.UScript
1578 find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1579 replace public static final int \1 = \2;\3
1580 -> SyntheticPropertyValueAliases.txt
1581 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1582 and in com.ibm.icu.dev.test.lang.TestUScript.java
1583 - ISO 15924 name change
1584 Mero 100 Meroitic Hieroglyphs (was Meroitic)
1585 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
1586 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
1588 * UnicodeData.txt changes
1590 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
1591 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
1592 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
1594 * build Unicode tools using CMake+make
1596 * run genpname/preparse.pl (on Linux)
1597 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1598 + make sure that data.h is writable
1599 + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1600 + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1602 * rebuild Unicode tools (at least genpname) using make
1603 - You might first need to "make install" ICU so that the tools build can pick
1604 up the new definitions from the installed header files.
1607 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1608 - rebuild ICU & tools
1610 * update source/data/unidata/norm2/nfkc_cf.txt
1611 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1613 * update source/data/unidata/norm2/uts46.txt
1614 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
1615 to ~/svn.icu/tools/trunk/src/unicode/py
1616 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
1617 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1618 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1620 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1621 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1622 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1623 - Unicode 6.0: U+2260, U+226E, U+226F
1625 * generate core properties data files
1626 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1627 - rebuild ICU & tools
1628 - run makeuca.sh so that genuca picks up the new nfc.nrm:
1629 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1630 - rebuild ICU & tools
1632 * implement new Script_Extensions property (provisional)
1633 - parser & generator: genprops & uprops.icu
1634 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
1635 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
1637 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
1639 - genbidi/gencase/genprops tools changes
1640 - re-run makeprops.sh (see above)
1641 - UCharacterProperty.java, UCharacterTypeIterator.java,
1642 UBiDiProps.java, UCaseProps.java, and several others with minor changes;
1643 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
1645 * update Java data files
1646 - refresh just the UCD-related files, just to be safe
1647 - see (ICU4C)/source/data/icu4j-readme.txt
1649 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1652 Unicode .icu files built to ./out/build/icudt45l
1653 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1654 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1655 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1656 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
1657 mkdir -p /tmp/icu4j/main/shared/data
1658 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1659 - copy the big-endian Unicode data files to another location,
1660 separate from the other data files
1661 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1662 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1663 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1664 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
1665 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1666 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1667 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1669 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1671 * refresh Java test .txt files
1672 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1674 * un-hardcode normalization skippable (NF*_Inert) test data
1675 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
1677 * copy updated break iterator test files
1678 - now handled by early ucdcopy.py and
1679 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
1681 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
1682 to ~/svn.icu/trunk/src/source/test/testdata)
1683 - they are not used in ICU4J
1687 - get output from Mark's tools; look in
1688 http://www.unicode.org/~book/incoming/mark/uca6.0.0/
1689 http://www.macchiato.com/unicode/utc/additional-uca-files
1690 http://www.unicode.org/Public/UCA/6.0.0/
1691 http://www.unicode.org/~mdavis/uca/
1692 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1693 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1694 - update Han-implicit ranges for new CJK extensions:
1695 swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1696 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
1697 do not add it into invuca so that tailoring primary-after an ignorable works
1698 - genuca: permit space between [variable top] bytes
1699 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
1701 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1703 - refresh ICU4J collation data:
1704 (subset of instructions above for properties data refresh, except copies all coll/*)
1705 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1706 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1707 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1708 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1709 - update (ICU)/source/test/testdata/CollationTest_*.txt
1710 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1711 with output from Mark's Unicode tools
1712 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1713 - note on intltest: if collate/UCAConformanceTest fails, then
1714 utility/MultithreadTest/TestCollators will fail as well;
1715 fix the conformance test before looking into the multi-thread test
1717 * When refreshing all of ICU4J data from ICU4C
1718 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1719 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1721 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1723 *** LayoutEngine script information
1725 (For details see the Unicode 5.2 change log below.)
1727 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1728 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1729 ScriptRunData.cpp, which is no longer needed.)
1731 The generated files have a current copyright date and "@draft" statement.
1733 * copy the above files into <icu>/source/layout, replacing the old files.
1734 * fix mixed line endings
1735 * review the diffs and fix incorrect @draft and missing aliases;
1736 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1737 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1739 ---------------------------------------------------------------------------- ***
1743 *** related ICU Trac tickets
1747 7167 verify collation bytes
1748 7235 Java test NAME_ALIAS
1749 7236 Java DerivedCoreProperties.txt test
1750 7237 Java BidiTest.txt
1751 7238 UTrie2 in core unidata
1752 7239 test for tailoring gaps
1753 7240 Java fix CollationMiscTest
1754 7243 update layout engine for Unicode 5.2
1756 *** Unicode version numbers
1759 - configure.in & configure
1760 - update ucdVersion in gennames.c if an algorithmic range changes
1762 *** data files & enums & parser code
1766 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
1767 - includes finding files regardless of version numbers,
1768 copying them, and performing the equivalent processing of the
1769 ucdstrip and ucdmerge tools on the desired set of files
1772 - PropertyAliases.txt
1773 moved from numeric to enumerated:
1774 ccc ; Canonical_Combining_Class
1775 new string properties:
1776 NFKC_CF ; NFKC_Casefold
1777 Name_Alias; Name_Alias
1778 new binary properties:
1781 CWCF ; Changes_When_Casefolded
1782 CWCM ; Changes_When_Casemapped
1783 CWKCF ; Changes_When_NFKC_Casefolded
1784 CWL ; Changes_When_Lowercased
1785 CWT ; Changes_When_Titlecased
1786 CWU ; Changes_When_Uppercased
1787 new CJK Unihan properties (not supported by ICU)
1788 - PropertyValueAliases.txt
1791 one script code change:
1792 sc ; Qaai ; Inherited
1794 sc ; Zinh ; Inherited ; Qaai
1795 new Line_Break (lb) value:
1796 lb ; CP ; Close_Parenthesis
1797 new Joining_Group (jg) values: Farsi_Yeh, Nya
1799 ccc; 214; ATA ; Attached_Above
1800 - DerivedBidiClass.txt
1801 new default-R range: U+1E800 - U+1EFFF
1803 all of the ISO comments are gone
1805 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
1807 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
1808 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
1812 + cd \svn\icuproj\icu\trunk\source\tools\genpname
1813 + make sure that data.h is writable
1814 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
1815 + preparse.pl complains with errors like the following:
1816 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
1817 This is because ICU 4.0 had scripts from ISO 15924 which are now
1818 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
1819 and PropertyValueAliases.txt.
1820 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
1821 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
1822 + preparse.pl complains with errors about block names missing from uchar.h; add them
1824 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
1825 - new block & script values
1827 copy new blocks from Blocks.txt
1828 MS VC++ 2008 regular expression:
1829 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
1830 replace with " UBLOCK_\3 = 172, /*[\1]*/"
1831 + several new script values already added in ICU 4.0 for ISO 15924 coverage
1832 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
1833 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
1834 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
1835 (added to SyntheticPropertyValueAliases.txt)
1836 - new Joining Group (JG) values: Farsi_Yeh, Nya
1837 - new Line_Break (lb) value:
1838 lb ; CP ; Close_Parenthesis
1840 * hardcoded Unihan range end/limit
1841 - Unihan range end moves from 9FC3 to 9FCB
1842 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
1843 + do change gennames.c
1845 * Compare definitions of new binary properties with what we used to use
1846 in algorithms, to see if the definitions changed.
1847 - Verified that definitions for Cased and Case_Ignorable are unchanged.
1848 The gencase tool now parses the newly public Case_Ignorable values
1849 in case the definition changes in the future.
1851 * uchar.c & uprops.h & uprops.c & genprops
1852 - new numeric values that didn't exist in Unicode data before:
1853 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
1854 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
1855 therefore redesign the encoding of numeric types and values for formatVersion 6;
1856 design for simple numbers up to at least 144 ("one gross"),
1857 large values up to at least 10^20,
1858 and fractions with numerators -1..17 and denominators 1..16
1859 to cover current and expected future values
1860 (e.g., more Han numeric values, Meroitic twelfths)
1862 * reimplement Hangul_Syllable_Type for new Jamo characters
1863 - the old code assumed that all Jamo characters are in the 11xx block
1864 - Unicode 5.2 fills holes there and adds new Jamo characters in
1865 A960..A97F; Hangul Jamo Extended-A
1867 D7B0..D7FF; Hangul Jamo Extended-B
1868 - Hangul_Syllable_Type can be trivially derived from a subset of
1869 Grapheme_Cluster_Break values
1871 * build Unicode data source code for hardcoding core data
1872 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
1874 ICU data make path is \svn\icuproj\icu\trunk\source\data\
1875 ICU root path is \svn\icuproj\icu\trunk
1876 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1877 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
1878 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
1879 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
1880 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
1881 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
1882 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
1883 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
1884 Creating data file for Unicode Property Names
1885 Creating data file for Unicode Character Properties
1886 Creating data file for Unicode Case Mapping Properties
1887 Creating data file for Unicode BiDi/Shaping Properties
1888 Creating data file for Unicode Normalization
1889 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
1890 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
1892 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
1893 and rebuild the common library
1897 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
1898 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
1899 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
1900 [ Begin obsolete instructions:
1901 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
1902 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
1904 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
1905 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
1906 End obsolete instructions]
1907 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1908 not just the *_STUB.txt files
1909 - note on intltest: if collate/UCAConformanceTest fails, then
1910 utility/MultithreadTest/TestCollators will fail as well;
1911 fix the conformance test before looking into the multi-thread test
1913 *** Implement Cased & Case_Ignorable properties
1914 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
1915 - Problem: These properties should be disjoint, but aren't
1916 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
1917 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
1919 *** Implement Changes_When_Xyz properties
1920 - without stored data
1922 *** Implement Name_Alias property
1923 - add it as another name field in unames.icu
1924 - make it available via u_charName() and UCharNameChoice and
1925 - consider it in u_charFromName()
1929 * Update break iterator rules to new UAX versions and new property values
1930 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
1932 *** new BidiTest file
1933 - review format and data
1934 - copy BidiTest.txt to source/test/testdata
1935 - write test code using this data
1936 - fix ICU code where it fails the conformance test
1939 - generally, find and update code corresponding to C/C++
1940 - UCharacter.UnicodeBlock constants:
1941 a) add an _ID integer per new block, update COUNT
1942 b) add a class instance per new block
1943 Visual Studio regex:
1944 find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
1945 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1946 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
1948 - port test changes to Java
1950 *** LayoutEngine script information
1952 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
1954 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1955 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1956 ScriptRunData.cpp, which is no longer needed.)
1958 The generated files have a current copyright date and "@draft" statement.
1960 -> Eric Mader wrote in email on 20090930:
1961 "I think the tool has been modified to update @draft to @stable for
1962 older scripts and to add @draft for new scripts.
1963 (I worked with an intern on this last year.)
1964 You should check the output after you run it."
1966 * copy the above files into <icu>/source/layout, replacing the old files.
1967 * fix mixed line endings
1968 * review the diffs and fix incorrect @draft and missing aliases
1969 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1971 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1972 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1974 -> Eric Mader wrote in email on 20090930:
1975 "This is just a matter of making sure that all the per-script tables have
1976 entries for any new scripts that were added.
1977 If any new Indic characters were added, then the class tables in
1978 IndicClassTables.cpp should be updated to reflect this.
1979 John Emmons should know how to do this if it's required."
1981 * rebuild the layout and layoutex libraries.
1985 + Jamo_Short_Name, sfc->scf, binary property value aliases
1987 ---------------------------------------------------------------------------- ***
1991 *** related ICU Trac tickets
1993 5696 Update to Unicode 5.1
1995 *** Unicode version numbers
1998 - configure.in & configure
1999 - update ucdVersion in gennames.c if an algorithmic range changes
2001 *** data files & enums & parser code
2005 DerivedCoreProperties.txt
2006 DerivedNormalizationProps.txt
2007 NormalizationTest.txt
2010 GraphemeBreakProperty.txt
2011 SentenceBreakProperty.txt
2012 WordBreakProperty.txt
2013 - ucdstrip and ucdmerge:
2017 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2018 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
2019 copy 5.1.0\ucd\Blocks.txt ..\unidata\
2020 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
2021 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
2022 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2023 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2024 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2025 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2026 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
2027 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
2028 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
2029 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
2030 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
2032 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2033 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2034 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2035 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
2036 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2037 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2038 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2039 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2040 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2041 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2045 + cd \svn\icuproj\icu\uni51\source\tools\genpname
2046 + make sure that data.h is writable
2047 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
2048 + preparse.pl complains with errors like the following:
2049 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
2050 This is because ICU 3.8 had scripts from ISO 15924 which are now
2051 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
2052 and PropertyValueAliases.txt.
2053 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2054 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
2055 + PropertyValueAliases.txt now explicitly contains values for boolean properties:
2056 N/Y, No/Yes, F/T, False/True
2057 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
2058 It will use further values from the file if present.
2060 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2061 - new block & script values
2063 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
2064 (removed from SyntheticPropertyValueAliases.txt)
2065 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
2066 (added to SyntheticPropertyValueAliases.txt)
2067 - uprops.icu (uprops.h) only provides 7 bits for script codes.
2068 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
2069 There is none above 127 yet which is the script code for an
2070 assigned Unicode character, so ICU 4.0 uprops.icu does not store any
2071 script code values greater than 127.
2072 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
2073 in a parallel bit field, and that overflows now.
2074 Also, future values >=128 would be incompatible anyway.
2075 uprops.h is modified to move around several of the bit fields
2076 in the properties vector words, and now uses 8 bits for the script code.
2077 Two other bit fields also grow to accommodate future growth:
2078 Block (current count: 172) grows from 8 to 9 bits,
2079 and Word_Break grows from 4 to 5 bits.
2080 - renamed property Simple_Case_Folding (sfc->scf)
2081 + nothing to be done: handled as normal alias
2082 - new property JSN Jamo_Short_Name
2083 + no new API: only contributes to the Name property
2084 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
2085 - new Joining Group (JG) value: Burushashki_Yeh_Barree
2086 - new Sentence_Break (SB) values:
2091 - new Word_Break (WB) values:
2093 WB ; Extend ; Extend
2097 * Further changes in the 2008-02-29 update:
2098 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
2099 because they should not normally be invisible.
2100 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
2101 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
2102 - new Word_Break (WB) value: NL=Newline
2104 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
2105 - Unihan range end moves from 9FBB to 9FC3
2106 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
2107 + do change gennames.c
2109 * build Unicode data source code for hardcoding core data
2110 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
2112 ICU data make path is \svn\icuproj\icu\uni51\source\data\
2113 ICU root path is \svn\icuproj\icu\uni51
2114 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2115 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2116 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2117 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2118 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2119 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2120 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2121 Creating data file for Unicode Character Properties
2122 Creating data file for Unicode Case Mapping Properties
2123 Creating data file for Unicode BiDi/Shaping Properties
2124 Creating data file for Unicode Normalization
2125 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
2126 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
2128 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
2129 and rebuild the common library
2133 * Update break iterator rules to new UAX versions and new property values
2137 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2140 - Test that APIs using Unicode property value aliases (like UnicodeSet)
2141 support all of the boolean values N/Y, No/Yes, F/T, False/True
2142 -> TestBinaryValues() tests in both cintltst and intltest
2144 *** LayoutEngine script information
2145 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2146 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2147 ScriptRunData.cpp, which is no longer needed.)
2149 The generated files have a current copyright date and "@draft" statement.
2151 * copy the above files into <icu>/source/layout, replacing the old files.
2153 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2154 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2156 * rebuild the layout and layoutex libraries.
2160 + Jamo_Short_Name, sfc->scf, binary property value aliases
2162 ---------------------------------------------------------------------------- ***
2166 *** related Jitterbugs
2168 5084 RFE: Update to Unicode 5.0
2170 *** data files & enums & parser code
2174 DerivedCoreProperties.txt
2175 DerivedNormalizationProps.txt
2176 NormalizationTest.txt
2179 GraphemeBreakProperty.txt
2180 SentenceBreakProperty.txt
2181 WordBreakProperty.txt
2182 - ucdstrip and ucdmerge:
2186 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2187 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
2188 copy 5.0.0\ucd\Blocks.txt ..\unidata\
2189 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
2190 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
2191 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2192 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2193 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2194 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2195 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
2196 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
2197 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
2198 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
2199 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
2201 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2202 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2203 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2204 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
2205 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2206 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2207 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2208 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2209 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2210 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2212 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2216 + make sure that data.h is writable
2217 + perl preparse.pl \cvs\oss\icu > out.txt
2219 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2220 - new block & script values
2221 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
2223 * build Unicode data source code for hardcoding core data
2224 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
2226 ICU data make path is \cvs\oss\icu\source\data\
2227 ICU root path is \cvs\oss\icu
2228 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2230 Creating data file for Unicode Character Properties
2231 Creating data file for Unicode Case Mapping Properties
2232 Creating data file for Unicode BiDi/Shaping Properties
2233 Creating data file for Unicode Normalization
2234 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
2235 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
2237 - copy the .c source files to C:\cvs\oss\icu\source\common
2238 and rebuild the common library
2240 *** Unicode version numbers
2245 *** LayoutEngine script information
2246 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2247 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2248 ScriptRunData.cpp, which is no longer needed.)
2250 The generated files have a current copyright date and "@draft" statement.
2252 * copy the above files into <icu>/source/layout, replacing the old files.
2254 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2255 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2257 * rebuild the layout and layoutex libraries.
2259 ---------------------------------------------------------------------------- ***
2263 *** related Jitterbugs
2265 4332 RFE: Update to Unicode 4.1
2266 4157 RBBI, TR29 4.1 updates
2268 *** data files & enums & parser code
2272 DerivedCoreProperties.txt
2273 DerivedNormalizationProps.txt
2274 NormalizationTest.txt
2275 GraphemeBreakProperty.txt
2276 SentenceBreakProperty.txt
2277 WordBreakProperty.txt
2278 - ucdstrip and ucdmerge:
2282 * add new files to the repository
2283 GraphemeBreakProperty.txt
2284 SentenceBreakProperty.txt
2285 WordBreakProperty.txt
2287 * update FractionalUCA.txt and UCARules.txt with new canonical closure
2290 - handle new enumerated properties in sub read_uchar
2293 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2294 - new binary properties
2296 + Pattern_White_Space
2297 - new enumerated properties
2298 + Grapheme_Cluster_Break
2301 - new block & script & line break values
2304 - case-ignorable changes
2305 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2306 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
2308 *** Unicode version numbers
2314 - verify that u_charMirror() round-trips
2315 - test all new properties and some new values of old properties
2319 * hardcoded Unihan range end/limit
2320 - Unihan range end moves from 9FA5 to 9FBB
2321 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
2322 + do not modify BOCU/BOCSU code because that would change the encoding
2323 and break binary compatibility!
2324 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
2326 + ignore trietest.c: test data is arbitrary
2327 + ignore tstnorm.cpp: test optimization, not important
2328 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
2329 + do change line_th.txt and word_th.txt
2330 by replacing hardcoded ranges with the new property values
2331 + do change gennames.c
2333 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2334 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2335 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
2338 - compare new special casing context conditions with previous ones
2339 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2342 - consider storing only the short name if it is the same as the long name
2345 - UAX #29 changes (grapheme/word/sentence breaks)
2346 - UAX #14 changes (line breaks)
2347 - Pattern_Syntax & Pattern_White_Space
2349 ---------------------------------------------------------------------------- ***
2351 Unicode 4.0.1 update
2353 *** related Jitterbugs
2355 3170 RFE: Update to Unicode 4.0.1
2356 3171 Add new Unicode 4.0.1 properties
2357 3520 use Unicode 4.0.1 updates for break iteration
2359 *** data files & enums & parser code
2362 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
2363 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
2366 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
2367 according to PRI #26
2368 http://www.unicode.org/review/resolved-pri.html#pri26
2369 - undone again because no corrigendum in sight;
2370 instead modified tests to not check consistency on this for Unicode 4.0.1
2373 - update from http://www.unicode.org/copyright.html
2374 formatted for plain text
2376 * uchar.h & uprops.h & uprops.c & genprops
2377 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
2378 - add U_LB_INSEPARABLE due to a spelling fix
2379 + put short name comment only on line with new constant
2380 for genpname perl script parser
2381 - new binary properties
2383 + Variation_Selector
2386 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
2387 - perl script: correctly calculate the maximum number of fields per row
2390 - new script code Hrkt=Katakana_Or_Hiragana
2392 * gennorm.c track changes in DerivedNormalizationProps.txt
2393 - "FNC" -> "FC_NFKC"
2394 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
2396 * genprops/props2.c track changes in DerivedNumericValues.txt
2397 - changed from 3 columns to 2, dropping the numeric type
2398 + assume that the type is always numeric for Han characters,
2399 and that only those are added in addition to what UnicodeData.txt lists
2401 *** Unicode version numbers
2407 - update test of default bidi classes according to PRI #28
2408 /tsutil/cucdtst/TestUnicodeData
2409 http://www.unicode.org/review/resolved-pri.html#pri28
2410 - bidi tests: change exemplar character for ES depending on Unicode version
2411 - change hardcoded expected property values where they change
2419 - use new Hrkt=Katakana_Or_Hiragana
2422 - are now part of combining character sequences
2423 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ