-* Copyright (C) 2004-2016, International Business Machines
+* Copyright (C) 2016 and later: Unicode, Inc. and others.
+* License & terms of use: http://www.unicode.org/copyright.html
+* Copyright (C) 2004-2016, International Business Machines
* Corporation and others. All Rights Reserved.
*
* file name: changes.txt
* created by: Markus W. Scherer
*
* change log for Unicode updates
+*
+* For each new Unicode version, during the beta period,
+* I copy the change log for the previous version to the top of this file.
+* I adjust the versions, tickets, URLs, and paths.
+* I work my way through the steps listed in the log, top to bottom,
+* adjusting the log as necessary.
+* I report problems to the UTC and/or CLDR and/or ICU.
+* Before the data is final, I "turn the crank" several more times,
+* using appropriate subsets of the steps.
---------------------------------------------------------------------------- ***
* New ISO 15924 script codes
-Starting with ICU 55, we do not add UScriptCode constants any more until their scripts
-are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.
+Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
+until they are encoded in Unicode,
+or can be assumed to be encoded in the next Unicode version.
Script enum constant names want to follow the Unicode script property value aliases,
which are assigned only when the scripts are encoded.
When we encode scripts early and guess wrong, then we have confusing enum constants
and have sometimes added aliases.
-Exception: Script codes like Latf and Aran that are not subject to separate encoding
+Variant script codes like Latf and Aran that are not subject to separate encoding
can be added at any time.
+(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
+
+We add script codes used in CLDR or in the spoof checker.
+This includes combination/alias codes like Hanb and Jamo.
+See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
+and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
+
+We add special Z* script codes like Zsye.
+
+For new script codes see http://www.unicode.org/iso15924/codechanges.html
+
+---------------------------------------------------------------------------- ***
+
+Unicode 11.0 update for ICU 62
+
+http://www.unicode.org/versions/Unicode11.0.0/
+http://unicode.org/versions/beta-11.0.0.html
+https://www.unicode.org/review/pri372/
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/reports/tr44/tr44-21.html
+
+* Command-line environment setup
+
+UNICODE_DATA=~/unidata/uni11/20180521
+CLDR_SRC=~/svn.cldr/uni
+ICU_ROOT=~/svn.icu/uni
+ICU_SRC=$ICU_ROOT/src
+ICUDT=icudt61b
+ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
+ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
+
+*** ICU Trac
+
+- ticket:13630: Unicode 11
+- ^/branches/markus/uni11
+
+*** CLDR Trac
+
+- cldrbug 10978: Unicode 11
+- ^/branches/markus/uni11
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+ so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* download files
+- mkdir -p $UNICODE_DATA
+- download Unicode files into $UNICODE_DATA
+ + subfolders: emoji, idna, security, ucd, uca
+ + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+
+* for manual diffs and for Unicode Tools input data updates:
+ remove version suffixes from the file names
+ ~$ unidata/desuffixucd.py $UNICODE_DATA
+ (see https://sites.google.com/site/unicodetools/inputdata)
+
+* process and/or copy files
+- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
+ + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+ + For debugging, and tweaking how ppucd.txt is written,
+ the tool has an --only_ppucd option:
+ py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
+
+- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
+
+* build ICU (make install)
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* preparseucd.py changes
+- fix other errors
+ NameError: unknown property Extended_Pictographic
+ -> add Extended_Pictographic binary property
+ -> add new short names for all Emoji properties
+
+* new constants for new property values
+- preparseucd.py error:
+ ValueError: missing uchar.h enum constants for some property values:
+ [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
+ u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
+ u'Indic_Siyaq_Numbers'])),
+ (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
+ (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
+ (u'GCB', set([u'LinkC', u'Virama'])),
+ (u'WB', set([u'WSegSpace']))]
+ = PropertyValueAliases.txt new property values (diff old & new .txt files)
+ blk; Chess_Symbols ; Chess_Symbols
+ blk; Dogra ; Dogra
+ blk; Georgian_Ext ; Georgian_Extended
+ blk; Gunjala_Gondi ; Gunjala_Gondi
+ blk; Hanifi_Rohingya ; Hanifi_Rohingya
+ blk; Indic_Siyaq_Numbers ; Indic_Siyaq_Numbers
+ blk; Makasar ; Makasar
+ blk; Mayan_Numerals ; Mayan_Numerals
+ blk; Medefaidrin ; Medefaidrin
+ blk; Old_Sogdian ; Old_Sogdian
+ blk; Sogdian ; Sogdian
+ -> add to uchar.h
+ use long property names for enum constants,
+ for the trailing comment get the block start code point: diff old & new Blocks.txt
+ -> add to UCharacter.UnicodeBlock IDs
+ Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+ replace public static final int \1_ID = \2; \3
+ -> add to UCharacter.UnicodeBlock objects
+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+
+ GCB; LinkC ; LinkingConsonant
+ GCB; Virama ; Virama
+ -> uchar.h & UCharacter.GraphemeClusterBreak
+ -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
+
+ InSC; Consonant_Initial_Postfixed ; Consonant_Initial_Postfixed
+ -> ignore: ICU does not yet support this property
+
+ jg ; Hanifi_Rohingya_Kinna_Ya ; Hanifi_Rohingya_Kinna_Ya
+ jg ; Hanifi_Rohingya_Pa ; Hanifi_Rohingya_Pa
+ -> uchar.h & UCharacter.JoiningGroup
+
+ sc ; Dogr ; Dogra
+ sc ; Gong ; Gunjala_Gondi
+ sc ; Maka ; Makasar
+ sc ; Medf ; Medefaidrin
+ sc ; Rohg ; Hanifi_Rohingya
+ sc ; Sogd ; Sogdian
+ sc ; Sogo ; Old_Sogdian
+ -> uscript.h & com.ibm.icu.lang.UScript
+ -> Nushu had been added already
+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+ and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+ WB ; WSegSpace ; WSegSpace
+ -> uchar.h & UCharacter.WordBreak
+
+* New short names for emoji properties
+- see UTS #51
+- short names set in preparseucd.py
+
+* New properties
+- boolean emoji property Extended_Pictographic
+ -> added in preparseucd.py
+ -> uchar.h & UProperty.java
+- misc. property Equivalent_Unified_Ideograph (EqUIdeo)
+ as shown in PropertyValueAliases.txt
+ -> ignore for now
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+ (not strictly necessary for NOT_ENCODED scripts)
+ $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
+
+* update spoof checker UnicodeSet initializers:
+ inclusionPat & recommendedPat in uspoof.cpp
+ INCLUSION & RECOMMENDED in SpoofChecker.java
+- make sure that the Unicode Tools tree contains the latest security data files
+- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
+- update the hardcoded version number there in the DIRECTORY path
+- run the tool (no special environment variables needed)
+- copy & paste from the Console output into the .cpp & .java files
+
+* generate normalization data files
+ cd $ICU_ROOT/dbg/icu4c
+ bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* build Unicode tools using CMake+make
+
+$ICU_SRC/tools/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
+# Location of the ICU4C source tree.
+set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
+
+ $ICU_ROOT/dbg$
+ mkdir -p tools/unicode/c
+ cd tools/unicode/c
+
+ $ICU_ROOT/dbg/tools/unicode/c$
+ cmake ../../../../src/tools/unicode/c
+ make
+
+* generate core properties data files
+ $ICU_ROOT/dbg/tools/unicode/c$
+ genprops/genprops $ICU_SRC/icu4c
+ genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
+ genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
+- rebuild ICU (make install) & tools
+
+* Fix case props
+ genprops error: casepropsbuilder: too many exceptions words
+ genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
+- With the addition of Georgian Mtavruli capital letters,
+ there are now too many simple case mappings with big mapping deltas
+ that yield uncompressible exceptions.
+- Changing the data structure (now formatVersion 4),
+ adding one bit for no-simple-case-folding (for Cherokee), and
+ one optional slot for a big delta (for most faraway mappings),
+ together with another bit for whether that is negative.
+ This makes most Cherokee & Georgian etc. case mappings compressible,
+ reducing the number of exceptions words.
+- Further changes to gain one more bit for the exceptions index,
+ for future growth. Details see casepropsbuilder.cpp.
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..11.0: U+2260, U+226E, U+226F
+- nothing new in this Unicode version, no test file to update
+
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+- Errors in char.txt, word.txt, word_POSIX.txt like
+ createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET" at line 46, column 16
+ because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
+ -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
+ not empty, just to get ICU building.
+ -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
+ and properties together with the rules that used them (GB 10, WB 14).
+ -> Andy adjusts the rule sets further to sync with
+ Unicode 11 grapheme, word, and line break spec changes.
+
+* collation: CLDR collation root, UCA DUCET
+
+- UCA DUCET goes into Mark's Unicode tools, see
+ https://sites.google.com/site/unicodetools/home#TOC-UCA
+ diff the main mapping file, look for bad changes
+ (for example, more bytes per weight for common characters)
+ ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
+ ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
+
+- CLDR root data files are checked into $CLDR_SRC/common/uca/
+ cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
+
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+ cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+ cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
+ (note removing the underscore before "Rules")
+ cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
+- restore TODO diffs in UCARules.txt
+ meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+ from the CLDR root files (..._CLDR_..._SHORT.txt)
+ cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+ cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+ cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+ CLDR common/collation/root.xml <collation type="private-unihan">
+ and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
+
+- run genuca, see command line above;
+ deal with
+ Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
+ FDD1 1180B; [71 CC 02, 05, 05] # Dogra first primary (compressible)
+ (add the character to genuca.cpp sampleCharsToScripts[])
+ + look up the USCRIPT_ code for the new sample characters
+ (should be obvious from the comment in the error output)
+ + *add* mappings to sampleCharsToScripts[], do not replace them
+ (in case the script sample characters flip-flop)
+ + insert new scripts in DUCET script order, see the top_byte table
+ at the beginning of FractionalUCA.txt
+- rebuild ICU4C
+
+* Unihan collators
+ https://sites.google.com/site/unicodetools/unihan
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollators
+ with VM arguments
+ -ea
+ -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
+ -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
+ -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
+ -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
+ -DUVERSION=11.0.0
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollatorFiles
+ with the same arguments
+- check CLDR diffs
+ cd $CLDR_SRC
+ meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
+ meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
+- copy to CLDR
+ cd $CLDR_SRC
+ cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
+ cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
+- run CLDR unit tests, commit to CLDR
+- generate ICU zh collation data: run CLDR
+ org.unicode.cldr.icu.NewLdml2IcuConverter
+ with program arguments
+ -t collation
+ -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
+ -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
+ -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
+ -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
+ zh
+ and VM arguments
+ -ea
+ -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+ (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+ utility/MultithreadTest/TestCollators will fail as well;
+ fix the conformance test before looking into the multi-thread test
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+ output:
+ ...
+ Unicode .icu files built to ./out/build/icudt61l
+ echo timestamp > uni-core-data
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
+ mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
+ echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
+ mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+ jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+ make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
+- copy the big-endian Unicode data files to another location,
+ separate from the other data files,
+ and then refresh ICU4J
+ cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+ cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
+or
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
+
+* update CollationFCD.java
+ + copy & paste the initializers of lcccIndex[] etc. from
+ ICU4C/source/i18n/collationfcd.cpp to
+ ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd $ICU_SRC/icu4c/source/data/unidata
+ cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd ../../test/testdata
+ cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** CLDR numbering systems
+- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
+ Unicode 11: using Unicode 11 CLDR ticket #10978
+ rohg 10D30..10D39 Hanifi_Rohingya
+ gong 11DA0..11DA9 Gunjala_Gondi
+ Earlier: CLDR tickets specific to adding new numbering systems.
+ Unicode 10: http://unicode.org/cldr/trac/ticket/10219
+ Unicode 9: http://unicode.org/cldr/trac/ticket/9692
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+ instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools are checked in:
+ http://www.unicode.org/utility/trac/log/trunk/unicodetools
+
+---------------------------------------------------------------------------- ***
+
+Unicode 10.0 update for ICU 60
+
+http://www.unicode.org/versions/Unicode10.0.0/
+http://www.unicode.org/versions/beta-10.0.0.html
+http://blog.unicode.org/2017/03/unicode-100-beta-review.html
+http://www.unicode.org/review/pri350/
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/reports/tr44/tr44-19.html
+
+* Command-line environment setup
+
+UNICODE_DATA=~/unidata/uni10/20170605
+CLDR_SRC=~/svn.cldr/uni10
+ICU_ROOT=~/svn.icu/uni10
+ICU_SRC=$ICU_ROOT/src
+ICUDT=icudt60b
+ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
+ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
+
+*** ICU Trac
+
+- ticket:12985: Unicode 10
+- ticket:13061: undo hacks from emoji 5.0 update
+- ticket:13062: add Emoji_Component property
+- ^/branches/markus/uni10
+
+*** CLDR Trac
+
+- cldrbug 10055: Unicode 10
+- cldrbug 9882: Unicode 10 script metadata
+- cldrbug 10219: numbering systems for Unicode 10
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+ so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* download files
+- mkdir -p $UNICODE_DATA
+- download Unicode 10.0 files into $UNICODE_DATA
+ + subfolders: ucd, uca, idna, security
+ + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+- download emoji 5.0 files into $UNICODE_DATA/emoji
-Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html
+* for manual diffs: remove version suffixes from the file names
+ ~$ unidata/desuffixucd.py $UNICODE_DATA
+ (see https://sites.google.com/site/unicodetools/inputdata)
+
+* process and/or copy files
+- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
+ + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+ + For debugging, and tweaking how ppucd.txt is written,
+ the tool has an --only_ppucd option:
+ py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
+
+- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
+
+* build ICU (make install)
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* preparseucd.py changes
+- remove or add new Unicode scripts from/to the
+ only-in-ISO-15924 list according to the error messages:
+ ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
+ -> adjust _scripts_only_in_iso15924 as indicated
+- fix other errors
+ Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
+ -> add vo=Vertical_Orientation to _ignored_properties
+ -> later removed again, parsing the file, even though we do not yet store data for runtime use
+
+* new constants for new property values
+- preparseucd.py error:
+ ValueError: missing uchar.h enum constants for some property values:
+ [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
+ u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
+ (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
+ u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
+ u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
+ (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
+ = PropertyValueAliases.txt new property values (diff old & new .txt files)
+ blk; CJK_Ext_F ; CJK_Unified_Ideographs_Extension_F
+ blk; Kana_Ext_A ; Kana_Extended_A
+ blk; Masaram_Gondi ; Masaram_Gondi
+ blk; Nushu ; Nushu
+ blk; Soyombo ; Soyombo
+ blk; Syriac_Sup ; Syriac_Supplement
+ blk; Zanabazar_Square ; Zanabazar_Square
+ -> add to uchar.h
+ use long property names for enum constants,
+ for the trailing comment get the block start code point: diff old & new Blocks.txt
+ -> add to UCharacter.UnicodeBlock IDs
+ Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+ replace public static final int \1_ID = \2; \3
+ -> add to UCharacter.UnicodeBlock objects
+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+
+ jg ; Malayalam_Bha ; Malayalam_Bha
+ jg ; Malayalam_Ja ; Malayalam_Ja
+ jg ; Malayalam_Lla ; Malayalam_Lla
+ jg ; Malayalam_Llla ; Malayalam_Llla
+ jg ; Malayalam_Nga ; Malayalam_Nga
+ jg ; Malayalam_Nna ; Malayalam_Nna
+ jg ; Malayalam_Nnna ; Malayalam_Nnna
+ jg ; Malayalam_Nya ; Malayalam_Nya
+ jg ; Malayalam_Ra ; Malayalam_Ra
+ jg ; Malayalam_Ssa ; Malayalam_Ssa
+ jg ; Malayalam_Tta ; Malayalam_Tta
+ -> uchar.h & UCharacter.JoiningGroup
+
+ sc ; Gonm ; Masaram_Gondi
+ sc ; Nshu ; Nushu
+ sc ; Soyo ; Soyombo
+ sc ; Zanb ; Zanabazar_Square
+ -> uscript.h & com.ibm.icu.lang.UScript
+ -> Nushu had been added already
+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+ and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+* New properties as shown in PropertyValueAliases.txt changes
+- boolean Emoji_Component from emoji 5
+ -> uchar.h & UProperty.java
+- boolean
+ # Regional_Indicator (RI)
+
+ RI ; N ; No ; F ; False
+ RI ; Y ; Yes ; T ; True
+ -> uchar.h & UProperty.java
+ -> single immutable range, to be hardcoded
+- boolean
+ # Prepended_Concatenation_Mark (PCM)
+
+ PCM; N ; No ; F ; False
+ PCM; Y ; Yes ; T ; True
+ -> was new in Unicode 9
+ -> uchar.h & UProperty.java
+- enumerated
+ # Vertical_Orientation (vo)
+
+ vo ; R ; Rotated
+ vo ; Tr ; Transformed_Rotated
+ vo ; Tu ; Transformed_Upright
+ vo ; U ; Upright
+ -> only pre-parsed for now, but not yet stored for runtime use
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+ (not strictly necessary for NOT_ENCODED scripts)
+ $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
+
+* generate normalization data files
+ cd $ICU_ROOT/dbg/icu4c
+ bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* build Unicode tools using CMake+make
+
+$ICU_SRC/tools/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
+# Location of the ICU4C source tree.
+set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
+
+ $ICU_ROOT/dbg/tools/unicode/c$
+ cmake ../../../../src/tools/unicode/c
+ make
+
+* generate core properties data files
+ $ICU_ROOT/dbg/tools/unicode/c$
+ genprops/genprops $ICU_SRC/icu4c
+ genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
+ genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..10.0: U+2260, U+226E, U+226F
+- nothing new in this Unicode version, no test file to update
-Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561
-- Adlm 166 Adlam
-- Aran 161 Arabic (Nastaliq variant)
-- Kitl 505 Khitan large script
-- Kits 288 Khitan small script
-- Marc 332 Marchen
-- Osge 219 Osage
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+* collation: CLDR collation root, UCA DUCET
-Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.
+- UCA DUCET goes into Mark's Unicode tools, see
+ https://sites.google.com/site/unicodetools/home#TOC-UCA
+- CLDR root data files are checked into $CLDR_SRC/common/uca/
+ cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
-Adlam, Marchen, and Osage are expected to go into Unicode 9;
-we should assign Unicode script property value aliases for them
-soon after Unicode 8 is released, and add them in ICU 56.
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+ cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+ cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
+ (note removing the underscore before "Rules")
+ cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
+- restore TODO diffs in UCARules.txt
+ meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+ from the CLDR root files (..._CLDR_..._SHORT.txt)
+ cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+ cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+ cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+ CLDR common/collation/root.xml <collation type="private-unihan">
+ and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
-Khitan scripts will be encoded later.
+- run genuca, see command line above;
+ deal with
+ Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
+ FDD1 11D10; [70 D5 02, 05, 05] # Masaram_Gondi first primary (compressible)
+ (add the character to genuca.cpp sampleCharsToScripts[])
+ + look up the USCRIPT_ code for the new sample characters
+ (should be obvious from the comment in the error output)
+ + *add* mappings to sampleCharsToScripts[], do not replace them
+ (in case the script sample characters flip-flop)
+ + insert new scripts in DUCET script order, see the top_byte table
+ at the beginning of FractionalUCA.txt
+- rebuild ICU4C
+
+* Unihan collators
+ https://sites.google.com/site/unicodetools/unihan
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollators
+ with VM arguments
+ -ea
+ -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
+ -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
+ -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
+ -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
+ -DUVERSION=10.0.0
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollatorFiles
+ with the same arguments
+- check CLDR diffs
+ cd $CLDR_SRC
+ meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
+ meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
+- copy to CLDR
+ cd $CLDR_SRC
+ cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
+ cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
+- run CLDR unit tests, commit to CLDR
+- generate ICU zh collation data: run CLDR
+ org.unicode.cldr.icu.NewLdml2IcuConverter
+ with program arguments
+ -t collation
+ -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
+ -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
+ -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
+ -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
+ zh
+ and VM arguments
+ -ea
+ -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+ (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+ utility/MultithreadTest/TestCollators will fail as well;
+ fix the conformance test before looking into the multi-thread test
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+ output:
+ ...
+ Unicode .icu files built to ./out/build/icudt60l
+ echo timestamp > uni-core-data
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
+ mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
+ echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
+ mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+ jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+ make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
+- copy the big-endian Unicode data files to another location,
+ separate from the other data files,
+ and then refresh ICU4J
+ cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+ cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
+or
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
+
+* update CollationFCD.java
+ + copy & paste the initializers of lcccIndex[] etc. from
+ ICU4C/source/i18n/collationfcd.cpp to
+ ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd $ICU_SRC/icu4c/source/data/unidata
+ cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd ../../test/testdata
+ cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** CLDR numbering systems
+- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
+ Unicode 10: http://unicode.org/cldr/trac/ticket/10219
+ Unicode 9: http://unicode.org/cldr/trac/ticket/9692
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+ instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools are checked in:
+ http://www.unicode.org/utility/trac/log/trunk/unicodetools
+
+---------------------------------------------------------------------------- ***
+
+Emoji 5.0 update for ICU 59
+- ICU 59 mostly remains on Unicode 9.0
+- except updates bidi and segmentation data to Unicode 10 beta
+
+First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
+
+* Command-line environment setup
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
+ICUDT=icudt59b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
+UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
+
+*** ICU Trac
+
+- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
+- changes directly on trunk
+
+*** data files & enums & parser code
+
+* download files
+
+- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
+- download emoji 5.0 beta files into the same uni90e50 folder
+- download Unicode 10.0 beta files: ucd
+ + copy Unicode 10 bidi files to the uni90e50/ucd folder:
+ BidiBrackets.txt
+ BidiCharacterTest.txt
+ BidiMirroring.txt
+ BidiTest.txt
+ extracted/DerivedBidiClass.txt
+ + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
+ LineBreak.txt
+ auxiliary/*
+
+* preparseucd.py changes
+- adjust for combined trunks
+- write new copyright lines
+- ignore new Emoji_Component property for now
+
+* process and/or copy files
+- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
+ + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+
+- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
+
+* build ICU (make install)
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* build Unicode tools using CMake+make
+
+~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
+# Location of the ICU4C source tree.
+set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
+
+ ~/svn.icu/trunk/dbg/tools/unicode/c$
+ cmake ../../../../src/tools/unicode/c
+ make
+
+* generate core properties data files
+ ~/svn.icu/trunk/dbg/tools/unicode/c$
+ genprops/genprops $ICU4C_SRC_DIR
+- rebuild ICU (make install) & tools
+
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+ output:
+ ...
+ Unicode .icu files built to ./out/build/icudt59l
+ echo timestamp > uni-core-data
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
+ mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
+ echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
+ mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+ jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+ make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
+- copy the big-endian Unicode data files to another location,
+ separate from the other data files,
+ and then refresh ICU4J
+ cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+ cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
+or
+- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd $ICU4C_SRC_DIR/source/data/unidata
+ cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd ../../test/testdata
+ cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+---------------------------------------------------------------------------- ***
+
+Unicode 9.0 update for ICU 58
+
+* Command-line environment setup
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICUDT=icudt58b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+UNIDATA=$ICU_SRC_DIR/source/data/unidata
+
+http://www.unicode.org/review/pri323/ -- beta review
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/versions/beta-9.0.0.html
+http://www.unicode.org/versions/Unicode9.0.0/
+http://www.unicode.org/reports/tr44/tr44-17.html
+
+*** ICU Trac
+
+- ticket:12526: integrate Unicode 9
+- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
+- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
+
+*** CLDR Trac
+
+- cldrbug 9414: UCA 9
+- ^/branches/markus/uni90 at r11518 from trunk at r11517
+
+- cldrbug 8745: Unicode 9.0 script metadata
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+ so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+ includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- only for manual diffs: remove version suffixes from the file names
+ ~/unidata/uni70/20140403$ ../../desuffixucd.py .
+ (see https://sites.google.com/site/unicodetools/inputdata)
+- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+
+- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
+ and copy to $UNIDATA
+ cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
+
+* preparseucd.py changes
+- remove or add new Unicode scripts from/to the
+ only-in-ISO-15924 list according to the error messages:
+ ValueError: remove ['Tang'] from _scripts_only_in_iso15924
+ ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
+ ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
+ ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+ and in com.ibm.icu.dev.test.lang.TestUScript.java
+- DerivedNumericValues.txt new numeric values
+ 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
+ 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH
+ 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS
+ 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH
+ 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS
+ -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
+ uchar.c, UCharacterProperty.java
+ to support a new series of values
+- adjust preparseucd.py for Tangut algorithmic names
+ in ppucd.txt:
+ algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
+ ->
+ algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
+- avoid block-compressing most String/Miscellaneous property values,
+ triggered by genprops not coping with a multi-code point Case_Folding on
+ block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
+ keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
+
+* PropertyAliases.txt changes
+- 1 new property PCM=Prepended_Concatenation_Mark
+ Ignore: Only useful for layout engines.
+ Ok to list in ppucd.txt.
+
+* PropertyValueAliases.txt new property values
+ blk; Adlam ; Adlam
+ blk; Bhaiksuki ; Bhaiksuki
+ blk; Cyrillic_Ext_C ; Cyrillic_Extended_C
+ blk; Glagolitic_Sup ; Glagolitic_Supplement
+ blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation
+ blk; Marchen ; Marchen
+ blk; Mongolian_Sup ; Mongolian_Supplement
+ blk; Newa ; Newa
+ blk; Osage ; Osage
+ blk; Tangut ; Tangut
+ blk; Tangut_Components ; Tangut_Components
+ -> add to uchar.h
+ use long property names for enum constants
+ -> add to UCharacter.UnicodeBlock IDs
+ Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+ replace public static final int \1_ID = \2; \3
+ -> add to UCharacter.UnicodeBlock objects
+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+
+ GCB; EB ; E_Base
+ GCB; EBG ; E_Base_GAZ
+ GCB; EM ; E_Modifier
+ GCB; GAZ ; Glue_After_Zwj
+ GCB; ZWJ ; ZWJ
+ -> uchar.h & UCharacter.GraphemeClusterBreak
+
+ jg ; African_Feh ; African_Feh
+ jg ; African_Noon ; African_Noon
+ jg ; African_Qaf ; African_Qaf
+ -> uchar.h & UCharacter.JoiningGroup
+
+ lb ; EB ; E_Base
+ lb ; EM ; E_Modifier
+ lb ; ZWJ ; ZWJ
+ -> uchar.h & UCharacter.LineBreak
+
+ sc ; Adlm ; Adlam
+ sc ; Bhks ; Bhaiksuki
+ sc ; Marc ; Marchen
+ sc ; Newa ; Newa
+ sc ; Osge ; Osage
+ sc ; Tang ; Tangut
+ -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
+
+ WB ; EB ; E_Base
+ WB ; EBG ; E_Base_GAZ
+ WB ; EM ; E_Modifier
+ WB ; GAZ ; Glue_After_Zwj
+ WB ; ZWJ ; ZWJ
+ -> uchar.h & UCharacter.WordBreak
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+ (not strictly necessary for NOT_ENCODED scripts)
+ ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
+
+* generate normalization data files
+ cd $ICU_ROOT/dbg
+ bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
+ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
+ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
+ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
+
+* build Unicode tools using CMake+make
+
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
+
+ # Location (--prefix) of where ICU was installed.
+ set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
+ # Location of the ICU source tree.
+ set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
+
+ ~/svn.icutools/trunk/dbg/unicode/c$
+ cmake ../../../src/unicode/c
+ make
+
+* generate core properties data files
+ ~/svn.icutools/trunk/dbg/unicode/c$
+ genprops/genprops $ICU_SRC_DIR
+ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
+ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..9.0: U+2260, U+226E, U+226F
+- nothing new in 9.0, no test file to update
+
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+* collation: CLDR collation root, UCA DUCET
+
+- UCA DUCET goes into Mark's Unicode tools, see
+ https://sites.google.com/site/unicodetools/home#TOC-UCA
+- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
+ cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
+
+- cd (CLDR UCA branch)/common/uca/
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+ cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+ cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
+ (note removing the underscore before "Rules")
+ cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- restore TODO diffs in UCARules.txt
+ meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+ from the CLDR root files (..._CLDR_..._SHORT.txt)
+ cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+ cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+ cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+ CLDR common/collation/root.xml <collation type="private-unihan">
+ and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
+
+- run genuca, see command line above;
+ deal with
+ Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
+ FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)
+ (add the character to genuca.cpp sampleCharsToScripts[])
+ + look up the USCRIPT_ code for the new sample characters
+ (should be obvious from the comment in the error output)
+ + *add* mappings to sampleCharsToScripts[], do not replace them
+ (in case the script sample characters flip-flop)
+ + insert new scripts in DUCET script order, see the top_byte table
+ at the beginning of FractionalUCA.txt
+- rebuild ICU4C
+
+* Unihan collators
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollators
+ with VM arguments
+ -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
+ -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
+ -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
+ -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
+ -DUVERSION=9.0.0
+ -ea
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollatorFiles
+ with the same arguments
+- check CLDR diffs
+ cd ~/svn.cldr/trunk
+ meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
+ meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
+- copy to CLDR
+ cd ~/svn.cldr/trunk
+ cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
+ cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
+- commit to CLDR
+- generate ICU zh collation data: run CLDR
+ org.unicode.cldr.icu.NewLdml2IcuConverter
+ with program arguments
+ -t collation
+ -s /home/mscherer/svn.cldr/trunk/common/collation
+ -m /home/mscherer/svn.cldr/trunk/common/supplemental
+ -d /home/mscherer/svn.icu/trunk/src/source/data/coll
+ -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
+ zh
+ and VM arguments
+ -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+ (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+ utility/MultithreadTest/TestCollators will fail as well;
+ fix the conformance test before looking into the multi-thread test
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+ output:
+ ...
+ Unicode .icu files built to ./out/build/icudt58l
+ echo timestamp > uni-core-data
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
+ mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
+ echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
+ mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+ jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+ make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
+- copy the big-endian Unicode data files to another location,
+ separate from the other data files,
+ and then refresh ICU4J
+ cd ~/svn.icu/trunk/dbg/data/out/icu4j
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+ cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+* update CollationFCD.java
+ + copy & paste the initializers of lcccIndex[] etc. from
+ ICU4C/source/i18n/collationfcd.cpp to
+ ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd $ICU_SRC_DIR/source/data/unidata
+ cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd ../../test/testdata
+ cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** LayoutEngine script information
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+ This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+ in the working directory.
+
+ (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+ It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
+ (a plain text file)
+ which maps ICU versions to the numbers of script/language constants
+ that were added then.
+ (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
+
+ The generated files have a current copyright date and "@deprecated" statement.
+
+* Review changes, fix Java tool if necessary, and copy to ICU4C
+ cd ~/svn.icu4j/trunk/src
+ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+ instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools & ICU tools are checked in
+ http://www.unicode.org/utility/trac/log/trunk/unicodetools
+ http://bugs.icu-project.org/trac/log/tools/trunk
+
+---------------------------------------------------------------------------- ***
+
+New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
+
+Adding
+- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
+- new combination/alias codes: Hanb, Jamo
+ - used in CLDR 29 and in spoof checker
+- new Z* code: Zsye
+
+Add new codes to uscript.h & UScript.java, see Unicode update logs.
+ -> com.ibm.icu.lang.UScript
+ find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+ replace public static final int \1 = \2; \3
+
+Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
+add new script codes.
+"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
+
+Note: If we have to run preparseucd.py again before the Unicode 9 update,
+then we need to manually keep/restore the new script codes.
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICUDT=icudt57b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+UNIDATA=$ICU_SRC_DIR/source/data/unidata
+
+Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
+see http://bugs.icu-project.org/trac/ticket/12141
+
+make install, then icutools cmake & make, then
+~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
+
+Generate Java data as usual, only update pnames.icu & uprops.icu.
+
+*** LayoutEngine script information
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+ This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+ in the working directory.
+
+ (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+ It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
+ (a plain text file)
+ which maps ICU versions to the numbers of script/language constants
+ that were added then.
+ (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
+
+ The generated files have a current copyright date and "@deprecated" statement.
+
+* Review changes, fix Java tool if necessary, and copy to ICU4C
+ cd ~/svn.icu4j/trunk/src
+ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
+ cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
---------------------------------------------------------------------------- ***