---------------------------------------------------------------------------- ***
+Unicode 13.0 update for ICU 66
+
+https://www.unicode.org/versions/Unicode13.0.0/
+https://www.unicode.org/versions/beta-13.0.0.html
+https://www.unicode.org/Public/13.0.0/ucd/
+https://www.unicode.org/reports/uax-proposed-updates.html
+https://www.unicode.org/reports/tr44/tr44-25.html
+
+https://unicode-org.atlassian.net/browse/CLDR-13387
+https://unicode-org.atlassian.net/browse/ICU-20893
+
+* Command-line environment setup
+
+UNICODE_DATA=~/unidata/uni13/20200212
+CLDR_SRC=~/cldr/uni/src
+ICU_ROOT=~/icu/uni
+ICU_SRC=$ICU_ROOT/src
+ICUDT=icudt66b
+ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
+ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+ so that the makefiles see the new version number.
+ cd $ICU_ROOT/dbg/icu4c
+ ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
+
+*** data files & enums & parser code
+
+* download files
+- mkdir -p $UNICODE_DATA
+- download Unicode files into $UNICODE_DATA
+ + subfolders: emoji, idna, security, ucd, uca
+ + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+ + split Unihan into single-property files
+ ~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
+ + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
+ or from the ucd/cldr/ output folder of the Unicode Tools:
+ Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
+ cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
+
+* for manual diffs and for Unicode Tools input data updates:
+ remove version suffixes from the file names
+ ~$ unidata/desuffixucd.py $UNICODE_DATA
+ (see https://sites.google.com/site/unicodetools/inputdata)
+
+* process and/or copy files
+- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
+ + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+ + For debugging, and tweaking how ppucd.txt is written,
+ the tool has an --only_ppucd option:
+ py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
+
+- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
+
+* new constants for new property values
+- preparseucd.py error:
+ ValueError: missing uchar.h enum constants for some property values:
+ [(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
+ u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
+ (u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
+ (u'InPC', set([u'Top_And_Bottom_And_Left']))]
+ = PropertyValueAliases.txt new property values (diff old & new .txt files)
+ blk; Chorasmian ; Chorasmian
+ blk; CJK_Ext_G ; CJK_Unified_Ideographs_Extension_G
+ blk; Dives_Akuru ; Dives_Akuru
+ blk; Khitan_Small_Script ; Khitan_Small_Script
+ blk; Lisu_Sup ; Lisu_Supplement
+ blk; Symbols_For_Legacy_Computing ; Symbols_For_Legacy_Computing
+ blk; Tangut_Sup ; Tangut_Supplement
+ blk; Yezidi ; Yezidi
+ -> add to uchar.h before UBLOCK_COUNT
+ use long property names for enum constants,
+ for the trailing comment get the block start code point: diff old & new Blocks.txt
+ -> add to UCharacter.UnicodeBlock IDs
+ Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+ replace public static final int \1_ID = \2; \3
+ -> add to UCharacter.UnicodeBlock objects
+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+
+ sc ; Chrs ; Chorasmian
+ sc ; Diak ; Dives_Akuru
+ sc ; Kits ; Khitan_Small_Script
+ sc ; Yezi ; Yezidi
+ -> uscript.h & com.ibm.icu.lang.UScript
+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+ and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+ InPC; Top_And_Bottom_And_Left ; Top_And_Bottom_And_Left
+ -> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+ (not strictly necessary for NOT_ENCODED scripts)
+ $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
+
+* build ICU (make install)
+ to make sure that there are no syntax errors, and
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
+
+* update spoof checker UnicodeSet initializers:
+ inclusionPat & recommendedPat in i18n/uspoof.cpp
+ INCLUSION & RECOMMENDED in SpoofChecker.java
+- make sure that the Unicode Tools tree contains the latest security data files
+- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
+- update the hardcoded version number there in the DIRECTORY path
+- run the tool (no special environment variables needed)
+- copy & paste from the Console output into the .cpp & .java files
+
+* generate normalization data files
+ cd $ICU_ROOT/dbg/icu4c
+ bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+ bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+ so that the tools build can pick up the new definitions from the installed header files.
+
+ $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
+
+* build Unicode tools using CMake+make
+
+$ICU_SRC/tools/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
+# Location of the ICU4C source tree.
+set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
+
+ $ICU_ROOT/dbg$
+ mkdir -p tools/unicode/c
+ cd tools/unicode/c
+
+ $ICU_ROOT/dbg/tools/unicode/c$
+ cmake ../../../../src/tools/unicode/c
+ make
+
+* generate core properties data files
+ $ICU_ROOT/dbg/tools/unicode/c$
+ genprops/genprops $ICU_SRC/icu4c
+- tool failure:
+ genprops: Script_Extensions indexes overflow bit field
+ genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
+ -> uprops.icu data file format :
+ add two more bits to store a script code or Script_Extensions index
+ -> generator code, C++ & Java runtime, uprops.icu format version 7.7
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..13.0: U+2260, U+226E, U+226F
+- nothing new in this Unicode version, no test file to update
+
+* run & fix ICU4C tests
+- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
+- Andy helps with RBBI & spoof check test failures
+
+* collation: CLDR collation root, UCA DUCET
+
+- UCA DUCET goes into Mark's Unicode tools, see
+ https://sites.google.com/site/unicodetools/home#TOC-UCA
+ diff the main mapping file, look for bad changes
+ (for example, more bytes per weight for common characters)
+ ~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
+ ~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
+
+- CLDR root data files are checked into $CLDR_SRC/common/uca/
+ cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
+
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+ cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+ cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
+ (note removing the underscore before "Rules")
+ cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
+- restore TODO diffs in UCARules.txt
+ meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+ from the CLDR root files (..._CLDR_..._SHORT.txt)
+ cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+ cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+ cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+ CLDR common/collation/root.xml <collation type="private-unihan">
+ and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
+
+- run genuca
+ $ICU_ROOT/dbg/tools/unicode/c$
+ genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
+ genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
+- rebuild ICU4C
+
+* Unihan collators
+ https://sites.google.com/site/unicodetools/unihan
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollators
+ with VM arguments
+ -ea
+ -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
+ -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
+ -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
+ -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
+ -DUVERSION=13.0.0
+- run Unicode Tools
+ org.unicode.draft.GenerateUnihanCollatorFiles
+ with the same arguments
+- check CLDR diffs
+ cd $CLDR_SRC
+ meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
+ meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
+- copy to CLDR
+ cd $CLDR_SRC
+ cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
+ cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
+- run CLDR unit tests, commit to CLDR
+- generate ICU zh collation data: run CLDR
+ org.unicode.cldr.icu.NewLdml2IcuConverter
+ with program arguments
+ -t collation
+ -s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
+ -m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
+ -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
+ -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
+ zh
+ and VM arguments
+ -ea
+ -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+ (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+ utility/MultithreadTest/TestCollators will fail as well;
+ fix the conformance test before looking into the multi-thread test
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+ output:
+ ...
+ make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
+ mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
+ mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+ jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+ make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
+- copy the big-endian Unicode data files to another location,
+ separate from the other data files,
+ and then refresh ICU4J
+ cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+ cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+ cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+ cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+ jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
+or
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
+
+* update CollationFCD.java
+ + copy & paste the initializers of lcccIndex[] etc. from
+ ICU4C/source/i18n/collationfcd.cpp to
+ ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd $ICU_SRC/icu4c/source/data/unidata
+ cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cd ../../test/testdata
+ cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+ cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** CLDR numbering systems
+- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
+ for example, look for
+ ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
+ in new blocks (Blocks.txt)
+ Unicode 13:
+ diak 11950..11959 Dives_Akuru
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+ instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools are checked in:
+ http://www.unicode.org/utility/trac/log/trunk/unicodetools
+
+---------------------------------------------------------------------------- ***
+
Unicode 12.1 update for ICU 64.2
** This is an abbreviated update with one new character for the new