ICU-62141.0.1.tar.gz

[apple/icu.git] / icuSources / data / unidata / changes.txt
diff --git a/icuSources/data/unidata/changes.txt b/icuSources/data/unidata/changes.txt

index 5e88414ddcb7554700b22a0318bd0925d2f4e96a..8ace8ca2510ab574948bf76bc520c810ae9ada23 100644 (file)
--- a/icuSources/data/unidata/changes.txt
+++ b/icuSources/data/unidata/changes.txt
@@ -1,4 +1,6 @@
-* Copyright (C) 2004-2012, International Business Machines
+* Copyright (C) 2016 and later: Unicode, Inc. and others.
+* License & terms of use: http://www.unicode.org/copyright.html
+* Copyright (C) 2004-2016, International Business Machines
  * Corporation and others.  All Rights Reserved.
  *
  *   file name:  changes.txt
@@ -10,6 +12,2232 @@
  *   created by: Markus W. Scherer
  *
  * change log for Unicode updates
+*
+* For each new Unicode version, during the beta period,
+* I copy the change log for the previous version to the top of this file.
+* I adjust the versions, tickets, URLs, and paths.
+* I work my way through the steps listed in the log, top to bottom,
+* adjusting the log as necessary.
+* I report problems to the UTC and/or CLDR and/or ICU.
+* Before the data is final, I "turn the crank" several more times,
+* using appropriate subsets of the steps.
+
+---------------------------------------------------------------------------- ***
+
+* New ISO 15924 script codes
+
+Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
+until they are encoded in Unicode,
+or can be assumed to be encoded in the next Unicode version.
+Script enum constant names want to follow the Unicode script property value aliases,
+which are assigned only when the scripts are encoded.
+When we encode scripts early and guess wrong, then we have confusing enum constants
+and have sometimes added aliases.
+
+Variant script codes like Latf and Aran that are not subject to separate encoding
+can be added at any time.
+(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
+
+We add script codes used in CLDR or in the spoof checker.
+This includes combination/alias codes like Hanb and Jamo.
+See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
+and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
+
+We add special Z* script codes like Zsye.
+
+For new script codes see http://www.unicode.org/iso15924/codechanges.html
+
+---------------------------------------------------------------------------- ***
+
+Unicode 11.0 update for ICU 62
+
+http://www.unicode.org/versions/Unicode11.0.0/
+http://unicode.org/versions/beta-11.0.0.html
+https://www.unicode.org/review/pri372/
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/reports/tr44/tr44-21.html
+
+* Command-line environment setup
+
+UNICODE_DATA=~/unidata/uni11/20180521
+CLDR_SRC=~/svn.cldr/uni
+ICU_ROOT=~/svn.icu/uni
+ICU_SRC=$ICU_ROOT/src
+ICUDT=icudt61b
+ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
+ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
+
+*** ICU Trac
+
+- ticket:13630: Unicode 11
+- ^/branches/markus/uni11
+
+*** CLDR Trac
+
+- cldrbug 10978: Unicode 11
+- ^/branches/markus/uni11
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* download files
+- mkdir -p $UNICODE_DATA
+- download Unicode files into $UNICODE_DATA
+  + subfolders: emoji, idna, security, ucd, uca
+  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+
+* for manual diffs and for Unicode Tools input data updates:
+  remove version suffixes from the file names
+    ~$ unidata/desuffixucd.py $UNICODE_DATA
+  (see https://sites.google.com/site/unicodetools/inputdata)
+
+* process and/or copy files
+- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
+  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+  + For debugging, and tweaking how ppucd.txt is written,
+    the tool has an --only_ppucd option:
+    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
+
+- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* preparseucd.py changes
+- fix other errors
+    NameError: unknown property Extended_Pictographic
+  -> add Extended_Pictographic binary property
+  -> add new short names for all Emoji properties
+
+* new constants for new property values
+- preparseucd.py error:
+    ValueError: missing uchar.h enum constants for some property values:
+    [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
+                   u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
+                   u'Indic_Siyaq_Numbers'])),
+     (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
+     (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
+     (u'GCB', set([u'LinkC', u'Virama'])),
+     (u'WB', set([u'WSegSpace']))]
+  = PropertyValueAliases.txt new property values (diff old & new .txt files)
+    blk; Chess_Symbols                    ; Chess_Symbols
+    blk; Dogra                            ; Dogra
+    blk; Georgian_Ext                     ; Georgian_Extended
+    blk; Gunjala_Gondi                    ; Gunjala_Gondi
+    blk; Hanifi_Rohingya                  ; Hanifi_Rohingya
+    blk; Indic_Siyaq_Numbers              ; Indic_Siyaq_Numbers
+    blk; Makasar                          ; Makasar
+    blk; Mayan_Numerals                   ; Mayan_Numerals
+    blk; Medefaidrin                      ; Medefaidrin
+    blk; Old_Sogdian                      ; Old_Sogdian
+    blk; Sogdian                          ; Sogdian
+  -> add to uchar.h
+    use long property names for enum constants,
+    for the trailing comment get the block start code point: diff old & new Blocks.txt
+  -> add to UCharacter.UnicodeBlock IDs
+    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+            replace  public static final int \1_ID = \2; \3
+  -> add to UCharacter.UnicodeBlock objects
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+
+    GCB; LinkC                            ; LinkingConsonant
+    GCB; Virama                           ; Virama
+  -> uchar.h & UCharacter.GraphemeClusterBreak
+  -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
+
+    InSC; Consonant_Initial_Postfixed     ; Consonant_Initial_Postfixed
+  -> ignore: ICU does not yet support this property
+
+    jg ; Hanifi_Rohingya_Kinna_Ya         ; Hanifi_Rohingya_Kinna_Ya
+    jg ; Hanifi_Rohingya_Pa               ; Hanifi_Rohingya_Pa
+  -> uchar.h & UCharacter.JoiningGroup
+
+    sc ; Dogr                             ; Dogra
+    sc ; Gong                             ; Gunjala_Gondi
+    sc ; Maka                             ; Makasar
+    sc ; Medf                             ; Medefaidrin
+    sc ; Rohg                             ; Hanifi_Rohingya
+    sc ; Sogd                             ; Sogdian
+    sc ; Sogo                             ; Old_Sogdian
+  -> uscript.h & com.ibm.icu.lang.UScript
+  -> Nushu had been added already
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+    WB ; WSegSpace                        ; WSegSpace
+  -> uchar.h & UCharacter.WordBreak
+
+* New short names for emoji properties
+- see UTS #51
+- short names set in preparseucd.py
+
+* New properties
+- boolean emoji property Extended_Pictographic
+  -> added in preparseucd.py
+  -> uchar.h & UProperty.java
+- misc. property Equivalent_Unified_Ideograph (EqUIdeo)
+  as shown in PropertyValueAliases.txt
+  -> ignore for now
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+    (not strictly necessary for NOT_ENCODED scripts)
+  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
+
+* update spoof checker UnicodeSet initializers:
+    inclusionPat & recommendedPat in uspoof.cpp
+    INCLUSION & RECOMMENDED in SpoofChecker.java
+- make sure that the Unicode Tools tree contains the latest security data files
+- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
+- update the hardcoded version number there in the DIRECTORY path
+- run the tool (no special environment variables needed)
+- copy & paste from the Console output into the .cpp & .java files
+
+* generate normalization data files
+  cd $ICU_ROOT/dbg/icu4c
+  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
+  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
+  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
+  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* build Unicode tools using CMake+make
+
+$ICU_SRC/tools/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
+# Location of the ICU4C source tree.
+set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
+
+  $ICU_ROOT/dbg$
+    mkdir -p tools/unicode/c
+    cd tools/unicode/c
+
+  $ICU_ROOT/dbg/tools/unicode/c$
+    cmake ../../../../src/tools/unicode/c
+    make
+
+* generate core properties data files
+  $ICU_ROOT/dbg/tools/unicode/c$
+    genprops/genprops $ICU_SRC/icu4c
+    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
+    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
+- rebuild ICU (make install) & tools
+
+* Fix case props
+    genprops error: casepropsbuilder: too many exceptions words
+    genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
+- With the addition of Georgian Mtavruli capital letters,
+  there are now too many simple case mappings with big mapping deltas
+  that yield uncompressible exceptions.
+- Changing the data structure (now formatVersion 4),
+  adding one bit for no-simple-case-folding (for Cherokee), and
+  one optional slot for a big delta (for most faraway mappings),
+  together with another bit for whether that is negative.
+  This makes most Cherokee & Georgian etc. case mappings compressible,
+  reducing the number of exceptions words.
+- Further changes to gain one more bit for the exceptions index,
+  for future growth. Details see casepropsbuilder.cpp.
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..11.0: U+2260, U+226E, U+226F
+- nothing new in this Unicode version, no test file to update
+
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+- Errors in char.txt, word.txt, word_POSIX.txt like
+    createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET"  at line 46, column 16
+  because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
+  -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
+     not empty, just to get ICU building.
+  -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
+     and properties together with the rules that used them (GB 10, WB 14).
+  -> Andy adjusts the rule sets further to sync with
+     Unicode 11 grapheme, word, and line break spec changes.
+
+* collation: CLDR collation root, UCA DUCET
+
+- UCA DUCET goes into Mark's Unicode tools, see
+    https://sites.google.com/site/unicodetools/home#TOC-UCA
+  diff the main mapping file, look for bad changes
+  (for example, more bytes per weight for common characters)
+    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
+    ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
+
+- CLDR root data files are checked into $CLDR_SRC/common/uca/
+    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
+
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
+    (note removing the underscore before "Rules")
+    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
+- restore TODO diffs in UCARules.txt
+    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  from the CLDR root files (..._CLDR_..._SHORT.txt)
+    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+  CLDR common/collation/root.xml <collation type="private-unihan">
+  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
+
+- run genuca, see command line above;
+  deal with
+    Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
+    FDD1 1180B;        [71 CC 02, 05, 05]      # Dogra first primary (compressible)
+        (add the character to genuca.cpp sampleCharsToScripts[])
+  + look up the USCRIPT_ code for the new sample characters
+    (should be obvious from the comment in the error output)
+  + *add* mappings to sampleCharsToScripts[], do not replace them
+    (in case the script sample characters flip-flop)
+  + insert new scripts in DUCET script order, see the top_byte table
+    at the beginning of FractionalUCA.txt
+- rebuild ICU4C
+
+* Unihan collators
+    https://sites.google.com/site/unicodetools/unihan
+- run Unicode Tools
+    org.unicode.draft.GenerateUnihanCollators
+  with VM arguments
+    -ea
+    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
+    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
+    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
+    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
+    -DUVERSION=11.0.0
+- run Unicode Tools
+    org.unicode.draft.GenerateUnihanCollatorFiles
+  with the same arguments
+- check CLDR diffs
+    cd $CLDR_SRC
+    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
+    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
+- copy to CLDR
+    cd $CLDR_SRC
+    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
+    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
+- run CLDR unit tests, commit to CLDR
+- generate ICU zh collation data: run CLDR
+    org.unicode.cldr.icu.NewLdml2IcuConverter
+  with program arguments
+    -t collation
+    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
+    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
+    -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
+    -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
+    zh
+  and VM arguments
+    -ea
+    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+  (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt61l
+    echo timestamp > uni-core-data
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
+    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files,
+  and then refresh ICU4J
+    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
+or
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
+
+* update CollationFCD.java
+  + copy & paste the initializers of lcccIndex[] etc. from
+    ICU4C/source/i18n/collationfcd.cpp to
+    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd $ICU_SRC/icu4c/source/data/unidata
+    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd ../../test/testdata
+    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** CLDR numbering systems
+- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
+  Unicode 11: using Unicode 11 CLDR ticket #10978
+    rohg 10D30..10D39 Hanifi_Rohingya
+    gong 11DA0..11DA9 Gunjala_Gondi
+  Earlier: CLDR tickets specific to adding new numbering systems.
+  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
+  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools are checked in:
+  http://www.unicode.org/utility/trac/log/trunk/unicodetools
+
+---------------------------------------------------------------------------- ***
+
+Unicode 10.0 update for ICU 60
+
+http://www.unicode.org/versions/Unicode10.0.0/
+http://www.unicode.org/versions/beta-10.0.0.html
+http://blog.unicode.org/2017/03/unicode-100-beta-review.html
+http://www.unicode.org/review/pri350/
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/reports/tr44/tr44-19.html
+
+* Command-line environment setup
+
+UNICODE_DATA=~/unidata/uni10/20170605
+CLDR_SRC=~/svn.cldr/uni10
+ICU_ROOT=~/svn.icu/uni10
+ICU_SRC=$ICU_ROOT/src
+ICUDT=icudt60b
+ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
+ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
+
+*** ICU Trac
+
+- ticket:12985: Unicode 10
+- ticket:13061: undo hacks from emoji 5.0 update
+- ticket:13062: add Emoji_Component property
+- ^/branches/markus/uni10
+
+*** CLDR Trac
+
+- cldrbug 10055: Unicode 10
+- cldrbug 9882: Unicode 10 script metadata
+- cldrbug 10219: numbering systems for Unicode 10
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* download files
+- mkdir -p $UNICODE_DATA
+- download Unicode 10.0 files into $UNICODE_DATA
+  + subfolders: ucd, uca, idna, security
+  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+- download emoji 5.0 files into $UNICODE_DATA/emoji
+
+* for manual diffs: remove version suffixes from the file names
+  ~$ unidata/desuffixucd.py $UNICODE_DATA
+  (see https://sites.google.com/site/unicodetools/inputdata)
+
+* process and/or copy files
+- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
+  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+  + For debugging, and tweaking how ppucd.txt is written,
+    the tool has an --only_ppucd option:
+    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
+
+- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* preparseucd.py changes
+- remove or add new Unicode scripts from/to the
+  only-in-ISO-15924 list according to the error messages:
+    ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
+  -> adjust _scripts_only_in_iso15924 as indicated
+- fix other errors
+    Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo'] 
+  -> add vo=Vertical_Orientation to _ignored_properties
+  -> later removed again, parsing the file, even though we do not yet store data for runtime use
+
+* new constants for new property values
+- preparseucd.py error:
+    ValueError: missing uchar.h enum constants for some property values:
+    [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
+                   u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
+     (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
+                  u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
+                  u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
+     (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
+  = PropertyValueAliases.txt new property values (diff old & new .txt files)
+    blk; CJK_Ext_F                        ; CJK_Unified_Ideographs_Extension_F
+    blk; Kana_Ext_A                       ; Kana_Extended_A
+    blk; Masaram_Gondi                    ; Masaram_Gondi
+    blk; Nushu                            ; Nushu
+    blk; Soyombo                          ; Soyombo
+    blk; Syriac_Sup                       ; Syriac_Supplement
+    blk; Zanabazar_Square                 ; Zanabazar_Square
+  -> add to uchar.h
+    use long property names for enum constants,
+    for the trailing comment get the block start code point: diff old & new Blocks.txt
+  -> add to UCharacter.UnicodeBlock IDs
+    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+            replace  public static final int \1_ID = \2; \3
+  -> add to UCharacter.UnicodeBlock objects
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+
+    jg ; Malayalam_Bha                    ; Malayalam_Bha
+    jg ; Malayalam_Ja                     ; Malayalam_Ja
+    jg ; Malayalam_Lla                    ; Malayalam_Lla
+    jg ; Malayalam_Llla                   ; Malayalam_Llla
+    jg ; Malayalam_Nga                    ; Malayalam_Nga
+    jg ; Malayalam_Nna                    ; Malayalam_Nna
+    jg ; Malayalam_Nnna                   ; Malayalam_Nnna
+    jg ; Malayalam_Nya                    ; Malayalam_Nya
+    jg ; Malayalam_Ra                     ; Malayalam_Ra
+    jg ; Malayalam_Ssa                    ; Malayalam_Ssa
+    jg ; Malayalam_Tta                    ; Malayalam_Tta
+  -> uchar.h & UCharacter.JoiningGroup
+
+    sc ; Gonm                             ; Masaram_Gondi
+    sc ; Nshu                             ; Nushu
+    sc ; Soyo                             ; Soyombo
+    sc ; Zanb                             ; Zanabazar_Square
+  -> uscript.h & com.ibm.icu.lang.UScript
+  -> Nushu had been added already
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+* New properties as shown in PropertyValueAliases.txt changes
+- boolean Emoji_Component from emoji 5
+  -> uchar.h & UProperty.java
+- boolean
+    # Regional_Indicator (RI)
+
+    RI ; N                                ; No                               ; F                                ; False
+    RI ; Y                                ; Yes                              ; T                                ; True
+  -> uchar.h & UProperty.java
+  -> single immutable range, to be hardcoded
+- boolean
+    # Prepended_Concatenation_Mark (PCM)
+
+    PCM; N                                ; No                               ; F                                ; False
+    PCM; Y                                ; Yes                              ; T                                ; True
+  -> was new in Unicode 9
+  -> uchar.h & UProperty.java
+- enumerated
+    # Vertical_Orientation (vo)
+
+    vo ; R                                ; Rotated
+    vo ; Tr                               ; Transformed_Rotated
+    vo ; Tu                               ; Transformed_Upright
+    vo ; U                                ; Upright
+  -> only pre-parsed for now, but not yet stored for runtime use
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+    (not strictly necessary for NOT_ENCODED scripts)
+  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
+
+* generate normalization data files
+  cd $ICU_ROOT/dbg/icu4c
+  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
+  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
+  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
+  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* build Unicode tools using CMake+make
+
+$ICU_SRC/tools/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
+# Location of the ICU4C source tree.
+set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
+
+  $ICU_ROOT/dbg/tools/unicode/c$
+    cmake ../../../../src/tools/unicode/c
+    make
+
+* generate core properties data files
+  $ICU_ROOT/dbg/tools/unicode/c$
+    genprops/genprops $ICU_SRC/icu4c
+    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
+    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..10.0: U+2260, U+226E, U+226F
+- nothing new in this Unicode version, no test file to update
+
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+* collation: CLDR collation root, UCA DUCET
+
+- UCA DUCET goes into Mark's Unicode tools, see
+  https://sites.google.com/site/unicodetools/home#TOC-UCA
+- CLDR root data files are checked into $CLDR_SRC/common/uca/
+    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
+
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
+    (note removing the underscore before "Rules")
+    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
+- restore TODO diffs in UCARules.txt
+    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  from the CLDR root files (..._CLDR_..._SHORT.txt)
+    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+  CLDR common/collation/root.xml <collation type="private-unihan">
+  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
+
+- run genuca, see command line above;
+  deal with
+    Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
+    FDD1 11D10;     [70 D5 02, 05, 05]      # Masaram_Gondi first primary (compressible)
+        (add the character to genuca.cpp sampleCharsToScripts[])
+  + look up the USCRIPT_ code for the new sample characters
+    (should be obvious from the comment in the error output)
+  + *add* mappings to sampleCharsToScripts[], do not replace them
+    (in case the script sample characters flip-flop)
+  + insert new scripts in DUCET script order, see the top_byte table
+    at the beginning of FractionalUCA.txt
+- rebuild ICU4C
+
+* Unihan collators
+    https://sites.google.com/site/unicodetools/unihan
+- run Unicode Tools
+    org.unicode.draft.GenerateUnihanCollators
+  with VM arguments
+    -ea
+    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
+    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
+    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
+    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
+    -DUVERSION=10.0.0
+- run Unicode Tools
+    org.unicode.draft.GenerateUnihanCollatorFiles
+  with the same arguments
+- check CLDR diffs
+    cd $CLDR_SRC
+    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
+    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
+- copy to CLDR
+    cd $CLDR_SRC
+    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
+    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
+- run CLDR unit tests, commit to CLDR
+- generate ICU zh collation data: run CLDR
+    org.unicode.cldr.icu.NewLdml2IcuConverter
+  with program arguments
+    -t collation
+    -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
+    -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
+    -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
+    -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
+    zh
+  and VM arguments
+    -ea
+    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+  (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt60l
+    echo timestamp > uni-core-data
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
+    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files,
+  and then refresh ICU4J
+    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
+or
+- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
+
+* update CollationFCD.java
+  + copy & paste the initializers of lcccIndex[] etc. from
+    ICU4C/source/i18n/collationfcd.cpp to
+    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd $ICU_SRC/icu4c/source/data/unidata
+    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd ../../test/testdata
+    cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** CLDR numbering systems
+- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
+  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
+  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools are checked in:
+  http://www.unicode.org/utility/trac/log/trunk/unicodetools
+
+---------------------------------------------------------------------------- ***
+
+Emoji 5.0 update for ICU 59
+- ICU 59 mostly remains on Unicode 9.0
+- except updates bidi and segmentation data to Unicode 10 beta
+
+First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
+
+* Command-line environment setup
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
+ICUDT=icudt59b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
+UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
+
+*** ICU Trac
+
+- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
+- changes directly on trunk
+
+*** data files & enums & parser code
+
+* download files
+
+- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
+- download emoji 5.0 beta files into the same uni90e50 folder
+- download Unicode 10.0 beta files: ucd
+  + copy Unicode 10 bidi files to the uni90e50/ucd folder:
+    BidiBrackets.txt
+    BidiCharacterTest.txt
+    BidiMirroring.txt
+    BidiTest.txt
+    extracted/DerivedBidiClass.txt
+  + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
+    LineBreak.txt
+    auxiliary/*
+
+* preparseucd.py changes
+- adjust for combined trunks
+- write new copyright lines
+- ignore new Emoji_Component property for now
+
+* process and/or copy files
+- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
+  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+
+- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
+
+* build Unicode tools using CMake+make
+
+~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
+# Location of the ICU4C source tree.
+set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
+
+  ~/svn.icu/trunk/dbg/tools/unicode/c$
+    cmake ../../../../src/tools/unicode/c
+    make
+
+* generate core properties data files
+  ~/svn.icu/trunk/dbg/tools/unicode/c$
+    genprops/genprops $ICU4C_SRC_DIR
+- rebuild ICU (make install) & tools
+
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt59l
+    echo timestamp > uni-core-data
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
+    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files,
+  and then refresh ICU4J
+    cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
+or
+- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd $ICU4C_SRC_DIR/source/data/unidata
+    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd ../../test/testdata
+    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+---------------------------------------------------------------------------- ***
+
+Unicode 9.0 update for ICU 58
+
+* Command-line environment setup
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICUDT=icudt58b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+UNIDATA=$ICU_SRC_DIR/source/data/unidata
+
+http://www.unicode.org/review/pri323/  -- beta review
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/versions/beta-9.0.0.html
+http://www.unicode.org/versions/Unicode9.0.0/
+http://www.unicode.org/reports/tr44/tr44-17.html
+
+*** ICU Trac
+
+- ticket:12526: integrate Unicode 9
+- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
+- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
+
+*** CLDR Trac
+
+- cldrbug 9414: UCA 9
+- ^/branches/markus/uni90 at r11518 from trunk at r11517
+
+- cldrbug 8745: Unicode 9.0 script metadata
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- only for manual diffs: remove version suffixes from the file names
+  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
+  (see https://sites.google.com/site/unicodetools/inputdata)
+- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+
+- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
+  and copy to $UNIDATA
+    cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
+
+* preparseucd.py changes
+- remove or add new Unicode scripts from/to the
+  only-in-ISO-15924 list according to the error messages:
+    ValueError: remove ['Tang'] from _scripts_only_in_iso15924
+    ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
+    ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
+    ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+- DerivedNumericValues.txt new numeric values
+    0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
+    0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
+    0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
+    0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
+    0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
+  -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
+     uchar.c, UCharacterProperty.java
+     to support a new series of values
+- adjust preparseucd.py for Tangut algorithmic names
+  in ppucd.txt:
+    algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
+  ->
+    algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
+- avoid block-compressing most String/Miscellaneous property values,
+  triggered by genprops not coping with a multi-code point Case_Folding on
+    block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
+  keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
+
+* PropertyAliases.txt changes
+- 1 new property PCM=Prepended_Concatenation_Mark
+  Ignore: Only useful for layout engines.
+  Ok to list in ppucd.txt.
+
+* PropertyValueAliases.txt new property values
+    blk; Adlam                            ; Adlam
+    blk; Bhaiksuki                        ; Bhaiksuki
+    blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
+    blk; Glagolitic_Sup                   ; Glagolitic_Supplement
+    blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
+    blk; Marchen                          ; Marchen
+    blk; Mongolian_Sup                    ; Mongolian_Supplement
+    blk; Newa                             ; Newa
+    blk; Osage                            ; Osage
+    blk; Tangut                           ; Tangut
+    blk; Tangut_Components                ; Tangut_Components
+  -> add to uchar.h
+    use long property names for enum constants
+  -> add to UCharacter.UnicodeBlock IDs
+    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+            replace  public static final int \1_ID = \2; \3
+  -> add to UCharacter.UnicodeBlock objects
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+
+    GCB; EB                               ; E_Base
+    GCB; EBG                              ; E_Base_GAZ
+    GCB; EM                               ; E_Modifier
+    GCB; GAZ                              ; Glue_After_Zwj
+    GCB; ZWJ                              ; ZWJ
+  -> uchar.h & UCharacter.GraphemeClusterBreak
+
+    jg ; African_Feh                      ; African_Feh
+    jg ; African_Noon                     ; African_Noon
+    jg ; African_Qaf                      ; African_Qaf
+  -> uchar.h & UCharacter.JoiningGroup
+
+    lb ; EB                               ; E_Base
+    lb ; EM                               ; E_Modifier
+    lb ; ZWJ                              ; ZWJ
+  -> uchar.h & UCharacter.LineBreak
+
+    sc ; Adlm                             ; Adlam
+    sc ; Bhks                             ; Bhaiksuki
+    sc ; Marc                             ; Marchen
+    sc ; Newa                             ; Newa
+    sc ; Osge                             ; Osage
+    sc ; Tang                             ; Tangut
+  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
+
+    WB ; EB                               ; E_Base
+    WB ; EBG                              ; E_Base_GAZ
+    WB ; EM                               ; E_Modifier
+    WB ; GAZ                              ; Glue_After_Zwj
+    WB ; ZWJ                              ; ZWJ
+  -> uchar.h & UCharacter.WordBreak
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+    (not strictly necessary for NOT_ENCODED scripts)
+  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
+
+* generate normalization data files
+  cd $ICU_ROOT/dbg
+  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
+  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
+
+* build Unicode tools using CMake+make
+
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
+
+  # Location (--prefix) of where ICU was installed.
+  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
+  # Location of the ICU source tree.
+  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
+
+  ~/svn.icutools/trunk/dbg/unicode/c$
+    cmake ../../../src/unicode/c
+    make
+
+* generate core properties data files
+  ~/svn.icutools/trunk/dbg/unicode/c$
+    genprops/genprops $ICU_SRC_DIR
+    genuca/genuca --hanOrder implicit $ICU_SRC_DIR
+    genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..9.0: U+2260, U+226E, U+226F
+- nothing new in 9.0, no test file to update
+
+* run & fix ICU4C tests
+- Andy handles RBBI & spoof check test failures
+
+* collation: CLDR collation root, UCA DUCET
+
+- UCA DUCET goes into Mark's Unicode tools, see
+  https://sites.google.com/site/unicodetools/home#TOC-UCA
+- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
+    cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
+
+- cd (CLDR UCA branch)/common/uca/
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+    cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
+    (note removing the underscore before "Rules")
+    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- restore TODO diffs in UCARules.txt
+    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  from the CLDR root files (..._CLDR_..._SHORT.txt)
+    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+  CLDR common/collation/root.xml <collation type="private-unihan">
+  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
+
+- run genuca, see command line above;
+  deal with
+    Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
+    FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
+        (add the character to genuca.cpp sampleCharsToScripts[])
+  + look up the USCRIPT_ code for the new sample characters
+    (should be obvious from the comment in the error output)
+  + *add* mappings to sampleCharsToScripts[], do not replace them
+    (in case the script sample characters flip-flop)
+  + insert new scripts in DUCET script order, see the top_byte table
+    at the beginning of FractionalUCA.txt
+- rebuild ICU4C
+
+* Unihan collators
+- run Unicode Tools
+    org.unicode.draft.GenerateUnihanCollators
+  with VM arguments
+    -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
+    -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
+    -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
+    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
+    -DUVERSION=9.0.0
+    -ea
+- run Unicode Tools
+    org.unicode.draft.GenerateUnihanCollatorFiles
+  with the same arguments
+- check CLDR diffs
+    cd ~/svn.cldr/trunk
+    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
+    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
+- copy to CLDR
+    cd ~/svn.cldr/trunk
+    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
+    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
+- commit to CLDR
+- generate ICU zh collation data: run CLDR
+    org.unicode.cldr.icu.NewLdml2IcuConverter
+  with program arguments
+    -t collation
+    -s /home/mscherer/svn.cldr/trunk/common/collation
+    -m /home/mscherer/svn.cldr/trunk/common/supplemental
+    -d /home/mscherer/svn.icu/trunk/src/source/data/coll
+    -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
+    zh
+  and VM arguments
+    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+  (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt58l
+    echo timestamp > uni-core-data
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
+    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files,
+  and then refresh ICU4J
+    cd ~/svn.icu/trunk/dbg/data/out/icu4j
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+* update CollationFCD.java
+  + copy & paste the initializers of lcccIndex[] etc. from
+    ICU4C/source/i18n/collationfcd.cpp to
+    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd $ICU_SRC_DIR/source/data/unidata
+    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd ../../test/testdata
+    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** LayoutEngine script information
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+  in the working directory.
+
+  (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
+  (a plain text file)
+  which maps ICU versions to the numbers of script/language constants
+  that were added then.
+  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
+
+  The generated files have a current copyright date and "@deprecated" statement.
+
+* Review changes, fix Java tool if necessary, and copy to ICU4C
+  cd ~/svn.icu4j/trunk/src
+  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
+  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools & ICU tools are checked in
+  http://www.unicode.org/utility/trac/log/trunk/unicodetools
+  http://bugs.icu-project.org/trac/log/tools/trunk
+
+---------------------------------------------------------------------------- ***
+
+New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
+
+Adding
+- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
+- new combination/alias codes: Hanb, Jamo
+  - used in CLDR 29 and in spoof checker
+- new Z* code: Zsye
+
+Add new codes to uscript.h & UScript.java, see Unicode update logs.
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2; \3
+
+Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
+add new script codes.
+"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
+
+Note: If we have to run preparseucd.py again before the Unicode 9 update,
+then we need to manually keep/restore the new script codes.
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICUDT=icudt57b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+UNIDATA=$ICU_SRC_DIR/source/data/unidata
+
+Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
+see http://bugs.icu-project.org/trac/ticket/12141
+
+make install, then icutools cmake & make, then
+~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
+
+Generate Java data as usual, only update pnames.icu & uprops.icu.
+
+*** LayoutEngine script information
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+  in the working directory.
+
+  (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
+  (a plain text file)
+  which maps ICU versions to the numbers of script/language constants
+  that were added then.
+  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
+
+  The generated files have a current copyright date and "@deprecated" statement.
+
+* Review changes, fix Java tool if necessary, and copy to ICU4C
+  cd ~/svn.icu4j/trunk/src
+  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
+  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
+
+---------------------------------------------------------------------------- ***
+
+Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
+
+Edit preparseucd.py to add & parse new properties.
+They share the UCD property namespace but are not listed in PropertyAliases.txt.
+
+Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
+Initial data from emoji/2.0/
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICUDT=icudt56b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+UNIDATA=$ICU_SRC_DIR/source/data/unidata
+
+Add binary-property constants to uchar.h enum UProperty & UProperty.java.
+
+~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
+(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
+
+Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
+
+make install, then icutools cmake & make, then
+~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
+
+Generate Java data as usual, only update pnames.icu & uprops.icu.
+
+---------------------------------------------------------------------------- ***
+
+Unicode 8.0 update for ICU 56
+
+* Command-line environment setup
+
+ICU_ROOT=~/svn.icu/trunk
+ICU_SRC_DIR=$ICU_ROOT/src
+ICUDT=icudt56b
+export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+UNIDATA=$ICU_SRC_DIR/source/data/unidata
+
+http://www.unicode.org/review/pri297/  -- beta review
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://unicode.org/versions/beta-8.0.0.html
+http://www.unicode.org/versions/Unicode8.0.0/
+http://www.unicode.org/reports/tr44/tr44-15.html
+
+*** ICU Trac
+
+- ticket:11574: Unicode 8
+- C++ branches/markus/uni80 at r37351 from trunk at r37343
+- Java branches/markus/uni80 at r37352 from trunk at r37338
+
+*** CLDR Trac
+
+- cldrbug 8311: UCA 8
+- branches/markus/uni80 at r11518 from trunk at r11517
+
+- cldrbug 8109: Unicode 8.0 script metadata
+- cldrbug 8418: Updated segmentation for Unicode 8.0
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- only for manual diffs: remove version suffixes from the file names
+  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
+  (see https://sites.google.com/site/unicodetools/inputdata)
+- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+
+- also: from http://unicode.org/Public/security/8.0.0/ download new
+  confusables.txt & confusablesWholeScript.txt
+  and copy to $UNIDATA
+    ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
+    ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
+
+* initial preparseucd.py changes
+- remove new Unicode scripts from the
+  only-in-ISO-15924 list according to the error message:
+    ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
+    from _scripts_only_in_iso15924
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+- property and file name change:
+    IndicMatraCategory -> IndicPositionalCategory
+- UnicodeData.txt unusual numeric values (improper fractions)
+    109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
+    109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
+    109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
+    109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
+    109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
+    109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
+    109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
+    109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
+    109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
+    109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
+  -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
+     which are listed in DerivedNumericValues.txt;
+     keeps storage in data file simple
+
+* PropertyValueAliases.txt changes
+- 10 new Block (blk) values:
+    blk; Ahom                             ; Ahom
+    blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
+    blk; Cherokee_Sup                     ; Cherokee_Supplement
+    blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
+    blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
+    blk; Hatran                           ; Hatran
+    blk; Multani                          ; Multani
+    blk; Old_Hungarian                    ; Old_Hungarian
+    blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
+    blk; Sutton_SignWriting               ; Sutton_SignWriting
+  -> add to uchar.h
+    use long property names for enum constants
+  -> add to UCharacter.UnicodeBlock IDs
+    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+            replace  public static final int \1_ID = \2; \3
+  -> add to UCharacter.UnicodeBlock objects
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- 6 new Script (sc) values:
+    sc ; Ahom                             ; Ahom
+    sc ; Hatr                             ; Hatran
+    sc ; Hluw                             ; Anatolian_Hieroglyphs
+    sc ; Hung                             ; Old_Hungarian
+    sc ; Mult                             ; Multani
+    sc ; Sgnw                             ; SignWriting
+  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+    (not strictly necessary for NOT_ENCODED scripts)
+  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
+
+* generate normalization data files
+  cd $ICU_ROOT/dbg
+  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
+  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
+
+* build Unicode tools using CMake+make
+
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
+
+  # Location (--prefix) of where ICU was installed.
+  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
+  # Location of the ICU source tree.
+  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
+
+  ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
+  ~/svn.icutools/trunk/dbg/unicode/c$ make
+
+* generate core properties data files
+- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
+- rebuild ICU (make install) & tools
+- run genuca again (see step above) so that it picks up the new nfc.nrm
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..8.0: U+2260, U+226E, U+226F
+- nothing new in 8.0, no test file to update
+
+* run & fix ICU4C tests
+- bad Cherokee case folding due to difference in fallbacks:
+  UCD case folding falls back to no mapping,
+  ICU runtime case folding falls back to lowercasing;
+  fixed casepropsbuilder.cpp to generate scf mappings to self
+  when there is an slc mapping but no scf
+- Andy handles RBBI & spoof check test failures
+
+* collation: CLDR collation root, UCA DUCET
+
+- UCA DUCET goes into Mark's Unicode tools, see
+  https://sites.google.com/site/unicodetools/home#TOC-UCA
+- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
+- cd (CLDR UCA branch)/common/uca/
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+  cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
+    (note removing the underscore before "Rules")
+    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- restore TODO diffs in UCARules.txt
+    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  from the CLDR root files (..._CLDR_..._SHORT.txt)
+    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
+- if CLDR common/uca/unihan-index.txt changes, then update
+  CLDR common/collation/root.xml <collation type="private-unihan">
+  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
+- run genuca, see command line above;
+  deal with
+    Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
+        (add the character to genuca.cpp sampleCharsToScripts[])
+  + look up the script for the new sample characters
+    (e.g., in FractionalUCA.txt)
+  + *add* mappings to sampleCharsToScripts[], do not replace them
+    (in case the script sample characters flip-flop)
+  + insert new scripts in DUCET script order, see the top_byte table
+    at the beginning of FractionalUCA.txt
+- rebuild ICU4C
+
+* run & fix ICU4C tests, now with new CLDR collation root data
+- run all tests with the collation test data *_SHORT.txt or the full files
+  (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+- fixed bug in CollationWeights::getWeightRanges()
+  exposed by new data and CollationTest::TestRootElements
+
+* update Java data files
+- refresh just the UCD/UCA-related/derived files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt56l
+    echo timestamp > uni-core-data
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
+    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files,
+  and then refresh ICU4J
+    cd ~/svn.icu/trunk/dbg/data/out/icu4j
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+* update CollationFCD.java
+  + copy & paste the initializers of lcccIndex[] etc. from
+    ICU4C/source/i18n/collationfcd.cpp to
+    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd $ICU_SRC_DIR/source/data/unidata
+    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd ../../test/testdata
+    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* run & fix ICU4J tests
+
+*** LayoutEngine script information
+
+* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
+  because the layout engine was deprecated in ICU 54.
+  Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
+  to write lines that we used to add manually.
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+  in the working directory.
+
+  (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
+  (a plain text file)
+  which maps ICU versions to the numbers of script/language constants
+  that were added then.
+  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
+
+  The generated files have a current copyright date and "@deprecated" statement.
+
+* Review changes, fix Java tool if necessary, and copy to ICU4C
+  cd ~/svn.icu4j/trunk/src
+  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
+  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+- make sure that changes to Unicode tools & ICU tools are checked in
+  http://www.unicode.org/utility/trac/log/trunk/unicodetools
+  http://bugs.icu-project.org/trac/log/tools/trunk
+
+---------------------------------------------------------------------------- ***
+
+Unicode 7.0 update for ICU 54
+
+http://www.unicode.org/review/pri271/  -- beta review
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
+http://www.unicode.org/reports/tr44/tr44-13.html
+
+*** ICU Trac
+
+- ticket 10821: Unicode 7.0, UCA 7.0
+- C++ branches/markus/uni70 at r35584 from trunk at r35580
+- Java branches/markus/uni70 at r35587 from trunk at r35545
+
+*** CLDR Trac
+
+- ticket 7195: UCA 7.0 CLDR root collation
+- branches/markus/uni70 at r10062 from trunk at r10061
+
+- ticket 6762: script metadata for Unicode 7.0 new scripts
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- only for manual diffs: remove version suffixes from the file names
+  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
+  (see https://sites.google.com/site/unicodetools/inputdata)
+- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+- Restore TODO diffs in source/data/unidata/UCARules.txt
+    cd $ICU_SRC_DIR
+    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
+- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
+
+- also: from http://unicode.org/Public/security/7.0.0/ download new
+  confusables.txt & confusablesWholeScript.txt
+  and copy to $ICU_ROOT/src/source/data/unidata/
+
+* initial preparseucd.py changes
+- remove new Unicode scripts from the
+  only-in-ISO-15924 list according to the error message:
+    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
+                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
+                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
+    from _scripts_only_in_iso15924
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+- NamesList.txt now has a heading with a non-ASCII character
+  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
+  + escape non-ASCII characters in heading comments
+- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
+  + get the copyright from the first file whose copyright line contains the current year
+
+* PropertyValueAliases.txt changes
+- 32 new Block (blk) values:
+    blk; Bassa_Vah                        ; Bassa_Vah
+    blk; Caucasian_Albanian               ; Caucasian_Albanian
+    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
+    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
+    blk; Duployan                         ; Duployan
+    blk; Elbasan                          ; Elbasan
+    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
+    blk; Grantha                          ; Grantha
+    blk; Khojki                           ; Khojki
+    blk; Khudawadi                        ; Khudawadi
+    blk; Latin_Ext_E                      ; Latin_Extended_E
+    blk; Linear_A                         ; Linear_A
+    blk; Mahajani                         ; Mahajani
+    blk; Manichaean                       ; Manichaean
+    blk; Mende_Kikakui                    ; Mende_Kikakui
+    blk; Modi                             ; Modi
+    blk; Mro                              ; Mro
+    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
+    blk; Nabataean                        ; Nabataean
+    blk; Old_North_Arabian                ; Old_North_Arabian
+    blk; Old_Permic                       ; Old_Permic
+    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
+    blk; Pahawh_Hmong                     ; Pahawh_Hmong
+    blk; Palmyrene                        ; Palmyrene
+    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
+    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
+    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
+    blk; Siddham                          ; Siddham
+    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
+    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
+    blk; Tirhuta                          ; Tirhuta
+    blk; Warang_Citi                      ; Warang_Citi
+  -> add to uchar.h
+    use long property names for enum constants
+  -> add to UCharacter.UnicodeBlock IDs
+    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+            replace  public static final int \1_ID = \2; \3
+  -> add to UCharacter.UnicodeBlock objects
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- 28 new Joining_Group (jg) values:
+    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
+    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
+    jg ; Manichaean_Beth                  ; Manichaean_Beth
+    jg ; Manichaean_Daleth                ; Manichaean_Daleth
+    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
+    jg ; Manichaean_Five                  ; Manichaean_Five
+    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
+    jg ; Manichaean_Heth                  ; Manichaean_Heth
+    jg ; Manichaean_Hundred               ; Manichaean_Hundred
+    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
+    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
+    jg ; Manichaean_Mem                   ; Manichaean_Mem
+    jg ; Manichaean_Nun                   ; Manichaean_Nun
+    jg ; Manichaean_One                   ; Manichaean_One
+    jg ; Manichaean_Pe                    ; Manichaean_Pe
+    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
+    jg ; Manichaean_Resh                  ; Manichaean_Resh
+    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
+    jg ; Manichaean_Samekh                ; Manichaean_Samekh
+    jg ; Manichaean_Taw                   ; Manichaean_Taw
+    jg ; Manichaean_Ten                   ; Manichaean_Ten
+    jg ; Manichaean_Teth                  ; Manichaean_Teth
+    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
+    jg ; Manichaean_Twenty                ; Manichaean_Twenty
+    jg ; Manichaean_Waw                   ; Manichaean_Waw
+    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
+    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
+    jg ; Straight_Waw                     ; Straight_Waw
+  -> uchar.h & UCharacter.JoiningGroup
+- 23 new Script (sc) values:
+    sc ; Aghb                             ; Caucasian_Albanian
+    sc ; Bass                             ; Bassa_Vah
+    sc ; Dupl                             ; Duployan
+    sc ; Elba                             ; Elbasan
+    sc ; Gran                             ; Grantha
+    sc ; Hmng                             ; Pahawh_Hmong
+    sc ; Khoj                             ; Khojki
+    sc ; Lina                             ; Linear_A
+    sc ; Mahj                             ; Mahajani
+    sc ; Mani                             ; Manichaean
+    sc ; Mend                             ; Mende_Kikakui
+    sc ; Modi                             ; Modi
+    sc ; Mroo                             ; Mro
+    sc ; Narb                             ; Old_North_Arabian
+    sc ; Nbat                             ; Nabataean
+    sc ; Palm                             ; Palmyrene
+    sc ; Pauc                             ; Pau_Cin_Hau
+    sc ; Perm                             ; Old_Permic
+    sc ; Phlp                             ; Psalter_Pahlavi
+    sc ; Sidd                             ; Siddham
+    sc ; Sind                             ; Khudawadi
+    sc ; Tirh                             ; Tirhuta
+    sc ; Wara                             ; Warang_Citi
+  -> uscript.h (many were added before)
+    comment "Mende Kikakui" for USCRIPT_MENDE
+    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2; \3
+- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+  (added 2012-11-01)
+    Ahom        338     Ahom
+    Hatr        127     Hatran
+    Mult        323     Multani
+  (added 2013-10-12)
+    Modi        324     Modi
+    Pauc        263     Pau Cin Hau
+    Sidd        302     Siddham
+  -> uscript.h (some overlap with additions from Unicode)
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2; \3
+  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
+  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+    (not strictly necessary for NOT_ENCODED scripts)
+  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
+
+* generate normalization data files
+- cd $ICU_ROOT/dbg
+- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+- UNIDATA=$ICU_SRC_DIR/source/data/unidata
+- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
+- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
+
+* build Unicode tools using CMake+make
+
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
+# Location of the ICU source tree.
+set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
+
+~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
+~/svn.icutools/trunk/dbg/unicode/c$ make
+
+* genprops work
+- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
+  + add second array of Joining_Group values for at most 10800..10FFF
+    icutools: unicode/c/genprops/bidipropsbuilder.cpp
+    icu: source/common/ubidi_props.h/.c/_data.h
+    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
+
+* generate core properties data files
+- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
+- rebuild ICU (make install) & tools
+- run genuca again (see step above) so that it picks up the new nfc.nrm
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..7.0: U+2260, U+226E, U+226F
+- nothing new in 7.0, no test file to update
+
+* run & fix ICU4C tests
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt53l
+    echo timestamp > uni-core-data
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
+    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    ICUDT=icudt54b
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    cd ~/svn.icu/uni70/dbg/data/out/icu4j
+    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+- refresh ICU4J
+    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* update CollationFCD.java
+  + copy & paste the initializers of lcccIndex[] etc. from
+    ICU4C/source/i18n/collationfcd.cpp to
+    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd $ICU_SRC_DIR/source/data/unidata
+    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd ../../test/testdata
+    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* UCA
+
+- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
+- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
+- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
+- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
+- output files are in ~/svn.unitools/Generated/uca/7.0.0/
+- review data; compare files, use blankweights.sed or similar
+  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
+- cd ~/svn.unitools/Generated/uca/7.0.0/
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+    (note removing the underscore before "Rules")
+    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
+    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
+- run genuca, see command line above
+- rebuild ICU4C
+- refresh ICU4J collation data:
+  (subset of instructions above for properties data refresh, except copies all coll/*)
+    ICUDT=icudt54b
+    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
+- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
+  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+* run & fix ICU4J tests
+
+*** LayoutEngine script information
+
+(For details see the Unicode 5.2 change log below.)
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+  in the working directory.
+  (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+  The generated files have a current copyright date and "@stable" statement.
+  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
+  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
+  which may not contain dots any more.
+
+- diff current <icu>/source/layout files vs. generated ones
+    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+  review and manually merge desired changes;
+  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
+  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
+- if you just copy the above files, then
+  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
+  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+
+---------------------------------------------------------------------------- ***
+
+Unicode 6.3 update
+
+http://www.unicode.org/review/pri249/  -- beta review
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
+http://www.unicode.org/reports/tr44/tr44-11.html
+
+*** ICU Trac
+
+- ticket 10128: update ICU to Unicode 6.3 beta
+- ticket 10168: update ICU to Unicode 6.3 final
+- C++ branches/markus/uni63 at r33552 from trunk at r33551
+- Java branches/markus/uni63 at r33550 from trunk at r33553
+
+- ticket 10142: implement Unicode 6.3 bidi algorithm additions
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+  (configure.in & configure: have been modified to extract the version from uchar.h)
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD, UCA & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- modify preparseucd.py:
+  parse new file BidiBrackets.txt
+  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+- Check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out.
+
+* PropertyAliases.txt changes
+- 1 new Enumerated Property
+  bpt                      ; Bidi_Paired_Bracket_Type
+  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
+  -> ubidi_props.h & .c & UBiDiProps.java
+  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
+  -> uprops.cpp
+  -> change ubidi.icu format version from 2.0 to 2.1
+- 1 new Miscellaneous Property
+  bpb                      ; Bidi_Paired_Bracket
+  -> uchar.h & UProperty.java
+  -> ppucd.h & .cpp
+
+* PropertyValueAliases.txt changes
+- 3 Bidi_Paired_Bracket_Type (bpt) values:
+  bpt; c                                ; Close
+  bpt; n                                ; None
+  bpt; o                                ; Open
+  -> uchar.h & UCharacter.BidiPairedBracketType
+  -> ubidi_props.h & .c & UBiDiProps.java
+  -> change ubidi.icu format version from 2.0 to 2.1
+- 4 new Bidi_Class (bc) values:
+  bc ; FSI                              ; First_Strong_Isolate
+  bc ; LRI                              ; Left_To_Right_Isolate
+  bc ; RLI                              ; Right_To_Left_Isolate
+  bc ; PDI                              ; Pop_Directional_Isolate
+  -> uchar.h & UCharacterEnums.ECharacterDirection
+  -> until the bidi code gets updated,
+     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
+- 3 new Word_Break (WB) values:
+  WB ; HL                               ; Hebrew_Letter
+  WB ; SQ                               ; Single_Quote
+  WB ; DQ                               ; Double_Quote
+  -> uchar.h & UCharacter.WordBreak
+  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
+- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+  (added 2012-10-16)
+  Aghb  239     Caucasian Albanian
+  Mahj  314     Mahajani
+  -> uscript.h
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2;\3
+  -> preparseucd.py _scripts_only_in_iso15924
+  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+     (not strictly necessary for NOT_ENCODED scripts)
+
+* generate normalization data files
+- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
+- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
+- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
+
+* build Unicode tools using CMake+make
+
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
+# Location of the ICU source tree.
+set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
+
+~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
+~/svn.icutools/trunk/dbg/unicode/c$ make
+
+* generate core properties data files
+- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
+- rebuild ICU (make install) & tools
+- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..6.3: U+2260, U+226E, U+226F
+- nothing new in 6.3, no test file to update
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt52l
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
+    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
+- refresh ICU4J
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
+
+- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
+- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+  (note removing the underscore before "Rules")
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
+- check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out
+- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
+- run genuca, see command line above
+- rebuild ICU4C
+- refresh ICU4J collation data:
+  (subset of instructions above for properties data refresh, except copies all coll/*)
+    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* test ICU, fix test code where necessary
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+*** LayoutEngine script information
+- skipped for Unicode 6.3: no new scripts
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
  
  ---------------------------------------------------------------------------- ***