X-Git-Url: https://git.saurik.com/apple/icu.git/blobdiff_plain/374ca955a76ecab1204ca8bfa63ff9238d998416..a0b4f637ba1a6c3c5651b61a69303b029bacf7d3:/icuSources/data/unidata/changes.txt

diff --git a/icuSources/data/unidata/changes.txt b/icuSources/data/unidata/changes.txt
index a406c3f6..37feb9a1 100644
--- a/icuSources/data/unidata/changes.txt
+++ b/icuSources/data/unidata/changes.txt
@@ -1,3 +1,2054 @@
+ï»¿* Copyright (C) 2004-2015, International Business Machines
+* Corporation and others.  All Rights Reserved.
+*
+*   file name:  changes.txt
+*   encoding:   US-ASCII
+*   tab size:   8 (not used)
+*   indentation:4
+*
+*   created on: 2004may06
+*   created by: Markus W. Scherer
+*
+* change log for Unicode updates
+
+---------------------------------------------------------------------------- ***
+
+* New ISO 15924 script codes
+
+Starting with ICU 55, we do not add UScriptCode constants any more until their scripts
+are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.
+Script enum constant names want to follow the Unicode script property value aliases,
+which are assigned only when the scripts are encoded.
+When we encode scripts early and guess wrong, then we have confusing enum constants
+and have sometimes added aliases.
+
+Exception: Script codes like Latf and Aran that are not subject to separate encoding
+can be added at any time.
+
+Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html
+
+Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561
+- Adlm  166     Adlam
+- Aran  161     Arabic (Nastaliq variant)
+- Kitl  505     Khitan large script
+- Kits  288     Khitan small script
+- Marc  332     Marchen
+- Osge  219     Osage
+
+Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.
+
+Adlam, Marchen, and Osage are expected to go into Unicode 9;
+we should assign Unicode script property value aliases for them
+soon after Unicode 8 is released, and add them in ICU 56.
+
+Khitan scripts will be encoded later.
+
+---------------------------------------------------------------------------- ***
+
+Unicode 8.0 update for ICU ??
+
+* UCA issue from 7.0
+
+- U+1DE9 COMBINING LATIN SMALL LETTER BETA
+  sorts with Greek Beta, should sort with Latin B?
+  + Ken says:
+    No, it was deliberate:
+
+    03B2;GREEK SMALL LETTER BETA;Ll;;;;0392;;0392
+    1D5D;MODIFIER LETTER SMALL BETA;Lm;<super> 03B2;;;;;
+    1DE9;COMBINING LATIN SMALL LETTER BETA;Mn;<sort> 03B2;;;;;
+    1D66;GREEK SUBSCRIPT SMALL LETTER BETA;Ll;<sub> 03B2;;;;;
+
+    Note the relationship to U+1D5D.
+
+    When the disunified *Latin* beta base letter shows up in Unicode 8.0:
+
+    U+A7B4 LATIN CAPITAL LETTER BETA
+    U+A7B5 LATIN SMALL LETTER BETA
+
+    we could re-evaluate what U+1DE9 equates to, for collation,
+    but currently there isnât any Latin beta to serve that function
+    in Unicode 7.0.
+
+- ICU_ROOT=~/svn.icu/trunk
+- ICU_SRC_DIR=$ICU_ROOT/src
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
+
+
+---------------------------------------------------------------------------- ***
+
+Unicode 7.0 update for ICU 54
+
+http://www.unicode.org/review/pri271/  -- beta review
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
+http://www.unicode.org/reports/tr44/tr44-13.html
+
+*** ICU Trac
+
+- ticket 10821: Unicode 7.0, UCA 7.0
+- C++ branches/markus/uni70 at r35584 from trunk at r35580
+- Java branches/markus/uni70 at r35587 from trunk at r35545
+
+*** CLDR Trac
+
+- ticket 7195: UCA 7.0 CLDR root collation
+- branches/markus/uni70 at r10062 from trunk at r10061
+
+- ticket 6762: script metadata for Unicode 7.0 new scripts
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- only for manual diffs: remove version suffixes from the file names
+  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
+  (see https://sites.google.com/site/unicodetools/inputdata)
+- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+- Restore TODO diffs in source/data/unidata/UCARules.txt
+    cd $ICU_SRC_DIR
+    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
+- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
+
+- also: from http://unicode.org/Public/security/7.0.0/ download new
+  confusables.txt & confusablesWholeScript.txt
+  and copy to $ICU_ROOT/src/source/data/unidata/
+
+* initial preparseucd.py changes
+- remove new Unicode scripts from the
+  only-in-ISO-15924 list according to the error message:
+    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
+                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
+                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
+    from _scripts_only_in_iso15924
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+- NamesList.txt now has a heading with a non-ASCII character
+  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
+  + escape non-ASCII characters in heading comments
+- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
+  + get the copyright from the first file whose copyright line contains the current year
+
+* PropertyValueAliases.txt changes
+- 32 new Block (blk) values:
+    blk; Bassa_Vah                        ; Bassa_Vah
+    blk; Caucasian_Albanian               ; Caucasian_Albanian
+    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
+    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
+    blk; Duployan                         ; Duployan
+    blk; Elbasan                          ; Elbasan
+    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
+    blk; Grantha                          ; Grantha
+    blk; Khojki                           ; Khojki
+    blk; Khudawadi                        ; Khudawadi
+    blk; Latin_Ext_E                      ; Latin_Extended_E
+    blk; Linear_A                         ; Linear_A
+    blk; Mahajani                         ; Mahajani
+    blk; Manichaean                       ; Manichaean
+    blk; Mende_Kikakui                    ; Mende_Kikakui
+    blk; Modi                             ; Modi
+    blk; Mro                              ; Mro
+    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
+    blk; Nabataean                        ; Nabataean
+    blk; Old_North_Arabian                ; Old_North_Arabian
+    blk; Old_Permic                       ; Old_Permic
+    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
+    blk; Pahawh_Hmong                     ; Pahawh_Hmong
+    blk; Palmyrene                        ; Palmyrene
+    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
+    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
+    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
+    blk; Siddham                          ; Siddham
+    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
+    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
+    blk; Tirhuta                          ; Tirhuta
+    blk; Warang_Citi                      ; Warang_Citi
+  -> add to uchar.h
+    use long property names for enum constants
+  -> add to UCharacter.UnicodeBlock IDs
+    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+            replace  public static final int \1_ID = \2; \3
+  -> add to UCharacter.UnicodeBlock objects
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- 28 new Joining_Group (jg) values:
+    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
+    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
+    jg ; Manichaean_Beth                  ; Manichaean_Beth
+    jg ; Manichaean_Daleth                ; Manichaean_Daleth
+    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
+    jg ; Manichaean_Five                  ; Manichaean_Five
+    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
+    jg ; Manichaean_Heth                  ; Manichaean_Heth
+    jg ; Manichaean_Hundred               ; Manichaean_Hundred
+    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
+    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
+    jg ; Manichaean_Mem                   ; Manichaean_Mem
+    jg ; Manichaean_Nun                   ; Manichaean_Nun
+    jg ; Manichaean_One                   ; Manichaean_One
+    jg ; Manichaean_Pe                    ; Manichaean_Pe
+    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
+    jg ; Manichaean_Resh                  ; Manichaean_Resh
+    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
+    jg ; Manichaean_Samekh                ; Manichaean_Samekh
+    jg ; Manichaean_Taw                   ; Manichaean_Taw
+    jg ; Manichaean_Ten                   ; Manichaean_Ten
+    jg ; Manichaean_Teth                  ; Manichaean_Teth
+    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
+    jg ; Manichaean_Twenty                ; Manichaean_Twenty
+    jg ; Manichaean_Waw                   ; Manichaean_Waw
+    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
+    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
+    jg ; Straight_Waw                     ; Straight_Waw
+  -> uchar.h & UCharacter.JoiningGroup
+- 23 new Script (sc) values:
+    sc ; Aghb                             ; Caucasian_Albanian
+    sc ; Bass                             ; Bassa_Vah
+    sc ; Dupl                             ; Duployan
+    sc ; Elba                             ; Elbasan
+    sc ; Gran                             ; Grantha
+    sc ; Hmng                             ; Pahawh_Hmong
+    sc ; Khoj                             ; Khojki
+    sc ; Lina                             ; Linear_A
+    sc ; Mahj                             ; Mahajani
+    sc ; Mani                             ; Manichaean
+    sc ; Mend                             ; Mende_Kikakui
+    sc ; Modi                             ; Modi
+    sc ; Mroo                             ; Mro
+    sc ; Narb                             ; Old_North_Arabian
+    sc ; Nbat                             ; Nabataean
+    sc ; Palm                             ; Palmyrene
+    sc ; Pauc                             ; Pau_Cin_Hau
+    sc ; Perm                             ; Old_Permic
+    sc ; Phlp                             ; Psalter_Pahlavi
+    sc ; Sidd                             ; Siddham
+    sc ; Sind                             ; Khudawadi
+    sc ; Tirh                             ; Tirhuta
+    sc ; Wara                             ; Warang_Citi
+  -> uscript.h (many were added before)
+    comment "Mende Kikakui" for USCRIPT_MENDE
+    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2; \3
+- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+  (added 2012-11-01)
+    Ahom        338     Ahom
+    Hatr        127     Hatran
+    Mult        323     Multani
+  (added 2013-10-12)
+    Modi        324     Modi
+    Pauc        263     Pau Cin Hau
+    Sidd        302     Siddham
+  -> uscript.h (some overlap with additions from Unicode)
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2; \3
+  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
+  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+    (not strictly necessary for NOT_ENCODED scripts)
+  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
+
+* generate normalization data files
+- cd $ICU_ROOT/dbg
+- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
+- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
+- UNIDATA=$ICU_SRC_DIR/source/data/unidata
+- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
+- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
+
+* build Unicode tools using CMake+make
+
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
+# Location of the ICU source tree.
+set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
+
+~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
+~/svn.icutools/trunk/dbg/unicode/c$ make
+
+* genprops work
+- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
+  + add second array of Joining_Group values for at most 10800..10FFF
+    icutools: unicode/c/genprops/bidipropsbuilder.cpp
+    icu: source/common/ubidi_props.h/.c/_data.h
+    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
+
+* generate core properties data files
+- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
+- rebuild ICU (make install) & tools
+- run genuca again (see step above) so that it picks up the new nfc.nrm
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..7.0: U+2260, U+226E, U+226F
+- nothing new in 7.0, no test file to update
+
+* run & fix ICU4C tests
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt53l
+    echo timestamp > uni-core-data
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
+    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    ICUDT=icudt54b
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+    cd ~/svn.icu/uni70/dbg/data/out/icu4j
+    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
+    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
+    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
+- refresh ICU4J
+    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+
+* update CollationFCD.java
+  + copy & paste the initializers of lcccIndex[] etc. from
+    ICU4C/source/i18n/collationfcd.cpp to
+    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd $ICU_SRC_DIR/source/data/unidata
+    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cd ../../test/testdata
+    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* UCA
+
+- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
+- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
+- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
+- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
+- output files are in ~/svn.unitools/Generated/uca/7.0.0/
+- review data; compare files, use blankweights.sed or similar
+  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
+- cd ~/svn.unitools/Generated/uca/7.0.0/
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+    (note removing the underscore before "Rules")
+    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
+    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
+    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
+    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
+- run genuca, see command line above
+- rebuild ICU4C
+- refresh ICU4J collation data:
+  (subset of instructions above for properties data refresh, except copies all coll/*)
+    ICUDT=icudt54b
+    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
+    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
+- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
+  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+* run & fix ICU4J tests
+
+*** LayoutEngine script information
+
+(For details see the Unicode 5.2 change log below.)
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+  in the working directory.
+  (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+  The generated files have a current copyright date and "@stable" statement.
+  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
+  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
+  which may not contain dots any more.
+
+- diff current <icu>/source/layout files vs. generated ones
+    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+  review and manually merge desired changes;
+  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
+  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
+- if you just copy the above files, then
+  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
+  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
+
+*** API additions
+- send notice to icu-design about new born-@stable API (enum constants etc.)
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+
+---------------------------------------------------------------------------- ***
+
+Unicode 6.3 update
+
+http://www.unicode.org/review/pri249/  -- beta review
+http://www.unicode.org/reports/uax-proposed-updates.html
+http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
+http://www.unicode.org/reports/tr44/tr44-11.html
+
+*** ICU Trac
+
+- ticket 10128: update ICU to Unicode 6.3 beta
+- ticket 10168: update ICU to Unicode 6.3 final
+- C++ branches/markus/uni63 at r33552 from trunk at r33551
+- Java branches/markus/uni63 at r33550 from trunk at r33553
+
+- ticket 10142: implement Unicode 6.3 bidi algorithm additions
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+  (configure.in & configure: have been modified to extract the version from uchar.h)
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
+  so that the makefiles see the new version number.
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD, UCA & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- modify preparseucd.py:
+  parse new file BidiBrackets.txt
+  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
+- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+- Check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out.
+
+* PropertyAliases.txt changes
+- 1 new Enumerated Property
+  bpt                      ; Bidi_Paired_Bracket_Type
+  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
+  -> ubidi_props.h & .c & UBiDiProps.java
+  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
+  -> uprops.cpp
+  -> change ubidi.icu format version from 2.0 to 2.1
+- 1 new Miscellaneous Property
+  bpb                      ; Bidi_Paired_Bracket
+  -> uchar.h & UProperty.java
+  -> ppucd.h & .cpp
+
+* PropertyValueAliases.txt changes
+- 3 Bidi_Paired_Bracket_Type (bpt) values:
+  bpt; c                                ; Close
+  bpt; n                                ; None
+  bpt; o                                ; Open
+  -> uchar.h & UCharacter.BidiPairedBracketType
+  -> ubidi_props.h & .c & UBiDiProps.java
+  -> change ubidi.icu format version from 2.0 to 2.1
+- 4 new Bidi_Class (bc) values:
+  bc ; FSI                              ; First_Strong_Isolate
+  bc ; LRI                              ; Left_To_Right_Isolate
+  bc ; RLI                              ; Right_To_Left_Isolate
+  bc ; PDI                              ; Pop_Directional_Isolate
+  -> uchar.h & UCharacterEnums.ECharacterDirection
+  -> until the bidi code gets updated,
+     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
+- 3 new Word_Break (WB) values:
+  WB ; HL                               ; Hebrew_Letter
+  WB ; SQ                               ; Single_Quote
+  WB ; DQ                               ; Double_Quote
+  -> uchar.h & UCharacter.WordBreak
+  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
+- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+  (added 2012-10-16)
+  Aghb  239     Caucasian Albanian
+  Mahj  314     Mahajani
+  -> uscript.h
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2;\3
+  -> preparseucd.py _scripts_only_in_iso15924
+  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
+     (not strictly necessary for NOT_ENCODED scripts)
+
+* generate normalization data files
+- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
+- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
+- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+
+~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
+
+* build Unicode tools using CMake+make
+
+~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
+
+# Location (--prefix) of where ICU was installed.
+set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
+# Location of the ICU source tree.
+set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
+
+~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
+~/svn.icutools/trunk/dbg/unicode/c$ make
+
+* generate core properties data files
+- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
+- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
+- rebuild ICU (make install) & tools
+- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..6.3: U+2260, U+226E, U+226F
+- nothing new in 6.3, no test file to update
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt52l
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
+    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
+- refresh ICU4J
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
+
+- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
+- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+  (note removing the underscore before "Rules")
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
+- check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out
+- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
+- run genuca, see command line above
+- rebuild ICU4C
+- refresh ICU4J collation data:
+  (subset of instructions above for properties data refresh, except copies all coll/*)
+    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
+    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* test ICU, fix test code where necessary
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+*** LayoutEngine script information
+- skipped for Unicode 6.3: no new scripts
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+
+---------------------------------------------------------------------------- ***
+
+Unicode 6.2 update
+
+http://www.unicode.org/review/pri230/
+http://www.unicode.org/versions/beta-6.2.0.html
+http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
+http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
+http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
+http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
+http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
+http://unicode.org/Public/idna/6.2.0/
+
+*** ICU Trac
+
+- ticket 9515: Unicode 6.2: final ICU update
+
+- ticket 9514: UCA 6.2: fix UCARules.txt
+
+- ticket 9437: update ICU to Unicode 6.2
+- C++ branches/markus/uni62 at r32050 from trunk at r32041
+- Java branches/markus/uni62 at r32068 from trunk at r32066
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+  (configure.in & configure: have been modified to extract the version from uchar.h)
+- com.ibm.icu.util.VersionInfo
+- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
+
+*** data files & enums & parser code
+
+* file preparation
+
+- download UCD, UCA & IDNA files
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
+- modify preparseucd.py: NamesList.txt is now in UTF-8
+- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+- Check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out.
+
+* PropertyValueAliases.txt changes
+- 1 new Line_Break (lb) value:
+  lb ; RI                               ; Regional_Indicator
+  -> uchar.h & UCharacter.LineBreak
+- 1 new Word_Break (WB) value:
+  WB ; RI                               ; Regional_Indicator
+  -> uchar.h & UCharacter.WordBreak
+- 1 new Grapheme_Cluster_Break (GCB) value:
+  GCB; RI                               ; Regional_Indicator
+  -> uchar.h & UCharacter.GraphemeClusterBreak
+
+* 3 new numeric values
+  The new value -1, which was really supposed to be NaN but that would have required
+  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
+  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
+    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
+    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
+  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
+    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
+    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
+  -> uprops.h, uchar.c & UCharacterProperty.java
+  -> cucdtst.c & UCharacterTest.java
+
+* generate normalization data files
+- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
+- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
+- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
+- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+* build Unicode tools using CMake+make
+
+* generate core properties data files
+- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
+- in initial bootstrapping, change the UCA version
+  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
+- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
+- rebuild ICU (make install) & tools
+  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
+    check if the UCA version in FractionalUCA.txt matches the new Unicode version
+    (see step above)
+- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
+- rebuild ICU (make install) & tools
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..6.2: U+2260, U+226E, U+226F
+- nothing new in 6.2, no test file to update
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt50l
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
+    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
+    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
+    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
+    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
+    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
+    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
+- refresh ICU4J
+    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* UCA
+
+- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
+- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+  (note removing the underscore before "Rules")
+- update (ICU4C)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
+- check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out
+- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
+- run genuca, see command line above
+- rebuild ICU4C
+- refresh ICU4J collation data:
+  (subset of instructions above for properties data refresh, except copies all coll/*)
+    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
+    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
+    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* test ICU, fix test code where necessary
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+*** LayoutEngine script information
+- skipped for Unicode 6.2: no new scripts
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+
+---------------------------------------------------------------------------- ***
+
+Future Unicode update
+
+Tools simplified since the Unicode 6.1 update. See
+- http://site.icu-project.org/design/props/ppucd
+- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
+
+* Unicode version numbers
+- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
+
+* file preparation
+- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
+- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
+- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
+- Check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out.
+
+* PropertyValueAliases.txt changes
+- Script codes that are in ISO 15924 but not in Unicode are now listed in
+  preparseucd.py, in the _scripts_only_in_iso15924 variable.
+  If there are new ISO codes, then add them.
+  If Unicode adds some of them, then remove them from the .py variable.
+
+* UnicodeData.txt changes
+- No more manual changes for CJK ranges for algorithmic names;
+  those are now written to ppucd.txt and genprops reads them from there.
+
+* generate core properties data files (makeprops.sh was deleted)
+- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
+
+* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
+- it is now generated by preparseucd.py
+
+* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
+- it is now generated by preparseucd.py
+- make sure that the Unicode data folder passed into preparseucd.py
+  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
+  (can be in some subfolder)
+
+* generate normalization data files
+- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
+- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
+- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
+- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
+- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
+- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
+- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
+
+* build ICU (make install)
+* build Unicode tools using CMake+make
+
+* new way to call genuca (makeuca.sh was deleted)
+- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
+
+---------------------------------------------------------------------------- ***
+
+Unicode 6.1 update
+
+*** ICU Trac
+
+- ticket 8995 final update to Unicode 6.1
+- ticket 8994 regenerate source/layout/CanonData.cpp
+
+- ticket 8961 support Unicode "Age" value *names*
+- ticket 8963 support multiple character name aliases & types
+
+- ticket 8827 "update ICU to Unicode 6.1"
+- C++ branches/markus/uni61 at r30864 from trunk at r30843
+- Java branches/markus/uni61 at r30865 from trunk at r30863
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+  (configure.in & configure: have been modified to extract the version from uchar.h)
+- com.ibm.icu.util.VersionInfo
+- icutools/unicode/makedefs.sh
+  + also review & update other definitions in that file,
+    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
+
+*** data files & enums & parser code
+
+* file preparation
+
+~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
+- This prepares both unidata and testdata files in respective output subfolders.
+- Check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out.
+
+* PropertyValueAliases.txt changes
+- 11 new block names:
+  Arabic_Extended_A
+  Arabic_Mathematical_Alphabetic_Symbols
+  Chakma
+  Meetei_Mayek_Extensions
+  Meroitic_Cursive
+  Meroitic_Hieroglyphs
+  Miao
+  Sharada
+  Sora_Sompeng
+  Sundanese_Supplement
+  Takri
+  -> add to uchar.h
+  -> add to UCharacter.UnicodeBlock IDs
+    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
+            replace  public static final int \1_ID = \2; \3
+  -> add to UCharacter.UnicodeBlock objects
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- 1 new Joining_Group (jg) value:
+  Rohingya_Yeh
+  -> uchar.h & UCharacter.JoiningGroup
+- 2 new Line_Break (lb) values:
+  CJ=Conditional_Japanese_Starter
+  HL=Hebrew_Letter
+  -> uchar.h & UCharacter.LineBreak
+- 7 new scripts:
+  sc ; Cakm      ; Chakma
+  sc ; Merc      ; Meroitic_Cursive
+  sc ; Mero      ; Meroitic_Hieroglyphs
+  sc ; Plrd      ; Miao
+  sc ; Shrd      ; Sharada
+  sc ; Sora      ; Sora_Sompeng
+  sc ; Takr      ; Takri
+  -> remove these from SyntheticPropertyValueAliases.txt
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+  (added 2011-06-21)
+  Khoj        322     Khojki
+  Tirh        326     Tirhuta
+    and another one added 2011-12-09
+  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
+  -> uscript.h
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2;\3
+  -> SyntheticPropertyValueAliases.txt
+  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+* UnicodeData.txt changes
+- the last Unihan code point changes from U+9FCB to U+9FCC
+  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
+  + do change gennames.c
+  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
+
+* DerivedBidiClass.txt changes
+- 2 new default-AL blocks:
+#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
+#     Arabic Mathematical Alphabetic Symbols:
+#                       U+1EE00  - U+1EEFF  (was default-R)
+- 2 new default-R blocks:
+#     Meroitic Hieroglyphs:
+#                        U+10980 - U+1099F
+#     Meroitic Cursive:  U+109A0 - U+109FF
+  -> should be picked up by the explicit data in the file
+
+* NameAliases.txt changes
+- from
+    # Each line has two fields
+    # First field: Code point
+    # Second field: Alias
+- to
+    # Each line has three fields, as described here:
+    #
+    # First field:  Code point
+    # Second field: Alias
+    # Third field:  Type
+- Also, the file previously allowed multiple aliases but only now does it
+  actually provide multiple, even multiple of the same type. For example,
+    FEFF;BYTE ORDER MARK;alternate
+    FEFF;BOM;abbreviation
+    FEFF;ZWNBSP;abbreviation
+- This breaks our gennames parser, unames.icu data structure, and API.
+  Fix gennames to only pick up "correction" aliases.
+  New ticket #8963 for further changes.
+
+* run genpname/preparse.pl (on Linux)
+  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
+  + make sure that data.h is writable
+  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
+  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
+
+* build ICU (make install)
+  so that the tools build can pick up the new definitions from the installed header files.
+* build Unicode tools (at least genpname) using CMake+make
+
+* run genpname
+  (builds both pnames.icu and propname_data.h)
+- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
+- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
+
+* build ICU (make install)
+* build Unicode tools using CMake+make
+
+* update source/data/unidata/norm2/nfkc_cf.txt
+- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
+
+* update source/data/unidata/norm2/uts46.txt
+- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
+  to ~/svn.icu/tools/trunk/src/unicode/py
+- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
+- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
+- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0..6.1: U+2260, U+226E, U+226F
+- nothing new in 6.1, no test file to update
+
+* generate core properties data files
+- in initial bootstrapping, change the UCA version
+  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
+- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU & tools
+  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
+    check if the UCA version in FractionalUCA.txt matches the new Unicode version
+    (see step above)
+- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
+  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU & tools
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt49l
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
+    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
+    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
+    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
+    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
+    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
+- refresh ICU4J
+    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* test ICU so far, fix test code where necessary
+- temporarily ignore collation issues that look like UCA/UCD mismatches,
+  until UCA data is updated
+
+* UCA
+
+- get output from Mark's tools; look in
+    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+  (note removing the underscore before "Rules")
+- update (ICU)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
+- check test file diffs for previously commented-out, known-failing data lines;
+  probably need to keep those commented out
+- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
+- run makeuca.sh:
+  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU4C
+- refresh ICU4J collation data:
+  (subset of instructions above for properties data refresh, except copies all coll/*)
+    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
+    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+*** LayoutEngine script information
+
+(For details see the Unicode 5.2 change log below.)
+
+* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
+  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
+  in the working directory.
+  (It also generates ScriptRunData.cpp, which is no longer needed.)
+
+  The generated files have a current copyright date and "@draft" statement.
+
+- diff current <icu>/source/layout files vs. generated ones
+    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
+  review and manually merge desired changes;
+  fix gratuitous changes, incorrect @draft and missing aliases;
+  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
+- if you just copy the above files, then
+  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
+  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
+
+*** merge the Unicode update branches back onto the trunk
+- do not merge the icudata.jar and testdata.jar,
+  instead rebuild them from merged & tested ICU4C
+
+---------------------------------------------------------------------------- ***
+
+ICU 4.8 (no Unicode update, just new script codes)
+
+* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+  (added 2010-12-21)
+    Afak    439     Afaka
+    Jurc    510     Jurchen
+    Mroo    199     Mro, Mru
+    Nshu    499     NÃ¼shu
+    Shrd    319     Sharada, ÅÄradÄ
+    Sora    398     Sora Sompeng
+    Takr    321     Takri, á¹¬ÄkrÄ«, á¹¬Äá¹krÄ«
+    Tang    520     Tangut
+    Wole    480     Woleai
+  -> uscript.h
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2;\3
+  -> genpname/SyntheticPropertyValueAliases.txt
+  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+
+* run genpname/preparse.pl (on Linux)
+  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
+  + make sure that data.h is writable
+  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
+  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
+
+* rebuild Unicode tools (at least genpname) using make
+- You might first need to "make install" ICU so that the tools build can pick
+  up the new definitions from the installed header files.
+
+* run genpname
+  (builds both pnames.icu and propname_data.h)
+- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
+- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
+- rebuild ICU & tools
+
+* run genprops
+- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
+- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
+- rebuild ICU & tools
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
+    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
+    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
+- refresh ICU4J
+    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
+
+* should have updated the layout engine script codes but forgot
+
+---------------------------------------------------------------------------- ***
+
+Unicode 6.0 update
+
+*** related ICU Trac tickets
+
+7264 Unicode 6.0 Update
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+  (configure.in & configure: have been modified to extract the version from uchar.h)
+- com.ibm.icu.util.VersionInfo
+
+*** data files & enums & parser code
+
+* file preparation
+
+~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
+- This now prepares both unidata and testdata files in respective output subfolders.
+
+* PropertyAliases.txt changes
+- new Script_Extensions property defined in the new ScriptExtensions.txt file
+  but not listed in PropertyAliases.txt; reported to unicode.org;
+  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
+    scx; Script_Extensions
+  -> uchar.h with new UProperty section
+  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
+
+* PropertyValueAliases.txt changes
+- 12 new block names:
+  Alchemical_Symbols
+  Bamum_Supplement
+  Batak
+  Brahmi
+  CJK_Unified_Ideographs_Extension_D
+  Emoticons
+  Ethiopic_Extended_A
+  Kana_Supplement
+  Mandaic
+  Miscellaneous_Symbols_And_Pictographs
+  Playing_Cards
+  Transport_And_Map_Symbols
+  -> add to uchar.h
+  -> add to UCharacter.UnicodeBlock
+    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
+            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- Joining_Group (jg) values:
+  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
+  -> uchar.h & UCharacter.JoiningGroup
+- 3 new scripts:
+  sc ; Batk      ; Batak
+  sc ; Brah      ; Brahmi
+  sc ; Mand      ; Mandaic
+  -> remove these from SyntheticPropertyValueAliases.txt
+  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
+  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+  (added 2009-11-11..2010-07-18)
+  Bass        259     Bassa Vah
+  Dupl        755     Duployan shortand
+  Elba        226     Elbasan
+  Gran        343     Grantha
+  Kpel        436     Kpelle
+  Loma        437     Loma
+  Mend        438     Mende
+  Merc        101     Meroitic Cursive
+  Narb        106     Old North Arabian
+  Nbat        159     Nabataean
+  Palm        126     Palmyrene
+  Sind        318     Sindhi
+  Wara        262     Warang Citi
+  -> uscript.h
+  -> com.ibm.icu.lang.UScript
+    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+    replace  public static final int \1 = \2;\3
+  -> SyntheticPropertyValueAliases.txt
+  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+      and in com.ibm.icu.dev.test.lang.TestUScript.java
+- ISO 15924 name change
+  Mero        100     Meroitic Hieroglyphs (was Meroitic)
+  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
+- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
+
+* UnicodeData.txt changes
+- new CJK block:
+  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
+  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
+  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
+
+* build Unicode tools using CMake+make
+
+* run genpname/preparse.pl (on Linux)
+  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
+  + make sure that data.h is writable
+  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
+  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
+
+* rebuild Unicode tools (at least genpname) using make
+- You might first need to "make install" ICU so that the tools build can pick
+  up the new definitions from the installed header files.
+
+* run genpname
+- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
+- rebuild ICU & tools
+
+* update source/data/unidata/norm2/nfkc_cf.txt
+- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
+
+* update source/data/unidata/norm2/uts46.txt
+- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
+  to ~/svn.icu/tools/trunk/src/unicode/py
+- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
+- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
+- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0: U+2260, U+226E, U+226F
+
+* generate core properties data files
+- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU & tools
+- run makeuca.sh so that genuca picks up the new nfc.nrm:
+  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU & tools
+
+* implement new Script_Extensions property (provisional)
+- parser & generator: genprops & uprops.icu
+- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
+- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
+
+* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
+- (one-time change)
+- genbidi/gencase/genprops tools changes
+- re-run makeprops.sh (see above)
+- UCharacterProperty.java, UCharacterTypeIterator.java,
+  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
+  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+  output:
+    ...
+    Unicode .icu files built to ./out/build/icudt45l
+    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
+    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
+    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
+    mkdir -p /tmp/icu4j/main/shared/data
+    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+- copy the big-endian Unicode data files to another location,
+  separate from the other data files
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
+    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
+- refresh ICU4J
+    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* un-hardcode normalization skippable (NF*_Inert) test data
+- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
+
+* copy updated break iterator test files
+- now handled by early ucdcopy.py and
+  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
+  (old instructions:
+   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
+   to ~/svn.icu/trunk/src/source/test/testdata)
+- they are not used in ICU4J
+
+* UCA
+
+- get output from Mark's tools; look in
+    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
+    http://www.macchiato.com/unicode/utc/additional-uca-files
+    http://www.unicode.org/Public/UCA/6.0.0/
+    http://www.unicode.org/~mdavis/uca/
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+- update Han-implicit ranges for new CJK extensions:
+  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
+- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
+  do not add it into invuca so that tailoring primary-after an ignorable works
+- genuca: permit space between [variable top] bytes
+- ucol.cpp: treat noncharacters like unassigned rather than ignorable
+- run makeuca.sh:
+  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU4C
+- refresh ICU4J collation data:
+  (subset of instructions above for properties data refresh, except copies all coll/*)
+    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
+- update (ICU)/source/test/testdata/CollationTest_*.txt
+  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+  with output from Mark's Unicode tools
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+*** LayoutEngine script information
+
+(For details see the Unicode 5.2 change log below.)
+
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+* fix mixed line endings
+* review the diffs and fix incorrect @draft and missing aliases;
+  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
+* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
+
+---------------------------------------------------------------------------- ***
+
+Unicode 5.2 update
+
+*** related ICU Trac tickets
+
+7084 Unicode 5.2
+
+7167 verify collation bytes
+7235 Java test NAME_ALIAS
+7236 Java DerivedCoreProperties.txt test
+7237 Java BidiTest.txt
+7238 UTrie2 in core unidata
+7239 test for tailoring gaps
+7240 Java fix CollationMiscTest
+7243 update layout engine for Unicode 5.2
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in & configure
+- update ucdVersion in gennames.c if an algorithmic range changes
+
+*** data files & enums & parser code
+
+* file preparation
+
+python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
+- includes finding files regardless of version numbers,
+  copying them, and performing the equivalent processing of the
+  ucdstrip and ucdmerge tools on the desired set of files
+
+* notes on changes
+- PropertyAliases.txt
+  moved from numeric to enumerated:
+    ccc       ; Canonical_Combining_Class
+  new string properties:
+    NFKC_CF   ; NFKC_Casefold
+    Name_Alias; Name_Alias
+  new binary properties:
+    Cased     ; Cased
+    CI        ; Case_Ignorable
+    CWCF      ; Changes_When_Casefolded
+    CWCM      ; Changes_When_Casemapped
+    CWKCF     ; Changes_When_NFKC_Casefolded
+    CWL       ; Changes_When_Lowercased
+    CWT       ; Changes_When_Titlecased
+    CWU       ; Changes_When_Uppercased
+  new CJK Unihan properties (not supported by ICU)
+- PropertyValueAliases.txt
+  new block names
+  new scripts
+  one script code change:
+    sc ; Qaai      ; Inherited
+    ->
+    sc ; Zinh      ; Inherited                        ; Qaai
+  new Line_Break (lb) value:
+    lb ; CP        ; Close_Parenthesis
+  new Joining_Group (jg) values: Farsi_Yeh, Nya
+  other new values:
+    ccc; 214; ATA  ; Attached_Above
+- DerivedBidiClass.txt
+  new default-R range: U+1E800 - U+1EFFF
+- UnicodeData.txt
+  all of the ISO comments are gone
+  new CJK block end:
+    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
+  new CJK block:
+    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
+    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
+
+* genpname
+- run preparse.pl
+  + cd \svn\icuproj\icu\trunk\source\tools\genpname
+  + make sure that data.h is writable
+  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
+  + preparse.pl complains with errors like the following:
+      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
+    This is because ICU 4.0 had scripts from ISO 15924 which are now
+    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
+    and PropertyValueAliases.txt.
+    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
+       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
+  + preparse.pl complains with errors about block names missing from uchar.h; add them
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new block & script values
+  + 26 new blocks
+    copy new blocks from Blocks.txt
+    MS VC++ 2008 regular expression:
+      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
+      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
+  + several new script values already added in ICU 4.0 for ISO 15924 coverage
+    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
+  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
+  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
+    (added to SyntheticPropertyValueAliases.txt)
+- new Joining Group (JG) values: Farsi_Yeh, Nya
+- new Line_Break (lb) value:
+    lb ; CP        ; Close_Parenthesis
+
+* hardcoded Unihan range end/limit
+- Unihan range end moves from 9FC3 to 9FCB
+  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
+  + do change gennames.c
+
+* Compare definitions of new binary properties with what we used to use
+  in algorithms, to see if the definitions changed.
+- Verified that definitions for Cased and Case_Ignorable are unchanged.
+  The gencase tool now parses the newly public Case_Ignorable values
+  in case the definition changes in the future.
+
+* uchar.c & uprops.h & uprops.c & genprops
+- new numeric values that didn't exist in Unicode data before:
+    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
+  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
+  therefore redesign the encoding of numeric types and values for formatVersion 6;
+  design for simple numbers up to at least 144 ("one gross"),
+  large values up to at least 10^20,
+  and fractions with numerators -1..17 and denominators 1..16
+  to cover current and expected future values
+  (e.g., more Han numeric values, Meroitic twelfths)
+
+* reimplement Hangul_Syllable_Type for new Jamo characters
+- the old code assumed that all Jamo characters are in the 11xx block
+- Unicode 5.2 fills holes there and adds new Jamo characters in
+    A960..A97F; Hangul Jamo Extended-A
+  and in
+    D7B0..D7FF; Hangul Jamo Extended-B
+- Hangul_Syllable_Type can be trivially derived from a subset of
+  Grapheme_Cluster_Break values
+
+* build Unicode data source code for hardcoding core data
+C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
+
+ICU data make path is \svn\icuproj\icu\trunk\source\data\
+ICU root path is \svn\icuproj\icu\trunk
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
+Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
+Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
+Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
+Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
+Creating data file for Unicode Property Names
+Creating data file for Unicode Character Properties
+Creating data file for Unicode Case Mapping Properties
+Creating data file for Unicode BiDi/Shaping Properties
+Creating data file for Unicode Normalization
+Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
+Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
+
+- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
+  and rebuild the common library
+
+*** UCA
+
+- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
+- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
+[ Begin obsolete instructions:
+  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
+    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
+      on Windows:
+        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
+        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
+  End obsolete instructions]
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
+  not just the *_STUB.txt files
+- note on intltest: if collate/UCAConformanceTest fails, then
+  utility/MultithreadTest/TestCollators will fail as well;
+  fix the conformance test before looking into the multi-thread test
+
+*** Implement Cased & Case_Ignorable properties
+- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
+- Problem: These properties should be disjoint, but aren't
+- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
+- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
+
+*** Implement Changes_When_Xyz properties
+- without stored data
+
+*** Implement Name_Alias property
+- add it as another name field in unames.icu
+- make it available via u_charName() and UCharNameChoice and
+- consider it in u_charFromName()
+
+*** Break iterators
+
+* Update break iterator rules to new UAX versions and new property values
+* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
+
+*** new BidiTest file
+- review format and data
+- copy BidiTest.txt to source/test/testdata
+- write test code using this data
+- fix ICU code where it fails the conformance test
+
+*** Java
+- generally, find and update code corresponding to C/C++
+- UCharacter.UnicodeBlock constants:
+  a) add an _ID integer per new block, update COUNT
+  b) add a class instance per new block
+     Visual Studio regex:
+        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
+        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
+
+- port test changes to Java
+
+*** LayoutEngine script information
+
+(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
+
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+-> Eric Mader wrote in email on 20090930:
+    "I think the tool has been modified to update @draft to @stable for
+     older scripts and to add @draft for new scripts.
+     (I worked with an intern on this last year.)
+     You should check the output after you run it."
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+* fix mixed line endings
+* review the diffs and fix incorrect @draft and missing aliases
+* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
+
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
+
+-> Eric Mader wrote in email on 20090930:
+    "This is just a matter of making sure that all the per-script tables have
+     entries for any new scripts that were added.
+     If any new Indic characters were added, then the class tables in
+     IndicClassTables.cpp should be updated to reflect this.
+     John Emmons should know how to do this if it's required."
+
+* rebuild the layout and layoutex libraries.
+
+*** Documentation
+- Update User Guide
+  + Jamo_Short_Name, sfc->scf, binary property value aliases
+
+---------------------------------------------------------------------------- ***
+
+Unicode 5.1 update
+
+*** related ICU Trac tickets
+
+5696 Update to Unicode 5.1
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in & configure
+- update ucdVersion in gennames.c if an algorithmic range changes
+
+*** data files & enums & parser code
+
+* file preparation
+- ucdstrip:
+    DerivedCoreProperties.txt
+    DerivedNormalizationProps.txt
+    NormalizationTest.txt
+    PropList.txt
+    Scripts.txt
+    GraphemeBreakProperty.txt
+    SentenceBreakProperty.txt
+    WordBreakProperty.txt
+- ucdstrip and ucdmerge:
+    EastAsianWidth.txt
+    LineBreak.txt
+
+* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
+copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
+copy 5.1.0\ucd\Blocks.txt ..\unidata\
+copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
+copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
+copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
+copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
+copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
+copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
+copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
+
+ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
+ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
+ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
+ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
+ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
+ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
+ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
+ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
+ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
+ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
+
+* genpname
+- run preparse.pl
+  + cd \svn\icuproj\icu\uni51\source\tools\genpname
+  + make sure that data.h is writable
+  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
+  + preparse.pl complains with errors like the following:
+      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
+    This is because ICU 3.8 had scripts from ISO 15924 which are now
+    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
+    and PropertyValueAliases.txt.
+    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
+       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
+  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
+      N/Y, No/Yes, F/T, False/True
+    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
+       It will use further values from the file if present.
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new block & script values
+  + 17 new blocks
+  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
+    (removed from SyntheticPropertyValueAliases.txt)
+  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
+    (added to SyntheticPropertyValueAliases.txt)
+- uprops.icu (uprops.h) only provides 7 bits for script codes.
+  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
+  There is none above 127 yet which is the script code for an
+  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
+  script code values greater than 127.
+  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
+  in a parallel bit field, and that overflows now.
+  Also, future values >=128 would be incompatible anyway.
+  uprops.h is modified to move around several of the bit fields
+  in the properties vector words, and now uses 8 bits for the script code.
+  Two other bit fields also grow to accommodate future growth:
+  Block (current count: 172) grows from 8 to 9 bits,
+  and Word_Break grows from 4 to 5 bits.
+- renamed property Simple_Case_Folding (sfc->scf)
+  + nothing to be done: handled as normal alias
+- new property JSN Jamo_Short_Name
+  + no new API: only contributes to the Name property
+- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
+- new Joining Group (JG) value: Burushashki_Yeh_Barree
+- new Sentence_Break (SB) values:
+    SB ; CR        ; CR
+    SB ; EX        ; Extend
+    SB ; LF        ; LF
+    SB ; SC        ; SContinue
+- new Word_Break (WB) values:
+    WB ; CR        ; CR
+    WB ; Extend    ; Extend
+    WB ; LF        ; LF
+    WB ; MB        ; MidNumLet
+
+* Further changes in the 2008-02-29 update:
+- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
+  because they should not normally be invisible.
+- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
+- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
+- new Word_Break (WB) value: NL=Newline
+
+* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
+- Unihan range end moves from 9FBB to 9FC3
+  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
+  + do change gennames.c
+
+* build Unicode data source code for hardcoding core data
+C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
+
+ICU data make path is \svn\icuproj\icu\uni51\source\data\
+ICU root path is \svn\icuproj\icu\uni51
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
+Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
+Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
+Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
+Creating data file for Unicode Character Properties
+Creating data file for Unicode Case Mapping Properties
+Creating data file for Unicode BiDi/Shaping Properties
+Creating data file for Unicode Normalization
+Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
+Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
+
+- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
+  and rebuild the common library
+
+*** Break iterators
+
+* Update break iterator rules to new UAX versions and new property values
+
+*** UCA
+
+* update FractionalUCA.txt and UCARules.txt with new canonical closure
+
+*** Test suites
+- Test that APIs using Unicode property value aliases (like UnicodeSet)
+  support all of the boolean values N/Y, No/Yes, F/T, False/True
+  -> TestBinaryValues() tests in both cintltst and intltest
+
+*** LayoutEngine script information
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
+
+* rebuild the layout and layoutex libraries.
+
+*** Documentation
+- Update User Guide
+  + Jamo_Short_Name, sfc->scf, binary property value aliases
+
+---------------------------------------------------------------------------- ***
+
+Unicode 5.0 update
+
+*** related Jitterbugs
+
+5084 RFE: Update to Unicode 5.0
+
+*** data files & enums & parser code
+
+* file preparation
+- ucdstrip:
+    DerivedCoreProperties.txt
+    DerivedNormalizationProps.txt
+    NormalizationTest.txt
+    PropList.txt
+    Scripts.txt
+    GraphemeBreakProperty.txt
+    SentenceBreakProperty.txt
+    WordBreakProperty.txt
+- ucdstrip and ucdmerge:
+    EastAsianWidth.txt
+    LineBreak.txt
+
+* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
+copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
+copy 5.0.0\ucd\Blocks.txt ..\unidata\
+copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
+copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
+copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
+copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
+copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
+copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
+copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
+
+ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
+ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
+ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
+ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
+ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
+ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
+ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
+ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
+ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
+ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
+
+* update FractionalUCA.txt and UCARules.txt with new canonical closure
+
+* genpname
+- run preparse.pl
+  + make sure that data.h is writable
+  + perl preparse.pl \cvs\oss\icu > out.txt
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new block & script values
+  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
+
+* build Unicode data source code for hardcoding core data
+C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
+
+ICU data make path is \cvs\oss\icu\source\data\
+ICU root path is \cvs\oss\icu
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
+[etc.]
+Creating data file for Unicode Character Properties
+Creating data file for Unicode Case Mapping Properties
+Creating data file for Unicode BiDi/Shaping Properties
+Creating data file for Unicode Normalization
+Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
+Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
+
+- copy the .c source files to C:\cvs\oss\icu\source\common
+  and rebuild the common library
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in
+
+*** LayoutEngine script information
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
+
+* rebuild the layout and layoutex libraries.
+
+---------------------------------------------------------------------------- ***
+
+Unicode 4.1 update
+
+*** related Jitterbugs
+
+4332 RFE: Update to Unicode 4.1
+4157 RBBI, TR29 4.1 updates
+
+*** data files & enums & parser code
+
+* file preparation
+- ucdstrip:
+    DerivedCoreProperties.txt
+    DerivedNormalizationProps.txt
+    NormalizationTest.txt
+    GraphemeBreakProperty.txt
+    SentenceBreakProperty.txt
+    WordBreakProperty.txt
+- ucdstrip and ucdmerge:
+    EastAsianWidth.txt
+    LineBreak.txt
+
+* add new files to the repository
+    GraphemeBreakProperty.txt
+    SentenceBreakProperty.txt
+    WordBreakProperty.txt
+
+* update FractionalUCA.txt and UCARules.txt with new canonical closure
+
+* genpname
+- handle new enumerated properties in sub read_uchar
+- run preparse.pl
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new binary properties
+  + Pattern_Syntax
+  + Pattern_White_Space
+- new enumerated properties
+  + Grapheme_Cluster_Break
+  + Sentence_Break
+  + Word_Break
+- new block & script & line break values
+
+* gencase
+- case-ignorable changes
+  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
+  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in
+
+*** tests
+- verify that u_charMirror() round-trips
+- test all new properties and some new values of old properties
+
+*** other code
+
+* hardcoded Unihan range end/limit
+- Unihan range end moves from 9FA5 to 9FBB
+  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
+  + do not modify BOCU/BOCSU code because that would change the encoding
+    and break binary compatibility!
+  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
+    NamePrepProfile.txt
+  + ignore trietest.c: test data is arbitrary
+  + ignore tstnorm.cpp: test optimization, not important
+  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
+  + do change line_th.txt and word_th.txt
+    by replacing hardcoded ranges with the new property values
+  + do change gennames.c
+
+source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
+source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
+source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
+
+* case mappings
+- compare new special casing context conditions with previous ones
+  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
+
+* genpname
+- consider storing only the short name if it is the same as the long name
+
+*** other reviews
+- UAX #29 changes (grapheme/word/sentence breaks)
+- UAX #14 changes (line breaks)
+- Pattern_Syntax & Pattern_White_Space
+
+---------------------------------------------------------------------------- ***
+
 Unicode 4.0.1 update
 
 *** related Jitterbugs