icuSources/data/unidata/changes.txt

   1 * Copyright (C) 2016 and later: Unicode, Inc. and others.
   2 * License & terms of use: http://www.unicode.org/copyright.html
   3 * Copyright (C) 2004-2016, International Business Machines
   4 * Corporation and others.  All Rights Reserved.
   5 *
   6 *   file name:  changes.txt
   7 *   encoding:   US-ASCII
   8 *   tab size:   8 (not used)
   9 *   indentation:4
  10 *
  11 *   created on: 2004may06
  12 *   created by: Markus W. Scherer
  13 *
  14 * change log for Unicode updates
  15 *
  16 * For each new Unicode version, during the beta period,
  17 * I copy the change log for the previous version to the top of this file.
  18 * I adjust the versions, tickets, URLs, and paths.
  19 * I work my way through the steps listed in the log, top to bottom,
  20 * adjusting the log as necessary.
  21 * I report problems to the UTC and/or CLDR and/or ICU.
  22 * Before the data is final, I "turn the crank" several more times,
  23 * using appropriate subsets of the steps.
  24
  25 ---------------------------------------------------------------------------- ***
  26
  27 * New ISO 15924 script codes
  28
  29 Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
  30 until they are encoded in Unicode,
  31 or can be assumed to be encoded in the next Unicode version.
  32 Script enum constant names want to follow the Unicode script property value aliases,
  33 which are assigned only when the scripts are encoded.
  34 When we encode scripts early and guess wrong, then we have confusing enum constants
  35 and have sometimes added aliases.
  36
  37 Variant script codes like Latf and Aran that are not subject to separate encoding
  38 can be added at any time.
  39 (For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
  40
  41 We add script codes used in CLDR or in the spoof checker.
  42 This includes combination/alias codes like Hanb and Jamo.
  43 See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
  44 and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
  45
  46 We add special Z* script codes like Zsye.
  47
  48 For new script codes see http://www.unicode.org/iso15924/codechanges.html
  49
  50 ---------------------------------------------------------------------------- ***
  51
  52 Unicode 10.0 update for ICU 60
  53
  54 http://www.unicode.org/versions/Unicode10.0.0/
  55 http://www.unicode.org/versions/beta-10.0.0.html
  56 http://blog.unicode.org/2017/03/unicode-100-beta-review.html
  57 http://www.unicode.org/review/pri350/
  58 http://www.unicode.org/reports/uax-proposed-updates.html
  59 http://www.unicode.org/reports/tr44/tr44-19.html
  60
  61 * Command-line environment setup
  62
  63 UNICODE_DATA=~/unidata/uni10/20170605
  64 CLDR_SRC=~/svn.cldr/uni10
  65 ICU_ROOT=~/svn.icu/uni10
  66 ICU_SRC=$ICU_ROOT/src
  67 ICUDT=icudt60b
  68 ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
  69 ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
  70 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
  71
  72 *** ICU Trac
  73
  74 - ticket:12985: Unicode 10
  75 - ticket:13061: undo hacks from emoji 5.0 update
  76 - ticket:13062: add Emoji_Component property
  77 - ^/branches/markus/uni10
  78
  79 *** CLDR Trac
  80
  81 - cldrbug 10055: Unicode 10
  82 - cldrbug 9882: Unicode 10 script metadata
  83 - cldrbug 10219: numbering systems for Unicode 10
  84
  85 *** Unicode version numbers
  86 - makedata.mak
  87 - uchar.h
  88 - com.ibm.icu.util.VersionInfo
  89 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
  90
  91 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
  92   so that the makefiles see the new version number.
  93
  94 *** data files & enums & parser code
  95
  96 * download files
  97 - mkdir -p $UNICODE_DATA
  98 - download Unicode 10.0 files into $UNICODE_DATA
  99   + subfolders: ucd, uca, idna, security
 100   + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
 101 - download emoji 5.0 files into $UNICODE_DATA/emoji
 102
 103 * for manual diffs: remove version suffixes from the file names
 104   ~$ unidata/desuffixucd.py $UNICODE_DATA
 105   (see https://sites.google.com/site/unicodetools/inputdata)
 106
 107 * process and/or copy files
 108 - $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
 109   + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 110   + For debugging, and tweaking how ppucd.txt is written,
 111     the tool has an --only_ppucd option:
 112     py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
 113
 114 - cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
 115
 116 * build ICU (make install)
 117   so that the tools build can pick up the new definitions from the installed header files.
 118
 119   $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
 120
 121 * preparseucd.py changes
 122 - remove or add new Unicode scripts from/to the
 123   only-in-ISO-15924 list according to the error messages:
 124     ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
 125   -> adjust _scripts_only_in_iso15924 as indicated
 126 - fix other errors
 127     Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
 128   -> add vo=Vertical_Orientation to _ignored_properties
 129   -> later removed again, parsing the file, even though we do not yet store data for runtime use
 130
 131 * new constants for new property values
 132 - preparseucd.py error:
 133     ValueError: missing uchar.h enum constants for some property values:
 134     [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
 135                    u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
 136      (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
 137                   u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
 138                   u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
 139      (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
 140   = PropertyValueAliases.txt new property values (diff old & new .txt files)
 141     blk; CJK_Ext_F                        ; CJK_Unified_Ideographs_Extension_F
 142     blk; Kana_Ext_A                       ; Kana_Extended_A
 143     blk; Masaram_Gondi                    ; Masaram_Gondi
 144     blk; Nushu                            ; Nushu
 145     blk; Soyombo                          ; Soyombo
 146     blk; Syriac_Sup                       ; Syriac_Supplement
 147     blk; Zanabazar_Square                 ; Zanabazar_Square
 148   -> add to uchar.h
 149     use long property names for enum constants,
 150     for the trailing comment get the block start code point: diff old & new Blocks.txt
 151   -> add to UCharacter.UnicodeBlock IDs
 152     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
 153             replace  public static final int \1_ID = \2; \3
 154   -> add to UCharacter.UnicodeBlock objects
 155     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
 156             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 157
 158     jg ; Malayalam_Bha                    ; Malayalam_Bha
 159     jg ; Malayalam_Ja                     ; Malayalam_Ja
 160     jg ; Malayalam_Lla                    ; Malayalam_Lla
 161     jg ; Malayalam_Llla                   ; Malayalam_Llla
 162     jg ; Malayalam_Nga                    ; Malayalam_Nga
 163     jg ; Malayalam_Nna                    ; Malayalam_Nna
 164     jg ; Malayalam_Nnna                   ; Malayalam_Nnna
 165     jg ; Malayalam_Nya                    ; Malayalam_Nya
 166     jg ; Malayalam_Ra                     ; Malayalam_Ra
 167     jg ; Malayalam_Ssa                    ; Malayalam_Ssa
 168     jg ; Malayalam_Tta                    ; Malayalam_Tta
 169   -> uchar.h & UCharacter.JoiningGroup
 170
 171     sc ; Gonm                             ; Masaram_Gondi
 172     sc ; Nshu                             ; Nushu
 173     sc ; Soyo                             ; Soyombo
 174     sc ; Zanb                             ; Zanabazar_Square
 175   -> uscript.h & com.ibm.icu.lang.UScript
 176   -> Nushu had been added already
 177   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
 178       and in com.ibm.icu.dev.test.lang.TestUScript.java
 179
 180 * New properties as shown in PropertyValueAliases.txt changes
 181 - boolean Emoji_Component from emoji 5
 182   -> uchar.h & UProperty.java
 183 - boolean
 184     # Regional_Indicator (RI)
 185
 186     RI ; N                                ; No                               ; F                                ; False
 187     RI ; Y                                ; Yes                              ; T                                ; True
 188   -> uchar.h & UProperty.java
 189   -> single immutable range, to be hardcoded
 190 - boolean
 191     # Prepended_Concatenation_Mark (PCM)
 192
 193     PCM; N                                ; No                               ; F                                ; False
 194     PCM; Y                                ; Yes                              ; T                                ; True
 195   -> was new in Unicode 9
 196   -> uchar.h & UProperty.java
 197 - enumerated
 198     # Vertical_Orientation (vo)
 199
 200     vo ; R                                ; Rotated
 201     vo ; Tr                               ; Transformed_Rotated
 202     vo ; Tu                               ; Transformed_Upright
 203     vo ; U                                ; Upright
 204   -> only pre-parsed for now, but not yet stored for runtime use
 205
 206 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
 207     (not strictly necessary for NOT_ENCODED scripts)
 208   $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
 209
 210 * generate normalization data files
 211   cd $ICU_ROOT/dbg/icu4c
 212   bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
 213   bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
 214   bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
 215   bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 216   bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
 217
 218 * build ICU (make install)
 219   so that the tools build can pick up the new definitions from the installed header files.
 220
 221   $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
 222
 223 * build Unicode tools using CMake+make
 224
 225 $ICU_SRC/tools/unicode/c/icudefs.txt:
 226
 227 # Location (--prefix) of where ICU was installed.
 228 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
 229 # Location of the ICU4C source tree.
 230 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
 231
 232   $ICU_ROOT/dbg/tools/unicode/c$
 233     cmake ../../../../src/tools/unicode/c
 234     make
 235
 236 * generate core properties data files
 237   $ICU_ROOT/dbg/tools/unicode/c$
 238     genprops/genprops $ICU_SRC/icu4c
 239     genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
 240     genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
 241 - rebuild ICU (make install) & tools
 242
 243 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 244   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 245 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 246 - Unicode 6.0..10.0: U+2260, U+226E, U+226F
 247 - nothing new in this Unicode version, no test file to update
 248
 249 * run & fix ICU4C tests
 250 - Andy handles RBBI & spoof check test failures
 251
 252 * collation: CLDR collation root, UCA DUCET
 253
 254 - UCA DUCET goes into Mark's Unicode tools, see
 255   https://sites.google.com/site/unicodetools/home#TOC-UCA
 256 - CLDR root data files are checked into $CLDR_SRC/common/uca/
 257     cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
 258
 259 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 260     cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
 261 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 262     cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
 263     (note removing the underscore before "Rules")
 264     cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
 265 - restore TODO diffs in UCARules.txt
 266     meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
 267 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
 268   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 269   from the CLDR root files (..._CLDR_..._SHORT.txt)
 270     cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
 271     cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
 272     cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
 273 - if CLDR common/uca/unihan-index.txt changes, then update
 274   CLDR common/collation/root.xml <collation type="private-unihan">
 275   and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
 276
 277 - run genuca, see command line above;
 278   deal with
 279     Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
 280     FDD1 11D10;     [70 D5 02, 05, 05]      # Masaram_Gondi first primary (compressible)
 281         (add the character to genuca.cpp sampleCharsToScripts[])
 282   + look up the USCRIPT_ code for the new sample characters
 283     (should be obvious from the comment in the error output)
 284   + *add* mappings to sampleCharsToScripts[], do not replace them
 285     (in case the script sample characters flip-flop)
 286   + insert new scripts in DUCET script order, see the top_byte table
 287     at the beginning of FractionalUCA.txt
 288 - rebuild ICU4C
 289
 290 * Unihan collators
 291     https://sites.google.com/site/unicodetools/unihan
 292 - run Unicode Tools
 293     org.unicode.draft.GenerateUnihanCollators
 294   with VM arguments
 295     -ea
 296     -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
 297     -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
 298     -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
 299     -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
 300     -DUVERSION=10.0.0
 301 - run Unicode Tools
 302     org.unicode.draft.GenerateUnihanCollatorFiles
 303   with the same arguments
 304 - check CLDR diffs
 305     cd $CLDR_SRC
 306     meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
 307     meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
 308 - copy to CLDR
 309     cd $CLDR_SRC
 310     cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
 311     cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
 312 - run CLDR unit tests, commit to CLDR
 313 - generate ICU zh collation data: run CLDR
 314     org.unicode.cldr.icu.NewLdml2IcuConverter
 315   with program arguments
 316     -t collation
 317     -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
 318     -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
 319     -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
 320     -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
 321     zh
 322   and VM arguments
 323     -ea
 324     -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
 325 - rebuild ICU4C
 326
 327 * run & fix ICU4C tests, now with new CLDR collation root data
 328 - run all tests with the collation test data *_SHORT.txt or the full files
 329   (the full ones have comments, useful for debugging)
 330 - note on intltest: if collate/UCAConformanceTest fails, then
 331   utility/MultithreadTest/TestCollators will fail as well;
 332   fix the conformance test before looking into the multi-thread test
 333
 334 * update Java data files
 335 - refresh just the UCD/UCA-related/derived files, just to be safe
 336 - see (ICU4C)/source/data/icu4j-readme.txt
 337 - mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 338 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 339   output:
 340     ...
 341     Unicode .icu files built to ./out/build/icudt60l
 342     echo timestamp > uni-core-data
 343     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
 344     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
 345     echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
 346     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
 347     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
 348     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
 349     mkdir -p /tmp/icu4j/main/shared/data
 350     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 351     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
 352     mkdir -p /tmp/icu4j/main/shared/data
 353     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 354     make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
 355 - copy the big-endian Unicode data files to another location,
 356   separate from the other data files,
 357   and then refresh ICU4J
 358     cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
 359     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 360     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 361     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 362     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 363     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
 364     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 365     cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 366     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 367     jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
 368
 369 * When refreshing all of ICU4J data from ICU4C
 370 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 371 - cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
 372 or
 373 - $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
 374
 375 * update CollationFCD.java
 376   + copy & paste the initializers of lcccIndex[] etc. from
 377     ICU4C/source/i18n/collationfcd.cpp to
 378     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
 379
 380 * refresh Java test .txt files
 381 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 382     cd $ICU_SRC/icu4c/source/data/unidata
 383     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
 384     cd ../../test/testdata
 385     cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
 386     cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
 387
 388 * run & fix ICU4J tests
 389
 390 *** API additions
 391 - send notice to icu-design about new born-@stable API (enum constants etc.)
 392
 393 *** CLDR numbering systems
 394 - look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
 395   Unicode 10: http://unicode.org/cldr/trac/ticket/10219
 396   Unicode 9: http://unicode.org/cldr/trac/ticket/9692
 397
 398 *** merge the Unicode update branches back onto the trunk
 399 - do not merge the icudata.jar and testdata.jar,
 400   instead rebuild them from merged & tested ICU4C
 401 - make sure that changes to Unicode tools are checked in:
 402   http://www.unicode.org/utility/trac/log/trunk/unicodetools
 403
 404 ---------------------------------------------------------------------------- ***
 405
 406 Emoji 5.0 update for ICU 59
 407 - ICU 59 mostly remains on Unicode 9.0
 408 - except updates bidi and segmentation data to Unicode 10 beta
 409
 410 First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
 411
 412 * Command-line environment setup
 413
 414 ICU_ROOT=~/svn.icu/trunk
 415 ICU_SRC_DIR=$ICU_ROOT/src
 416 ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
 417 ICUDT=icudt59b
 418 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 419 SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
 420 UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
 421
 422 *** ICU Trac
 423
 424 - ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
 425 - changes directly on trunk
 426
 427 *** data files & enums & parser code
 428
 429 * download files
 430
 431 - download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
 432 - download emoji 5.0 beta files into the same uni90e50 folder
 433 - download Unicode 10.0 beta files: ucd
 434   + copy Unicode 10 bidi files to the uni90e50/ucd folder:
 435     BidiBrackets.txt
 436     BidiCharacterTest.txt
 437     BidiMirroring.txt
 438     BidiTest.txt
 439     extracted/DerivedBidiClass.txt
 440   + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
 441     LineBreak.txt
 442     auxiliary/*
 443
 444 * preparseucd.py changes
 445 - adjust for combined trunks
 446 - write new copyright lines
 447 - ignore new Emoji_Component property for now
 448
 449 * process and/or copy files
 450 - ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
 451   + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 452
 453 - cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
 454
 455 * build ICU (make install)
 456   so that the tools build can pick up the new definitions from the installed header files.
 457
 458   $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
 459
 460 * build Unicode tools using CMake+make
 461
 462 ~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
 463
 464 # Location (--prefix) of where ICU was installed.
 465 set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
 466 # Location of the ICU4C source tree.
 467 set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
 468
 469   ~/svn.icu/trunk/dbg/tools/unicode/c$
 470     cmake ../../../../src/tools/unicode/c
 471     make
 472
 473 * generate core properties data files
 474   ~/svn.icu/trunk/dbg/tools/unicode/c$
 475     genprops/genprops $ICU4C_SRC_DIR
 476 - rebuild ICU (make install) & tools
 477
 478 * run & fix ICU4C tests
 479 - Andy handles RBBI & spoof check test failures
 480
 481 * update Java data files
 482 - refresh just the UCD/UCA-related/derived files, just to be safe
 483 - see (ICU4C)/source/data/icu4j-readme.txt
 484 - mkdir /tmp/icu4j
 485 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 486   output:
 487     ...
 488     Unicode .icu files built to ./out/build/icudt59l
 489     echo timestamp > uni-core-data
 490     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
 491     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
 492     echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
 493     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
 494     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
 495     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
 496     mkdir -p /tmp/icu4j/main/shared/data
 497     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 498     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
 499     mkdir -p /tmp/icu4j/main/shared/data
 500     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 501     make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
 502 - copy the big-endian Unicode data files to another location,
 503   separate from the other data files,
 504   and then refresh ICU4J
 505     cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
 506     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 507     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 508     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 509     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
 510     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 511     jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
 512
 513 * When refreshing all of ICU4J data from ICU4C
 514 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 515 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
 516 or
 517 - ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
 518
 519 * refresh Java test .txt files
 520 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 521     cd $ICU4C_SRC_DIR/source/data/unidata
 522     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
 523     cd ../../test/testdata
 524     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
 525     cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
 526
 527 * run & fix ICU4J tests
 528
 529 ---------------------------------------------------------------------------- ***
 530
 531 Unicode 9.0 update for ICU 58
 532
 533 * Command-line environment setup
 534
 535 ICU_ROOT=~/svn.icu/trunk
 536 ICU_SRC_DIR=$ICU_ROOT/src
 537 ICUDT=icudt58b
 538 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 539 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 540 UNIDATA=$ICU_SRC_DIR/source/data/unidata
 541
 542 http://www.unicode.org/review/pri323/  -- beta review
 543 http://www.unicode.org/reports/uax-proposed-updates.html
 544 http://www.unicode.org/versions/beta-9.0.0.html
 545 http://www.unicode.org/versions/Unicode9.0.0/
 546 http://www.unicode.org/reports/tr44/tr44-17.html
 547
 548 *** ICU Trac
 549
 550 - ticket:12526: integrate Unicode 9
 551 - C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
 552 - Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
 553
 554 *** CLDR Trac
 555
 556 - cldrbug 9414: UCA 9
 557 - ^/branches/markus/uni90 at r11518 from trunk at r11517
 558
 559 - cldrbug 8745: Unicode 9.0 script metadata
 560
 561 *** Unicode version numbers
 562 - makedata.mak
 563 - uchar.h
 564 - com.ibm.icu.util.VersionInfo
 565 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
 566
 567 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
 568   so that the makefiles see the new version number.
 569
 570 *** data files & enums & parser code
 571
 572 * file preparation
 573
 574 - download UCD & IDNA files
 575 - make sure that the Unicode data folder passed into preparseucd.py
 576   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
 577 - only for manual diffs: remove version suffixes from the file names
 578   ~/unidata/uni70/20140403$ ../../desuffixucd.py .
 579   (see https://sites.google.com/site/unicodetools/inputdata)
 580 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
 581 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
 582 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 583
 584 - also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
 585   and copy to $UNIDATA
 586     cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
 587
 588 * preparseucd.py changes
 589 - remove or add new Unicode scripts from/to the
 590   only-in-ISO-15924 list according to the error messages:
 591     ValueError: remove ['Tang'] from _scripts_only_in_iso15924
 592     ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
 593     ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
 594     ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
 595   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
 596       and in com.ibm.icu.dev.test.lang.TestUScript.java
 597 - DerivedNumericValues.txt new numeric values
 598     0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
 599     0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
 600     0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
 601     0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
 602     0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
 603   -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
 604      uchar.c, UCharacterProperty.java
 605      to support a new series of values
 606 - adjust preparseucd.py for Tangut algorithmic names
 607   in ppucd.txt:
 608     algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
 609   ->
 610     algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
 611 - avoid block-compressing most String/Miscellaneous property values,
 612   triggered by genprops not coping with a multi-code point Case_Folding on
 613     block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
 614   keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
 615
 616 * PropertyAliases.txt changes
 617 - 1 new property PCM=Prepended_Concatenation_Mark
 618   Ignore: Only useful for layout engines.
 619   Ok to list in ppucd.txt.
 620
 621 * PropertyValueAliases.txt new property values
 622     blk; Adlam                            ; Adlam
 623     blk; Bhaiksuki                        ; Bhaiksuki
 624     blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
 625     blk; Glagolitic_Sup                   ; Glagolitic_Supplement
 626     blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
 627     blk; Marchen                          ; Marchen
 628     blk; Mongolian_Sup                    ; Mongolian_Supplement
 629     blk; Newa                             ; Newa
 630     blk; Osage                            ; Osage
 631     blk; Tangut                           ; Tangut
 632     blk; Tangut_Components                ; Tangut_Components
 633   -> add to uchar.h
 634     use long property names for enum constants
 635   -> add to UCharacter.UnicodeBlock IDs
 636     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
 637             replace  public static final int \1_ID = \2; \3
 638   -> add to UCharacter.UnicodeBlock objects
 639     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
 640             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 641
 642     GCB; EB                               ; E_Base
 643     GCB; EBG                              ; E_Base_GAZ
 644     GCB; EM                               ; E_Modifier
 645     GCB; GAZ                              ; Glue_After_Zwj
 646     GCB; ZWJ                              ; ZWJ
 647   -> uchar.h & UCharacter.GraphemeClusterBreak
 648
 649     jg ; African_Feh                      ; African_Feh
 650     jg ; African_Noon                     ; African_Noon
 651     jg ; African_Qaf                      ; African_Qaf
 652   -> uchar.h & UCharacter.JoiningGroup
 653
 654     lb ; EB                               ; E_Base
 655     lb ; EM                               ; E_Modifier
 656     lb ; ZWJ                              ; ZWJ
 657   -> uchar.h & UCharacter.LineBreak
 658
 659     sc ; Adlm                             ; Adlam
 660     sc ; Bhks                             ; Bhaiksuki
 661     sc ; Marc                             ; Marchen
 662     sc ; Newa                             ; Newa
 663     sc ; Osge                             ; Osage
 664     sc ; Tang                             ; Tangut
 665   -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
 666
 667     WB ; EB                               ; E_Base
 668     WB ; EBG                              ; E_Base_GAZ
 669     WB ; EM                               ; E_Modifier
 670     WB ; GAZ                              ; Glue_After_Zwj
 671     WB ; ZWJ                              ; ZWJ
 672   -> uchar.h & UCharacter.WordBreak
 673
 674 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
 675     (not strictly necessary for NOT_ENCODED scripts)
 676   ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
 677
 678 * generate normalization data files
 679   cd $ICU_ROOT/dbg
 680   bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
 681   bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 682   bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 683   bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 684   bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 685
 686 * build ICU (make install)
 687   so that the tools build can pick up the new definitions from the installed header files.
 688
 689   $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
 690
 691 * build Unicode tools using CMake+make
 692
 693 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
 694
 695   # Location (--prefix) of where ICU was installed.
 696   set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
 697   # Location of the ICU source tree.
 698   set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
 699
 700   ~/svn.icutools/trunk/dbg/unicode/c$
 701     cmake ../../../src/unicode/c
 702     make
 703
 704 * generate core properties data files
 705   ~/svn.icutools/trunk/dbg/unicode/c$
 706     genprops/genprops $ICU_SRC_DIR
 707     genuca/genuca --hanOrder implicit $ICU_SRC_DIR
 708     genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
 709 - rebuild ICU (make install) & tools
 710
 711 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 712   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 713 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 714 - Unicode 6.0..9.0: U+2260, U+226E, U+226F
 715 - nothing new in 9.0, no test file to update
 716
 717 * run & fix ICU4C tests
 718 - Andy handles RBBI & spoof check test failures
 719
 720 * collation: CLDR collation root, UCA DUCET
 721
 722 - UCA DUCET goes into Mark's Unicode tools, see
 723   https://sites.google.com/site/unicodetools/home#TOC-UCA
 724 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
 725     cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
 726
 727 - cd (CLDR UCA branch)/common/uca/
 728 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 729     cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
 730 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 731     cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
 732     (note removing the underscore before "Rules")
 733     cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
 734 - restore TODO diffs in UCARules.txt
 735     meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
 736 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
 737   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 738   from the CLDR root files (..._CLDR_..._SHORT.txt)
 739     cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
 740     cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
 741     cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
 742 - if CLDR common/uca/unihan-index.txt changes, then update
 743   CLDR common/collation/root.xml <collation type="private-unihan">
 744   and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
 745
 746 - run genuca, see command line above;
 747   deal with
 748     Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
 749     FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
 750         (add the character to genuca.cpp sampleCharsToScripts[])
 751   + look up the USCRIPT_ code for the new sample characters
 752     (should be obvious from the comment in the error output)
 753   + *add* mappings to sampleCharsToScripts[], do not replace them
 754     (in case the script sample characters flip-flop)
 755   + insert new scripts in DUCET script order, see the top_byte table
 756     at the beginning of FractionalUCA.txt
 757 - rebuild ICU4C
 758
 759 * Unihan collators
 760 - run Unicode Tools
 761     org.unicode.draft.GenerateUnihanCollators
 762   with VM arguments
 763     -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
 764     -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
 765     -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
 766     -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
 767     -DUVERSION=9.0.0
 768     -ea
 769 - run Unicode Tools
 770     org.unicode.draft.GenerateUnihanCollatorFiles
 771   with the same arguments
 772 - check CLDR diffs
 773     cd ~/svn.cldr/trunk
 774     meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
 775     meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
 776 - copy to CLDR
 777     cd ~/svn.cldr/trunk
 778     cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
 779     cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
 780 - commit to CLDR
 781 - generate ICU zh collation data: run CLDR
 782     org.unicode.cldr.icu.NewLdml2IcuConverter
 783   with program arguments
 784     -t collation
 785     -s /home/mscherer/svn.cldr/trunk/common/collation
 786     -m /home/mscherer/svn.cldr/trunk/common/supplemental
 787     -d /home/mscherer/svn.icu/trunk/src/source/data/coll
 788     -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
 789     zh
 790   and VM arguments
 791     -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
 792 - rebuild ICU4C
 793
 794 * run & fix ICU4C tests, now with new CLDR collation root data
 795 - run all tests with the collation test data *_SHORT.txt or the full files
 796   (the full ones have comments, useful for debugging)
 797 - note on intltest: if collate/UCAConformanceTest fails, then
 798   utility/MultithreadTest/TestCollators will fail as well;
 799   fix the conformance test before looking into the multi-thread test
 800
 801 * update Java data files
 802 - refresh just the UCD/UCA-related/derived files, just to be safe
 803 - see (ICU4C)/source/data/icu4j-readme.txt
 804 - mkdir /tmp/icu4j
 805 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 806   output:
 807     ...
 808     Unicode .icu files built to ./out/build/icudt58l
 809     echo timestamp > uni-core-data
 810     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
 811     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
 812     echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
 813     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
 814     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
 815     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
 816     mkdir -p /tmp/icu4j/main/shared/data
 817     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 818     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
 819     mkdir -p /tmp/icu4j/main/shared/data
 820     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 821     make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
 822 - copy the big-endian Unicode data files to another location,
 823   separate from the other data files,
 824   and then refresh ICU4J
 825     cd ~/svn.icu/trunk/dbg/data/out/icu4j
 826     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 827     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 828     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 829     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 830     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
 831     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 832     cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 833     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 834     jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
 835
 836 * When refreshing all of ICU4J data from ICU4C
 837 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 838 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
 839 or
 840 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
 841
 842 * update CollationFCD.java
 843   + copy & paste the initializers of lcccIndex[] etc. from
 844     ICU4C/source/i18n/collationfcd.cpp to
 845     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
 846
 847 * refresh Java test .txt files
 848 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 849     cd $ICU_SRC_DIR/source/data/unidata
 850     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 851     cd ../../test/testdata
 852     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 853     cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 854
 855 * run & fix ICU4J tests
 856
 857 *** LayoutEngine script information
 858
 859 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
 860   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
 861   in the working directory.
 862
 863   (It also generates ScriptRunData.cpp, which is no longer needed.)
 864
 865   It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
 866   (a plain text file)
 867   which maps ICU versions to the numbers of script/language constants
 868   that were added then.
 869   (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
 870
 871   The generated files have a current copyright date and "@deprecated" statement.
 872
 873 * Review changes, fix Java tool if necessary, and copy to ICU4C
 874   cd ~/svn.icu4j/trunk/src
 875   meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
 876   cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
 877   cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
 878
 879 *** API additions
 880 - send notice to icu-design about new born-@stable API (enum constants etc.)
 881
 882 *** merge the Unicode update branches back onto the trunk
 883 - do not merge the icudata.jar and testdata.jar,
 884   instead rebuild them from merged & tested ICU4C
 885 - make sure that changes to Unicode tools & ICU tools are checked in
 886   http://www.unicode.org/utility/trac/log/trunk/unicodetools
 887   http://bugs.icu-project.org/trac/log/tools/trunk
 888
 889 ---------------------------------------------------------------------------- ***
 890
 891 New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
 892
 893 Adding
 894 - new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
 895 - new combination/alias codes: Hanb, Jamo
 896   - used in CLDR 29 and in spoof checker
 897 - new Z* code: Zsye
 898
 899 Add new codes to uscript.h & UScript.java, see Unicode update logs.
 900   -> com.ibm.icu.lang.UScript
 901     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 902     replace  public static final int \1 = \2; \3
 903
 904 Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
 905 add new script codes.
 906 "Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
 907
 908 Note: If we have to run preparseucd.py again before the Unicode 9 update,
 909 then we need to manually keep/restore the new script codes.
 910
 911 ICU_ROOT=~/svn.icu/trunk
 912 ICU_SRC_DIR=$ICU_ROOT/src
 913 ICUDT=icudt57b
 914 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 915 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 916 UNIDATA=$ICU_SRC_DIR/source/data/unidata
 917
 918 Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
 919 see http://bugs.icu-project.org/trac/ticket/12141
 920
 921 make install, then icutools cmake & make, then
 922 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
 923
 924 Generate Java data as usual, only update pnames.icu & uprops.icu.
 925
 926 *** LayoutEngine script information
 927
 928 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
 929   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
 930   in the working directory.
 931
 932   (It also generates ScriptRunData.cpp, which is no longer needed.)
 933
 934   It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
 935   (a plain text file)
 936   which maps ICU versions to the numbers of script/language constants
 937   that were added then.
 938   (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
 939
 940   The generated files have a current copyright date and "@deprecated" statement.
 941
 942 * Review changes, fix Java tool if necessary, and copy to ICU4C
 943   cd ~/svn.icu4j/trunk/src
 944   meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
 945   cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
 946   cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
 947
 948 ---------------------------------------------------------------------------- ***
 949
 950 Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
 951
 952 Edit preparseucd.py to add & parse new properties.
 953 They share the UCD property namespace but are not listed in PropertyAliases.txt.
 954
 955 Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
 956 Initial data from emoji/2.0/
 957
 958 ICU_ROOT=~/svn.icu/trunk
 959 ICU_SRC_DIR=$ICU_ROOT/src
 960 ICUDT=icudt56b
 961 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 962 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 963 UNIDATA=$ICU_SRC_DIR/source/data/unidata
 964
 965 Add binary-property constants to uchar.h enum UProperty & UProperty.java.
 966
 967 ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
 968 (Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
 969
 970 Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
 971
 972 make install, then icutools cmake & make, then
 973 ~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
 974
 975 Generate Java data as usual, only update pnames.icu & uprops.icu.
 976
 977 ---------------------------------------------------------------------------- ***
 978
 979 Unicode 8.0 update for ICU 56
 980
 981 * Command-line environment setup
 982
 983 ICU_ROOT=~/svn.icu/trunk
 984 ICU_SRC_DIR=$ICU_ROOT/src
 985 ICUDT=icudt56b
 986 export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 987 SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 988 UNIDATA=$ICU_SRC_DIR/source/data/unidata
 989
 990 http://www.unicode.org/review/pri297/  -- beta review
 991 http://www.unicode.org/reports/uax-proposed-updates.html
 992 http://unicode.org/versions/beta-8.0.0.html
 993 http://www.unicode.org/versions/Unicode8.0.0/
 994 http://www.unicode.org/reports/tr44/tr44-15.html
 995
 996 *** ICU Trac
 997
 998 - ticket:11574: Unicode 8
 999 - C++ branches/markus/uni80 at r37351 from trunk at r37343
1000 - Java branches/markus/uni80 at r37352 from trunk at r37338
1001
1002 *** CLDR Trac
1003
1004 - cldrbug 8311: UCA 8
1005 - branches/markus/uni80 at r11518 from trunk at r11517
1006
1007 - cldrbug 8109: Unicode 8.0 script metadata
1008 - cldrbug 8418: Updated segmentation for Unicode 8.0
1009
1010 *** Unicode version numbers
1011 - makedata.mak
1012 - uchar.h
1013 - com.ibm.icu.util.VersionInfo
1014 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1015
1016 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1017   so that the makefiles see the new version number.
1018
1019 *** data files & enums & parser code
1020
1021 * file preparation
1022
1023 - download UCD & IDNA files
1024 - make sure that the Unicode data folder passed into preparseucd.py
1025   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1026 - only for manual diffs: remove version suffixes from the file names
1027   ~/unidata/uni70/20140403$ ../../desuffixucd.py .
1028   (see https://sites.google.com/site/unicodetools/inputdata)
1029 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1030 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
1031 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1032
1033 - also: from http://unicode.org/Public/security/8.0.0/ download new
1034   confusables.txt & confusablesWholeScript.txt
1035   and copy to $UNIDATA
1036     ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
1037     ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
1038
1039 * initial preparseucd.py changes
1040 - remove new Unicode scripts from the
1041   only-in-ISO-15924 list according to the error message:
1042     ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
1043     from _scripts_only_in_iso15924
1044   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1045       and in com.ibm.icu.dev.test.lang.TestUScript.java
1046 - property and file name change:
1047     IndicMatraCategory -> IndicPositionalCategory
1048 - UnicodeData.txt unusual numeric values (improper fractions)
1049     109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
1050     109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
1051     109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
1052     109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
1053     109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
1054     109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
1055     109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
1056     109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
1057     109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
1058     109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
1059   -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
1060      which are listed in DerivedNumericValues.txt;
1061      keeps storage in data file simple
1062
1063 * PropertyValueAliases.txt changes
1064 - 10 new Block (blk) values:
1065     blk; Ahom                             ; Ahom
1066     blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
1067     blk; Cherokee_Sup                     ; Cherokee_Supplement
1068     blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
1069     blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
1070     blk; Hatran                           ; Hatran
1071     blk; Multani                          ; Multani
1072     blk; Old_Hungarian                    ; Old_Hungarian
1073     blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
1074     blk; Sutton_SignWriting               ; Sutton_SignWriting
1075   -> add to uchar.h
1076     use long property names for enum constants
1077   -> add to UCharacter.UnicodeBlock IDs
1078     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1079             replace  public static final int \1_ID = \2; \3
1080   -> add to UCharacter.UnicodeBlock objects
1081     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1082             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1083 - 6 new Script (sc) values:
1084     sc ; Ahom                             ; Ahom
1085     sc ; Hatr                             ; Hatran
1086     sc ; Hluw                             ; Anatolian_Hieroglyphs
1087     sc ; Hung                             ; Old_Hungarian
1088     sc ; Mult                             ; Multani
1089     sc ; Sgnw                             ; SignWriting
1090   -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
1091
1092 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1093     (not strictly necessary for NOT_ENCODED scripts)
1094   ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
1095
1096 * generate normalization data files
1097   cd $ICU_ROOT/dbg
1098   bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1099   bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1100   bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1101   bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1102   bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1103
1104 * build ICU (make install)
1105   so that the tools build can pick up the new definitions from the installed header files.
1106
1107   $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1108
1109 * build Unicode tools using CMake+make
1110
1111 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1112
1113   # Location (--prefix) of where ICU was installed.
1114   set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
1115   # Location of the ICU source tree.
1116   set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
1117
1118   ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1119   ~/svn.icutools/trunk/dbg/unicode/c$ make
1120
1121 * generate core properties data files
1122 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1123 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
1124 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
1125 - rebuild ICU (make install) & tools
1126 - run genuca again (see step above) so that it picks up the new nfc.nrm
1127 - rebuild ICU (make install) & tools
1128
1129 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1130   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1131 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1132 - Unicode 6.0..8.0: U+2260, U+226E, U+226F
1133 - nothing new in 8.0, no test file to update
1134
1135 * run & fix ICU4C tests
1136 - bad Cherokee case folding due to difference in fallbacks:
1137   UCD case folding falls back to no mapping,
1138   ICU runtime case folding falls back to lowercasing;
1139   fixed casepropsbuilder.cpp to generate scf mappings to self
1140   when there is an slc mapping but no scf
1141 - Andy handles RBBI & spoof check test failures
1142
1143 * collation: CLDR collation root, UCA DUCET
1144
1145 - UCA DUCET goes into Mark's Unicode tools, see
1146   https://sites.google.com/site/unicodetools/home#TOC-UCA
1147 - CLDR root data files are checked into (CLDR UCA branch)/common/uca/
1148 - cd (CLDR UCA branch)/common/uca/
1149 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1150   cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1151 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1152     cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
1153     (note removing the underscore before "Rules")
1154     cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1155 - restore TODO diffs in UCARules.txt
1156     meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1157 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1158   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1159   from the CLDR root files (..._CLDR_..._SHORT.txt)
1160     cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1161     cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1162     cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1163 - if CLDR common/uca/unihan-index.txt changes, then update
1164   CLDR common/collation/root.xml <collation type="private-unihan">
1165   and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
1166 - run genuca, see command line above;
1167   deal with
1168     Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
1169         (add the character to genuca.cpp sampleCharsToScripts[])
1170   + look up the script for the new sample characters
1171     (e.g., in FractionalUCA.txt)
1172   + *add* mappings to sampleCharsToScripts[], do not replace them
1173     (in case the script sample characters flip-flop)
1174   + insert new scripts in DUCET script order, see the top_byte table
1175     at the beginning of FractionalUCA.txt
1176 - rebuild ICU4C
1177
1178 * run & fix ICU4C tests, now with new CLDR collation root data
1179 - run all tests with the collation test data *_SHORT.txt or the full files
1180   (the full ones have comments, useful for debugging)
1181 - note on intltest: if collate/UCAConformanceTest fails, then
1182   utility/MultithreadTest/TestCollators will fail as well;
1183   fix the conformance test before looking into the multi-thread test
1184 - fixed bug in CollationWeights::getWeightRanges()
1185   exposed by new data and CollationTest::TestRootElements
1186
1187 * update Java data files
1188 - refresh just the UCD/UCA-related/derived files, just to be safe
1189 - see (ICU4C)/source/data/icu4j-readme.txt
1190 - mkdir /tmp/icu4j
1191 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1192   output:
1193     ...
1194     Unicode .icu files built to ./out/build/icudt56l
1195     echo timestamp > uni-core-data
1196     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
1197     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
1198     echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1199     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
1200     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
1201     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
1202     mkdir -p /tmp/icu4j/main/shared/data
1203     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1204     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
1205     mkdir -p /tmp/icu4j/main/shared/data
1206     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1207     make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
1208 - copy the big-endian Unicode data files to another location,
1209   separate from the other data files,
1210   and then refresh ICU4J
1211     cd ~/svn.icu/trunk/dbg/data/out/icu4j
1212     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1213     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1214     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1215     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1216     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1217     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1218     cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1219     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1220     jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1221
1222 * When refreshing all of ICU4J data from ICU4C
1223 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1224 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1225 or
1226 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1227
1228 * update CollationFCD.java
1229   + copy & paste the initializers of lcccIndex[] etc. from
1230     ICU4C/source/i18n/collationfcd.cpp to
1231     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1232
1233 * refresh Java test .txt files
1234 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1235     cd $ICU_SRC_DIR/source/data/unidata
1236     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1237     cd ../../test/testdata
1238     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1239     cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1240
1241 * run & fix ICU4J tests
1242
1243 *** LayoutEngine script information
1244
1245 * ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
1246   because the layout engine was deprecated in ICU 54.
1247   Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
1248   to write lines that we used to add manually.
1249
1250 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1251   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1252   in the working directory.
1253
1254   (It also generates ScriptRunData.cpp, which is no longer needed.)
1255
1256   It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
1257   (a plain text file)
1258   which maps ICU versions to the numbers of script/language constants
1259   that were added then.
1260   (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
1261
1262   The generated files have a current copyright date and "@deprecated" statement.
1263
1264 * Review changes, fix Java tool if necessary, and copy to ICU4C
1265   cd ~/svn.icu4j/trunk/src
1266   meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1267   cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
1268   cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
1269
1270 *** API additions
1271 - send notice to icu-design about new born-@stable API (enum constants etc.)
1272
1273 *** merge the Unicode update branches back onto the trunk
1274 - do not merge the icudata.jar and testdata.jar,
1275   instead rebuild them from merged & tested ICU4C
1276 - make sure that changes to Unicode tools & ICU tools are checked in
1277   http://www.unicode.org/utility/trac/log/trunk/unicodetools
1278   http://bugs.icu-project.org/trac/log/tools/trunk
1279
1280 ---------------------------------------------------------------------------- ***
1281
1282 Unicode 7.0 update for ICU 54
1283
1284 http://www.unicode.org/review/pri271/  -- beta review
1285 http://www.unicode.org/reports/uax-proposed-updates.html
1286 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
1287 http://www.unicode.org/reports/tr44/tr44-13.html
1288
1289 *** ICU Trac
1290
1291 - ticket 10821: Unicode 7.0, UCA 7.0
1292 - C++ branches/markus/uni70 at r35584 from trunk at r35580
1293 - Java branches/markus/uni70 at r35587 from trunk at r35545
1294
1295 *** CLDR Trac
1296
1297 - ticket 7195: UCA 7.0 CLDR root collation
1298 - branches/markus/uni70 at r10062 from trunk at r10061
1299
1300 - ticket 6762: script metadata for Unicode 7.0 new scripts
1301
1302 *** Unicode version numbers
1303 - makedata.mak
1304 - uchar.h
1305 - com.ibm.icu.util.VersionInfo
1306 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1307
1308 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1309   so that the makefiles see the new version number.
1310
1311 *** data files & enums & parser code
1312
1313 * file preparation
1314
1315 - download UCD & IDNA files
1316 - make sure that the Unicode data folder passed into preparseucd.py
1317   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1318 - only for manual diffs: remove version suffixes from the file names
1319   ~/unidata/uni70/20140403$ ../../desuffixucd.py .
1320   (see https://sites.google.com/site/unicodetools/inputdata)
1321 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1322 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
1323 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1324 - Restore TODO diffs in source/data/unidata/UCARules.txt
1325     cd $ICU_SRC_DIR
1326     meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
1327 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
1328
1329 - also: from http://unicode.org/Public/security/7.0.0/ download new
1330   confusables.txt & confusablesWholeScript.txt
1331   and copy to $ICU_ROOT/src/source/data/unidata/
1332
1333 * initial preparseucd.py changes
1334 - remove new Unicode scripts from the
1335   only-in-ISO-15924 list according to the error message:
1336     ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
1337                         'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
1338                         'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
1339     from _scripts_only_in_iso15924
1340   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1341       and in com.ibm.icu.dev.test.lang.TestUScript.java
1342 - NamesList.txt now has a heading with a non-ASCII character
1343   + keep ppucd.txt in platform charset, rather than changing tool/test parsers
1344   + escape non-ASCII characters in heading comments
1345 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
1346   + get the copyright from the first file whose copyright line contains the current year
1347
1348 * PropertyValueAliases.txt changes
1349 - 32 new Block (blk) values:
1350     blk; Bassa_Vah                        ; Bassa_Vah
1351     blk; Caucasian_Albanian               ; Caucasian_Albanian
1352     blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
1353     blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
1354     blk; Duployan                         ; Duployan
1355     blk; Elbasan                          ; Elbasan
1356     blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
1357     blk; Grantha                          ; Grantha
1358     blk; Khojki                           ; Khojki
1359     blk; Khudawadi                        ; Khudawadi
1360     blk; Latin_Ext_E                      ; Latin_Extended_E
1361     blk; Linear_A                         ; Linear_A
1362     blk; Mahajani                         ; Mahajani
1363     blk; Manichaean                       ; Manichaean
1364     blk; Mende_Kikakui                    ; Mende_Kikakui
1365     blk; Modi                             ; Modi
1366     blk; Mro                              ; Mro
1367     blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
1368     blk; Nabataean                        ; Nabataean
1369     blk; Old_North_Arabian                ; Old_North_Arabian
1370     blk; Old_Permic                       ; Old_Permic
1371     blk; Ornamental_Dingbats              ; Ornamental_Dingbats
1372     blk; Pahawh_Hmong                     ; Pahawh_Hmong
1373     blk; Palmyrene                        ; Palmyrene
1374     blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
1375     blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
1376     blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
1377     blk; Siddham                          ; Siddham
1378     blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
1379     blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
1380     blk; Tirhuta                          ; Tirhuta
1381     blk; Warang_Citi                      ; Warang_Citi
1382   -> add to uchar.h
1383     use long property names for enum constants
1384   -> add to UCharacter.UnicodeBlock IDs
1385     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1386             replace  public static final int \1_ID = \2; \3
1387   -> add to UCharacter.UnicodeBlock objects
1388     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1389             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1390 - 28 new Joining_Group (jg) values:
1391     jg ; Manichaean_Aleph                 ; Manichaean_Aleph
1392     jg ; Manichaean_Ayin                  ; Manichaean_Ayin
1393     jg ; Manichaean_Beth                  ; Manichaean_Beth
1394     jg ; Manichaean_Daleth                ; Manichaean_Daleth
1395     jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
1396     jg ; Manichaean_Five                  ; Manichaean_Five
1397     jg ; Manichaean_Gimel                 ; Manichaean_Gimel
1398     jg ; Manichaean_Heth                  ; Manichaean_Heth
1399     jg ; Manichaean_Hundred               ; Manichaean_Hundred
1400     jg ; Manichaean_Kaph                  ; Manichaean_Kaph
1401     jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
1402     jg ; Manichaean_Mem                   ; Manichaean_Mem
1403     jg ; Manichaean_Nun                   ; Manichaean_Nun
1404     jg ; Manichaean_One                   ; Manichaean_One
1405     jg ; Manichaean_Pe                    ; Manichaean_Pe
1406     jg ; Manichaean_Qoph                  ; Manichaean_Qoph
1407     jg ; Manichaean_Resh                  ; Manichaean_Resh
1408     jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
1409     jg ; Manichaean_Samekh                ; Manichaean_Samekh
1410     jg ; Manichaean_Taw                   ; Manichaean_Taw
1411     jg ; Manichaean_Ten                   ; Manichaean_Ten
1412     jg ; Manichaean_Teth                  ; Manichaean_Teth
1413     jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
1414     jg ; Manichaean_Twenty                ; Manichaean_Twenty
1415     jg ; Manichaean_Waw                   ; Manichaean_Waw
1416     jg ; Manichaean_Yodh                  ; Manichaean_Yodh
1417     jg ; Manichaean_Zayin                 ; Manichaean_Zayin
1418     jg ; Straight_Waw                     ; Straight_Waw
1419   -> uchar.h & UCharacter.JoiningGroup
1420 - 23 new Script (sc) values:
1421     sc ; Aghb                             ; Caucasian_Albanian
1422     sc ; Bass                             ; Bassa_Vah
1423     sc ; Dupl                             ; Duployan
1424     sc ; Elba                             ; Elbasan
1425     sc ; Gran                             ; Grantha
1426     sc ; Hmng                             ; Pahawh_Hmong
1427     sc ; Khoj                             ; Khojki
1428     sc ; Lina                             ; Linear_A
1429     sc ; Mahj                             ; Mahajani
1430     sc ; Mani                             ; Manichaean
1431     sc ; Mend                             ; Mende_Kikakui
1432     sc ; Modi                             ; Modi
1433     sc ; Mroo                             ; Mro
1434     sc ; Narb                             ; Old_North_Arabian
1435     sc ; Nbat                             ; Nabataean
1436     sc ; Palm                             ; Palmyrene
1437     sc ; Pauc                             ; Pau_Cin_Hau
1438     sc ; Perm                             ; Old_Permic
1439     sc ; Phlp                             ; Psalter_Pahlavi
1440     sc ; Sidd                             ; Siddham
1441     sc ; Sind                             ; Khudawadi
1442     sc ; Tirh                             ; Tirhuta
1443     sc ; Wara                             ; Warang_Citi
1444   -> uscript.h (many were added before)
1445     comment "Mende Kikakui" for USCRIPT_MENDE
1446     add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
1447   -> com.ibm.icu.lang.UScript
1448     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1449     replace  public static final int \1 = \2; \3
1450 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1451   (added 2012-11-01)
1452     Ahom        338     Ahom
1453     Hatr        127     Hatran
1454     Mult        323     Multani
1455   (added 2013-10-12)
1456     Modi        324     Modi
1457     Pauc        263     Pau Cin Hau
1458     Sidd        302     Siddham
1459   -> uscript.h (some overlap with additions from Unicode)
1460   -> com.ibm.icu.lang.UScript
1461     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1462     replace  public static final int \1 = \2; \3
1463   -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
1464   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1465       and in com.ibm.icu.dev.test.lang.TestUScript.java
1466
1467 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1468     (not strictly necessary for NOT_ENCODED scripts)
1469   ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
1470
1471 * generate normalization data files
1472 - cd $ICU_ROOT/dbg
1473 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1474 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
1475 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
1476 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
1477 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1478 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1479 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1480 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1481
1482 * build ICU (make install)
1483   so that the tools build can pick up the new definitions from the installed header files.
1484
1485 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1486
1487 * build Unicode tools using CMake+make
1488
1489 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1490
1491 # Location (--prefix) of where ICU was installed.
1492 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
1493 # Location of the ICU source tree.
1494 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
1495
1496 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1497 ~/svn.icutools/trunk/dbg/unicode/c$ make
1498
1499 * genprops work
1500 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
1501   + add second array of Joining_Group values for at most 10800..10FFF
1502     icutools: unicode/c/genprops/bidipropsbuilder.cpp
1503     icu: source/common/ubidi_props.h/.c/_data.h
1504     icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
1505
1506 * generate core properties data files
1507 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1508 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
1509 - rebuild ICU (make install) & tools
1510 - run genuca again (see step above) so that it picks up the new nfc.nrm
1511 - rebuild ICU (make install) & tools
1512
1513 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1514   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1515 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1516 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
1517 - nothing new in 7.0, no test file to update
1518
1519 * run & fix ICU4C tests
1520
1521 * update Java data files
1522 - refresh just the UCD-related files, just to be safe
1523 - see (ICU4C)/source/data/icu4j-readme.txt
1524 - mkdir /tmp/icu4j
1525 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1526   output:
1527     ...
1528     Unicode .icu files built to ./out/build/icudt53l
1529     echo timestamp > uni-core-data
1530     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1531     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
1532     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1533     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1534     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
1535     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
1536     mkdir -p /tmp/icu4j/main/shared/data
1537     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1538     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
1539     mkdir -p /tmp/icu4j/main/shared/data
1540     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1541     make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
1542 - copy the big-endian Unicode data files to another location,
1543   separate from the other data files
1544     ICUDT=icudt54b
1545     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1546     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1547     cd ~/svn.icu/uni70/dbg/data/out/icu4j
1548     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1549     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1550     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1551     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1552     cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1553     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1554 - refresh ICU4J
1555     ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1556
1557 * update CollationFCD.java
1558   + copy & paste the initializers of lcccIndex[] etc. from
1559     ICU4C/source/i18n/collationfcd.cpp to
1560     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1561
1562 * refresh Java test .txt files
1563 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1564     cd $ICU_SRC_DIR/source/data/unidata
1565     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1566     cd ../../test/testdata
1567     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1568     cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1569
1570 * UCA
1571
1572 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
1573 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
1574 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
1575 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
1576 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
1577 - review data; compare files, use blankweights.sed or similar
1578   ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
1579 - cd ~/svn.unitools/Generated/uca/7.0.0/
1580 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1581   cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1582 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1583     (note removing the underscore before "Rules")
1584     cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1585 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1586   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1587   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1588     cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1589     cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1590     cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1591 - run genuca, see command line above
1592 - rebuild ICU4C
1593 - refresh ICU4J collation data:
1594   (subset of instructions above for properties data refresh, except copies all coll/*)
1595     ICUDT=icudt54b
1596     ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1597     ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1598     ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1599     ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1600 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1601 - note on intltest: if collate/UCAConformanceTest fails, then
1602   utility/MultithreadTest/TestCollators will fail as well;
1603   fix the conformance test before looking into the multi-thread test
1604 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
1605 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
1606   ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
1607
1608 * When refreshing all of ICU4J data from ICU4C
1609 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1610 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1611 or
1612 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1613
1614 * run & fix ICU4J tests
1615
1616 *** LayoutEngine script information
1617
1618 (For details see the Unicode 5.2 change log below.)
1619
1620 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1621   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1622   in the working directory.
1623   (It also generates ScriptRunData.cpp, which is no longer needed.)
1624
1625   The generated files have a current copyright date and "@stable" statement.
1626   ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
1627   for "born stable" Unicode API constants, and to stop parsing ICU version numbers
1628   which may not contain dots any more.
1629
1630 - diff current <icu>/source/layout files vs. generated ones
1631     ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1632   review and manually merge desired changes;
1633   fix gratuitous changes, incorrect @draft/@stable and missing aliases;
1634   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1635 - if you just copy the above files, then
1636   fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1637   manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1638
1639 *** API additions
1640 - send notice to icu-design about new born-@stable API (enum constants etc.)
1641
1642 *** merge the Unicode update branches back onto the trunk
1643 - do not merge the icudata.jar and testdata.jar,
1644   instead rebuild them from merged & tested ICU4C
1645
1646 ---------------------------------------------------------------------------- ***
1647
1648 Unicode 6.3 update
1649
1650 http://www.unicode.org/review/pri249/  -- beta review
1651 http://www.unicode.org/reports/uax-proposed-updates.html
1652 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
1653 http://www.unicode.org/reports/tr44/tr44-11.html
1654
1655 *** ICU Trac
1656
1657 - ticket 10128: update ICU to Unicode 6.3 beta
1658 - ticket 10168: update ICU to Unicode 6.3 final
1659 - C++ branches/markus/uni63 at r33552 from trunk at r33551
1660 - Java branches/markus/uni63 at r33550 from trunk at r33553
1661
1662 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
1663
1664 *** Unicode version numbers
1665 - makedata.mak
1666 - uchar.h
1667   (configure.in & configure: have been modified to extract the version from uchar.h)
1668 - com.ibm.icu.util.VersionInfo
1669 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1670
1671 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1672   so that the makefiles see the new version number.
1673
1674 *** data files & enums & parser code
1675
1676 * file preparation
1677
1678 - download UCD, UCA & IDNA files
1679 - make sure that the Unicode data folder passed into preparseucd.py
1680   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1681 - modify preparseucd.py:
1682   parse new file BidiBrackets.txt
1683   with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
1684 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
1685 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1686 - Check test file diffs for previously commented-out, known-failing data lines;
1687   probably need to keep those commented out.
1688
1689 * PropertyAliases.txt changes
1690 - 1 new Enumerated Property
1691   bpt                      ; Bidi_Paired_Bracket_Type
1692   -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
1693   -> ubidi_props.h & .c & UBiDiProps.java
1694   -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
1695   -> uprops.cpp
1696   -> change ubidi.icu format version from 2.0 to 2.1
1697 - 1 new Miscellaneous Property
1698   bpb                      ; Bidi_Paired_Bracket
1699   -> uchar.h & UProperty.java
1700   -> ppucd.h & .cpp
1701
1702 * PropertyValueAliases.txt changes
1703 - 3 Bidi_Paired_Bracket_Type (bpt) values:
1704   bpt; c                                ; Close
1705   bpt; n                                ; None
1706   bpt; o                                ; Open
1707   -> uchar.h & UCharacter.BidiPairedBracketType
1708   -> ubidi_props.h & .c & UBiDiProps.java
1709   -> change ubidi.icu format version from 2.0 to 2.1
1710 - 4 new Bidi_Class (bc) values:
1711   bc ; FSI                              ; First_Strong_Isolate
1712   bc ; LRI                              ; Left_To_Right_Isolate
1713   bc ; RLI                              ; Right_To_Left_Isolate
1714   bc ; PDI                              ; Pop_Directional_Isolate
1715   -> uchar.h & UCharacterEnums.ECharacterDirection
1716   -> until the bidi code gets updated,
1717      Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
1718 - 3 new Word_Break (WB) values:
1719   WB ; HL                               ; Hebrew_Letter
1720   WB ; SQ                               ; Single_Quote
1721   WB ; DQ                               ; Double_Quote
1722   -> uchar.h & UCharacter.WordBreak
1723   -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
1724 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1725   (added 2012-10-16)
1726   Aghb  239     Caucasian Albanian
1727   Mahj  314     Mahajani
1728   -> uscript.h
1729   -> com.ibm.icu.lang.UScript
1730     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1731     replace  public static final int \1 = \2;\3
1732   -> preparseucd.py _scripts_only_in_iso15924
1733   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1734       and in com.ibm.icu.dev.test.lang.TestUScript.java
1735   -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1736      (not strictly necessary for NOT_ENCODED scripts)
1737
1738 * generate normalization data files
1739 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
1740 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
1741 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
1742 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1743 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1744 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1745 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1746
1747 * build ICU (make install)
1748   so that the tools build can pick up the new definitions from the installed header files.
1749
1750 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1751
1752 * build Unicode tools using CMake+make
1753
1754 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1755
1756 # Location (--prefix) of where ICU was installed.
1757 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
1758 # Location of the ICU source tree.
1759 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
1760
1761 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1762 ~/svn.icutools/trunk/dbg/unicode/c$ make
1763
1764 * generate core properties data files
1765 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
1766 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
1767 - rebuild ICU (make install) & tools
1768 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1769 - rebuild ICU (make install) & tools
1770
1771 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1772   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1773 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1774 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
1775 - nothing new in 6.3, no test file to update
1776
1777 * update Java data files
1778 - refresh just the UCD-related files, just to be safe
1779 - see (ICU4C)/source/data/icu4j-readme.txt
1780 - mkdir /tmp/icu4j
1781 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1782   output:
1783     ...
1784     Unicode .icu files built to ./out/build/icudt52l
1785     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1786     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
1787     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1788     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1789     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
1790     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
1791     mkdir -p /tmp/icu4j/main/shared/data
1792     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1793     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
1794     mkdir -p /tmp/icu4j/main/shared/data
1795     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1796     make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
1797 - copy the big-endian Unicode data files to another location,
1798   separate from the other data files
1799     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1800     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1801     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1802     ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
1803     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1804     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1805     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1806 - refresh ICU4J
1807     ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1808
1809 * refresh Java test .txt files
1810 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1811
1812 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
1813
1814 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1815 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1816 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1817 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1818   (note removing the underscore before "Rules")
1819 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1820   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1821   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1822 - check test file diffs for previously commented-out, known-failing data lines;
1823   probably need to keep those commented out
1824 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1825 - run genuca, see command line above
1826 - rebuild ICU4C
1827 - refresh ICU4J collation data:
1828   (subset of instructions above for properties data refresh, except copies all coll/*)
1829     ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1830     ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1831     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1832     ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1833 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1834 - note on intltest: if collate/UCAConformanceTest fails, then
1835   utility/MultithreadTest/TestCollators will fail as well;
1836   fix the conformance test before looking into the multi-thread test
1837
1838 * test ICU, fix test code where necessary
1839
1840 * When refreshing all of ICU4J data from ICU4C
1841 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1842 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1843 or
1844 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1845
1846 *** LayoutEngine script information
1847 - skipped for Unicode 6.3: no new scripts
1848
1849 *** merge the Unicode update branches back onto the trunk
1850 - do not merge the icudata.jar and testdata.jar,
1851   instead rebuild them from merged & tested ICU4C
1852
1853 ---------------------------------------------------------------------------- ***
1854
1855 Unicode 6.2 update
1856
1857 http://www.unicode.org/review/pri230/
1858 http://www.unicode.org/versions/beta-6.2.0.html
1859 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
1860 http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
1861 http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
1862 http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
1863 http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
1864 http://unicode.org/Public/idna/6.2.0/
1865
1866 *** ICU Trac
1867
1868 - ticket 9515: Unicode 6.2: final ICU update
1869
1870 - ticket 9514: UCA 6.2: fix UCARules.txt
1871
1872 - ticket 9437: update ICU to Unicode 6.2
1873 - C++ branches/markus/uni62 at r32050 from trunk at r32041
1874 - Java branches/markus/uni62 at r32068 from trunk at r32066
1875
1876 *** Unicode version numbers
1877 - makedata.mak
1878 - uchar.h
1879   (configure.in & configure: have been modified to extract the version from uchar.h)
1880 - com.ibm.icu.util.VersionInfo
1881 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1882
1883 *** data files & enums & parser code
1884
1885 * file preparation
1886
1887 - download UCD, UCA & IDNA files
1888 - make sure that the Unicode data folder passed into preparseucd.py
1889   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1890 - modify preparseucd.py: NamesList.txt is now in UTF-8
1891 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
1892 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1893 - Check test file diffs for previously commented-out, known-failing data lines;
1894   probably need to keep those commented out.
1895
1896 * PropertyValueAliases.txt changes
1897 - 1 new Line_Break (lb) value:
1898   lb ; RI                               ; Regional_Indicator
1899   -> uchar.h & UCharacter.LineBreak
1900 - 1 new Word_Break (WB) value:
1901   WB ; RI                               ; Regional_Indicator
1902   -> uchar.h & UCharacter.WordBreak
1903 - 1 new Grapheme_Cluster_Break (GCB) value:
1904   GCB; RI                               ; Regional_Indicator
1905   -> uchar.h & UCharacter.GraphemeClusterBreak
1906
1907 * 3 new numeric values
1908   The new value -1, which was really supposed to be NaN but that would have required
1909   new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
1910   but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
1911     cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
1912     cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
1913   The two new values 216000 and 432000 require an addition to the encoding of numeric values.
1914     cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
1915     cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
1916   -> uprops.h, uchar.c & UCharacterProperty.java
1917   -> cucdtst.c & UCharacterTest.java
1918
1919 * generate normalization data files
1920 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
1921 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
1922 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
1923 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1924 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1925 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1926 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1927
1928 * build ICU (make install)
1929   so that the tools build can pick up the new definitions from the installed header files.
1930 * build Unicode tools using CMake+make
1931
1932 * generate core properties data files
1933 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
1934 - in initial bootstrapping, change the UCA version
1935   in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1936 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
1937 - rebuild ICU (make install) & tools
1938   + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1939     check if the UCA version in FractionalUCA.txt matches the new Unicode version
1940     (see step above)
1941 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1942 - rebuild ICU (make install) & tools
1943
1944 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1945   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1946 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1947 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
1948 - nothing new in 6.2, no test file to update
1949
1950 * update Java data files
1951 - refresh just the UCD-related files, just to be safe
1952 - see (ICU4C)/source/data/icu4j-readme.txt
1953 - mkdir /tmp/icu4j
1954 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1955   output:
1956     ...
1957     Unicode .icu files built to ./out/build/icudt50l
1958     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1959     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
1960     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1961     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1962     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
1963     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
1964     mkdir -p /tmp/icu4j/main/shared/data
1965     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1966     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
1967     mkdir -p /tmp/icu4j/main/shared/data
1968     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1969     make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
1970 - copy the big-endian Unicode data files to another location,
1971   separate from the other data files
1972     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1973     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1974     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1975     ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
1976     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1977     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1978     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1979 - refresh ICU4J
1980     ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1981
1982 * refresh Java test .txt files
1983 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1984
1985 * UCA
1986
1987 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1988 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1989 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1990 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1991   (note removing the underscore before "Rules")
1992 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
1993   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1994   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1995 - check test file diffs for previously commented-out, known-failing data lines;
1996   probably need to keep those commented out
1997 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1998 - run genuca, see command line above
1999 - rebuild ICU4C
2000 - refresh ICU4J collation data:
2001   (subset of instructions above for properties data refresh, except copies all coll/*)
2002     ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2003     ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
2004     ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
2005     ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
2006 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
2007 - note on intltest: if collate/UCAConformanceTest fails, then
2008   utility/MultithreadTest/TestCollators will fail as well;
2009   fix the conformance test before looking into the multi-thread test
2010
2011 * test ICU, fix test code where necessary
2012
2013 * When refreshing all of ICU4J data from ICU4C
2014 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2015 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2016 or
2017 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2018
2019 *** LayoutEngine script information
2020 - skipped for Unicode 6.2: no new scripts
2021
2022 *** merge the Unicode update branches back onto the trunk
2023 - do not merge the icudata.jar and testdata.jar,
2024   instead rebuild them from merged & tested ICU4C
2025
2026 ---------------------------------------------------------------------------- ***
2027
2028 Future Unicode update
2029
2030 Tools simplified since the Unicode 6.1 update. See
2031 - http://site.icu-project.org/design/props/ppucd
2032 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
2033
2034 * Unicode version numbers
2035 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
2036
2037 * file preparation
2038 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
2039 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
2040 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2041 - Check test file diffs for previously commented-out, known-failing data lines;
2042   probably need to keep those commented out.
2043
2044 * PropertyValueAliases.txt changes
2045 - Script codes that are in ISO 15924 but not in Unicode are now listed in
2046   preparseucd.py, in the _scripts_only_in_iso15924 variable.
2047   If there are new ISO codes, then add them.
2048   If Unicode adds some of them, then remove them from the .py variable.
2049
2050 * UnicodeData.txt changes
2051 - No more manual changes for CJK ranges for algorithmic names;
2052   those are now written to ppucd.txt and genprops reads them from there.
2053
2054 * generate core properties data files (makeprops.sh was deleted)
2055 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
2056
2057 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
2058 - it is now generated by preparseucd.py
2059
2060 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
2061 - it is now generated by preparseucd.py
2062 - make sure that the Unicode data folder passed into preparseucd.py
2063   includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
2064   (can be in some subfolder)
2065
2066 * generate normalization data files
2067 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
2068 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
2069 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
2070 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2071 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2072 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2073 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2074
2075 * build ICU (make install)
2076 * build Unicode tools using CMake+make
2077
2078 * new way to call genuca (makeuca.sh was deleted)
2079 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
2080
2081 ---------------------------------------------------------------------------- ***
2082
2083 Unicode 6.1 update
2084
2085 *** ICU Trac
2086
2087 - ticket 8995 final update to Unicode 6.1
2088 - ticket 8994 regenerate source/layout/CanonData.cpp
2089
2090 - ticket 8961 support Unicode "Age" value *names*
2091 - ticket 8963 support multiple character name aliases & types
2092
2093 - ticket 8827 "update ICU to Unicode 6.1"
2094 - C++ branches/markus/uni61 at r30864 from trunk at r30843
2095 - Java branches/markus/uni61 at r30865 from trunk at r30863
2096
2097 *** Unicode version numbers
2098 - makedata.mak
2099 - uchar.h
2100   (configure.in & configure: have been modified to extract the version from uchar.h)
2101 - com.ibm.icu.util.VersionInfo
2102 - icutools/unicode/makedefs.sh
2103   + also review & update other definitions in that file,
2104     e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
2105
2106 *** data files & enums & parser code
2107
2108 * file preparation
2109
2110 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
2111 - This prepares both unidata and testdata files in respective output subfolders.
2112 - Check test file diffs for previously commented-out, known-failing data lines;
2113   probably need to keep those commented out.
2114
2115 * PropertyValueAliases.txt changes
2116 - 11 new block names:
2117   Arabic_Extended_A
2118   Arabic_Mathematical_Alphabetic_Symbols
2119   Chakma
2120   Meetei_Mayek_Extensions
2121   Meroitic_Cursive
2122   Meroitic_Hieroglyphs
2123   Miao
2124   Sharada
2125   Sora_Sompeng
2126   Sundanese_Supplement
2127   Takri
2128   -> add to uchar.h
2129   -> add to UCharacter.UnicodeBlock IDs
2130     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2131             replace  public static final int \1_ID = \2; \3
2132   -> add to UCharacter.UnicodeBlock objects
2133     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2134             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2135 - 1 new Joining_Group (jg) value:
2136   Rohingya_Yeh
2137   -> uchar.h & UCharacter.JoiningGroup
2138 - 2 new Line_Break (lb) values:
2139   CJ=Conditional_Japanese_Starter
2140   HL=Hebrew_Letter
2141   -> uchar.h & UCharacter.LineBreak
2142 - 7 new scripts:
2143   sc ; Cakm      ; Chakma
2144   sc ; Merc      ; Meroitic_Cursive
2145   sc ; Mero      ; Meroitic_Hieroglyphs
2146   sc ; Plrd      ; Miao
2147   sc ; Shrd      ; Sharada
2148   sc ; Sora      ; Sora_Sompeng
2149   sc ; Takr      ; Takri
2150   -> remove these from SyntheticPropertyValueAliases.txt
2151   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2152       and in com.ibm.icu.dev.test.lang.TestUScript.java
2153 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2154   (added 2011-06-21)
2155   Khoj        322     Khojki
2156   Tirh        326     Tirhuta
2157     and another one added 2011-12-09
2158   Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
2159   -> uscript.h
2160   -> com.ibm.icu.lang.UScript
2161     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2162     replace  public static final int \1 = \2;\3
2163   -> SyntheticPropertyValueAliases.txt
2164   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2165       and in com.ibm.icu.dev.test.lang.TestUScript.java
2166
2167 * UnicodeData.txt changes
2168 - the last Unihan code point changes from U+9FCB to U+9FCC
2169   search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
2170   + do change gennames.c
2171   + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
2172
2173 * DerivedBidiClass.txt changes
2174 - 2 new default-AL blocks:
2175 #     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
2176 #     Arabic Mathematical Alphabetic Symbols:
2177 #                       U+1EE00  - U+1EEFF  (was default-R)
2178 - 2 new default-R blocks:
2179 #     Meroitic Hieroglyphs:
2180 #                        U+10980 - U+1099F
2181 #     Meroitic Cursive:  U+109A0 - U+109FF
2182   -> should be picked up by the explicit data in the file
2183
2184 * NameAliases.txt changes
2185 - from
2186     # Each line has two fields
2187     # First field: Code point
2188     # Second field: Alias
2189 - to
2190     # Each line has three fields, as described here:
2191     #
2192     # First field:  Code point
2193     # Second field: Alias
2194     # Third field:  Type
2195 - Also, the file previously allowed multiple aliases but only now does it
2196   actually provide multiple, even multiple of the same type. For example,
2197     FEFF;BYTE ORDER MARK;alternate
2198     FEFF;BOM;abbreviation
2199     FEFF;ZWNBSP;abbreviation
2200 - This breaks our gennames parser, unames.icu data structure, and API.
2201   Fix gennames to only pick up "correction" aliases.
2202   New ticket #8963 for further changes.
2203
2204 * run genpname/preparse.pl (on Linux)
2205   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2206   + make sure that data.h is writable
2207   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2208   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2209
2210 * build ICU (make install)
2211   so that the tools build can pick up the new definitions from the installed header files.
2212 * build Unicode tools (at least genpname) using CMake+make
2213
2214 * run genpname
2215   (builds both pnames.icu and propname_data.h)
2216 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2217 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
2218
2219 * build ICU (make install)
2220 * build Unicode tools using CMake+make
2221
2222 * update source/data/unidata/norm2/nfkc_cf.txt
2223 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2224
2225 * update source/data/unidata/norm2/uts46.txt
2226 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
2227   to ~/svn.icu/tools/trunk/src/unicode/py
2228 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
2229 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2230 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2231
2232 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2233   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2234 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2235 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
2236 - nothing new in 6.1, no test file to update
2237
2238 * generate core properties data files
2239 - in initial bootstrapping, change the UCA version
2240   in source/data/unidata/FractionalUCA.txt to match the new Unicode version
2241 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2242 - rebuild ICU & tools
2243   + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
2244     check if the UCA version in FractionalUCA.txt matches the new Unicode version
2245     (see step above)
2246 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
2247   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2248 - rebuild ICU & tools
2249
2250 * update Java data files
2251 - refresh just the UCD-related files, just to be safe
2252 - see (ICU4C)/source/data/icu4j-readme.txt
2253 - mkdir /tmp/icu4j
2254 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2255   output:
2256     ...
2257     Unicode .icu files built to ./out/build/icudt49l
2258     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
2259     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
2260     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2261     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
2262     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
2263     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
2264     mkdir -p /tmp/icu4j/main/shared/data
2265     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2266     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
2267     mkdir -p /tmp/icu4j/main/shared/data
2268     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2269     make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
2270 - copy the big-endian Unicode data files to another location,
2271   separate from the other data files
2272     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2273     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
2274     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
2275     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
2276     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
2277     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2278     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
2279 - refresh ICU4J
2280     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
2281
2282 * refresh Java test .txt files
2283 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2284
2285 * test ICU so far, fix test code where necessary
2286 - temporarily ignore collation issues that look like UCA/UCD mismatches,
2287   until UCA data is updated
2288
2289 * UCA
2290
2291 - get output from Mark's tools; look in
2292     http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
2293 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2294 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2295   (note removing the underscore before "Rules")
2296 - update (ICU)/source/test/testdata/CollationTest_*.txt
2297   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2298   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
2299 - check test file diffs for previously commented-out, known-failing data lines;
2300   probably need to keep those commented out
2301 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
2302 - run makeuca.sh:
2303   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2304 - rebuild ICU4C
2305 - refresh ICU4J collation data:
2306   (subset of instructions above for properties data refresh, except copies all coll/*)
2307     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2308     ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2309     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
2310     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
2311 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
2312 - note on intltest: if collate/UCAConformanceTest fails, then
2313   utility/MultithreadTest/TestCollators will fail as well;
2314   fix the conformance test before looking into the multi-thread test
2315
2316 * When refreshing all of ICU4J data from ICU4C
2317 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2318 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2319 or
2320 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2321
2322 *** LayoutEngine script information
2323
2324 (For details see the Unicode 5.2 change log below.)
2325
2326 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2327   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2328   in the working directory.
2329   (It also generates ScriptRunData.cpp, which is no longer needed.)
2330
2331   The generated files have a current copyright date and "@draft" statement.
2332
2333 - diff current <icu>/source/layout files vs. generated ones
2334     ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2335   review and manually merge desired changes;
2336   fix gratuitous changes, incorrect @draft and missing aliases;
2337   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2338 - if you just copy the above files, then
2339   fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
2340   manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2341
2342 *** merge the Unicode update branches back onto the trunk
2343 - do not merge the icudata.jar and testdata.jar,
2344   instead rebuild them from merged & tested ICU4C
2345
2346 ---------------------------------------------------------------------------- ***
2347
2348 ICU 4.8 (no Unicode update, just new script codes)
2349
2350 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2351   (added 2010-12-21)
2352     Afak    439     Afaka
2353     Jurc    510     Jurchen
2354     Mroo    199     Mro, Mru
2355     Nshu    499     Nüshu
2356     Shrd    319     Sharada, Śāradā
2357     Sora    398     Sora Sompeng
2358     Takr    321     Takri, Ṭākrī, Ṭāṅkrī
2359     Tang    520     Tangut
2360     Wole    480     Woleai
2361   -> uscript.h
2362   -> com.ibm.icu.lang.UScript
2363     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2364     replace  public static final int \1 = \2;\3
2365   -> genpname/SyntheticPropertyValueAliases.txt
2366   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2367       and in com.ibm.icu.dev.test.lang.TestUScript.java
2368
2369 * run genpname/preparse.pl (on Linux)
2370   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2371   + make sure that data.h is writable
2372   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2373   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2374
2375 * rebuild Unicode tools (at least genpname) using make
2376 - You might first need to "make install" ICU so that the tools build can pick
2377   up the new definitions from the installed header files.
2378
2379 * run genpname
2380   (builds both pnames.icu and propname_data.h)
2381 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2382 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
2383 - rebuild ICU & tools
2384
2385 * run genprops
2386 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
2387 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
2388 - rebuild ICU & tools
2389
2390 * update Java data files
2391 - refresh just the UCD-related files, just to be safe
2392 - see (ICU4C)/source/data/icu4j-readme.txt
2393 - mkdir /tmp/icu4j
2394 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2395 - copy the big-endian Unicode data files to another location,
2396   separate from the other data files
2397     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
2398     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
2399     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
2400 - refresh ICU4J
2401     ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
2402
2403 * should have updated the layout engine script codes but forgot
2404
2405 ---------------------------------------------------------------------------- ***
2406
2407 Unicode 6.0 update
2408
2409 *** related ICU Trac tickets
2410
2411 7264 Unicode 6.0 Update
2412
2413 *** Unicode version numbers
2414 - makedata.mak
2415 - uchar.h
2416   (configure.in & configure: have been modified to extract the version from uchar.h)
2417 - com.ibm.icu.util.VersionInfo
2418
2419 *** data files & enums & parser code
2420
2421 * file preparation
2422
2423 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
2424 - This now prepares both unidata and testdata files in respective output subfolders.
2425
2426 * PropertyAliases.txt changes
2427 - new Script_Extensions property defined in the new ScriptExtensions.txt file
2428   but not listed in PropertyAliases.txt; reported to unicode.org;
2429   -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
2430     scx; Script_Extensions
2431   -> uchar.h with new UProperty section
2432   -> com.ibm.icu.lang.UProperty, parallel with uchar.h
2433
2434 * PropertyValueAliases.txt changes
2435 - 12 new block names:
2436   Alchemical_Symbols
2437   Bamum_Supplement
2438   Batak
2439   Brahmi
2440   CJK_Unified_Ideographs_Extension_D
2441   Emoticons
2442   Ethiopic_Extended_A
2443   Kana_Supplement
2444   Mandaic
2445   Miscellaneous_Symbols_And_Pictographs
2446   Playing_Cards
2447   Transport_And_Map_Symbols
2448   -> add to uchar.h
2449   -> add to UCharacter.UnicodeBlock
2450     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2451             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2452 - Joining_Group (jg) values:
2453   Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
2454   -> uchar.h & UCharacter.JoiningGroup
2455 - 3 new scripts:
2456   sc ; Batk      ; Batak
2457   sc ; Brah      ; Brahmi
2458   sc ; Mand      ; Mandaic
2459   -> remove these from SyntheticPropertyValueAliases.txt
2460   -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
2461   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2462       and in com.ibm.icu.dev.test.lang.TestUScript.java
2463 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2464   (added 2009-11-11..2010-07-18)
2465   Bass        259     Bassa Vah
2466   Dupl        755     Duployan shortand
2467   Elba        226     Elbasan
2468   Gran        343     Grantha
2469   Kpel        436     Kpelle
2470   Loma        437     Loma
2471   Mend        438     Mende
2472   Merc        101     Meroitic Cursive
2473   Narb        106     Old North Arabian
2474   Nbat        159     Nabataean
2475   Palm        126     Palmyrene
2476   Sind        318     Sindhi
2477   Wara        262     Warang Citi
2478   -> uscript.h
2479   -> com.ibm.icu.lang.UScript
2480     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2481     replace  public static final int \1 = \2;\3
2482   -> SyntheticPropertyValueAliases.txt
2483   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2484       and in com.ibm.icu.dev.test.lang.TestUScript.java
2485 - ISO 15924 name change
2486   Mero        100     Meroitic Hieroglyphs (was Meroitic)
2487   -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
2488 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
2489
2490 * UnicodeData.txt changes
2491 - new CJK block:
2492   2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
2493   2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
2494   -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
2495
2496 * build Unicode tools using CMake+make
2497
2498 * run genpname/preparse.pl (on Linux)
2499   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2500   + make sure that data.h is writable
2501   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2502   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2503
2504 * rebuild Unicode tools (at least genpname) using make
2505 - You might first need to "make install" ICU so that the tools build can pick
2506   up the new definitions from the installed header files.
2507
2508 * run genpname
2509 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2510 - rebuild ICU & tools
2511
2512 * update source/data/unidata/norm2/nfkc_cf.txt
2513 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2514
2515 * update source/data/unidata/norm2/uts46.txt
2516 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
2517   to ~/svn.icu/tools/trunk/src/unicode/py
2518 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
2519 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2520 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2521
2522 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2523   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2524 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2525 - Unicode 6.0: U+2260, U+226E, U+226F
2526
2527 * generate core properties data files
2528 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2529 - rebuild ICU & tools
2530 - run makeuca.sh so that genuca picks up the new nfc.nrm:
2531   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2532 - rebuild ICU & tools
2533
2534 * implement new Script_Extensions property (provisional)
2535 - parser & generator: genprops & uprops.icu
2536 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
2537 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
2538
2539 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
2540 - (one-time change)
2541 - genbidi/gencase/genprops tools changes
2542 - re-run makeprops.sh (see above)
2543 - UCharacterProperty.java, UCharacterTypeIterator.java,
2544   UBiDiProps.java, UCaseProps.java, and several others with minor changes;
2545   UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
2546
2547 * update Java data files
2548 - refresh just the UCD-related files, just to be safe
2549 - see (ICU4C)/source/data/icu4j-readme.txt
2550 - mkdir /tmp/icu4j
2551 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2552   output:
2553     ...
2554     Unicode .icu files built to ./out/build/icudt45l
2555     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2556     echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2557     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2558     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
2559     mkdir -p /tmp/icu4j/main/shared/data
2560     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2561 - copy the big-endian Unicode data files to another location,
2562   separate from the other data files
2563     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2564     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2565     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2566     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
2567     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2568     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2569     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2570 - refresh ICU4J
2571     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2572
2573 * refresh Java test .txt files
2574 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2575
2576 * un-hardcode normalization skippable (NF*_Inert) test data
2577 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
2578
2579 * copy updated break iterator test files
2580 - now handled by early ucdcopy.py and
2581   copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
2582   (old instructions:
2583    copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
2584    to ~/svn.icu/trunk/src/source/test/testdata)
2585 - they are not used in ICU4J
2586
2587 * UCA
2588
2589 - get output from Mark's tools; look in
2590     http://www.unicode.org/~book/incoming/mark/uca6.0.0/
2591     http://www.macchiato.com/unicode/utc/additional-uca-files
2592     http://www.unicode.org/Public/UCA/6.0.0/
2593     http://www.unicode.org/~mdavis/uca/
2594 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2595 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2596 - update Han-implicit ranges for new CJK extensions:
2597   swapCJK() in ucol.cpp & ImplicitCEGenerator.java
2598 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
2599   do not add it into invuca so that tailoring primary-after an ignorable works
2600 - genuca: permit space between [variable top] bytes
2601 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
2602 - run makeuca.sh:
2603   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2604 - rebuild ICU4C
2605 - refresh ICU4J collation data:
2606   (subset of instructions above for properties data refresh, except copies all coll/*)
2607     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2608     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2609     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2610     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2611 - update (ICU)/source/test/testdata/CollationTest_*.txt
2612   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2613   with output from Mark's Unicode tools
2614 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2615 - note on intltest: if collate/UCAConformanceTest fails, then
2616   utility/MultithreadTest/TestCollators will fail as well;
2617   fix the conformance test before looking into the multi-thread test
2618
2619 * When refreshing all of ICU4J data from ICU4C
2620 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2621 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2622 or
2623 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2624
2625 *** LayoutEngine script information
2626
2627 (For details see the Unicode 5.2 change log below.)
2628
2629 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2630 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2631 ScriptRunData.cpp, which is no longer needed.)
2632
2633 The generated files have a current copyright date and "@draft" statement.
2634
2635 * copy the above files into <icu>/source/layout, replacing the old files.
2636 * fix mixed line endings
2637 * review the diffs and fix incorrect @draft and missing aliases;
2638   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2639 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2640
2641 ---------------------------------------------------------------------------- ***
2642
2643 Unicode 5.2 update
2644
2645 *** related ICU Trac tickets
2646
2647 7084 Unicode 5.2
2648
2649 7167 verify collation bytes
2650 7235 Java test NAME_ALIAS
2651 7236 Java DerivedCoreProperties.txt test
2652 7237 Java BidiTest.txt
2653 7238 UTrie2 in core unidata
2654 7239 test for tailoring gaps
2655 7240 Java fix CollationMiscTest
2656 7243 update layout engine for Unicode 5.2
2657
2658 *** Unicode version numbers
2659 - makedata.mak
2660 - uchar.h
2661 - configure.in & configure
2662 - update ucdVersion in gennames.c if an algorithmic range changes
2663
2664 *** data files & enums & parser code
2665
2666 * file preparation
2667
2668 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
2669 - includes finding files regardless of version numbers,
2670   copying them, and performing the equivalent processing of the
2671   ucdstrip and ucdmerge tools on the desired set of files
2672
2673 * notes on changes
2674 - PropertyAliases.txt
2675   moved from numeric to enumerated:
2676     ccc       ; Canonical_Combining_Class
2677   new string properties:
2678     NFKC_CF   ; NFKC_Casefold
2679     Name_Alias; Name_Alias
2680   new binary properties:
2681     Cased     ; Cased
2682     CI        ; Case_Ignorable
2683     CWCF      ; Changes_When_Casefolded
2684     CWCM      ; Changes_When_Casemapped
2685     CWKCF     ; Changes_When_NFKC_Casefolded
2686     CWL       ; Changes_When_Lowercased
2687     CWT       ; Changes_When_Titlecased
2688     CWU       ; Changes_When_Uppercased
2689   new CJK Unihan properties (not supported by ICU)
2690 - PropertyValueAliases.txt
2691   new block names
2692   new scripts
2693   one script code change:
2694     sc ; Qaai      ; Inherited
2695     ->
2696     sc ; Zinh      ; Inherited                        ; Qaai
2697   new Line_Break (lb) value:
2698     lb ; CP        ; Close_Parenthesis
2699   new Joining_Group (jg) values: Farsi_Yeh, Nya
2700   other new values:
2701     ccc; 214; ATA  ; Attached_Above
2702 - DerivedBidiClass.txt
2703   new default-R range: U+1E800 - U+1EFFF
2704 - UnicodeData.txt
2705   all of the ISO comments are gone
2706   new CJK block end:
2707     9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
2708   new CJK block:
2709     2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
2710     2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
2711
2712 * genpname
2713 - run preparse.pl
2714   + cd \svn\icuproj\icu\trunk\source\tools\genpname
2715   + make sure that data.h is writable
2716   + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
2717   + preparse.pl complains with errors like the following:
2718       Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
2719     This is because ICU 4.0 had scripts from ISO 15924 which are now
2720     added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
2721     and PropertyValueAliases.txt.
2722     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2723        Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
2724   + preparse.pl complains with errors about block names missing from uchar.h; add them
2725
2726 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2727 - new block & script values
2728   + 26 new blocks
2729     copy new blocks from Blocks.txt
2730     MS VC++ 2008 regular expression:
2731       find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
2732       replace with "    UBLOCK_\3 = 172, /*[\1]*/"
2733   + several new script values already added in ICU 4.0 for ISO 15924 coverage
2734     (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
2735   + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
2736   + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
2737     (added to SyntheticPropertyValueAliases.txt)
2738 - new Joining Group (JG) values: Farsi_Yeh, Nya
2739 - new Line_Break (lb) value:
2740     lb ; CP        ; Close_Parenthesis
2741
2742 * hardcoded Unihan range end/limit
2743 - Unihan range end moves from 9FC3 to 9FCB
2744   search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
2745   + do change gennames.c
2746
2747 * Compare definitions of new binary properties with what we used to use
2748   in algorithms, to see if the definitions changed.
2749 - Verified that definitions for Cased and Case_Ignorable are unchanged.
2750   The gencase tool now parses the newly public Case_Ignorable values
2751   in case the definition changes in the future.
2752
2753 * uchar.c & uprops.h & uprops.c & genprops
2754 - new numeric values that didn't exist in Unicode data before:
2755     1/7, 1/9, 1/10, 3/10, 1/16, 3/16
2756   the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
2757   therefore redesign the encoding of numeric types and values for formatVersion 6;
2758   design for simple numbers up to at least 144 ("one gross"),
2759   large values up to at least 10^20,
2760   and fractions with numerators -1..17 and denominators 1..16
2761   to cover current and expected future values
2762   (e.g., more Han numeric values, Meroitic twelfths)
2763
2764 * reimplement Hangul_Syllable_Type for new Jamo characters
2765 - the old code assumed that all Jamo characters are in the 11xx block
2766 - Unicode 5.2 fills holes there and adds new Jamo characters in
2767     A960..A97F; Hangul Jamo Extended-A
2768   and in
2769     D7B0..D7FF; Hangul Jamo Extended-B
2770 - Hangul_Syllable_Type can be trivially derived from a subset of
2771   Grapheme_Cluster_Break values
2772
2773 * build Unicode data source code for hardcoding core data
2774 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
2775
2776 ICU data make path is \svn\icuproj\icu\trunk\source\data\
2777 ICU root path is \svn\icuproj\icu\trunk
2778 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2779 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2780 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2781 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2782 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2783 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2784 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2785 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
2786 Creating data file for Unicode Property Names
2787 Creating data file for Unicode Character Properties
2788 Creating data file for Unicode Case Mapping Properties
2789 Creating data file for Unicode BiDi/Shaping Properties
2790 Creating data file for Unicode Normalization
2791 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
2792 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
2793
2794 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
2795   and rebuild the common library
2796
2797 *** UCA
2798
2799 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
2800 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
2801 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
2802 [ Begin obsolete instructions:
2803   Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
2804     - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
2805       on Windows:
2806         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
2807         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
2808   End obsolete instructions]
2809 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2810   not just the *_STUB.txt files
2811 - note on intltest: if collate/UCAConformanceTest fails, then
2812   utility/MultithreadTest/TestCollators will fail as well;
2813   fix the conformance test before looking into the multi-thread test
2814
2815 *** Implement Cased & Case_Ignorable properties
2816 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
2817 - Problem: These properties should be disjoint, but aren't
2818 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
2819 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
2820
2821 *** Implement Changes_When_Xyz properties
2822 - without stored data
2823
2824 *** Implement Name_Alias property
2825 - add it as another name field in unames.icu
2826 - make it available via u_charName() and UCharNameChoice and
2827 - consider it in u_charFromName()
2828
2829 *** Break iterators
2830
2831 * Update break iterator rules to new UAX versions and new property values
2832 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
2833
2834 *** new BidiTest file
2835 - review format and data
2836 - copy BidiTest.txt to source/test/testdata
2837 - write test code using this data
2838 - fix ICU code where it fails the conformance test
2839
2840 *** Java
2841 - generally, find and update code corresponding to C/C++
2842 - UCharacter.UnicodeBlock constants:
2843   a) add an _ID integer per new block, update COUNT
2844   b) add a class instance per new block
2845      Visual Studio regex:
2846         find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
2847         replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2848 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
2849
2850 - port test changes to Java
2851
2852 *** LayoutEngine script information
2853
2854 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
2855
2856 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2857 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2858 ScriptRunData.cpp, which is no longer needed.)
2859
2860 The generated files have a current copyright date and "@draft" statement.
2861
2862 -> Eric Mader wrote in email on 20090930:
2863     "I think the tool has been modified to update @draft to @stable for
2864      older scripts and to add @draft for new scripts.
2865      (I worked with an intern on this last year.)
2866      You should check the output after you run it."
2867
2868 * copy the above files into <icu>/source/layout, replacing the old files.
2869 * fix mixed line endings
2870 * review the diffs and fix incorrect @draft and missing aliases
2871 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2872
2873 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2874 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2875
2876 -> Eric Mader wrote in email on 20090930:
2877     "This is just a matter of making sure that all the per-script tables have
2878      entries for any new scripts that were added.
2879      If any new Indic characters were added, then the class tables in
2880      IndicClassTables.cpp should be updated to reflect this.
2881      John Emmons should know how to do this if it's required."
2882
2883 * rebuild the layout and layoutex libraries.
2884
2885 *** Documentation
2886 - Update User Guide
2887   + Jamo_Short_Name, sfc->scf, binary property value aliases
2888
2889 ---------------------------------------------------------------------------- ***
2890
2891 Unicode 5.1 update
2892
2893 *** related ICU Trac tickets
2894
2895 5696 Update to Unicode 5.1
2896
2897 *** Unicode version numbers
2898 - makedata.mak
2899 - uchar.h
2900 - configure.in & configure
2901 - update ucdVersion in gennames.c if an algorithmic range changes
2902
2903 *** data files & enums & parser code
2904
2905 * file preparation
2906 - ucdstrip:
2907     DerivedCoreProperties.txt
2908     DerivedNormalizationProps.txt
2909     NormalizationTest.txt
2910     PropList.txt
2911     Scripts.txt
2912     GraphemeBreakProperty.txt
2913     SentenceBreakProperty.txt
2914     WordBreakProperty.txt
2915 - ucdstrip and ucdmerge:
2916     EastAsianWidth.txt
2917     LineBreak.txt
2918
2919 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2920 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
2921 copy 5.1.0\ucd\Blocks.txt ..\unidata\
2922 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
2923 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
2924 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2925 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2926 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2927 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2928 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
2929 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
2930 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
2931 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
2932 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
2933
2934 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2935 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2936 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2937 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
2938 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2939 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2940 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2941 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2942 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2943 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2944
2945 * genpname
2946 - run preparse.pl
2947   + cd \svn\icuproj\icu\uni51\source\tools\genpname
2948   + make sure that data.h is writable
2949   + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
2950   + preparse.pl complains with errors like the following:
2951       Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
2952     This is because ICU 3.8 had scripts from ISO 15924 which are now
2953     added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
2954     and PropertyValueAliases.txt.
2955     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2956        Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
2957   + PropertyValueAliases.txt now explicitly contains values for boolean properties:
2958       N/Y, No/Yes, F/T, False/True
2959     -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
2960        It will use further values from the file if present.
2961
2962 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
2963 - new block & script values
2964   + 17 new blocks
2965   + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
2966     (removed from SyntheticPropertyValueAliases.txt)
2967   + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
2968     (added to SyntheticPropertyValueAliases.txt)
2969 - uprops.icu (uprops.h) only provides 7 bits for script codes.
2970   In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
2971   There is none above 127 yet which is the script code for an
2972   assigned Unicode character, so ICU 4.0 uprops.icu does not store any
2973   script code values greater than 127.
2974   However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
2975   in a parallel bit field, and that overflows now.
2976   Also, future values >=128 would be incompatible anyway.
2977   uprops.h is modified to move around several of the bit fields
2978   in the properties vector words, and now uses 8 bits for the script code.
2979   Two other bit fields also grow to accommodate future growth:
2980   Block (current count: 172) grows from 8 to 9 bits,
2981   and Word_Break grows from 4 to 5 bits.
2982 - renamed property Simple_Case_Folding (sfc->scf)
2983   + nothing to be done: handled as normal alias
2984 - new property JSN Jamo_Short_Name
2985   + no new API: only contributes to the Name property
2986 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
2987 - new Joining Group (JG) value: Burushashki_Yeh_Barree
2988 - new Sentence_Break (SB) values:
2989     SB ; CR        ; CR
2990     SB ; EX        ; Extend
2991     SB ; LF        ; LF
2992     SB ; SC        ; SContinue
2993 - new Word_Break (WB) values:
2994     WB ; CR        ; CR
2995     WB ; Extend    ; Extend
2996     WB ; LF        ; LF
2997     WB ; MB        ; MidNumLet
2998
2999 * Further changes in the 2008-02-29 update:
3000 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
3001   because they should not normally be invisible.
3002 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
3003 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
3004 - new Word_Break (WB) value: NL=Newline
3005
3006 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
3007 - Unihan range end moves from 9FBB to 9FC3
3008   search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
3009   + do change gennames.c
3010
3011 * build Unicode data source code for hardcoding core data
3012 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
3013
3014 ICU data make path is \svn\icuproj\icu\uni51\source\data\
3015 ICU root path is \svn\icuproj\icu\uni51
3016 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3017 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
3018 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
3019 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
3020 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
3021 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
3022 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
3023 Creating data file for Unicode Character Properties
3024 Creating data file for Unicode Case Mapping Properties
3025 Creating data file for Unicode BiDi/Shaping Properties
3026 Creating data file for Unicode Normalization
3027 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
3028 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
3029
3030 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
3031   and rebuild the common library
3032
3033 *** Break iterators
3034
3035 * Update break iterator rules to new UAX versions and new property values
3036
3037 *** UCA
3038
3039 * update FractionalUCA.txt and UCARules.txt with new canonical closure
3040
3041 *** Test suites
3042 - Test that APIs using Unicode property value aliases (like UnicodeSet)
3043   support all of the boolean values N/Y, No/Yes, F/T, False/True
3044   -> TestBinaryValues() tests in both cintltst and intltest
3045
3046 *** LayoutEngine script information
3047 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
3048 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
3049 ScriptRunData.cpp, which is no longer needed.)
3050
3051 The generated files have a current copyright date and "@draft" statement.
3052
3053 * copy the above files into <icu>/source/layout, replacing the old files.
3054
3055 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
3056 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
3057
3058 * rebuild the layout and layoutex libraries.
3059
3060 *** Documentation
3061 - Update User Guide
3062   + Jamo_Short_Name, sfc->scf, binary property value aliases
3063
3064 ---------------------------------------------------------------------------- ***
3065
3066 Unicode 5.0 update
3067
3068 *** related Jitterbugs
3069
3070 5084 RFE: Update to Unicode 5.0
3071
3072 *** data files & enums & parser code
3073
3074 * file preparation
3075 - ucdstrip:
3076     DerivedCoreProperties.txt
3077     DerivedNormalizationProps.txt
3078     NormalizationTest.txt
3079     PropList.txt
3080     Scripts.txt
3081     GraphemeBreakProperty.txt
3082     SentenceBreakProperty.txt
3083     WordBreakProperty.txt
3084 - ucdstrip and ucdmerge:
3085     EastAsianWidth.txt
3086     LineBreak.txt
3087
3088 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
3089 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
3090 copy 5.0.0\ucd\Blocks.txt ..\unidata\
3091 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
3092 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
3093 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
3094 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
3095 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
3096 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
3097 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
3098 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
3099 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
3100 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
3101 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
3102
3103 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
3104 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
3105 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
3106 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
3107 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
3108 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
3109 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
3110 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
3111 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
3112 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
3113
3114 * update FractionalUCA.txt and UCARules.txt with new canonical closure
3115
3116 * genpname
3117 - run preparse.pl
3118   + make sure that data.h is writable
3119   + perl preparse.pl \cvs\oss\icu > out.txt
3120
3121 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
3122 - new block & script values
3123   + script values already added in ICU 3.6 because all of ISO 15924 is now covered
3124
3125 * build Unicode data source code for hardcoding core data
3126 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
3127
3128 ICU data make path is \cvs\oss\icu\source\data\
3129 ICU root path is \cvs\oss\icu
3130 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
3131 [etc.]
3132 Creating data file for Unicode Character Properties
3133 Creating data file for Unicode Case Mapping Properties
3134 Creating data file for Unicode BiDi/Shaping Properties
3135 Creating data file for Unicode Normalization
3136 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
3137 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
3138
3139 - copy the .c source files to C:\cvs\oss\icu\source\common
3140   and rebuild the common library
3141
3142 *** Unicode version numbers
3143 - makedata.mak
3144 - uchar.h
3145 - configure.in
3146
3147 *** LayoutEngine script information
3148 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
3149 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
3150 ScriptRunData.cpp, which is no longer needed.)
3151
3152 The generated files have a current copyright date and "@draft" statement.
3153
3154 * copy the above files into <icu>/source/layout, replacing the old files.
3155
3156 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
3157 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
3158
3159 * rebuild the layout and layoutex libraries.
3160
3161 ---------------------------------------------------------------------------- ***
3162
3163 Unicode 4.1 update
3164
3165 *** related Jitterbugs
3166
3167 4332 RFE: Update to Unicode 4.1
3168 4157 RBBI, TR29 4.1 updates
3169
3170 *** data files & enums & parser code
3171
3172 * file preparation
3173 - ucdstrip:
3174     DerivedCoreProperties.txt
3175     DerivedNormalizationProps.txt
3176     NormalizationTest.txt
3177     GraphemeBreakProperty.txt
3178     SentenceBreakProperty.txt
3179     WordBreakProperty.txt
3180 - ucdstrip and ucdmerge:
3181     EastAsianWidth.txt
3182     LineBreak.txt
3183
3184 * add new files to the repository
3185     GraphemeBreakProperty.txt
3186     SentenceBreakProperty.txt
3187     WordBreakProperty.txt
3188
3189 * update FractionalUCA.txt and UCARules.txt with new canonical closure
3190
3191 * genpname
3192 - handle new enumerated properties in sub read_uchar
3193 - run preparse.pl
3194
3195 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
3196 - new binary properties
3197   + Pattern_Syntax
3198   + Pattern_White_Space
3199 - new enumerated properties
3200   + Grapheme_Cluster_Break
3201   + Sentence_Break
3202   + Word_Break
3203 - new block & script & line break values
3204
3205 * gencase
3206 - case-ignorable changes
3207   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
3208   now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
3209
3210 *** Unicode version numbers
3211 - makedata.mak
3212 - uchar.h
3213 - configure.in
3214
3215 *** tests
3216 - verify that u_charMirror() round-trips
3217 - test all new properties and some new values of old properties
3218
3219 *** other code
3220
3221 * hardcoded Unihan range end/limit
3222 - Unihan range end moves from 9FA5 to 9FBB
3223   search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
3224   + do not modify BOCU/BOCSU code because that would change the encoding
3225     and break binary compatibility!
3226   + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
3227     NamePrepProfile.txt
3228   + ignore trietest.c: test data is arbitrary
3229   + ignore tstnorm.cpp: test optimization, not important
3230   + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
3231   + do change line_th.txt and word_th.txt
3232     by replacing hardcoded ranges with the new property values
3233   + do change gennames.c
3234
3235 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
3236 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
3237 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
3238
3239 * case mappings
3240 - compare new special casing context conditions with previous ones
3241   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
3242
3243 * genpname
3244 - consider storing only the short name if it is the same as the long name
3245
3246 *** other reviews
3247 - UAX #29 changes (grapheme/word/sentence breaks)
3248 - UAX #14 changes (line breaks)
3249 - Pattern_Syntax & Pattern_White_Space
3250
3251 ---------------------------------------------------------------------------- ***
3252
3253 Unicode 4.0.1 update
3254
3255 *** related Jitterbugs
3256
3257 3170 RFE: Update to Unicode 4.0.1
3258 3171 Add new Unicode 4.0.1 properties
3259 3520 use Unicode 4.0.1 updates for break iteration
3260
3261 *** data files & enums & parser code
3262
3263 * file preparation
3264 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
3265 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
3266
3267 * file fixes
3268 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
3269   according to PRI #26
3270   http://www.unicode.org/review/resolved-pri.html#pri26
3271 - undone again because no corrigendum in sight;
3272   instead modified tests to not check consistency on this for Unicode 4.0.1
3273
3274 * ucdterms.txt
3275 - update from http://www.unicode.org/copyright.html
3276   formatted for plain text
3277
3278 * uchar.h & uprops.h & uprops.c & genprops
3279 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
3280 - add U_LB_INSEPARABLE due to a spelling fix
3281   + put short name comment only on line with new constant
3282     for genpname perl script parser
3283 - new binary properties
3284   + STerm
3285   + Variation_Selector
3286
3287 * genpname
3288 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
3289 - perl script: correctly calculate the maximum number of fields per row
3290
3291 * uscript.h
3292 - new script code Hrkt=Katakana_Or_Hiragana
3293
3294 * gennorm.c track changes in DerivedNormalizationProps.txt
3295 - "FNC" -> "FC_NFKC"
3296 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
3297
3298 * genprops/props2.c track changes in DerivedNumericValues.txt
3299 - changed from 3 columns to 2, dropping the numeric type
3300   + assume that the type is always numeric for Han characters,
3301     and that only those are added in addition to what UnicodeData.txt lists
3302
3303 *** Unicode version numbers
3304 - makedata.mak
3305 - uchar.h
3306 - configure.in
3307
3308 *** tests
3309 - update test of default bidi classes according to PRI #28
3310   /tsutil/cucdtst/TestUnicodeData
3311   http://www.unicode.org/review/resolved-pri.html#pri28
3312 - bidi tests: change exemplar character for ES depending on Unicode version
3313 - change hardcoded expected property values where they change
3314
3315 *** other code
3316
3317 * name matching
3318 - read UCD.html
3319
3320 * scripts
3321 - use new Hrkt=Katakana_Or_Hiragana
3322
3323 * ZWJ & ZWNJ
3324 - are now part of combining character sequences
3325 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ