icuSources/data/unidata/changes.txt

   1 * Copyright (C) 2004-2015, International Business Machines
   2 * Corporation and others.  All Rights Reserved.
   3 *
   4 *   file name:  changes.txt
   5 *   encoding:   US-ASCII
   6 *   tab size:   8 (not used)
   7 *   indentation:4
   8 *
   9 *   created on: 2004may06
  10 *   created by: Markus W. Scherer
  11 *
  12 * change log for Unicode updates
  13
  14 ---------------------------------------------------------------------------- ***
  15
  16 * New ISO 15924 script codes
  17
  18 Starting with ICU 55, we do not add UScriptCode constants any more until their scripts
  19 are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.
  20 Script enum constant names want to follow the Unicode script property value aliases,
  21 which are assigned only when the scripts are encoded.
  22 When we encode scripts early and guess wrong, then we have confusing enum constants
  23 and have sometimes added aliases.
  24
  25 Exception: Script codes like Latf and Aran that are not subject to separate encoding
  26 can be added at any time.
  27
  28 Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html
  29
  30 Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561
  31 - Adlm  166     Adlam
  32 - Aran  161     Arabic (Nastaliq variant)
  33 - Kitl  505     Khitan large script
  34 - Kits  288     Khitan small script
  35 - Marc  332     Marchen
  36 - Osge  219     Osage
  37
  38 Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.
  39
  40 Adlam, Marchen, and Osage are expected to go into Unicode 9;
  41 we should assign Unicode script property value aliases for them
  42 soon after Unicode 8 is released, and add them in ICU 56.
  43
  44 Khitan scripts will be encoded later.
  45
  46 ---------------------------------------------------------------------------- ***
  47
  48 Unicode 8.0 update for ICU ??
  49
  50 * UCA issue from 7.0
  51
  52 - U+1DE9 COMBINING LATIN SMALL LETTER BETA
  53   sorts with Greek Beta, should sort with Latin B?
  54   + Ken says:
  55     No, it was deliberate:
  56
  57     03B2;GREEK SMALL LETTER BETA;Ll;;;;0392;;0392
  58     1D5D;MODIFIER LETTER SMALL BETA;Lm;<super> 03B2;;;;;
  59     1DE9;COMBINING LATIN SMALL LETTER BETA;Mn;<sort> 03B2;;;;;
  60     1D66;GREEK SUBSCRIPT SMALL LETTER BETA;Ll;<sub> 03B2;;;;;
  61
  62     Note the relationship to U+1D5D.
  63
  64     When the disunified *Latin* beta base letter shows up in Unicode 8.0:
  65
  66     U+A7B4 LATIN CAPITAL LETTER BETA
  67     U+A7B5 LATIN SMALL LETTER BETA
  68
  69     we could re-evaluate what U+1DE9 equates to, for collation,
  70     but currently there isn’t any Latin beta to serve that function
  71     in Unicode 7.0.
  72
  73 - ICU_ROOT=~/svn.icu/trunk
  74 - ICU_SRC_DIR=$ICU_ROOT/src
  75 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
  76 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
  77
  78
  79 ---------------------------------------------------------------------------- ***
  80
  81 Unicode 7.0 update for ICU 54
  82
  83 http://www.unicode.org/review/pri271/  -- beta review
  84 http://www.unicode.org/reports/uax-proposed-updates.html
  85 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
  86 http://www.unicode.org/reports/tr44/tr44-13.html
  87
  88 *** ICU Trac
  89
  90 - ticket 10821: Unicode 7.0, UCA 7.0
  91 - C++ branches/markus/uni70 at r35584 from trunk at r35580
  92 - Java branches/markus/uni70 at r35587 from trunk at r35545
  93
  94 *** CLDR Trac
  95
  96 - ticket 7195: UCA 7.0 CLDR root collation
  97 - branches/markus/uni70 at r10062 from trunk at r10061
  98
  99 - ticket 6762: script metadata for Unicode 7.0 new scripts
 100
 101 *** Unicode version numbers
 102 - makedata.mak
 103 - uchar.h
 104 - com.ibm.icu.util.VersionInfo
 105 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
 106
 107 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
 108   so that the makefiles see the new version number.
 109
 110 *** data files & enums & parser code
 111
 112 * file preparation
 113
 114 - download UCD & IDNA files
 115 - make sure that the Unicode data folder passed into preparseucd.py
 116   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
 117 - only for manual diffs: remove version suffixes from the file names
 118   ~/unidata/uni70/20140403$ ../../desuffixucd.py .
 119   (see https://sites.google.com/site/unicodetools/inputdata)
 120 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
 121 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
 122 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 123 - Restore TODO diffs in source/data/unidata/UCARules.txt
 124     cd $ICU_SRC_DIR
 125     meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
 126 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
 127
 128 - also: from http://unicode.org/Public/security/7.0.0/ download new
 129   confusables.txt & confusablesWholeScript.txt
 130   and copy to $ICU_ROOT/src/source/data/unidata/
 131
 132 * initial preparseucd.py changes
 133 - remove new Unicode scripts from the
 134   only-in-ISO-15924 list according to the error message:
 135     ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
 136                         'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
 137                         'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
 138     from _scripts_only_in_iso15924
 139   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
 140       and in com.ibm.icu.dev.test.lang.TestUScript.java
 141 - NamesList.txt now has a heading with a non-ASCII character
 142   + keep ppucd.txt in platform charset, rather than changing tool/test parsers
 143   + escape non-ASCII characters in heading comments
 144 - gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
 145   + get the copyright from the first file whose copyright line contains the current year
 146
 147 * PropertyValueAliases.txt changes
 148 - 32 new Block (blk) values:
 149     blk; Bassa_Vah                        ; Bassa_Vah
 150     blk; Caucasian_Albanian               ; Caucasian_Albanian
 151     blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
 152     blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
 153     blk; Duployan                         ; Duployan
 154     blk; Elbasan                          ; Elbasan
 155     blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
 156     blk; Grantha                          ; Grantha
 157     blk; Khojki                           ; Khojki
 158     blk; Khudawadi                        ; Khudawadi
 159     blk; Latin_Ext_E                      ; Latin_Extended_E
 160     blk; Linear_A                         ; Linear_A
 161     blk; Mahajani                         ; Mahajani
 162     blk; Manichaean                       ; Manichaean
 163     blk; Mende_Kikakui                    ; Mende_Kikakui
 164     blk; Modi                             ; Modi
 165     blk; Mro                              ; Mro
 166     blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
 167     blk; Nabataean                        ; Nabataean
 168     blk; Old_North_Arabian                ; Old_North_Arabian
 169     blk; Old_Permic                       ; Old_Permic
 170     blk; Ornamental_Dingbats              ; Ornamental_Dingbats
 171     blk; Pahawh_Hmong                     ; Pahawh_Hmong
 172     blk; Palmyrene                        ; Palmyrene
 173     blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
 174     blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
 175     blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
 176     blk; Siddham                          ; Siddham
 177     blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
 178     blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
 179     blk; Tirhuta                          ; Tirhuta
 180     blk; Warang_Citi                      ; Warang_Citi
 181   -> add to uchar.h
 182     use long property names for enum constants
 183   -> add to UCharacter.UnicodeBlock IDs
 184     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
 185             replace  public static final int \1_ID = \2; \3
 186   -> add to UCharacter.UnicodeBlock objects
 187     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
 188             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 189 - 28 new Joining_Group (jg) values:
 190     jg ; Manichaean_Aleph                 ; Manichaean_Aleph
 191     jg ; Manichaean_Ayin                  ; Manichaean_Ayin
 192     jg ; Manichaean_Beth                  ; Manichaean_Beth
 193     jg ; Manichaean_Daleth                ; Manichaean_Daleth
 194     jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
 195     jg ; Manichaean_Five                  ; Manichaean_Five
 196     jg ; Manichaean_Gimel                 ; Manichaean_Gimel
 197     jg ; Manichaean_Heth                  ; Manichaean_Heth
 198     jg ; Manichaean_Hundred               ; Manichaean_Hundred
 199     jg ; Manichaean_Kaph                  ; Manichaean_Kaph
 200     jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
 201     jg ; Manichaean_Mem                   ; Manichaean_Mem
 202     jg ; Manichaean_Nun                   ; Manichaean_Nun
 203     jg ; Manichaean_One                   ; Manichaean_One
 204     jg ; Manichaean_Pe                    ; Manichaean_Pe
 205     jg ; Manichaean_Qoph                  ; Manichaean_Qoph
 206     jg ; Manichaean_Resh                  ; Manichaean_Resh
 207     jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
 208     jg ; Manichaean_Samekh                ; Manichaean_Samekh
 209     jg ; Manichaean_Taw                   ; Manichaean_Taw
 210     jg ; Manichaean_Ten                   ; Manichaean_Ten
 211     jg ; Manichaean_Teth                  ; Manichaean_Teth
 212     jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
 213     jg ; Manichaean_Twenty                ; Manichaean_Twenty
 214     jg ; Manichaean_Waw                   ; Manichaean_Waw
 215     jg ; Manichaean_Yodh                  ; Manichaean_Yodh
 216     jg ; Manichaean_Zayin                 ; Manichaean_Zayin
 217     jg ; Straight_Waw                     ; Straight_Waw
 218   -> uchar.h & UCharacter.JoiningGroup
 219 - 23 new Script (sc) values:
 220     sc ; Aghb                             ; Caucasian_Albanian
 221     sc ; Bass                             ; Bassa_Vah
 222     sc ; Dupl                             ; Duployan
 223     sc ; Elba                             ; Elbasan
 224     sc ; Gran                             ; Grantha
 225     sc ; Hmng                             ; Pahawh_Hmong
 226     sc ; Khoj                             ; Khojki
 227     sc ; Lina                             ; Linear_A
 228     sc ; Mahj                             ; Mahajani
 229     sc ; Mani                             ; Manichaean
 230     sc ; Mend                             ; Mende_Kikakui
 231     sc ; Modi                             ; Modi
 232     sc ; Mroo                             ; Mro
 233     sc ; Narb                             ; Old_North_Arabian
 234     sc ; Nbat                             ; Nabataean
 235     sc ; Palm                             ; Palmyrene
 236     sc ; Pauc                             ; Pau_Cin_Hau
 237     sc ; Perm                             ; Old_Permic
 238     sc ; Phlp                             ; Psalter_Pahlavi
 239     sc ; Sidd                             ; Siddham
 240     sc ; Sind                             ; Khudawadi
 241     sc ; Tirh                             ; Tirhuta
 242     sc ; Wara                             ; Warang_Citi
 243   -> uscript.h (many were added before)
 244     comment "Mende Kikakui" for USCRIPT_MENDE
 245     add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
 246   -> com.ibm.icu.lang.UScript
 247     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 248     replace  public static final int \1 = \2; \3
 249 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
 250   (added 2012-11-01)
 251     Ahom        338     Ahom
 252     Hatr        127     Hatran
 253     Mult        323     Multani
 254   (added 2013-10-12)
 255     Modi        324     Modi
 256     Pauc        263     Pau Cin Hau
 257     Sidd        302     Siddham
 258   -> uscript.h (some overlap with additions from Unicode)
 259   -> com.ibm.icu.lang.UScript
 260     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 261     replace  public static final int \1 = \2; \3
 262   -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
 263   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
 264       and in com.ibm.icu.dev.test.lang.TestUScript.java
 265
 266 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
 267     (not strictly necessary for NOT_ENCODED scripts)
 268   ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
 269
 270 * generate normalization data files
 271 - cd $ICU_ROOT/dbg
 272 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
 273 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
 274 - UNIDATA=$ICU_SRC_DIR/source/data/unidata
 275 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
 276 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 277 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 278 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 279 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 280
 281 * build ICU (make install)
 282   so that the tools build can pick up the new definitions from the installed header files.
 283
 284 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
 285
 286 * build Unicode tools using CMake+make
 287
 288 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
 289
 290 # Location (--prefix) of where ICU was installed.
 291 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
 292 # Location of the ICU source tree.
 293 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
 294
 295 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
 296 ~/svn.icutools/trunk/dbg/unicode/c$ make
 297
 298 * genprops work
 299 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean
 300   + add second array of Joining_Group values for at most 10800..10FFF
 301     icutools: unicode/c/genprops/bidipropsbuilder.cpp
 302     icu: source/common/ubidi_props.h/.c/_data.h
 303     icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
 304
 305 * generate core properties data files
 306 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
 307 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
 308 - rebuild ICU (make install) & tools
 309 - run genuca again (see step above) so that it picks up the new nfc.nrm
 310 - rebuild ICU (make install) & tools
 311
 312 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 313   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 314 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 315 - Unicode 6.0..7.0: U+2260, U+226E, U+226F
 316 - nothing new in 7.0, no test file to update
 317
 318 * run & fix ICU4C tests
 319
 320 * update Java data files
 321 - refresh just the UCD-related files, just to be safe
 322 - see (ICU4C)/source/data/icu4j-readme.txt
 323 - mkdir /tmp/icu4j
 324 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 325   output:
 326     ...
 327     Unicode .icu files built to ./out/build/icudt53l
 328     echo timestamp > uni-core-data
 329     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
 330     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
 331     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
 332     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
 333     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
 334     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
 335     mkdir -p /tmp/icu4j/main/shared/data
 336     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 337     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
 338     mkdir -p /tmp/icu4j/main/shared/data
 339     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 340     make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
 341 - copy the big-endian Unicode data files to another location,
 342   separate from the other data files
 343     ICUDT=icudt54b
 344     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 345     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 346     cd ~/svn.icu/uni70/dbg/data/out/icu4j
 347     cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 348     cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 349     rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
 350     cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
 351     cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 352     cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
 353 - refresh ICU4J
 354     ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
 355
 356 * update CollationFCD.java
 357   + copy & paste the initializers of lcccIndex[] etc. from
 358     ICU4C/source/i18n/collationfcd.cpp to
 359     ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
 360
 361 * refresh Java test .txt files
 362 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 363     cd $ICU_SRC_DIR/source/data/unidata
 364     cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 365     cd ../../test/testdata
 366     cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 367     cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
 368
 369 * UCA
 370
 371 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
 372 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
 373 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
 374 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
 375 - output files are in ~/svn.unitools/Generated/uca/7.0.0/
 376 - review data; compare files, use blankweights.sed or similar
 377   ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
 378 - cd ~/svn.unitools/Generated/uca/7.0.0/
 379 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 380   cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
 381 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 382     (note removing the underscore before "Rules")
 383     cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
 384 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
 385   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 386   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
 387     cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
 388     cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
 389     cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
 390 - run genuca, see command line above
 391 - rebuild ICU4C
 392 - refresh ICU4J collation data:
 393   (subset of instructions above for properties data refresh, except copies all coll/*)
 394     ICUDT=icudt54b
 395     ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 396     ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 397     ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
 398     ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
 399 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
 400 - note on intltest: if collate/UCAConformanceTest fails, then
 401   utility/MultithreadTest/TestCollators will fail as well;
 402   fix the conformance test before looking into the multi-thread test
 403 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
 404 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
 405   ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
 406
 407 * When refreshing all of ICU4J data from ICU4C
 408 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 409 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
 410 or
 411 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
 412
 413 * run & fix ICU4J tests
 414
 415 *** LayoutEngine script information
 416
 417 (For details see the Unicode 5.2 change log below.)
 418
 419 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
 420   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
 421   in the working directory.
 422   (It also generates ScriptRunData.cpp, which is no longer needed.)
 423
 424   The generated files have a current copyright date and "@stable" statement.
 425   ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
 426   for "born stable" Unicode API constants, and to stop parsing ICU version numbers
 427   which may not contain dots any more.
 428
 429 - diff current <icu>/source/layout files vs. generated ones
 430     ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
 431   review and manually merge desired changes;
 432   fix gratuitous changes, incorrect @draft/@stable and missing aliases;
 433   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
 434 - if you just copy the above files, then
 435   fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
 436   manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
 437
 438 *** API additions
 439 - send notice to icu-design about new born-@stable API (enum constants etc.)
 440
 441 *** merge the Unicode update branches back onto the trunk
 442 - do not merge the icudata.jar and testdata.jar,
 443   instead rebuild them from merged & tested ICU4C
 444
 445 ---------------------------------------------------------------------------- ***
 446
 447 Unicode 6.3 update
 448
 449 http://www.unicode.org/review/pri249/  -- beta review
 450 http://www.unicode.org/reports/uax-proposed-updates.html
 451 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
 452 http://www.unicode.org/reports/tr44/tr44-11.html
 453
 454 *** ICU Trac
 455
 456 - ticket 10128: update ICU to Unicode 6.3 beta
 457 - ticket 10168: update ICU to Unicode 6.3 final
 458 - C++ branches/markus/uni63 at r33552 from trunk at r33551
 459 - Java branches/markus/uni63 at r33550 from trunk at r33553
 460
 461 - ticket 10142: implement Unicode 6.3 bidi algorithm additions
 462
 463 *** Unicode version numbers
 464 - makedata.mak
 465 - uchar.h
 466   (configure.in & configure: have been modified to extract the version from uchar.h)
 467 - com.ibm.icu.util.VersionInfo
 468 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
 469
 470 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
 471   so that the makefiles see the new version number.
 472
 473 *** data files & enums & parser code
 474
 475 * file preparation
 476
 477 - download UCD, UCA & IDNA files
 478 - make sure that the Unicode data folder passed into preparseucd.py
 479   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
 480 - modify preparseucd.py:
 481   parse new file BidiBrackets.txt
 482   with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
 483 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
 484 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 485 - Check test file diffs for previously commented-out, known-failing data lines;
 486   probably need to keep those commented out.
 487
 488 * PropertyAliases.txt changes
 489 - 1 new Enumerated Property
 490   bpt                      ; Bidi_Paired_Bracket_Type
 491   -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
 492   -> ubidi_props.h & .c & UBiDiProps.java
 493   -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
 494   -> uprops.cpp
 495   -> change ubidi.icu format version from 2.0 to 2.1
 496 - 1 new Miscellaneous Property
 497   bpb                      ; Bidi_Paired_Bracket
 498   -> uchar.h & UProperty.java
 499   -> ppucd.h & .cpp
 500
 501 * PropertyValueAliases.txt changes
 502 - 3 Bidi_Paired_Bracket_Type (bpt) values:
 503   bpt; c                                ; Close
 504   bpt; n                                ; None
 505   bpt; o                                ; Open
 506   -> uchar.h & UCharacter.BidiPairedBracketType
 507   -> ubidi_props.h & .c & UBiDiProps.java
 508   -> change ubidi.icu format version from 2.0 to 2.1
 509 - 4 new Bidi_Class (bc) values:
 510   bc ; FSI                              ; First_Strong_Isolate
 511   bc ; LRI                              ; Left_To_Right_Isolate
 512   bc ; RLI                              ; Right_To_Left_Isolate
 513   bc ; PDI                              ; Pop_Directional_Isolate
 514   -> uchar.h & UCharacterEnums.ECharacterDirection
 515   -> until the bidi code gets updated,
 516      Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
 517 - 3 new Word_Break (WB) values:
 518   WB ; HL                               ; Hebrew_Letter
 519   WB ; SQ                               ; Single_Quote
 520   WB ; DQ                               ; Double_Quote
 521   -> uchar.h & UCharacter.WordBreak
 522   -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
 523 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
 524   (added 2012-10-16)
 525   Aghb  239     Caucasian Albanian
 526   Mahj  314     Mahajani
 527   -> uscript.h
 528   -> com.ibm.icu.lang.UScript
 529     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 530     replace  public static final int \1 = \2;\3
 531   -> preparseucd.py _scripts_only_in_iso15924
 532   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
 533       and in com.ibm.icu.dev.test.lang.TestUScript.java
 534   -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
 535      (not strictly necessary for NOT_ENCODED scripts)
 536
 537 * generate normalization data files
 538 - ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
 539 - ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
 540 - ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
 541 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 542 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 543 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 544 - ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 545
 546 * build ICU (make install)
 547   so that the tools build can pick up the new definitions from the installed header files.
 548
 549 ~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
 550
 551 * build Unicode tools using CMake+make
 552
 553 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
 554
 555 # Location (--prefix) of where ICU was installed.
 556 set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
 557 # Location of the ICU source tree.
 558 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
 559
 560 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
 561 ~/svn.icutools/trunk/dbg/unicode/c$ make
 562
 563 * generate core properties data files
 564 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
 565 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
 566 - rebuild ICU (make install) & tools
 567 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
 568 - rebuild ICU (make install) & tools
 569
 570 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 571   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 572 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 573 - Unicode 6.0..6.3: U+2260, U+226E, U+226F
 574 - nothing new in 6.3, no test file to update
 575
 576 * update Java data files
 577 - refresh just the UCD-related files, just to be safe
 578 - see (ICU4C)/source/data/icu4j-readme.txt
 579 - mkdir /tmp/icu4j
 580 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 581   output:
 582     ...
 583     Unicode .icu files built to ./out/build/icudt52l
 584     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
 585     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
 586     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
 587     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
 588     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
 589     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
 590     mkdir -p /tmp/icu4j/main/shared/data
 591     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 592     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
 593     mkdir -p /tmp/icu4j/main/shared/data
 594     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 595     make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
 596 - copy the big-endian Unicode data files to another location,
 597   separate from the other data files
 598     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
 599     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
 600     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
 601     ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
 602     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
 603     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
 604     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
 605 - refresh ICU4J
 606     ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
 607
 608 * refresh Java test .txt files
 609 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 610
 611 * UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
 612
 613 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
 614 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
 615 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 616 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 617   (note removing the underscore before "Rules")
 618 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
 619   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 620   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
 621 - check test file diffs for previously commented-out, known-failing data lines;
 622   probably need to keep those commented out
 623 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
 624 - run genuca, see command line above
 625 - rebuild ICU4C
 626 - refresh ICU4J collation data:
 627   (subset of instructions above for properties data refresh, except copies all coll/*)
 628     ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 629     ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
 630     ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
 631     ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
 632 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
 633 - note on intltest: if collate/UCAConformanceTest fails, then
 634   utility/MultithreadTest/TestCollators will fail as well;
 635   fix the conformance test before looking into the multi-thread test
 636
 637 * test ICU, fix test code where necessary
 638
 639 * When refreshing all of ICU4J data from ICU4C
 640 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 641 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
 642 or
 643 - ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
 644
 645 *** LayoutEngine script information
 646 - skipped for Unicode 6.3: no new scripts
 647
 648 *** merge the Unicode update branches back onto the trunk
 649 - do not merge the icudata.jar and testdata.jar,
 650   instead rebuild them from merged & tested ICU4C
 651
 652 ---------------------------------------------------------------------------- ***
 653
 654 Unicode 6.2 update
 655
 656 http://www.unicode.org/review/pri230/
 657 http://www.unicode.org/versions/beta-6.2.0.html
 658 http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
 659 http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
 660 http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
 661 http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
 662 http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
 663 http://unicode.org/Public/idna/6.2.0/
 664
 665 *** ICU Trac
 666
 667 - ticket 9515: Unicode 6.2: final ICU update
 668
 669 - ticket 9514: UCA 6.2: fix UCARules.txt
 670
 671 - ticket 9437: update ICU to Unicode 6.2
 672 - C++ branches/markus/uni62 at r32050 from trunk at r32041
 673 - Java branches/markus/uni62 at r32068 from trunk at r32066
 674
 675 *** Unicode version numbers
 676 - makedata.mak
 677 - uchar.h
 678   (configure.in & configure: have been modified to extract the version from uchar.h)
 679 - com.ibm.icu.util.VersionInfo
 680 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
 681
 682 *** data files & enums & parser code
 683
 684 * file preparation
 685
 686 - download UCD, UCA & IDNA files
 687 - make sure that the Unicode data folder passed into preparseucd.py
 688   includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
 689 - modify preparseucd.py: NamesList.txt is now in UTF-8
 690 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
 691 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 692 - Check test file diffs for previously commented-out, known-failing data lines;
 693   probably need to keep those commented out.
 694
 695 * PropertyValueAliases.txt changes
 696 - 1 new Line_Break (lb) value:
 697   lb ; RI                               ; Regional_Indicator
 698   -> uchar.h & UCharacter.LineBreak
 699 - 1 new Word_Break (WB) value:
 700   WB ; RI                               ; Regional_Indicator
 701   -> uchar.h & UCharacter.WordBreak
 702 - 1 new Grapheme_Cluster_Break (GCB) value:
 703   GCB; RI                               ; Regional_Indicator
 704   -> uchar.h & UCharacter.GraphemeClusterBreak
 705
 706 * 3 new numeric values
 707   The new value -1, which was really supposed to be NaN but that would have required
 708   new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
 709   but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
 710     cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
 711     cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
 712   The two new values 216000 and 432000 require an addition to the encoding of numeric values.
 713     cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
 714     cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
 715   -> uprops.h, uchar.c & UCharacterProperty.java
 716   -> cucdtst.c & UCharacterTest.java
 717
 718 * generate normalization data files
 719 - ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
 720 - ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
 721 - ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
 722 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 723 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 724 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 725 - ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 726
 727 * build ICU (make install)
 728   so that the tools build can pick up the new definitions from the installed header files.
 729 * build Unicode tools using CMake+make
 730
 731 * generate core properties data files
 732 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
 733 - in initial bootstrapping, change the UCA version
 734   in source/data/unidata/FractionalUCA.txt to match the new Unicode version
 735 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
 736 - rebuild ICU (make install) & tools
 737   + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
 738     check if the UCA version in FractionalUCA.txt matches the new Unicode version
 739     (see step above)
 740 - run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
 741 - rebuild ICU (make install) & tools
 742
 743 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 744   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 745 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 746 - Unicode 6.0..6.2: U+2260, U+226E, U+226F
 747 - nothing new in 6.2, no test file to update
 748
 749 * update Java data files
 750 - refresh just the UCD-related files, just to be safe
 751 - see (ICU4C)/source/data/icu4j-readme.txt
 752 - mkdir /tmp/icu4j
 753 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 754   output:
 755     ...
 756     Unicode .icu files built to ./out/build/icudt50l
 757     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
 758     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
 759     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
 760     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
 761     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
 762     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
 763     mkdir -p /tmp/icu4j/main/shared/data
 764     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 765     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
 766     mkdir -p /tmp/icu4j/main/shared/data
 767     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
 768     make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
 769 - copy the big-endian Unicode data files to another location,
 770   separate from the other data files
 771     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
 772     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
 773     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
 774     ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
 775     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
 776     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
 777     ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
 778 - refresh ICU4J
 779     ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
 780
 781 * refresh Java test .txt files
 782 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 783
 784 * UCA
 785
 786 - get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
 787 - CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
 788 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 789 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 790   (note removing the underscore before "Rules")
 791 - update (ICU4C)/source/test/testdata/CollationTest_*.txt
 792   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 793   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
 794 - check test file diffs for previously commented-out, known-failing data lines;
 795   probably need to keep those commented out
 796 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
 797 - run genuca, see command line above
 798 - rebuild ICU4C
 799 - refresh ICU4J collation data:
 800   (subset of instructions above for properties data refresh, except copies all coll/*)
 801     ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 802     ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
 803     ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
 804     ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
 805 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
 806 - note on intltest: if collate/UCAConformanceTest fails, then
 807   utility/MultithreadTest/TestCollators will fail as well;
 808   fix the conformance test before looking into the multi-thread test
 809
 810 * test ICU, fix test code where necessary
 811
 812 * When refreshing all of ICU4J data from ICU4C
 813 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 814 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
 815 or
 816 - ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
 817
 818 *** LayoutEngine script information
 819 - skipped for Unicode 6.2: no new scripts
 820
 821 *** merge the Unicode update branches back onto the trunk
 822 - do not merge the icudata.jar and testdata.jar,
 823   instead rebuild them from merged & tested ICU4C
 824
 825 ---------------------------------------------------------------------------- ***
 826
 827 Future Unicode update
 828
 829 Tools simplified since the Unicode 6.1 update. See
 830 - http://site.icu-project.org/design/props/ppucd
 831 - http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
 832
 833 * Unicode version numbers
 834 - icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
 835
 836 * file preparation
 837 - ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
 838 - ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
 839 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
 840 - Check test file diffs for previously commented-out, known-failing data lines;
 841   probably need to keep those commented out.
 842
 843 * PropertyValueAliases.txt changes
 844 - Script codes that are in ISO 15924 but not in Unicode are now listed in
 845   preparseucd.py, in the _scripts_only_in_iso15924 variable.
 846   If there are new ISO codes, then add them.
 847   If Unicode adds some of them, then remove them from the .py variable.
 848
 849 * UnicodeData.txt changes
 850 - No more manual changes for CJK ranges for algorithmic names;
 851   those are now written to ppucd.txt and genprops reads them from there.
 852
 853 * generate core properties data files (makeprops.sh was deleted)
 854 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
 855
 856 * no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
 857 - it is now generated by preparseucd.py
 858
 859 * no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
 860 - it is now generated by preparseucd.py
 861 - make sure that the Unicode data folder passed into preparseucd.py
 862   includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
 863   (can be in some subfolder)
 864
 865 * generate normalization data files
 866 - ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
 867 - ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
 868 - ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
 869 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
 870 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
 871 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
 872 - ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
 873
 874 * build ICU (make install)
 875 * build Unicode tools using CMake+make
 876
 877 * new way to call genuca (makeuca.sh was deleted)
 878 - ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
 879
 880 ---------------------------------------------------------------------------- ***
 881
 882 Unicode 6.1 update
 883
 884 *** ICU Trac
 885
 886 - ticket 8995 final update to Unicode 6.1
 887 - ticket 8994 regenerate source/layout/CanonData.cpp
 888
 889 - ticket 8961 support Unicode "Age" value *names*
 890 - ticket 8963 support multiple character name aliases & types
 891
 892 - ticket 8827 "update ICU to Unicode 6.1"
 893 - C++ branches/markus/uni61 at r30864 from trunk at r30843
 894 - Java branches/markus/uni61 at r30865 from trunk at r30863
 895
 896 *** Unicode version numbers
 897 - makedata.mak
 898 - uchar.h
 899   (configure.in & configure: have been modified to extract the version from uchar.h)
 900 - com.ibm.icu.util.VersionInfo
 901 - icutools/unicode/makedefs.sh
 902   + also review & update other definitions in that file,
 903     e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
 904
 905 *** data files & enums & parser code
 906
 907 * file preparation
 908
 909 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
 910 - This prepares both unidata and testdata files in respective output subfolders.
 911 - Check test file diffs for previously commented-out, known-failing data lines;
 912   probably need to keep those commented out.
 913
 914 * PropertyValueAliases.txt changes
 915 - 11 new block names:
 916   Arabic_Extended_A
 917   Arabic_Mathematical_Alphabetic_Symbols
 918   Chakma
 919   Meetei_Mayek_Extensions
 920   Meroitic_Cursive
 921   Meroitic_Hieroglyphs
 922   Miao
 923   Sharada
 924   Sora_Sompeng
 925   Sundanese_Supplement
 926   Takri
 927   -> add to uchar.h
 928   -> add to UCharacter.UnicodeBlock IDs
 929     Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
 930             replace  public static final int \1_ID = \2; \3
 931   -> add to UCharacter.UnicodeBlock objects
 932     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
 933             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 934 - 1 new Joining_Group (jg) value:
 935   Rohingya_Yeh
 936   -> uchar.h & UCharacter.JoiningGroup
 937 - 2 new Line_Break (lb) values:
 938   CJ=Conditional_Japanese_Starter
 939   HL=Hebrew_Letter
 940   -> uchar.h & UCharacter.LineBreak
 941 - 7 new scripts:
 942   sc ; Cakm      ; Chakma
 943   sc ; Merc      ; Meroitic_Cursive
 944   sc ; Mero      ; Meroitic_Hieroglyphs
 945   sc ; Plrd      ; Miao
 946   sc ; Shrd      ; Sharada
 947   sc ; Sora      ; Sora_Sompeng
 948   sc ; Takr      ; Takri
 949   -> remove these from SyntheticPropertyValueAliases.txt
 950   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
 951       and in com.ibm.icu.dev.test.lang.TestUScript.java
 952 - 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
 953   (added 2011-06-21)
 954   Khoj        322     Khojki
 955   Tirh        326     Tirhuta
 956     and another one added 2011-12-09
 957   Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
 958   -> uscript.h
 959   -> com.ibm.icu.lang.UScript
 960     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
 961     replace  public static final int \1 = \2;\3
 962   -> SyntheticPropertyValueAliases.txt
 963   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
 964       and in com.ibm.icu.dev.test.lang.TestUScript.java
 965
 966 * UnicodeData.txt changes
 967 - the last Unihan code point changes from U+9FCB to U+9FCC
 968   search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
 969   + do change gennames.c
 970   + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
 971
 972 * DerivedBidiClass.txt changes
 973 - 2 new default-AL blocks:
 974 #     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
 975 #     Arabic Mathematical Alphabetic Symbols:
 976 #                       U+1EE00  - U+1EEFF  (was default-R)
 977 - 2 new default-R blocks:
 978 #     Meroitic Hieroglyphs:
 979 #                        U+10980 - U+1099F
 980 #     Meroitic Cursive:  U+109A0 - U+109FF
 981   -> should be picked up by the explicit data in the file
 982
 983 * NameAliases.txt changes
 984 - from
 985     # Each line has two fields
 986     # First field: Code point
 987     # Second field: Alias
 988 - to
 989     # Each line has three fields, as described here:
 990     #
 991     # First field:  Code point
 992     # Second field: Alias
 993     # Third field:  Type
 994 - Also, the file previously allowed multiple aliases but only now does it
 995   actually provide multiple, even multiple of the same type. For example,
 996     FEFF;BYTE ORDER MARK;alternate
 997     FEFF;BOM;abbreviation
 998     FEFF;ZWNBSP;abbreviation
 999 - This breaks our gennames parser, unames.icu data structure, and API.
1000   Fix gennames to only pick up "correction" aliases.
1001   New ticket #8963 for further changes.
1002
1003 * run genpname/preparse.pl (on Linux)
1004   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1005   + make sure that data.h is writable
1006   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1007   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1008
1009 * build ICU (make install)
1010   so that the tools build can pick up the new definitions from the installed header files.
1011 * build Unicode tools (at least genpname) using CMake+make
1012
1013 * run genpname
1014   (builds both pnames.icu and propname_data.h)
1015 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1016 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1017
1018 * build ICU (make install)
1019 * build Unicode tools using CMake+make
1020
1021 * update source/data/unidata/norm2/nfkc_cf.txt
1022 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1023
1024 * update source/data/unidata/norm2/uts46.txt
1025 - download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1026   to ~/svn.icu/tools/trunk/src/unicode/py
1027 - adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
1028 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1029 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1030
1031 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1032   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1033 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1034 - Unicode 6.0..6.1: U+2260, U+226E, U+226F
1035 - nothing new in 6.1, no test file to update
1036
1037 * generate core properties data files
1038 - in initial bootstrapping, change the UCA version
1039   in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1040 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1041 - rebuild ICU & tools
1042   + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1043     check if the UCA version in FractionalUCA.txt matches the new Unicode version
1044     (see step above)
1045 - run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
1046   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1047 - rebuild ICU & tools
1048
1049 * update Java data files
1050 - refresh just the UCD-related files, just to be safe
1051 - see (ICU4C)/source/data/icu4j-readme.txt
1052 - mkdir /tmp/icu4j
1053 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1054   output:
1055     ...
1056     Unicode .icu files built to ./out/build/icudt49l
1057     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1058     mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
1059     echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1060     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1061     mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
1062     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
1063     mkdir -p /tmp/icu4j/main/shared/data
1064     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1065     jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
1066     mkdir -p /tmp/icu4j/main/shared/data
1067     cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1068     make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
1069 - copy the big-endian Unicode data files to another location,
1070   separate from the other data files
1071     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1072     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1073     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1074     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
1075     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1076     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1077     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1078 - refresh ICU4J
1079     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1080
1081 * refresh Java test .txt files
1082 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1083
1084 * test ICU so far, fix test code where necessary
1085 - temporarily ignore collation issues that look like UCA/UCD mismatches,
1086   until UCA data is updated
1087
1088 * UCA
1089
1090 - get output from Mark's tools; look in
1091     http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
1092 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1093 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1094   (note removing the underscore before "Rules")
1095 - update (ICU)/source/test/testdata/CollationTest_*.txt
1096   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1097   with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1098 - check test file diffs for previously commented-out, known-failing data lines;
1099   probably need to keep those commented out
1100 - check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1101 - run makeuca.sh:
1102   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1103 - rebuild ICU4C
1104 - refresh ICU4J collation data:
1105   (subset of instructions above for properties data refresh, except copies all coll/*)
1106     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1107     ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1108     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1109     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1110 - run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1111 - note on intltest: if collate/UCAConformanceTest fails, then
1112   utility/MultithreadTest/TestCollators will fail as well;
1113   fix the conformance test before looking into the multi-thread test
1114
1115 * When refreshing all of ICU4J data from ICU4C
1116 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1117 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1118 or
1119 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1120
1121 *** LayoutEngine script information
1122
1123 (For details see the Unicode 5.2 change log below.)
1124
1125 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1126   This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1127   in the working directory.
1128   (It also generates ScriptRunData.cpp, which is no longer needed.)
1129
1130   The generated files have a current copyright date and "@draft" statement.
1131
1132 - diff current <icu>/source/layout files vs. generated ones
1133     ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1134   review and manually merge desired changes;
1135   fix gratuitous changes, incorrect @draft and missing aliases;
1136   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1137 - if you just copy the above files, then
1138   fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1139   manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1140
1141 *** merge the Unicode update branches back onto the trunk
1142 - do not merge the icudata.jar and testdata.jar,
1143   instead rebuild them from merged & tested ICU4C
1144
1145 ---------------------------------------------------------------------------- ***
1146
1147 ICU 4.8 (no Unicode update, just new script codes)
1148
1149 * 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1150   (added 2010-12-21)
1151     Afak    439     Afaka
1152     Jurc    510     Jurchen
1153     Mroo    199     Mro, Mru
1154     Nshu    499     Nüshu
1155     Shrd    319     Sharada, Śāradā
1156     Sora    398     Sora Sompeng
1157     Takr    321     Takri, Ṭākrī, Ṭāṅkrī
1158     Tang    520     Tangut
1159     Wole    480     Woleai
1160   -> uscript.h
1161   -> com.ibm.icu.lang.UScript
1162     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1163     replace  public static final int \1 = \2;\3
1164   -> genpname/SyntheticPropertyValueAliases.txt
1165   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1166       and in com.ibm.icu.dev.test.lang.TestUScript.java
1167
1168 * run genpname/preparse.pl (on Linux)
1169   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1170   + make sure that data.h is writable
1171   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1172   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1173
1174 * rebuild Unicode tools (at least genpname) using make
1175 - You might first need to "make install" ICU so that the tools build can pick
1176   up the new definitions from the installed header files.
1177
1178 * run genpname
1179   (builds both pnames.icu and propname_data.h)
1180 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1181 - ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1182 - rebuild ICU & tools
1183
1184 * run genprops
1185 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1186 - ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1187 - rebuild ICU & tools
1188
1189 * update Java data files
1190 - refresh just the UCD-related files, just to be safe
1191 - see (ICU4C)/source/data/icu4j-readme.txt
1192 - mkdir /tmp/icu4j
1193 - ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1194 - copy the big-endian Unicode data files to another location,
1195   separate from the other data files
1196     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1197     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1198     ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1199 - refresh ICU4J
1200     ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
1201
1202 * should have updated the layout engine script codes but forgot
1203
1204 ---------------------------------------------------------------------------- ***
1205
1206 Unicode 6.0 update
1207
1208 *** related ICU Trac tickets
1209
1210 7264 Unicode 6.0 Update
1211
1212 *** Unicode version numbers
1213 - makedata.mak
1214 - uchar.h
1215   (configure.in & configure: have been modified to extract the version from uchar.h)
1216 - com.ibm.icu.util.VersionInfo
1217
1218 *** data files & enums & parser code
1219
1220 * file preparation
1221
1222 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
1223 - This now prepares both unidata and testdata files in respective output subfolders.
1224
1225 * PropertyAliases.txt changes
1226 - new Script_Extensions property defined in the new ScriptExtensions.txt file
1227   but not listed in PropertyAliases.txt; reported to unicode.org;
1228   -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
1229     scx; Script_Extensions
1230   -> uchar.h with new UProperty section
1231   -> com.ibm.icu.lang.UProperty, parallel with uchar.h
1232
1233 * PropertyValueAliases.txt changes
1234 - 12 new block names:
1235   Alchemical_Symbols
1236   Bamum_Supplement
1237   Batak
1238   Brahmi
1239   CJK_Unified_Ideographs_Extension_D
1240   Emoticons
1241   Ethiopic_Extended_A
1242   Kana_Supplement
1243   Mandaic
1244   Miscellaneous_Symbols_And_Pictographs
1245   Playing_Cards
1246   Transport_And_Map_Symbols
1247   -> add to uchar.h
1248   -> add to UCharacter.UnicodeBlock
1249     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1250             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1251 - Joining_Group (jg) values:
1252   Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
1253   -> uchar.h & UCharacter.JoiningGroup
1254 - 3 new scripts:
1255   sc ; Batk      ; Batak
1256   sc ; Brah      ; Brahmi
1257   sc ; Mand      ; Mandaic
1258   -> remove these from SyntheticPropertyValueAliases.txt
1259   -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
1260   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1261       and in com.ibm.icu.dev.test.lang.TestUScript.java
1262 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1263   (added 2009-11-11..2010-07-18)
1264   Bass        259     Bassa Vah
1265   Dupl        755     Duployan shortand
1266   Elba        226     Elbasan
1267   Gran        343     Grantha
1268   Kpel        436     Kpelle
1269   Loma        437     Loma
1270   Mend        438     Mende
1271   Merc        101     Meroitic Cursive
1272   Narb        106     Old North Arabian
1273   Nbat        159     Nabataean
1274   Palm        126     Palmyrene
1275   Sind        318     Sindhi
1276   Wara        262     Warang Citi
1277   -> uscript.h
1278   -> com.ibm.icu.lang.UScript
1279     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1280     replace  public static final int \1 = \2;\3
1281   -> SyntheticPropertyValueAliases.txt
1282   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1283       and in com.ibm.icu.dev.test.lang.TestUScript.java
1284 - ISO 15924 name change
1285   Mero        100     Meroitic Hieroglyphs (was Meroitic)
1286   -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
1287 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
1288
1289 * UnicodeData.txt changes
1290 - new CJK block:
1291   2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
1292   2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
1293   -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
1294
1295 * build Unicode tools using CMake+make
1296
1297 * run genpname/preparse.pl (on Linux)
1298   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1299   + make sure that data.h is writable
1300   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1301   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1302
1303 * rebuild Unicode tools (at least genpname) using make
1304 - You might first need to "make install" ICU so that the tools build can pick
1305   up the new definitions from the installed header files.
1306
1307 * run genpname
1308 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1309 - rebuild ICU & tools
1310
1311 * update source/data/unidata/norm2/nfkc_cf.txt
1312 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1313
1314 * update source/data/unidata/norm2/uts46.txt
1315 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
1316   to ~/svn.icu/tools/trunk/src/unicode/py
1317 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
1318 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1319 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1320
1321 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1322   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1323 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1324 - Unicode 6.0: U+2260, U+226E, U+226F
1325
1326 * generate core properties data files
1327 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1328 - rebuild ICU & tools
1329 - run makeuca.sh so that genuca picks up the new nfc.nrm:
1330   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1331 - rebuild ICU & tools
1332
1333 * implement new Script_Extensions property (provisional)
1334 - parser & generator: genprops & uprops.icu
1335 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
1336 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
1337
1338 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
1339 - (one-time change)
1340 - genbidi/gencase/genprops tools changes
1341 - re-run makeprops.sh (see above)
1342 - UCharacterProperty.java, UCharacterTypeIterator.java,
1343   UBiDiProps.java, UCaseProps.java, and several others with minor changes;
1344   UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
1345
1346 * update Java data files
1347 - refresh just the UCD-related files, just to be safe
1348 - see (ICU4C)/source/data/icu4j-readme.txt
1349 - mkdir /tmp/icu4j
1350 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1351   output:
1352     ...
1353     Unicode .icu files built to ./out/build/icudt45l
1354     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1355     echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1356     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1357     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
1358     mkdir -p /tmp/icu4j/main/shared/data
1359     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1360 - copy the big-endian Unicode data files to another location,
1361   separate from the other data files
1362     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1363     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1364     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1365     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
1366     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1367     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1368     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1369 - refresh ICU4J
1370     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1371
1372 * refresh Java test .txt files
1373 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1374
1375 * un-hardcode normalization skippable (NF*_Inert) test data
1376 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
1377
1378 * copy updated break iterator test files
1379 - now handled by early ucdcopy.py and
1380   copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
1381   (old instructions:
1382    copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
1383    to ~/svn.icu/trunk/src/source/test/testdata)
1384 - they are not used in ICU4J
1385
1386 * UCA
1387
1388 - get output from Mark's tools; look in
1389     http://www.unicode.org/~book/incoming/mark/uca6.0.0/
1390     http://www.macchiato.com/unicode/utc/additional-uca-files
1391     http://www.unicode.org/Public/UCA/6.0.0/
1392     http://www.unicode.org/~mdavis/uca/
1393 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1394 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1395 - update Han-implicit ranges for new CJK extensions:
1396   swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1397 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
1398   do not add it into invuca so that tailoring primary-after an ignorable works
1399 - genuca: permit space between [variable top] bytes
1400 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
1401 - run makeuca.sh:
1402   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1403 - rebuild ICU4C
1404 - refresh ICU4J collation data:
1405   (subset of instructions above for properties data refresh, except copies all coll/*)
1406     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1407     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1408     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1409     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1410 - update (ICU)/source/test/testdata/CollationTest_*.txt
1411   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1412   with output from Mark's Unicode tools
1413 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1414 - note on intltest: if collate/UCAConformanceTest fails, then
1415   utility/MultithreadTest/TestCollators will fail as well;
1416   fix the conformance test before looking into the multi-thread test
1417
1418 * When refreshing all of ICU4J data from ICU4C
1419 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1420 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1421 or
1422 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1423
1424 *** LayoutEngine script information
1425
1426 (For details see the Unicode 5.2 change log below.)
1427
1428 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1429 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1430 ScriptRunData.cpp, which is no longer needed.)
1431
1432 The generated files have a current copyright date and "@draft" statement.
1433
1434 * copy the above files into <icu>/source/layout, replacing the old files.
1435 * fix mixed line endings
1436 * review the diffs and fix incorrect @draft and missing aliases;
1437   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1438 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1439
1440 ---------------------------------------------------------------------------- ***
1441
1442 Unicode 5.2 update
1443
1444 *** related ICU Trac tickets
1445
1446 7084 Unicode 5.2
1447
1448 7167 verify collation bytes
1449 7235 Java test NAME_ALIAS
1450 7236 Java DerivedCoreProperties.txt test
1451 7237 Java BidiTest.txt
1452 7238 UTrie2 in core unidata
1453 7239 test for tailoring gaps
1454 7240 Java fix CollationMiscTest
1455 7243 update layout engine for Unicode 5.2
1456
1457 *** Unicode version numbers
1458 - makedata.mak
1459 - uchar.h
1460 - configure.in & configure
1461 - update ucdVersion in gennames.c if an algorithmic range changes
1462
1463 *** data files & enums & parser code
1464
1465 * file preparation
1466
1467 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
1468 - includes finding files regardless of version numbers,
1469   copying them, and performing the equivalent processing of the
1470   ucdstrip and ucdmerge tools on the desired set of files
1471
1472 * notes on changes
1473 - PropertyAliases.txt
1474   moved from numeric to enumerated:
1475     ccc       ; Canonical_Combining_Class
1476   new string properties:
1477     NFKC_CF   ; NFKC_Casefold
1478     Name_Alias; Name_Alias
1479   new binary properties:
1480     Cased     ; Cased
1481     CI        ; Case_Ignorable
1482     CWCF      ; Changes_When_Casefolded
1483     CWCM      ; Changes_When_Casemapped
1484     CWKCF     ; Changes_When_NFKC_Casefolded
1485     CWL       ; Changes_When_Lowercased
1486     CWT       ; Changes_When_Titlecased
1487     CWU       ; Changes_When_Uppercased
1488   new CJK Unihan properties (not supported by ICU)
1489 - PropertyValueAliases.txt
1490   new block names
1491   new scripts
1492   one script code change:
1493     sc ; Qaai      ; Inherited
1494     ->
1495     sc ; Zinh      ; Inherited                        ; Qaai
1496   new Line_Break (lb) value:
1497     lb ; CP        ; Close_Parenthesis
1498   new Joining_Group (jg) values: Farsi_Yeh, Nya
1499   other new values:
1500     ccc; 214; ATA  ; Attached_Above
1501 - DerivedBidiClass.txt
1502   new default-R range: U+1E800 - U+1EFFF
1503 - UnicodeData.txt
1504   all of the ISO comments are gone
1505   new CJK block end:
1506     9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
1507   new CJK block:
1508     2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
1509     2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
1510
1511 * genpname
1512 - run preparse.pl
1513   + cd \svn\icuproj\icu\trunk\source\tools\genpname
1514   + make sure that data.h is writable
1515   + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
1516   + preparse.pl complains with errors like the following:
1517       Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
1518     This is because ICU 4.0 had scripts from ISO 15924 which are now
1519     added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
1520     and PropertyValueAliases.txt.
1521     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
1522        Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
1523   + preparse.pl complains with errors about block names missing from uchar.h; add them
1524
1525 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
1526 - new block & script values
1527   + 26 new blocks
1528     copy new blocks from Blocks.txt
1529     MS VC++ 2008 regular expression:
1530       find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
1531       replace with "    UBLOCK_\3 = 172, /*[\1]*/"
1532   + several new script values already added in ICU 4.0 for ISO 15924 coverage
1533     (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
1534   + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
1535   + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
1536     (added to SyntheticPropertyValueAliases.txt)
1537 - new Joining Group (JG) values: Farsi_Yeh, Nya
1538 - new Line_Break (lb) value:
1539     lb ; CP        ; Close_Parenthesis
1540
1541 * hardcoded Unihan range end/limit
1542 - Unihan range end moves from 9FC3 to 9FCB
1543   search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
1544   + do change gennames.c
1545
1546 * Compare definitions of new binary properties with what we used to use
1547   in algorithms, to see if the definitions changed.
1548 - Verified that definitions for Cased and Case_Ignorable are unchanged.
1549   The gencase tool now parses the newly public Case_Ignorable values
1550   in case the definition changes in the future.
1551
1552 * uchar.c & uprops.h & uprops.c & genprops
1553 - new numeric values that didn't exist in Unicode data before:
1554     1/7, 1/9, 1/10, 3/10, 1/16, 3/16
1555   the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
1556   therefore redesign the encoding of numeric types and values for formatVersion 6;
1557   design for simple numbers up to at least 144 ("one gross"),
1558   large values up to at least 10^20,
1559   and fractions with numerators -1..17 and denominators 1..16
1560   to cover current and expected future values
1561   (e.g., more Han numeric values, Meroitic twelfths)
1562
1563 * reimplement Hangul_Syllable_Type for new Jamo characters
1564 - the old code assumed that all Jamo characters are in the 11xx block
1565 - Unicode 5.2 fills holes there and adds new Jamo characters in
1566     A960..A97F; Hangul Jamo Extended-A
1567   and in
1568     D7B0..D7FF; Hangul Jamo Extended-B
1569 - Hangul_Syllable_Type can be trivially derived from a subset of
1570   Grapheme_Cluster_Break values
1571
1572 * build Unicode data source code for hardcoding core data
1573 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
1574
1575 ICU data make path is \svn\icuproj\icu\trunk\source\data\
1576 ICU root path is \svn\icuproj\icu\trunk
1577 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1578 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
1579 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
1580 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
1581 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
1582 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
1583 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
1584 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
1585 Creating data file for Unicode Property Names
1586 Creating data file for Unicode Character Properties
1587 Creating data file for Unicode Case Mapping Properties
1588 Creating data file for Unicode BiDi/Shaping Properties
1589 Creating data file for Unicode Normalization
1590 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
1591 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
1592
1593 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
1594   and rebuild the common library
1595
1596 *** UCA
1597
1598 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
1599 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
1600 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
1601 [ Begin obsolete instructions:
1602   Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
1603     - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
1604       on Windows:
1605         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
1606         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
1607   End obsolete instructions]
1608 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1609   not just the *_STUB.txt files
1610 - note on intltest: if collate/UCAConformanceTest fails, then
1611   utility/MultithreadTest/TestCollators will fail as well;
1612   fix the conformance test before looking into the multi-thread test
1613
1614 *** Implement Cased & Case_Ignorable properties
1615 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
1616 - Problem: These properties should be disjoint, but aren't
1617 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
1618 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
1619
1620 *** Implement Changes_When_Xyz properties
1621 - without stored data
1622
1623 *** Implement Name_Alias property
1624 - add it as another name field in unames.icu
1625 - make it available via u_charName() and UCharNameChoice and
1626 - consider it in u_charFromName()
1627
1628 *** Break iterators
1629
1630 * Update break iterator rules to new UAX versions and new property values
1631 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
1632
1633 *** new BidiTest file
1634 - review format and data
1635 - copy BidiTest.txt to source/test/testdata
1636 - write test code using this data
1637 - fix ICU code where it fails the conformance test
1638
1639 *** Java
1640 - generally, find and update code corresponding to C/C++
1641 - UCharacter.UnicodeBlock constants:
1642   a) add an _ID integer per new block, update COUNT
1643   b) add a class instance per new block
1644      Visual Studio regex:
1645         find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
1646         replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1647 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
1648
1649 - port test changes to Java
1650
1651 *** LayoutEngine script information
1652
1653 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
1654
1655 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1656 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1657 ScriptRunData.cpp, which is no longer needed.)
1658
1659 The generated files have a current copyright date and "@draft" statement.
1660
1661 -> Eric Mader wrote in email on 20090930:
1662     "I think the tool has been modified to update @draft to @stable for
1663      older scripts and to add @draft for new scripts.
1664      (I worked with an intern on this last year.)
1665      You should check the output after you run it."
1666
1667 * copy the above files into <icu>/source/layout, replacing the old files.
1668 * fix mixed line endings
1669 * review the diffs and fix incorrect @draft and missing aliases
1670 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1671
1672 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1673 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1674
1675 -> Eric Mader wrote in email on 20090930:
1676     "This is just a matter of making sure that all the per-script tables have
1677      entries for any new scripts that were added.
1678      If any new Indic characters were added, then the class tables in
1679      IndicClassTables.cpp should be updated to reflect this.
1680      John Emmons should know how to do this if it's required."
1681
1682 * rebuild the layout and layoutex libraries.
1683
1684 *** Documentation
1685 - Update User Guide
1686   + Jamo_Short_Name, sfc->scf, binary property value aliases
1687
1688 ---------------------------------------------------------------------------- ***
1689
1690 Unicode 5.1 update
1691
1692 *** related ICU Trac tickets
1693
1694 5696 Update to Unicode 5.1
1695
1696 *** Unicode version numbers
1697 - makedata.mak
1698 - uchar.h
1699 - configure.in & configure
1700 - update ucdVersion in gennames.c if an algorithmic range changes
1701
1702 *** data files & enums & parser code
1703
1704 * file preparation
1705 - ucdstrip:
1706     DerivedCoreProperties.txt
1707     DerivedNormalizationProps.txt
1708     NormalizationTest.txt
1709     PropList.txt
1710     Scripts.txt
1711     GraphemeBreakProperty.txt
1712     SentenceBreakProperty.txt
1713     WordBreakProperty.txt
1714 - ucdstrip and ucdmerge:
1715     EastAsianWidth.txt
1716     LineBreak.txt
1717
1718 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
1719 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
1720 copy 5.1.0\ucd\Blocks.txt ..\unidata\
1721 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
1722 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
1723 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
1724 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
1725 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
1726 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
1727 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
1728 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
1729 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
1730 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
1731 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
1732
1733 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
1734 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
1735 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
1736 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
1737 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
1738 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
1739 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
1740 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
1741 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
1742 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
1743
1744 * genpname
1745 - run preparse.pl
1746   + cd \svn\icuproj\icu\uni51\source\tools\genpname
1747   + make sure that data.h is writable
1748   + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
1749   + preparse.pl complains with errors like the following:
1750       Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
1751     This is because ICU 3.8 had scripts from ISO 15924 which are now
1752     added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
1753     and PropertyValueAliases.txt.
1754     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
1755        Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
1756   + PropertyValueAliases.txt now explicitly contains values for boolean properties:
1757       N/Y, No/Yes, F/T, False/True
1758     -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
1759        It will use further values from the file if present.
1760
1761 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
1762 - new block & script values
1763   + 17 new blocks
1764   + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
1765     (removed from SyntheticPropertyValueAliases.txt)
1766   + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
1767     (added to SyntheticPropertyValueAliases.txt)
1768 - uprops.icu (uprops.h) only provides 7 bits for script codes.
1769   In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
1770   There is none above 127 yet which is the script code for an
1771   assigned Unicode character, so ICU 4.0 uprops.icu does not store any
1772   script code values greater than 127.
1773   However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
1774   in a parallel bit field, and that overflows now.
1775   Also, future values >=128 would be incompatible anyway.
1776   uprops.h is modified to move around several of the bit fields
1777   in the properties vector words, and now uses 8 bits for the script code.
1778   Two other bit fields also grow to accommodate future growth:
1779   Block (current count: 172) grows from 8 to 9 bits,
1780   and Word_Break grows from 4 to 5 bits.
1781 - renamed property Simple_Case_Folding (sfc->scf)
1782   + nothing to be done: handled as normal alias
1783 - new property JSN Jamo_Short_Name
1784   + no new API: only contributes to the Name property
1785 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
1786 - new Joining Group (JG) value: Burushashki_Yeh_Barree
1787 - new Sentence_Break (SB) values:
1788     SB ; CR        ; CR
1789     SB ; EX        ; Extend
1790     SB ; LF        ; LF
1791     SB ; SC        ; SContinue
1792 - new Word_Break (WB) values:
1793     WB ; CR        ; CR
1794     WB ; Extend    ; Extend
1795     WB ; LF        ; LF
1796     WB ; MB        ; MidNumLet
1797
1798 * Further changes in the 2008-02-29 update:
1799 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
1800   because they should not normally be invisible.
1801 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
1802 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
1803 - new Word_Break (WB) value: NL=Newline
1804
1805 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
1806 - Unihan range end moves from 9FBB to 9FC3
1807   search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
1808   + do change gennames.c
1809
1810 * build Unicode data source code for hardcoding core data
1811 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
1812
1813 ICU data make path is \svn\icuproj\icu\uni51\source\data\
1814 ICU root path is \svn\icuproj\icu\uni51
1815 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1816 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
1817 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
1818 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
1819 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
1820 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
1821 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
1822 Creating data file for Unicode Character Properties
1823 Creating data file for Unicode Case Mapping Properties
1824 Creating data file for Unicode BiDi/Shaping Properties
1825 Creating data file for Unicode Normalization
1826 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
1827 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
1828
1829 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
1830   and rebuild the common library
1831
1832 *** Break iterators
1833
1834 * Update break iterator rules to new UAX versions and new property values
1835
1836 *** UCA
1837
1838 * update FractionalUCA.txt and UCARules.txt with new canonical closure
1839
1840 *** Test suites
1841 - Test that APIs using Unicode property value aliases (like UnicodeSet)
1842   support all of the boolean values N/Y, No/Yes, F/T, False/True
1843   -> TestBinaryValues() tests in both cintltst and intltest
1844
1845 *** LayoutEngine script information
1846 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
1847 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
1848 ScriptRunData.cpp, which is no longer needed.)
1849
1850 The generated files have a current copyright date and "@draft" statement.
1851
1852 * copy the above files into <icu>/source/layout, replacing the old files.
1853
1854 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1855 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1856
1857 * rebuild the layout and layoutex libraries.
1858
1859 *** Documentation
1860 - Update User Guide
1861   + Jamo_Short_Name, sfc->scf, binary property value aliases
1862
1863 ---------------------------------------------------------------------------- ***
1864
1865 Unicode 5.0 update
1866
1867 *** related Jitterbugs
1868
1869 5084 RFE: Update to Unicode 5.0
1870
1871 *** data files & enums & parser code
1872
1873 * file preparation
1874 - ucdstrip:
1875     DerivedCoreProperties.txt
1876     DerivedNormalizationProps.txt
1877     NormalizationTest.txt
1878     PropList.txt
1879     Scripts.txt
1880     GraphemeBreakProperty.txt
1881     SentenceBreakProperty.txt
1882     WordBreakProperty.txt
1883 - ucdstrip and ucdmerge:
1884     EastAsianWidth.txt
1885     LineBreak.txt
1886
1887 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
1888 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
1889 copy 5.0.0\ucd\Blocks.txt ..\unidata\
1890 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
1891 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
1892 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
1893 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
1894 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
1895 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
1896 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
1897 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
1898 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
1899 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
1900 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
1901
1902 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
1903 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
1904 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
1905 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
1906 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
1907 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
1908 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
1909 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
1910 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
1911 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
1912
1913 * update FractionalUCA.txt and UCARules.txt with new canonical closure
1914
1915 * genpname
1916 - run preparse.pl
1917   + make sure that data.h is writable
1918   + perl preparse.pl \cvs\oss\icu > out.txt
1919
1920 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
1921 - new block & script values
1922   + script values already added in ICU 3.6 because all of ISO 15924 is now covered
1923
1924 * build Unicode data source code for hardcoding core data
1925 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
1926
1927 ICU data make path is \cvs\oss\icu\source\data\
1928 ICU root path is \cvs\oss\icu
1929 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1930 [etc.]
1931 Creating data file for Unicode Character Properties
1932 Creating data file for Unicode Case Mapping Properties
1933 Creating data file for Unicode BiDi/Shaping Properties
1934 Creating data file for Unicode Normalization
1935 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
1936 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
1937
1938 - copy the .c source files to C:\cvs\oss\icu\source\common
1939   and rebuild the common library
1940
1941 *** Unicode version numbers
1942 - makedata.mak
1943 - uchar.h
1944 - configure.in
1945
1946 *** LayoutEngine script information
1947 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
1948 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
1949 ScriptRunData.cpp, which is no longer needed.)
1950
1951 The generated files have a current copyright date and "@draft" statement.
1952
1953 * copy the above files into <icu>/source/layout, replacing the old files.
1954
1955 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1956 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1957
1958 * rebuild the layout and layoutex libraries.
1959
1960 ---------------------------------------------------------------------------- ***
1961
1962 Unicode 4.1 update
1963
1964 *** related Jitterbugs
1965
1966 4332 RFE: Update to Unicode 4.1
1967 4157 RBBI, TR29 4.1 updates
1968
1969 *** data files & enums & parser code
1970
1971 * file preparation
1972 - ucdstrip:
1973     DerivedCoreProperties.txt
1974     DerivedNormalizationProps.txt
1975     NormalizationTest.txt
1976     GraphemeBreakProperty.txt
1977     SentenceBreakProperty.txt
1978     WordBreakProperty.txt
1979 - ucdstrip and ucdmerge:
1980     EastAsianWidth.txt
1981     LineBreak.txt
1982
1983 * add new files to the repository
1984     GraphemeBreakProperty.txt
1985     SentenceBreakProperty.txt
1986     WordBreakProperty.txt
1987
1988 * update FractionalUCA.txt and UCARules.txt with new canonical closure
1989
1990 * genpname
1991 - handle new enumerated properties in sub read_uchar
1992 - run preparse.pl
1993
1994 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
1995 - new binary properties
1996   + Pattern_Syntax
1997   + Pattern_White_Space
1998 - new enumerated properties
1999   + Grapheme_Cluster_Break
2000   + Sentence_Break
2001   + Word_Break
2002 - new block & script & line break values
2003
2004 * gencase
2005 - case-ignorable changes
2006   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2007   now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
2008
2009 *** Unicode version numbers
2010 - makedata.mak
2011 - uchar.h
2012 - configure.in
2013
2014 *** tests
2015 - verify that u_charMirror() round-trips
2016 - test all new properties and some new values of old properties
2017
2018 *** other code
2019
2020 * hardcoded Unihan range end/limit
2021 - Unihan range end moves from 9FA5 to 9FBB
2022   search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
2023   + do not modify BOCU/BOCSU code because that would change the encoding
2024     and break binary compatibility!
2025   + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
2026     NamePrepProfile.txt
2027   + ignore trietest.c: test data is arbitrary
2028   + ignore tstnorm.cpp: test optimization, not important
2029   + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
2030   + do change line_th.txt and word_th.txt
2031     by replacing hardcoded ranges with the new property values
2032   + do change gennames.c
2033
2034 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2035 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2036 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
2037
2038 * case mappings
2039 - compare new special casing context conditions with previous ones
2040   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2041
2042 * genpname
2043 - consider storing only the short name if it is the same as the long name
2044
2045 *** other reviews
2046 - UAX #29 changes (grapheme/word/sentence breaks)
2047 - UAX #14 changes (line breaks)
2048 - Pattern_Syntax & Pattern_White_Space
2049
2050 ---------------------------------------------------------------------------- ***
2051
2052 Unicode 4.0.1 update
2053
2054 *** related Jitterbugs
2055
2056 3170 RFE: Update to Unicode 4.0.1
2057 3171 Add new Unicode 4.0.1 properties
2058 3520 use Unicode 4.0.1 updates for break iteration
2059
2060 *** data files & enums & parser code
2061
2062 * file preparation
2063 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
2064 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
2065
2066 * file fixes
2067 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
2068   according to PRI #26
2069   http://www.unicode.org/review/resolved-pri.html#pri26
2070 - undone again because no corrigendum in sight;
2071   instead modified tests to not check consistency on this for Unicode 4.0.1
2072
2073 * ucdterms.txt
2074 - update from http://www.unicode.org/copyright.html
2075   formatted for plain text
2076
2077 * uchar.h & uprops.h & uprops.c & genprops
2078 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
2079 - add U_LB_INSEPARABLE due to a spelling fix
2080   + put short name comment only on line with new constant
2081     for genpname perl script parser
2082 - new binary properties
2083   + STerm
2084   + Variation_Selector
2085
2086 * genpname
2087 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
2088 - perl script: correctly calculate the maximum number of fields per row
2089
2090 * uscript.h
2091 - new script code Hrkt=Katakana_Or_Hiragana
2092
2093 * gennorm.c track changes in DerivedNormalizationProps.txt
2094 - "FNC" -> "FC_NFKC"
2095 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
2096
2097 * genprops/props2.c track changes in DerivedNumericValues.txt
2098 - changed from 3 columns to 2, dropping the numeric type
2099   + assume that the type is always numeric for Han characters,
2100     and that only those are added in addition to what UnicodeData.txt lists
2101
2102 *** Unicode version numbers
2103 - makedata.mak
2104 - uchar.h
2105 - configure.in
2106
2107 *** tests
2108 - update test of default bidi classes according to PRI #28
2109   /tsutil/cucdtst/TestUnicodeData
2110   http://www.unicode.org/review/resolved-pri.html#pri28
2111 - bidi tests: change exemplar character for ES depending on Unicode version
2112 - change hardcoded expected property values where they change
2113
2114 *** other code
2115
2116 * name matching
2117 - read UCD.html
2118
2119 * scripts
2120 - use new Hrkt=Katakana_Or_Hiragana
2121
2122 * ZWJ & ZWNJ
2123 - are now part of combining character sequences
2124 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ