icuSources/data/unidata/changes.txt

   1 * Copyright (C) 2004-2010, International Business Machines
   2 * Corporation and others.  All Rights Reserved.
   3 *
   4 *   file name:  changes.txt
   5 *   encoding:   US-ASCII
   6 *   tab size:   8 (not used)
   7 *   indentation:4
   8 *
   9 *   created on: 2004may06
  10 *   created by: Markus W. Scherer
  11 *
  12 * change log for Unicode updates
  13
  14 ---------------------------------------------------------------------------- ***
  15
  16 Unicode 6.0 update
  17
  18 *** related ICU Trac tickets
  19
  20 7264 Unicode 6.0 Update
  21
  22 *** Unicode version numbers
  23 - makedata.mak
  24 - uchar.h
  25   (configure.in & configure: have been modified to extract the version from uchar.h)
  26 - com.ibm.icu.util.VersionInfo
  27
  28 *** data files & enums & parser code
  29
  30 * file preparation
  31
  32 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
  33 - This now prepares both unidata and testdata files in respective output subfolders.
  34
  35 * PropertyAliases.txt changes
  36 - new Script_Extensions property defined in the new ScriptExtensions.txt file
  37   but not listed in PropertyAliases.txt; reported to unicode.org;
  38   -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
  39     scx; Script_Extensions
  40   -> uchar.h with new UProperty section
  41   -> com.ibm.icu.lang.UProperty, parallel with uchar.h
  42
  43 * PropertyValueAliases.txt changes
  44 - 12 new block names:
  45   Alchemical_Symbols
  46   Bamum_Supplement
  47   Batak
  48   Brahmi
  49   CJK_Unified_Ideographs_Extension_D
  50   Emoticons
  51   Ethiopic_Extended_A
  52   Kana_Supplement
  53   Mandaic
  54   Miscellaneous_Symbols_And_Pictographs
  55   Playing_Cards
  56   Transport_And_Map_Symbols
  57   -> add to uchar.h
  58   -> add to UCharacter.UnicodeBlock
  59     Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
  60             replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
  61 - Joining_Group (jg) values:
  62   Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
  63   -> uchar.h & UCharacter.JoiningGroup
  64 - 3 new scripts:
  65   sc ; Batk      ; Batak
  66   sc ; Brah      ; Brahmi
  67   sc ; Mand      ; Mandaic
  68   -> remove these from SyntheticPropertyValueAliases.txt
  69   -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
  70   -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
  71       and in com.ibm.icu.dev.test.lang.TestUScript.java
  72 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
  73   (added 2009-11-11..2010-07-18)
  74   Bass        259     Bassa Vah
  75   Dupl        755     Duployan shortand
  76   Elba        226     Elbasan
  77   Gran        343     Grantha
  78   Kpel        436     Kpelle
  79   Loma        437     Loma
  80   Mend        438     Mende
  81   Merc        101     Meroitic Cursive
  82   Narb        106     Old North Arabian
  83   Nbat        159     Nabataean
  84   Palm        126     Palmyrene
  85   Sind        318     Sindhi
  86   Wara        262     Warang Citi
  87   -> uscript.h
  88   -> com.ibm.icu.lang.UScript
  89     find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
  90     replace  public static final int \1 = \2;\3
  91   -> SyntheticPropertyValueAliases.txt
  92   -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
  93       and in com.ibm.icu.dev.test.lang.TestUScript.java
  94 - ISO 15924 name change
  95   Mero        100     Meroitic Hieroglyphs (was Meroitic)
  96   -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
  97 - property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
  98
  99 * UnicodeData.txt changes
 100 - new CJK block:
 101   2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
 102   2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
 103   -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
 104
 105 * build Unicode tools using CMake+make
 106
 107 * run genpname/preparse.pl (on Linux)
 108   + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
 109   + make sure that data.h is writable
 110   + perl preparse.pl ~/svn.icu/trunk/src > out.txt
 111   + preparse.pl shows no errors, out.txt Info and Warning lines look ok
 112
 113 * rebuild Unicode tools (at least genpname) using make
 114 - You might first need to "make install" ICU so that the tools build can pick
 115   up the new definitions from the installed header files.
 116
 117 * run genpname
 118 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
 119 - rebuild ICU & tools
 120
 121 * update source/data/unidata/norm2/nfkc_cf.txt
 122 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
 123
 124 * update source/data/unidata/norm2/uts46.txt
 125 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
 126   to ~/svn.icu/tools/trunk/src/unicode/py
 127 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
 128 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
 129 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
 130
 131 * update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
 132   sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
 133 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
 134 - Unicode 6.0: U+2260, U+226E, U+226F
 135
 136 * generate core properties data files
 137 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
 138 - rebuild ICU & tools
 139 - run makeuca.sh so that genuca picks up the new nfc.nrm:
 140   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
 141 - rebuild ICU & tools
 142
 143 * implement new Script_Extensions property (provisional)
 144 - parser & generator: genprops & uprops.icu
 145 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
 146 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
 147
 148 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
 149 - (one-time change)
 150 - genbidi/gencase/genprops tools changes
 151 - re-run makeprops.sh (see above)
 152 - UCharacterProperty.java, UCharacterTypeIterator.java,
 153   UBiDiProps.java, UCaseProps.java, and several others with minor changes;
 154   UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
 155
 156 * update Java data files
 157 - refresh just the UCD-related files, just to be safe
 158 - see (ICU4C)/source/data/icu4j-readme.txt
 159 - mkdir /tmp/icu4j
 160 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 161   output:
 162     ...
 163     Unicode .icu files built to ./out/build/icudt45l
 164     mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
 165     echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
 166     LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
 167     jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
 168     mkdir -p /tmp/icu4j/main/shared/data
 169     cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
 170 - copy the big-endian Unicode data files to another location,
 171   separate from the other data files
 172     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
 173     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
 174     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
 175     ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
 176     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
 177     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
 178     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
 179 - refresh ICU4J
 180     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
 181
 182 * refresh Java test .txt files
 183 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
 184
 185 * un-hardcode normalization skippable (NF*_Inert) test data
 186 - removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
 187
 188 * copy updated break iterator test files
 189 - now handled by early ucdcopy.py and
 190   copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
 191   (old instructions:
 192    copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
 193    to ~/svn.icu/trunk/src/source/test/testdata)
 194 - they are not used in ICU4J
 195
 196 * UCA
 197
 198 - get output from Mark's tools; look in
 199     http://www.unicode.org/~book/incoming/mark/uca6.0.0/
 200     http://www.macchiato.com/unicode/utc/additional-uca-files
 201     http://www.unicode.org/Public/UCA/6.0.0/
 202     http://www.unicode.org/~mdavis/uca/
 203 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
 204 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
 205 - update Han-implicit ranges for new CJK extensions:
 206   swapCJK() in ucol.cpp & ImplicitCEGenerator.java
 207 - genuca: allow bytes 02 for U+FFFE, new merge-sort character;
 208   do not add it into invuca so that tailoring primary-after an ignorable works
 209 - genuca: permit space between [variable top] bytes
 210 - ucol.cpp: treat noncharacters like unassigned rather than ignorable
 211 - run makeuca.sh:
 212   ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
 213 - rebuild ICU4C
 214 - refresh ICU4J collation data:
 215   (subset of instructions above for properties data refresh, except copies all coll/*)
 216     ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 217     mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
 218     ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
 219     ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
 220 - update (ICU)/source/test/testdata/CollationTest_*.txt
 221   and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
 222   with output from Mark's Unicode tools
 223 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
 224 - note on intltest: if collate/UCAConformanceTest fails, then
 225   utility/MultithreadTest/TestCollators will fail as well;
 226   fix the conformance test before looking into the multi-thread test
 227
 228 * When refreshing all of ICU4J data from ICU4C
 229 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
 230 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
 231 or
 232 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
 233
 234 *** LayoutEngine script information
 235
 236 (For details see the Unicode 5.2 change log below.)
 237
 238 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
 239 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
 240 ScriptRunData.cpp, which is no longer needed.)
 241
 242 The generated files have a current copyright date and "@draft" statement.
 243
 244 * copy the above files into <icu>/source/layout, replacing the old files.
 245 * fix mixed line endings
 246 * review the diffs and fix incorrect @draft and missing aliases;
 247   Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
 248 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
 249
 250 ---------------------------------------------------------------------------- ***
 251
 252 Unicode 5.2 update
 253
 254 *** related ICU Trac tickets
 255
 256 7084 Unicode 5.2
 257
 258 7167 verify collation bytes
 259 7235 Java test NAME_ALIAS
 260 7236 Java DerivedCoreProperties.txt test
 261 7237 Java BidiTest.txt
 262 7238 UTrie2 in core unidata
 263 7239 test for tailoring gaps
 264 7240 Java fix CollationMiscTest
 265 7243 update layout engine for Unicode 5.2
 266
 267 *** Unicode version numbers
 268 - makedata.mak
 269 - uchar.h
 270 - configure.in & configure
 271 - update ucdVersion in gennames.c if an algorithmic range changes
 272
 273 *** data files & enums & parser code
 274
 275 * file preparation
 276
 277 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
 278 - includes finding files regardless of version numbers,
 279   copying them, and performing the equivalent processing of the
 280   ucdstrip and ucdmerge tools on the desired set of files
 281
 282 * notes on changes
 283 - PropertyAliases.txt
 284   moved from numeric to enumerated:
 285     ccc       ; Canonical_Combining_Class
 286   new string properties:
 287     NFKC_CF   ; NFKC_Casefold
 288     Name_Alias; Name_Alias
 289   new binary properties:
 290     Cased     ; Cased
 291     CI        ; Case_Ignorable
 292     CWCF      ; Changes_When_Casefolded
 293     CWCM      ; Changes_When_Casemapped
 294     CWKCF     ; Changes_When_NFKC_Casefolded
 295     CWL       ; Changes_When_Lowercased
 296     CWT       ; Changes_When_Titlecased
 297     CWU       ; Changes_When_Uppercased
 298   new CJK Unihan properties (not supported by ICU)
 299 - PropertyValueAliases.txt
 300   new block names
 301   new scripts
 302   one script code change:
 303     sc ; Qaai      ; Inherited
 304     ->
 305     sc ; Zinh      ; Inherited                        ; Qaai
 306   new Line_Break (lb) value:
 307     lb ; CP        ; Close_Parenthesis
 308   new Joining_Group (jg) values: Farsi_Yeh, Nya
 309   other new values:
 310     ccc; 214; ATA  ; Attached_Above
 311 - DerivedBidiClass.txt
 312   new default-R range: U+1E800 - U+1EFFF
 313 - UnicodeData.txt
 314   all of the ISO comments are gone
 315   new CJK block end:
 316     9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
 317   new CJK block:
 318     2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
 319     2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
 320
 321 * genpname
 322 - run preparse.pl
 323   + cd \svn\icuproj\icu\trunk\source\tools\genpname
 324   + make sure that data.h is writable
 325   + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
 326   + preparse.pl complains with errors like the following:
 327       Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
 328     This is because ICU 4.0 had scripts from ISO 15924 which are now
 329     added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
 330     and PropertyValueAliases.txt.
 331     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
 332        Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
 333   + preparse.pl complains with errors about block names missing from uchar.h; add them
 334
 335 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
 336 - new block & script values
 337   + 26 new blocks
 338     copy new blocks from Blocks.txt
 339     MS VC++ 2008 regular expression:
 340       find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
 341       replace with "    UBLOCK_\3 = 172, /*[\1]*/"
 342   + several new script values already added in ICU 4.0 for ISO 15924 coverage
 343     (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
 344   + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
 345   + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
 346     (added to SyntheticPropertyValueAliases.txt)
 347 - new Joining Group (JG) values: Farsi_Yeh, Nya
 348 - new Line_Break (lb) value:
 349     lb ; CP        ; Close_Parenthesis
 350
 351 * hardcoded Unihan range end/limit
 352 - Unihan range end moves from 9FC3 to 9FCB
 353   search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
 354   + do change gennames.c
 355
 356 * Compare definitions of new binary properties with what we used to use
 357   in algorithms, to see if the definitions changed.
 358 - Verified that definitions for Cased and Case_Ignorable are unchanged.
 359   The gencase tool now parses the newly public Case_Ignorable values
 360   in case the definition changes in the future.
 361
 362 * uchar.c & uprops.h & uprops.c & genprops
 363 - new numeric values that didn't exist in Unicode data before:
 364     1/7, 1/9, 1/10, 3/10, 1/16, 3/16
 365   the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
 366   therefore redesign the encoding of numeric types and values for formatVersion 6;
 367   design for simple numbers up to at least 144 ("one gross"),
 368   large values up to at least 10^20,
 369   and fractions with numerators -1..17 and denominators 1..16
 370   to cover current and expected future values
 371   (e.g., more Han numeric values, Meroitic twelfths)
 372
 373 * reimplement Hangul_Syllable_Type for new Jamo characters
 374 - the old code assumed that all Jamo characters are in the 11xx block
 375 - Unicode 5.2 fills holes there and adds new Jamo characters in
 376     A960..A97F; Hangul Jamo Extended-A
 377   and in
 378     D7B0..D7FF; Hangul Jamo Extended-B
 379 - Hangul_Syllable_Type can be trivially derived from a subset of
 380   Grapheme_Cluster_Break values
 381
 382 * build Unicode data source code for hardcoding core data
 383 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
 384
 385 ICU data make path is \svn\icuproj\icu\trunk\source\data\
 386 ICU root path is \svn\icuproj\icu\trunk
 387 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
 388 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
 389 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
 390 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
 391 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
 392 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
 393 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
 394 Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
 395 Creating data file for Unicode Property Names
 396 Creating data file for Unicode Character Properties
 397 Creating data file for Unicode Case Mapping Properties
 398 Creating data file for Unicode BiDi/Shaping Properties
 399 Creating data file for Unicode Normalization
 400 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
 401 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
 402
 403 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
 404   and rebuild the common library
 405
 406 *** UCA
 407
 408 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
 409 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
 410 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
 411 [ Begin obsolete instructions:
 412   Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
 413     - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
 414       on Windows:
 415         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
 416         python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
 417   End obsolete instructions]
 418 - run all tests with the *_SHORT.txt or the full files (the full ones have comments)
 419   not just the *_STUB.txt files
 420 - note on intltest: if collate/UCAConformanceTest fails, then
 421   utility/MultithreadTest/TestCollators will fail as well;
 422   fix the conformance test before looking into the multi-thread test
 423
 424 *** Implement Cased & Case_Ignorable properties
 425 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
 426 - Problem: These properties should be disjoint, but aren't
 427 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
 428 - change ucase.icu to be able to store any combination of Cased and Case_Ignorable
 429
 430 *** Implement Changes_When_Xyz properties
 431 - without stored data
 432
 433 *** Implement Name_Alias property
 434 - add it as another name field in unames.icu
 435 - make it available via u_charName() and UCharNameChoice and
 436 - consider it in u_charFromName()
 437
 438 *** Break iterators
 439
 440 * Update break iterator rules to new UAX versions and new property values
 441 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
 442
 443 *** new BidiTest file
 444 - review format and data
 445 - copy BidiTest.txt to source/test/testdata
 446 - write test code using this data
 447 - fix ICU code where it fails the conformance test
 448
 449 *** Java
 450 - generally, find and update code corresponding to C/C++
 451 - UCharacter.UnicodeBlock constants:
 452   a) add an _ID integer per new block, update COUNT
 453   b) add a class instance per new block
 454      Visual Studio regex:
 455         find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
 456         replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
 457 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
 458
 459 - port test changes to Java
 460
 461 *** LayoutEngine script information
 462
 463 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
 464
 465 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
 466 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
 467 ScriptRunData.cpp, which is no longer needed.)
 468
 469 The generated files have a current copyright date and "@draft" statement.
 470
 471 -> Eric Mader wrote in email on 20090930:
 472     "I think the tool has been modified to update @draft to @stable for
 473      older scripts and to add @draft for new scripts.
 474      (I worked with an intern on this last year.)
 475      You should check the output after you run it."
 476
 477 * copy the above files into <icu>/source/layout, replacing the old files.
 478 * fix mixed line endings
 479 * review the diffs and fix incorrect @draft and missing aliases
 480 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
 481
 482 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
 483 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
 484
 485 -> Eric Mader wrote in email on 20090930:
 486     "This is just a matter of making sure that all the per-script tables have
 487      entries for any new scripts that were added.
 488      If any new Indic characters were added, then the class tables in
 489      IndicClassTables.cpp should be updated to reflect this.
 490      John Emmons should know how to do this if it's required."
 491
 492 * rebuild the layout and layoutex libraries.
 493
 494 *** Documentation
 495 - Update User Guide
 496   + Jamo_Short_Name, sfc->scf, binary property value aliases
 497
 498 ---------------------------------------------------------------------------- ***
 499
 500 Unicode 5.1 update
 501
 502 *** related ICU Trac tickets
 503
 504 5696 Update to Unicode 5.1
 505
 506 *** Unicode version numbers
 507 - makedata.mak
 508 - uchar.h
 509 - configure.in & configure
 510 - update ucdVersion in gennames.c if an algorithmic range changes
 511
 512 *** data files & enums & parser code
 513
 514 * file preparation
 515 - ucdstrip:
 516     DerivedCoreProperties.txt
 517     DerivedNormalizationProps.txt
 518     NormalizationTest.txt
 519     PropList.txt
 520     Scripts.txt
 521     GraphemeBreakProperty.txt
 522     SentenceBreakProperty.txt
 523     WordBreakProperty.txt
 524 - ucdstrip and ucdmerge:
 525     EastAsianWidth.txt
 526     LineBreak.txt
 527
 528 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
 529 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
 530 copy 5.1.0\ucd\Blocks.txt ..\unidata\
 531 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
 532 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
 533 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
 534 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
 535 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
 536 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
 537 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
 538 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
 539 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
 540 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
 541 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
 542
 543 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
 544 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
 545 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
 546 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
 547 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
 548 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
 549 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
 550 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
 551 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
 552 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
 553
 554 * genpname
 555 - run preparse.pl
 556   + cd \svn\icuproj\icu\uni51\source\tools\genpname
 557   + make sure that data.h is writable
 558   + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
 559   + preparse.pl complains with errors like the following:
 560       Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
 561     This is because ICU 3.8 had scripts from ISO 15924 which are now
 562     added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
 563     and PropertyValueAliases.txt.
 564     -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
 565        Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
 566   + PropertyValueAliases.txt now explicitly contains values for boolean properties:
 567       N/Y, No/Yes, F/T, False/True
 568     -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
 569        It will use further values from the file if present.
 570
 571 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
 572 - new block & script values
 573   + 17 new blocks
 574   + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
 575     (removed from SyntheticPropertyValueAliases.txt)
 576   + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
 577     (added to SyntheticPropertyValueAliases.txt)
 578 - uprops.icu (uprops.h) only provides 7 bits for script codes.
 579   In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
 580   There is none above 127 yet which is the script code for an
 581   assigned Unicode character, so ICU 4.0 uprops.icu does not store any
 582   script code values greater than 127.
 583   However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
 584   in a parallel bit field, and that overflows now.
 585   Also, future values >=128 would be incompatible anyway.
 586   uprops.h is modified to move around several of the bit fields
 587   in the properties vector words, and now uses 8 bits for the script code.
 588   Two other bit fields also grow to accommodate future growth:
 589   Block (current count: 172) grows from 8 to 9 bits,
 590   and Word_Break grows from 4 to 5 bits.
 591 - renamed property Simple_Case_Folding (sfc->scf)
 592   + nothing to be done: handled as normal alias
 593 - new property JSN Jamo_Short_Name
 594   + no new API: only contributes to the Name property
 595 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
 596 - new Joining Group (JG) value: Burushashki_Yeh_Barree
 597 - new Sentence_Break (SB) values:
 598     SB ; CR        ; CR
 599     SB ; EX        ; Extend
 600     SB ; LF        ; LF
 601     SB ; SC        ; SContinue
 602 - new Word_Break (WB) values:
 603     WB ; CR        ; CR
 604     WB ; Extend    ; Extend
 605     WB ; LF        ; LF
 606     WB ; MB        ; MidNumLet
 607
 608 * Further changes in the 2008-02-29 update:
 609 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
 610   because they should not normally be invisible.
 611 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
 612 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend
 613 - new Word_Break (WB) value: NL=Newline
 614
 615 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
 616 - Unihan range end moves from 9FBB to 9FC3
 617   search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
 618   + do change gennames.c
 619
 620 * build Unicode data source code for hardcoding core data
 621 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
 622
 623 ICU data make path is \svn\icuproj\icu\uni51\source\data\
 624 ICU root path is \svn\icuproj\icu\uni51
 625 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
 626 Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
 627 Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
 628 Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
 629 Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
 630 Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
 631 Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
 632 Creating data file for Unicode Character Properties
 633 Creating data file for Unicode Case Mapping Properties
 634 Creating data file for Unicode BiDi/Shaping Properties
 635 Creating data file for Unicode Normalization
 636 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
 637 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
 638
 639 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
 640   and rebuild the common library
 641
 642 *** Break iterators
 643
 644 * Update break iterator rules to new UAX versions and new property values
 645
 646 *** UCA
 647
 648 * update FractionalUCA.txt and UCARules.txt with new canonical closure
 649
 650 *** Test suites
 651 - Test that APIs using Unicode property value aliases (like UnicodeSet)
 652   support all of the boolean values N/Y, No/Yes, F/T, False/True
 653   -> TestBinaryValues() tests in both cintltst and intltest
 654
 655 *** LayoutEngine script information
 656 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
 657 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
 658 ScriptRunData.cpp, which is no longer needed.)
 659
 660 The generated files have a current copyright date and "@draft" statement.
 661
 662 * copy the above files into <icu>/source/layout, replacing the old files.
 663
 664 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
 665 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
 666
 667 * rebuild the layout and layoutex libraries.
 668
 669 *** Documentation
 670 - Update User Guide
 671   + Jamo_Short_Name, sfc->scf, binary property value aliases
 672
 673 ---------------------------------------------------------------------------- ***
 674
 675 Unicode 5.0 update
 676
 677 *** related Jitterbugs
 678
 679 5084 RFE: Update to Unicode 5.0
 680
 681 *** data files & enums & parser code
 682
 683 * file preparation
 684 - ucdstrip:
 685     DerivedCoreProperties.txt
 686     DerivedNormalizationProps.txt
 687     NormalizationTest.txt
 688     PropList.txt
 689     Scripts.txt
 690     GraphemeBreakProperty.txt
 691     SentenceBreakProperty.txt
 692     WordBreakProperty.txt
 693 - ucdstrip and ucdmerge:
 694     EastAsianWidth.txt
 695     LineBreak.txt
 696
 697 * my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
 698 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
 699 copy 5.0.0\ucd\Blocks.txt ..\unidata\
 700 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
 701 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
 702 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
 703 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
 704 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
 705 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
 706 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
 707 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
 708 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
 709 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
 710 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
 711
 712 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
 713 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
 714 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
 715 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
 716 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
 717 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
 718 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
 719 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
 720 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
 721 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
 722
 723 * update FractionalUCA.txt and UCARules.txt with new canonical closure
 724
 725 * genpname
 726 - run preparse.pl
 727   + make sure that data.h is writable
 728   + perl preparse.pl \cvs\oss\icu > out.txt
 729
 730 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
 731 - new block & script values
 732   + script values already added in ICU 3.6 because all of ISO 15924 is now covered
 733
 734 * build Unicode data source code for hardcoding core data
 735 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
 736
 737 ICU data make path is \cvs\oss\icu\source\data\
 738 ICU root path is \cvs\oss\icu
 739 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
 740 [etc.]
 741 Creating data file for Unicode Character Properties
 742 Creating data file for Unicode Case Mapping Properties
 743 Creating data file for Unicode BiDi/Shaping Properties
 744 Creating data file for Unicode Normalization
 745 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
 746 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
 747
 748 - copy the .c source files to C:\cvs\oss\icu\source\common
 749   and rebuild the common library
 750
 751 *** Unicode version numbers
 752 - makedata.mak
 753 - uchar.h
 754 - configure.in
 755
 756 *** LayoutEngine script information
 757 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
 758 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
 759 ScriptRunData.cpp, which is no longer needed.)
 760
 761 The generated files have a current copyright date and "@draft" statement.
 762
 763 * copy the above files into <icu>/source/layout, replacing the old files.
 764
 765 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
 766 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
 767
 768 * rebuild the layout and layoutex libraries.
 769
 770 ---------------------------------------------------------------------------- ***
 771
 772 Unicode 4.1 update
 773
 774 *** related Jitterbugs
 775
 776 4332 RFE: Update to Unicode 4.1
 777 4157 RBBI, TR29 4.1 updates
 778
 779 *** data files & enums & parser code
 780
 781 * file preparation
 782 - ucdstrip:
 783     DerivedCoreProperties.txt
 784     DerivedNormalizationProps.txt
 785     NormalizationTest.txt
 786     GraphemeBreakProperty.txt
 787     SentenceBreakProperty.txt
 788     WordBreakProperty.txt
 789 - ucdstrip and ucdmerge:
 790     EastAsianWidth.txt
 791     LineBreak.txt
 792
 793 * add new files to the repository
 794     GraphemeBreakProperty.txt
 795     SentenceBreakProperty.txt
 796     WordBreakProperty.txt
 797
 798 * update FractionalUCA.txt and UCARules.txt with new canonical closure
 799
 800 * genpname
 801 - handle new enumerated properties in sub read_uchar
 802 - run preparse.pl
 803
 804 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
 805 - new binary properties
 806   + Pattern_Syntax
 807   + Pattern_White_Space
 808 - new enumerated properties
 809   + Grapheme_Cluster_Break
 810   + Sentence_Break
 811   + Word_Break
 812 - new block & script & line break values
 813
 814 * gencase
 815 - case-ignorable changes
 816   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
 817   now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
 818
 819 *** Unicode version numbers
 820 - makedata.mak
 821 - uchar.h
 822 - configure.in
 823
 824 *** tests
 825 - verify that u_charMirror() round-trips
 826 - test all new properties and some new values of old properties
 827
 828 *** other code
 829
 830 * hardcoded Unihan range end/limit
 831 - Unihan range end moves from 9FA5 to 9FBB
 832   search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
 833   + do not modify BOCU/BOCSU code because that would change the encoding
 834     and break binary compatibility!
 835   + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
 836     NamePrepProfile.txt
 837   + ignore trietest.c: test data is arbitrary
 838   + ignore tstnorm.cpp: test optimization, not important
 839   + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
 840   + do change line_th.txt and word_th.txt
 841     by replacing hardcoded ranges with the new property values
 842   + do change gennames.c
 843
 844 source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
 845 source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
 846 source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
 847
 848 * case mappings
 849 - compare new special casing context conditions with previous ones
 850   see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
 851
 852 * genpname
 853 - consider storing only the short name if it is the same as the long name
 854
 855 *** other reviews
 856 - UAX #29 changes (grapheme/word/sentence breaks)
 857 - UAX #14 changes (line breaks)
 858 - Pattern_Syntax & Pattern_White_Space
 859
 860 ---------------------------------------------------------------------------- ***
 861
 862 Unicode 4.0.1 update
 863
 864 *** related Jitterbugs
 865
 866 3170 RFE: Update to Unicode 4.0.1
 867 3171 Add new Unicode 4.0.1 properties
 868 3520 use Unicode 4.0.1 updates for break iteration
 869
 870 *** data files & enums & parser code
 871
 872 * file preparation
 873 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
 874 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
 875
 876 * file fixes
 877 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
 878   according to PRI #26
 879   http://www.unicode.org/review/resolved-pri.html#pri26
 880 - undone again because no corrigendum in sight;
 881   instead modified tests to not check consistency on this for Unicode 4.0.1
 882
 883 * ucdterms.txt
 884 - update from http://www.unicode.org/copyright.html
 885   formatted for plain text
 886
 887 * uchar.h & uprops.h & uprops.c & genprops
 888 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
 889 - add U_LB_INSEPARABLE due to a spelling fix
 890   + put short name comment only on line with new constant
 891     for genpname perl script parser
 892 - new binary properties
 893   + STerm
 894   + Variation_Selector
 895
 896 * genpname
 897 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
 898 - perl script: correctly calculate the maximum number of fields per row
 899
 900 * uscript.h
 901 - new script code Hrkt=Katakana_Or_Hiragana
 902
 903 * gennorm.c track changes in DerivedNormalizationProps.txt
 904 - "FNC" -> "FC_NFKC"
 905 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
 906
 907 * genprops/props2.c track changes in DerivedNumericValues.txt
 908 - changed from 3 columns to 2, dropping the numeric type
 909   + assume that the type is always numeric for Han characters,
 910     and that only those are added in addition to what UnicodeData.txt lists
 911
 912 *** Unicode version numbers
 913 - makedata.mak
 914 - uchar.h
 915 - configure.in
 916
 917 *** tests
 918 - update test of default bidi classes according to PRI #28
 919   /tsutil/cucdtst/TestUnicodeData
 920   http://www.unicode.org/review/resolved-pri.html#pri28
 921 - bidi tests: change exemplar character for ES depending on Unicode version
 922 - change hardcoded expected property values where they change
 923
 924 *** other code
 925
 926 * name matching
 927 - read UCD.html
 928
 929 * scripts
 930 - use new Hrkt=Katakana_Or_Hiragana
 931
 932 * ZWJ & ZWNJ
 933 - are now part of combining character sequences
 934 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ