1 * Copyright (C) 2004-2006, International Business Machines
2 * Corporation and others. All Rights Reserved.
4 * file name: changes.txt
6 * tab size: 8 (not used)
9 * created on: 2004may06
10 * created by: Markus W. Scherer
12 * change log for Unicode updates
14 ---------------------------------------------------------------------------- ***
18 *** related Jitterbugs
20 5084 RFE: Update to Unicode 5.0
22 *** data files & enums & parser code
26 DerivedCoreProperties.txt
27 DerivedNormalizationProps.txt
31 GraphemeBreakProperty.txt
32 SentenceBreakProperty.txt
34 - ucdstrip and ucdmerge:
38 * my ucd2unidata.txt (needs to be updated each time with UCD and file version numbers)
39 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
40 copy 5.0.0\ucd\Blocks.txt ..\unidata\
41 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
42 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
43 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
44 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
45 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
46 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
47 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
48 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
49 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
50 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
51 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
53 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
54 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
55 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
56 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
57 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
58 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
59 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
60 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
61 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
62 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
64 * update FractionalUCA.txt and UCARules.txt with new canonical closure
68 + make sure that data.h is writable
69 + perl preparse.pl \cvs\oss\icu > out.txt
71 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
72 - new block & script values
73 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
75 * build Unicode data source code for hardcoding core data
76 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
78 ICU data make path is \cvs\oss\icu\source\data\
79 ICU root path is \cvs\oss\icu
80 Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
82 Creating data file for Unicode Character Properties
83 Creating data file for Unicode Case Mapping Properties
84 Creating data file for Unicode BiDi/Shaping Properties
85 Creating data file for Unicode Normalization
86 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
87 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
89 - copy the .c source files to C:\cvs\oss\icu\source\common
90 and rebuild the common library
92 *** Unicode version numbers
97 *** LayoutEngine script information
98 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
99 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
100 ScriptRunData.cpp, which is no longer needed.)
102 The generated files have a current copyright date and "@draft" statement.
104 * copy the above files into <icu>/source/layout, replacing the old files.
106 Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
107 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
109 * rebuild the layout and layoutex libraries.
111 ---------------------------------------------------------------------------- ***
115 *** related Jitterbugs
117 4332 RFE: Update to Unicode 4.1
118 4157 RBBI, TR29 4.1 updates
120 *** data files & enums & parser code
124 DerivedCoreProperties.txt
125 DerivedNormalizationProps.txt
126 NormalizationTest.txt
127 GraphemeBreakProperty.txt
128 SentenceBreakProperty.txt
129 WordBreakProperty.txt
130 - ucdstrip and ucdmerge:
134 * add new files to the repository
135 GraphemeBreakProperty.txt
136 SentenceBreakProperty.txt
137 WordBreakProperty.txt
139 * update FractionalUCA.txt and UCARules.txt with new canonical closure
142 - handle new enumerated properties in sub read_uchar
145 * uchar.h & uscript.h & uprops.h & uprops.c & genprops
146 - new binary properties
148 + Pattern_White_Space
149 - new enumerated properties
150 + Grapheme_Cluster_Break
153 - new block & script & line break values
156 - case-ignorable changes
157 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
158 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
160 *** Unicode version numbers
166 - verify that u_charMirror() round-trips
167 - test all new properties and some new values of old properties
171 * hardcoded Unihan range end/limit
172 - Unihan range end moves from 9FA5 to 9FBB
173 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
174 + do not modify BOCU/BOCSU code because that would change the encoding
175 and break binary compatibility!
176 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
178 + ignore trietest.c: test data is arbitrary
179 + ignore tstnorm.cpp: test optimization, not important
180 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
181 + do change line_th.txt and word_th.txt
182 by replacing hardcoded ranges with the new property values
183 + do change gennames.c
185 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
186 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
187 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
190 - compare new special casing context conditions with previous ones
191 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
194 - consider storing only the short name if it is the same as the long name
197 - UAX #29 changes (grapheme/word/sentence breaks)
198 - UAX #14 changes (line breaks)
199 - Pattern_Syntax & Pattern_White_Space
201 ---------------------------------------------------------------------------- ***
205 *** related Jitterbugs
207 3170 RFE: Update to Unicode 4.0.1
208 3171 Add new Unicode 4.0.1 properties
209 3520 use Unicode 4.0.1 updates for break iteration
211 *** data files & enums & parser code
214 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
215 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
218 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No
220 http://www.unicode.org/review/resolved-pri.html#pri26
221 - undone again because no corrigendum in sight;
222 instead modified tests to not check consistency on this for Unicode 4.0.1
225 - update from http://www.unicode.org/copyright.html
226 formatted for plain text
228 * uchar.h & uprops.h & uprops.c & genprops
229 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
230 - add U_LB_INSEPARABLE due to a spelling fix
231 + put short name comment only on line with new constant
232 for genpname perl script parser
233 - new binary properties
238 - fix genpname perl script so that it doesn't choke on more than 2 names per property value
239 - perl script: correctly calculate the maximum number of fields per row
242 - new script code Hrkt=Katakana_Or_Hiragana
244 * gennorm.c track changes in DerivedNormalizationProps.txt
246 - single field "NFD_NO" -> two fields "NFD_QC; N" etc.
248 * genprops/props2.c track changes in DerivedNumericValues.txt
249 - changed from 3 columns to 2, dropping the numeric type
250 + assume that the type is always numeric for Han characters,
251 and that only those are added in addition to what UnicodeData.txt lists
253 *** Unicode version numbers
259 - update test of default bidi classes according to PRI #28
260 /tsutil/cucdtst/TestUnicodeData
261 http://www.unicode.org/review/resolved-pri.html#pri28
262 - bidi tests: change exemplar character for ES depending on Unicode version
263 - change hardcoded expected property values where they change
271 - use new Hrkt=Katakana_Or_Hiragana
274 - are now part of combining character sequences
275 - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ