]>
Commit | Line | Data |
---|---|---|
73c04bcf A |
1 | * Copyright (C) 2004-2006, International Business Machines |
2 | * Corporation and others. All Rights Reserved. | |
3 | * | |
4 | * file name: changes.txt | |
5 | * encoding: US-ASCII | |
6 | * tab size: 8 (not used) | |
7 | * indentation:4 | |
8 | * | |
9 | * created on: 2004may06 | |
10 | * created by: Markus W. Scherer | |
11 | * | |
12 | * change log for Unicode updates | |
13 | ||
14 | ---------------------------------------------------------------------------- *** | |
15 | ||
16 | Unicode 5.0 update | |
17 | ||
18 | *** related Jitterbugs | |
19 | ||
20 | 5084 RFE: Update to Unicode 5.0 | |
21 | ||
22 | *** data files & enums & parser code | |
23 | ||
24 | * file preparation | |
25 | - ucdstrip: | |
26 | DerivedCoreProperties.txt | |
27 | DerivedNormalizationProps.txt | |
28 | NormalizationTest.txt | |
29 | PropList.txt | |
30 | Scripts.txt | |
31 | GraphemeBreakProperty.txt | |
32 | SentenceBreakProperty.txt | |
33 | WordBreakProperty.txt | |
34 | - ucdstrip and ucdmerge: | |
35 | EastAsianWidth.txt | |
36 | LineBreak.txt | |
37 | ||
38 | * my ucd2unidata.txt (needs to be updated each time with UCD and file version numbers) | |
39 | copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ | |
40 | copy 5.0.0\ucd\Blocks.txt ..\unidata\ | |
41 | copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ | |
42 | copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ | |
43 | copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ | |
44 | copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ | |
45 | copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ | |
46 | copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ | |
47 | copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ | |
48 | copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ | |
49 | copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ | |
50 | copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ | |
51 | copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ | |
52 | ||
53 | ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt | |
54 | ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt | |
55 | ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt | |
56 | ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt | |
57 | ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt | |
58 | ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt | |
59 | ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt | |
60 | ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt | |
61 | ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt | |
62 | ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt | |
63 | ||
64 | * update FractionalUCA.txt and UCARules.txt with new canonical closure | |
65 | ||
66 | * genpname | |
67 | - run preparse.pl | |
68 | + make sure that data.h is writable | |
69 | + perl preparse.pl \cvs\oss\icu > out.txt | |
70 | ||
71 | * uchar.h & uscript.h & uprops.h & uprops.c & genprops | |
72 | - new block & script values | |
73 | + script values already added in ICU 3.6 because all of ISO 15924 is now covered | |
74 | ||
75 | * build Unicode data source code for hardcoding core data | |
76 | C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data | |
77 | ||
78 | ICU data make path is \cvs\oss\icu\source\data\ | |
79 | ICU root path is \cvs\oss\icu | |
80 | Information: cannot find "ucmlocal.mk". Not building user-additional converter files. | |
81 | [etc.] | |
82 | Creating data file for Unicode Character Properties | |
83 | Creating data file for Unicode Case Mapping Properties | |
84 | Creating data file for Unicode BiDi/Shaping Properties | |
85 | Creating data file for Unicode Normalization | |
86 | Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" | |
87 | Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" | |
88 | ||
89 | - copy the .c source files to C:\cvs\oss\icu\source\common | |
90 | and rebuild the common library | |
91 | ||
92 | *** Unicode version numbers | |
93 | - makedata.mak | |
94 | - uchar.h | |
95 | - configure.in | |
96 | ||
97 | *** LayoutEngine script information | |
98 | * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, | |
99 | ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates | |
100 | ScriptRunData.cpp, which is no longer needed.) | |
101 | ||
102 | The generated files have a current copyright date and "@draft" statement. | |
103 | ||
104 | * copy the above files into <icu>/source/layout, replacing the old files. | |
105 | ||
106 | Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp | |
107 | and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) | |
108 | ||
109 | * rebuild the layout and layoutex libraries. | |
110 | ||
111 | ---------------------------------------------------------------------------- *** | |
112 | ||
113 | Unicode 4.1 update | |
114 | ||
115 | *** related Jitterbugs | |
116 | ||
117 | 4332 RFE: Update to Unicode 4.1 | |
118 | 4157 RBBI, TR29 4.1 updates | |
119 | ||
120 | *** data files & enums & parser code | |
121 | ||
122 | * file preparation | |
123 | - ucdstrip: | |
124 | DerivedCoreProperties.txt | |
125 | DerivedNormalizationProps.txt | |
126 | NormalizationTest.txt | |
127 | GraphemeBreakProperty.txt | |
128 | SentenceBreakProperty.txt | |
129 | WordBreakProperty.txt | |
130 | - ucdstrip and ucdmerge: | |
131 | EastAsianWidth.txt | |
132 | LineBreak.txt | |
133 | ||
134 | * add new files to the repository | |
135 | GraphemeBreakProperty.txt | |
136 | SentenceBreakProperty.txt | |
137 | WordBreakProperty.txt | |
138 | ||
139 | * update FractionalUCA.txt and UCARules.txt with new canonical closure | |
140 | ||
141 | * genpname | |
142 | - handle new enumerated properties in sub read_uchar | |
143 | - run preparse.pl | |
144 | ||
145 | * uchar.h & uscript.h & uprops.h & uprops.c & genprops | |
146 | - new binary properties | |
147 | + Pattern_Syntax | |
148 | + Pattern_White_Space | |
149 | - new enumerated properties | |
150 | + Grapheme_Cluster_Break | |
151 | + Sentence_Break | |
152 | + Word_Break | |
153 | - new block & script & line break values | |
154 | ||
155 | * gencase | |
156 | - case-ignorable changes | |
157 | see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods | |
158 | now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk | |
159 | ||
160 | *** Unicode version numbers | |
161 | - makedata.mak | |
162 | - uchar.h | |
163 | - configure.in | |
164 | ||
165 | *** tests | |
166 | - verify that u_charMirror() round-trips | |
167 | - test all new properties and some new values of old properties | |
168 | ||
169 | *** other code | |
170 | ||
171 | * hardcoded Unihan range end/limit | |
172 | - Unihan range end moves from 9FA5 to 9FBB | |
173 | search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) | |
174 | + do not modify BOCU/BOCSU code because that would change the encoding | |
175 | and break binary compatibility! | |
176 | + similarly, do not change the GB 18030 range data (ucnvmbcs.c), | |
177 | NamePrepProfile.txt | |
178 | + ignore trietest.c: test data is arbitrary | |
179 | + ignore tstnorm.cpp: test optimization, not important | |
180 | + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF | |
181 | + do change line_th.txt and word_th.txt | |
182 | by replacing hardcoded ranges with the new property values | |
183 | + do change gennames.c | |
184 | ||
185 | source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 | |
186 | source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 | |
187 | source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, | |
188 | ||
189 | * case mappings | |
190 | - compare new special casing context conditions with previous ones | |
191 | see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods | |
192 | ||
193 | * genpname | |
194 | - consider storing only the short name if it is the same as the long name | |
195 | ||
196 | *** other reviews | |
197 | - UAX #29 changes (grapheme/word/sentence breaks) | |
198 | - UAX #14 changes (line breaks) | |
199 | - Pattern_Syntax & Pattern_White_Space | |
200 | ||
201 | ---------------------------------------------------------------------------- *** | |
202 | ||
374ca955 A |
203 | Unicode 4.0.1 update |
204 | ||
205 | *** related Jitterbugs | |
206 | ||
207 | 3170 RFE: Update to Unicode 4.0.1 | |
208 | 3171 Add new Unicode 4.0.1 properties | |
209 | 3520 use Unicode 4.0.1 updates for break iteration | |
210 | ||
211 | *** data files & enums & parser code | |
212 | ||
213 | * file preparation | |
214 | - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt | |
215 | - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt | |
216 | ||
217 | * file fixes | |
218 | - fix UnicodeData.txt general categories of Ethiopic digits Nd->No | |
219 | according to PRI #26 | |
220 | http://www.unicode.org/review/resolved-pri.html#pri26 | |
221 | - undone again because no corrigendum in sight; | |
222 | instead modified tests to not check consistency on this for Unicode 4.0.1 | |
223 | ||
224 | * ucdterms.txt | |
225 | - update from http://www.unicode.org/copyright.html | |
226 | formatted for plain text | |
227 | ||
228 | * uchar.h & uprops.h & uprops.c & genprops | |
229 | - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed | |
230 | - add U_LB_INSEPARABLE due to a spelling fix | |
231 | + put short name comment only on line with new constant | |
232 | for genpname perl script parser | |
233 | - new binary properties | |
234 | + STerm | |
235 | + Variation_Selector | |
236 | ||
237 | * genpname | |
238 | - fix genpname perl script so that it doesn't choke on more than 2 names per property value | |
239 | - perl script: correctly calculate the maximum number of fields per row | |
240 | ||
241 | * uscript.h | |
242 | - new script code Hrkt=Katakana_Or_Hiragana | |
243 | ||
244 | * gennorm.c track changes in DerivedNormalizationProps.txt | |
245 | - "FNC" -> "FC_NFKC" | |
246 | - single field "NFD_NO" -> two fields "NFD_QC; N" etc. | |
247 | ||
248 | * genprops/props2.c track changes in DerivedNumericValues.txt | |
249 | - changed from 3 columns to 2, dropping the numeric type | |
250 | + assume that the type is always numeric for Han characters, | |
251 | and that only those are added in addition to what UnicodeData.txt lists | |
252 | ||
253 | *** Unicode version numbers | |
254 | - makedata.mak | |
255 | - uchar.h | |
256 | - configure.in | |
257 | ||
258 | *** tests | |
259 | - update test of default bidi classes according to PRI #28 | |
260 | /tsutil/cucdtst/TestUnicodeData | |
261 | http://www.unicode.org/review/resolved-pri.html#pri28 | |
262 | - bidi tests: change exemplar character for ES depending on Unicode version | |
263 | - change hardcoded expected property values where they change | |
264 | ||
265 | *** other code | |
266 | ||
267 | * name matching | |
268 | - read UCD.html | |
269 | ||
270 | * scripts | |
271 | - use new Hrkt=Katakana_Or_Hiragana | |
272 | ||
273 | * ZWJ & ZWNJ | |
274 | - are now part of combining character sequences | |
275 | - break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ |