]> git.saurik.com Git - apple/icu.git/blame - icuSources/data/unidata/changes.txt
ICU-8.11.4.tar.gz
[apple/icu.git] / icuSources / data / unidata / changes.txt
CommitLineData
73c04bcf
A
1* Copyright (C) 2004-2006, International Business Machines
2* Corporation and others. All Rights Reserved.
3*
4* file name: changes.txt
5* encoding: US-ASCII
6* tab size: 8 (not used)
7* indentation:4
8*
9* created on: 2004may06
10* created by: Markus W. Scherer
11*
12* change log for Unicode updates
13
14---------------------------------------------------------------------------- ***
15
16Unicode 5.0 update
17
18*** related Jitterbugs
19
205084 RFE: Update to Unicode 5.0
21
22*** data files & enums & parser code
23
24* file preparation
25- ucdstrip:
26 DerivedCoreProperties.txt
27 DerivedNormalizationProps.txt
28 NormalizationTest.txt
29 PropList.txt
30 Scripts.txt
31 GraphemeBreakProperty.txt
32 SentenceBreakProperty.txt
33 WordBreakProperty.txt
34- ucdstrip and ucdmerge:
35 EastAsianWidth.txt
36 LineBreak.txt
37
38* my ucd2unidata.txt (needs to be updated each time with UCD and file version numbers)
39copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
40copy 5.0.0\ucd\Blocks.txt ..\unidata\
41copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
42copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
43copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
44copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
45copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
46copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
47copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
48copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
49copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
50copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
51copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
52
53ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
54ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
55ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
56ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
57ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
58ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
59ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
60ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
61ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
62ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
63
64* update FractionalUCA.txt and UCARules.txt with new canonical closure
65
66* genpname
67- run preparse.pl
68 + make sure that data.h is writable
69 + perl preparse.pl \cvs\oss\icu > out.txt
70
71* uchar.h & uscript.h & uprops.h & uprops.c & genprops
72- new block & script values
73 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
74
75* build Unicode data source code for hardcoding core data
76C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
77
78ICU data make path is \cvs\oss\icu\source\data\
79ICU root path is \cvs\oss\icu
80Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
81[etc.]
82Creating data file for Unicode Character Properties
83Creating data file for Unicode Case Mapping Properties
84Creating data file for Unicode BiDi/Shaping Properties
85Creating data file for Unicode Normalization
86Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
87Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
88
89- copy the .c source files to C:\cvs\oss\icu\source\common
90 and rebuild the common library
91
92*** Unicode version numbers
93- makedata.mak
94- uchar.h
95- configure.in
96
97*** LayoutEngine script information
98* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
99ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
100ScriptRunData.cpp, which is no longer needed.)
101
102The generated files have a current copyright date and "@draft" statement.
103
104* copy the above files into <icu>/source/layout, replacing the old files.
105
106Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
107and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
108
109* rebuild the layout and layoutex libraries.
110
111---------------------------------------------------------------------------- ***
112
113Unicode 4.1 update
114
115*** related Jitterbugs
116
1174332 RFE: Update to Unicode 4.1
1184157 RBBI, TR29 4.1 updates
119
120*** data files & enums & parser code
121
122* file preparation
123- ucdstrip:
124 DerivedCoreProperties.txt
125 DerivedNormalizationProps.txt
126 NormalizationTest.txt
127 GraphemeBreakProperty.txt
128 SentenceBreakProperty.txt
129 WordBreakProperty.txt
130- ucdstrip and ucdmerge:
131 EastAsianWidth.txt
132 LineBreak.txt
133
134* add new files to the repository
135 GraphemeBreakProperty.txt
136 SentenceBreakProperty.txt
137 WordBreakProperty.txt
138
139* update FractionalUCA.txt and UCARules.txt with new canonical closure
140
141* genpname
142- handle new enumerated properties in sub read_uchar
143- run preparse.pl
144
145* uchar.h & uscript.h & uprops.h & uprops.c & genprops
146- new binary properties
147 + Pattern_Syntax
148 + Pattern_White_Space
149- new enumerated properties
150 + Grapheme_Cluster_Break
151 + Sentence_Break
152 + Word_Break
153- new block & script & line break values
154
155* gencase
156- case-ignorable changes
157 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
158 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
159
160*** Unicode version numbers
161- makedata.mak
162- uchar.h
163- configure.in
164
165*** tests
166- verify that u_charMirror() round-trips
167- test all new properties and some new values of old properties
168
169*** other code
170
171* hardcoded Unihan range end/limit
172- Unihan range end moves from 9FA5 to 9FBB
173 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
174 + do not modify BOCU/BOCSU code because that would change the encoding
175 and break binary compatibility!
176 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
177 NamePrepProfile.txt
178 + ignore trietest.c: test data is arbitrary
179 + ignore tstnorm.cpp: test optimization, not important
180 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
181 + do change line_th.txt and word_th.txt
182 by replacing hardcoded ranges with the new property values
183 + do change gennames.c
184
185source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
186source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
187source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
188
189* case mappings
190- compare new special casing context conditions with previous ones
191 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
192
193* genpname
194- consider storing only the short name if it is the same as the long name
195
196*** other reviews
197- UAX #29 changes (grapheme/word/sentence breaks)
198- UAX #14 changes (line breaks)
199- Pattern_Syntax & Pattern_White_Space
200
201---------------------------------------------------------------------------- ***
202
374ca955
A
203Unicode 4.0.1 update
204
205*** related Jitterbugs
206
2073170 RFE: Update to Unicode 4.0.1
2083171 Add new Unicode 4.0.1 properties
2093520 use Unicode 4.0.1 updates for break iteration
210
211*** data files & enums & parser code
212
213* file preparation
214- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
215- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
216
217* file fixes
218- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
219 according to PRI #26
220 http://www.unicode.org/review/resolved-pri.html#pri26
221- undone again because no corrigendum in sight;
222 instead modified tests to not check consistency on this for Unicode 4.0.1
223
224* ucdterms.txt
225- update from http://www.unicode.org/copyright.html
226 formatted for plain text
227
228* uchar.h & uprops.h & uprops.c & genprops
229- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
230- add U_LB_INSEPARABLE due to a spelling fix
231 + put short name comment only on line with new constant
232 for genpname perl script parser
233- new binary properties
234 + STerm
235 + Variation_Selector
236
237* genpname
238- fix genpname perl script so that it doesn't choke on more than 2 names per property value
239- perl script: correctly calculate the maximum number of fields per row
240
241* uscript.h
242- new script code Hrkt=Katakana_Or_Hiragana
243
244* gennorm.c track changes in DerivedNormalizationProps.txt
245- "FNC" -> "FC_NFKC"
246- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
247
248* genprops/props2.c track changes in DerivedNumericValues.txt
249- changed from 3 columns to 2, dropping the numeric type
250 + assume that the type is always numeric for Han characters,
251 and that only those are added in addition to what UnicodeData.txt lists
252
253*** Unicode version numbers
254- makedata.mak
255- uchar.h
256- configure.in
257
258*** tests
259- update test of default bidi classes according to PRI #28
260 /tsutil/cucdtst/TestUnicodeData
261 http://www.unicode.org/review/resolved-pri.html#pri28
262- bidi tests: change exemplar character for ES depending on Unicode version
263- change hardcoded expected property values where they change
264
265*** other code
266
267* name matching
268- read UCD.html
269
270* scripts
271- use new Hrkt=Katakana_Or_Hiragana
272
273* ZWJ & ZWNJ
274- are now part of combining character sequences
275- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ