]> git.saurik.com Git - apple/icu.git/blame - icuSources/data/unidata/changes.txt
ICU-400.37.tar.gz
[apple/icu.git] / icuSources / data / unidata / changes.txt
CommitLineData
46f4442e 1* Copyright (C) 2004-2008, International Business Machines
73c04bcf
A
2* Corporation and others. All Rights Reserved.
3*
4* file name: changes.txt
5* encoding: US-ASCII
6* tab size: 8 (not used)
7* indentation:4
8*
9* created on: 2004may06
10* created by: Markus W. Scherer
11*
12* change log for Unicode updates
13
14---------------------------------------------------------------------------- ***
15
46f4442e
A
16Unicode 5.1 update
17
18*** related ICU Trac tickets
19
205696 Update to Unicode 5.1
21
22*** Unicode version numbers
23- makedata.mak
24- uchar.h
25- configure.in & configure
26- update ucdVersion in gennames.c if an algorithmic range changes
27
28*** data files & enums & parser code
29
30* file preparation
31- ucdstrip:
32 DerivedCoreProperties.txt
33 DerivedNormalizationProps.txt
34 NormalizationTest.txt
35 PropList.txt
36 Scripts.txt
37 GraphemeBreakProperty.txt
38 SentenceBreakProperty.txt
39 WordBreakProperty.txt
40- ucdstrip and ucdmerge:
41 EastAsianWidth.txt
42 LineBreak.txt
43
44* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
45copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
46copy 5.1.0\ucd\Blocks.txt ..\unidata\
47copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
48copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
49copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
50copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
51copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
52copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
53copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
54copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
55copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
56copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
57copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
58
59ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
60ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
61ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
62ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
63ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
64ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
65ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
66ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
67ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
68ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
69
70* genpname
71- run preparse.pl
72 + cd \svn\icuproj\icu\uni51\source\tools\genpname
73 + make sure that data.h is writable
74 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
75 + preparse.pl complains with errors like the following:
76 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
77 This is because ICU 3.8 had scripts from ISO 15924 which are now
78 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
79 and PropertyValueAliases.txt.
80 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
81 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
82 + PropertyValueAliases.txt now explicitly contains values for boolean properties:
83 N/Y, No/Yes, F/T, False/True
84 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
85 It will use further values from the file if present.
86
87* uchar.h & uscript.h & uprops.h & uprops.c & genprops
88- new block & script values
89 + 17 new blocks
90 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
91 (removed from SyntheticPropertyValueAliases.txt)
92 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
93 (added to SyntheticPropertyValueAliases.txt)
94- uprops.icu (uprops.h) only provides 7 bits for script codes.
95 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
96 There is none above 127 yet which is the script code for an
97 assigned Unicode character, so ICU 4.0 uprops.icu does not store any
98 script code values greater than 127.
99 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
100 in a parallel bit field, and that overflows now.
101 Also, future values >=128 would be incompatible anyway.
102 uprops.h is modified to move around several of the bit fields
103 in the properties vector words, and now uses 8 bits for the script code.
104 Two other bit fields also grow to accommodate future growth:
105 Block (current count: 172) grows from 8 to 9 bits,
106 and Word_Break grows from 4 to 5 bits.
107- renamed property Simple_Case_Folding (sfc->scf)
108 + nothing to be done: handled as normal alias
109- new property JSN Jamo_Short_Name
110 + no new API: only contributes to the Name property
111- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
112- new Joining Group (JG) value: Burushashki_Yeh_Barree
113- new Sentence_Break (SB) values:
114 SB ; CR ; CR
115 SB ; EX ; Extend
116 SB ; LF ; LF
117 SB ; SC ; SContinue
118- new Word_Break (WB) values:
119 WB ; CR ; CR
120 WB ; Extend ; Extend
121 WB ; LF ; LF
122 WB ; MB ; MidNumLet
123
124* Further changes in the 2008-02-29 update:
125- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
126 because they should not normally be invisible.
127- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
128- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
129- new Word_Break (WB) value: NL=Newline
130
131* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
132- Unihan range end moves from 9FBB to 9FC3
133 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
134 + do change gennames.c
135
136* build Unicode data source code for hardcoding core data
137C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
138
139ICU data make path is \svn\icuproj\icu\uni51\source\data\
140ICU root path is \svn\icuproj\icu\uni51
141Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
142Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
143Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
144Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
145Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
146Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
147Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
148Creating data file for Unicode Character Properties
149Creating data file for Unicode Case Mapping Properties
150Creating data file for Unicode BiDi/Shaping Properties
151Creating data file for Unicode Normalization
152Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
153Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
154
155- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
156 and rebuild the common library
157
158*** Break iterators
159
160* Update break iterator rules to new UAX versions and new property values
161
162*** UCA
163
164* update FractionalUCA.txt and UCARules.txt with new canonical closure
165
166*** Test suites
167- Test that APIs using Unicode property value aliases (like UnicodeSet)
168 support all of the boolean values N/Y, No/Yes, F/T, False/True
169 -> TestBinaryValues() tests in both cintltst and intltest
170
171*** LayoutEngine script information
172* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
173ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
174ScriptRunData.cpp, which is no longer needed.)
175
176The generated files have a current copyright date and "@draft" statement.
177
178* copy the above files into <icu>/source/layout, replacing the old files.
179
180Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
181and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
182
183* rebuild the layout and layoutex libraries.
184
185*** Documentation
186- Update User Guide
187 + Jamo_Short_Name, sfc->scf, binary property value aliases
188
189---------------------------------------------------------------------------- ***
190
73c04bcf
A
191Unicode 5.0 update
192
193*** related Jitterbugs
194
1955084 RFE: Update to Unicode 5.0
196
197*** data files & enums & parser code
198
199* file preparation
200- ucdstrip:
201 DerivedCoreProperties.txt
202 DerivedNormalizationProps.txt
203 NormalizationTest.txt
204 PropList.txt
205 Scripts.txt
206 GraphemeBreakProperty.txt
207 SentenceBreakProperty.txt
208 WordBreakProperty.txt
209- ucdstrip and ucdmerge:
210 EastAsianWidth.txt
211 LineBreak.txt
212
46f4442e 213* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
73c04bcf
A
214copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
215copy 5.0.0\ucd\Blocks.txt ..\unidata\
216copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
217copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
218copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
219copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
220copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
221copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
222copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
223copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
224copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
225copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
226copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
227
228ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
229ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
230ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
231ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
232ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
233ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
234ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
235ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
236ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
237ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
238
239* update FractionalUCA.txt and UCARules.txt with new canonical closure
240
241* genpname
242- run preparse.pl
243 + make sure that data.h is writable
244 + perl preparse.pl \cvs\oss\icu > out.txt
245
246* uchar.h & uscript.h & uprops.h & uprops.c & genprops
247- new block & script values
248 + script values already added in ICU 3.6 because all of ISO 15924 is now covered
249
250* build Unicode data source code for hardcoding core data
251C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
252
253ICU data make path is \cvs\oss\icu\source\data\
254ICU root path is \cvs\oss\icu
255Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
256[etc.]
257Creating data file for Unicode Character Properties
258Creating data file for Unicode Case Mapping Properties
259Creating data file for Unicode BiDi/Shaping Properties
260Creating data file for Unicode Normalization
261Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
262Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
263
264- copy the .c source files to C:\cvs\oss\icu\source\common
265 and rebuild the common library
266
267*** Unicode version numbers
268- makedata.mak
269- uchar.h
270- configure.in
271
272*** LayoutEngine script information
273* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
274ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
275ScriptRunData.cpp, which is no longer needed.)
276
277The generated files have a current copyright date and "@draft" statement.
278
279* copy the above files into <icu>/source/layout, replacing the old files.
280
281Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
282and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
283
284* rebuild the layout and layoutex libraries.
285
286---------------------------------------------------------------------------- ***
287
288Unicode 4.1 update
289
290*** related Jitterbugs
291
2924332 RFE: Update to Unicode 4.1
2934157 RBBI, TR29 4.1 updates
294
295*** data files & enums & parser code
296
297* file preparation
298- ucdstrip:
299 DerivedCoreProperties.txt
300 DerivedNormalizationProps.txt
301 NormalizationTest.txt
302 GraphemeBreakProperty.txt
303 SentenceBreakProperty.txt
304 WordBreakProperty.txt
305- ucdstrip and ucdmerge:
306 EastAsianWidth.txt
307 LineBreak.txt
308
309* add new files to the repository
310 GraphemeBreakProperty.txt
311 SentenceBreakProperty.txt
312 WordBreakProperty.txt
313
314* update FractionalUCA.txt and UCARules.txt with new canonical closure
315
316* genpname
317- handle new enumerated properties in sub read_uchar
318- run preparse.pl
319
320* uchar.h & uscript.h & uprops.h & uprops.c & genprops
321- new binary properties
322 + Pattern_Syntax
323 + Pattern_White_Space
324- new enumerated properties
325 + Grapheme_Cluster_Break
326 + Sentence_Break
327 + Word_Break
328- new block & script & line break values
329
330* gencase
331- case-ignorable changes
332 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
333 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
334
335*** Unicode version numbers
336- makedata.mak
337- uchar.h
338- configure.in
339
340*** tests
341- verify that u_charMirror() round-trips
342- test all new properties and some new values of old properties
343
344*** other code
345
346* hardcoded Unihan range end/limit
347- Unihan range end moves from 9FA5 to 9FBB
348 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
349 + do not modify BOCU/BOCSU code because that would change the encoding
350 and break binary compatibility!
351 + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
352 NamePrepProfile.txt
353 + ignore trietest.c: test data is arbitrary
354 + ignore tstnorm.cpp: test optimization, not important
355 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
356 + do change line_th.txt and word_th.txt
357 by replacing hardcoded ranges with the new property values
358 + do change gennames.c
359
360source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
361source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
362source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
363
364* case mappings
365- compare new special casing context conditions with previous ones
366 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
367
368* genpname
369- consider storing only the short name if it is the same as the long name
370
371*** other reviews
372- UAX #29 changes (grapheme/word/sentence breaks)
373- UAX #14 changes (line breaks)
374- Pattern_Syntax & Pattern_White_Space
375
376---------------------------------------------------------------------------- ***
377
374ca955
A
378Unicode 4.0.1 update
379
380*** related Jitterbugs
381
3823170 RFE: Update to Unicode 4.0.1
3833171 Add new Unicode 4.0.1 properties
3843520 use Unicode 4.0.1 updates for break iteration
385
386*** data files & enums & parser code
387
388* file preparation
389- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
390- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
391
392* file fixes
393- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
394 according to PRI #26
395 http://www.unicode.org/review/resolved-pri.html#pri26
396- undone again because no corrigendum in sight;
397 instead modified tests to not check consistency on this for Unicode 4.0.1
398
399* ucdterms.txt
400- update from http://www.unicode.org/copyright.html
401 formatted for plain text
402
403* uchar.h & uprops.h & uprops.c & genprops
404- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
405- add U_LB_INSEPARABLE due to a spelling fix
406 + put short name comment only on line with new constant
407 for genpname perl script parser
408- new binary properties
409 + STerm
410 + Variation_Selector
411
412* genpname
413- fix genpname perl script so that it doesn't choke on more than 2 names per property value
414- perl script: correctly calculate the maximum number of fields per row
415
416* uscript.h
417- new script code Hrkt=Katakana_Or_Hiragana
418
419* gennorm.c track changes in DerivedNormalizationProps.txt
420- "FNC" -> "FC_NFKC"
421- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
422
423* genprops/props2.c track changes in DerivedNumericValues.txt
424- changed from 3 columns to 2, dropping the numeric type
425 + assume that the type is always numeric for Han characters,
426 and that only those are added in addition to what UnicodeData.txt lists
427
428*** Unicode version numbers
429- makedata.mak
430- uchar.h
431- configure.in
432
433*** tests
434- update test of default bidi classes according to PRI #28
435 /tsutil/cucdtst/TestUnicodeData
436 http://www.unicode.org/review/resolved-pri.html#pri28
437- bidi tests: change exemplar character for ES depending on Unicode version
438- change hardcoded expected property values where they change
439
440*** other code
441
442* name matching
443- read UCD.html
444
445* scripts
446- use new Hrkt=Katakana_Or_Hiragana
447
448* ZWJ & ZWNJ
449- are now part of combining character sequences
450- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ