[apple/icu.git] / icuSources / data / unidata / changes.txt

Unicode 4.0.1 update

*** related Jitterbugs

3170 RFE: Update to Unicode 4.0.1
3171 Add new Unicode 4.0.1 properties
3520 use Unicode 4.0.1 updates for break iteration

*** data files & enums & parser code

* file preparation
- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt

* file fixes
- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
  according to PRI #26
  http://www.unicode.org/review/resolved-pri.html#pri26
- undone again because no corrigendum in sight;
  instead modified tests to not check consistency on this for Unicode 4.0.1

* ucdterms.txt
- update from http://www.unicode.org/copyright.html
  formatted for plain text

* uchar.h & uprops.h & uprops.c & genprops
- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
- add U_LB_INSEPARABLE due to a spelling fix
  + put short name comment only on line with new constant
    for genpname perl script parser
- new binary properties
  + STerm
  + Variation_Selector

* genpname
- fix genpname perl script so that it doesn't choke on more than 2 names per property value
- perl script: correctly calculate the maximum number of fields per row

* uscript.h
- new script code Hrkt=Katakana_Or_Hiragana

* gennorm.c track changes in DerivedNormalizationProps.txt
- "FNC" -> "FC_NFKC"
- single field "NFD_NO" -> two fields "NFD_QC; N" etc.

* genprops/props2.c track changes in DerivedNumericValues.txt
- changed from 3 columns to 2, dropping the numeric type
  + assume that the type is always numeric for Han characters,
    and that only those are added in addition to what UnicodeData.txt lists

*** Unicode version numbers
- makedata.mak
- uchar.h
- configure.in

*** tests
- update test of default bidi classes according to PRI #28
  /tsutil/cucdtst/TestUnicodeData
  http://www.unicode.org/review/resolved-pri.html#pri28
- bidi tests: change exemplar character for ES depending on Unicode version
- change hardcoded expected property values where they change

*** other code

* name matching
- read UCD.html

* scripts
- use new Hrkt=Katakana_Or_Hiragana

* ZWJ & ZWNJ
- are now part of combining character sequences
- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
Commit	Line	Data
374ca955 A	1	Unicode 4.0.1 update
	2
	3	*** related Jitterbugs
	4
	5	3170 RFE: Update to Unicode 4.0.1
	6	3171 Add new Unicode 4.0.1 properties
	7	3520 use Unicode 4.0.1 updates for break iteration
	8
	9	*** data files & enums & parser code
	10
	11	* file preparation
	12	- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
	13	- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
	14
	15	* file fixes
	16	- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
	17	according to PRI #26
	18	http://www.unicode.org/review/resolved-pri.html#pri26
	19	- undone again because no corrigendum in sight;
	20	instead modified tests to not check consistency on this for Unicode 4.0.1
	21
	22	* ucdterms.txt
	23	- update from http://www.unicode.org/copyright.html
	24	formatted for plain text
	25
	26	* uchar.h & uprops.h & uprops.c & genprops
	27	- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
	28	- add U_LB_INSEPARABLE due to a spelling fix
	29	+ put short name comment only on line with new constant
	30	for genpname perl script parser
	31	- new binary properties
	32	+ STerm
	33	+ Variation_Selector
	34
	35	* genpname
	36	- fix genpname perl script so that it doesn't choke on more than 2 names per property value
	37	- perl script: correctly calculate the maximum number of fields per row
	38
	39	* uscript.h
	40	- new script code Hrkt=Katakana_Or_Hiragana
	41
	42	* gennorm.c track changes in DerivedNormalizationProps.txt
	43	- "FNC" -> "FC_NFKC"
	44	- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
	45
	46	* genprops/props2.c track changes in DerivedNumericValues.txt
	47	- changed from 3 columns to 2, dropping the numeric type
	48	+ assume that the type is always numeric for Han characters,
	49	and that only those are added in addition to what UnicodeData.txt lists
	50
	51	*** Unicode version numbers
	52	- makedata.mak
	53	- uchar.h
	54	- configure.in
	55
	56	*** tests
	57	- update test of default bidi classes according to PRI #28
	58	/tsutil/cucdtst/TestUnicodeData
	59	http://www.unicode.org/review/resolved-pri.html#pri28
	60	- bidi tests: change exemplar character for ES depending on Unicode version
	61	- change hardcoded expected property values where they change
	62
	63	*** other code
	64
65	* name matching
66	- read UCD.html
67
68	* scripts
69	- use new Hrkt=Katakana_Or_Hiragana
70
71	* ZWJ & ZWNJ
72	- are now part of combining character sequences
73	- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ