git.saurik.com Git - apple/icu.git/blame - icuSources/data/unidata/changes.txt

Commit	Line	Data
f3c0d7a5 A	1	* Copyright (C) 2016 and later: Unicode, Inc. and others.
	2	* License & terms of use: http://www.unicode.org/copyright.html
	3	* Copyright (C) 2004-2016, International Business Machines
73c04bcf A	4	* Corporation and others. All Rights Reserved.
	5	*
	6	* file name: changes.txt
	7	* encoding: US-ASCII
	8	* tab size: 8 (not used)
	9	* indentation:4
	10	*
	11	* created on: 2004may06
	12	* created by: Markus W. Scherer
	13	*
	14	* change log for Unicode updates
6be67b06 A	15	*
	16	* For each new Unicode version, during the beta period,
	17	* I copy the change log for the previous version to the top of this file.
	18	* I adjust the versions, tickets, URLs, and paths.
	19	* I work my way through the steps listed in the log, top to bottom,
	20	* adjusting the log as necessary.
	21	* I report problems to the UTC and/or CLDR and/or ICU.
	22	* Before the data is final, I "turn the crank" several more times,
	23	* using appropriate subsets of the steps.
73c04bcf A	24
73c04bcf A	25	---------------------------------------------------------------------------- ***
51004dcb	26
b331163b A	27	* New ISO 15924 script codes
b331163b A	28
f3c0d7a5 A	29	Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
	30	until they are encoded in Unicode,
	31	or can be assumed to be encoded in the next Unicode version.
b331163b A	32	Script enum constant names want to follow the Unicode script property value aliases,
	33	which are assigned only when the scripts are encoded.
	34	When we encode scripts early and guess wrong, then we have confusing enum constants
	35	and have sometimes added aliases.
	36
f3c0d7a5	37	Variant script codes like Latf and Aran that are not subject to separate encoding
b331163b	38	can be added at any time.
f3c0d7a5	39	(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
b331163b	40
f3c0d7a5 A	41	We add script codes used in CLDR or in the spoof checker.
	42	This includes combination/alias codes like Hanb and Jamo.
	43	See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
	44	and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
b331163b	45
f3c0d7a5	46	We add special Z* script codes like Zsye.
b331163b	47
f3c0d7a5	48	For new script codes see http://www.unicode.org/iso15924/codechanges.html
b331163b	49
f3c0d7a5 A	50	---------------------------------------------------------------------------- ***
f3c0d7a5 A	51
340931cb A	52	Unicode 13.0 update for ICU 66
	53
	54	https://www.unicode.org/versions/Unicode13.0.0/
	55	https://www.unicode.org/versions/beta-13.0.0.html
	56	https://www.unicode.org/Public/13.0.0/ucd/
	57	https://www.unicode.org/reports/uax-proposed-updates.html
	58	https://www.unicode.org/reports/tr44/tr44-25.html
	59
	60	https://unicode-org.atlassian.net/browse/CLDR-13387
	61	https://unicode-org.atlassian.net/browse/ICU-20893
	62
	63	* Command-line environment setup
	64
	65	UNICODE_DATA=~/unidata/uni13/20200212
	66	CLDR_SRC=~/cldr/uni/src
	67	ICU_ROOT=~/icu/uni
	68	ICU_SRC=$ICU_ROOT/src
	69	ICUDT=icudt66b
	70	ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
	71	ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
	72	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
	73
	74	*** Unicode version numbers
	75	- makedata.mak
	76	- uchar.h
	77	- com.ibm.icu.util.VersionInfo
	78	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	79
	80	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
	81	so that the makefiles see the new version number.
	82	cd $ICU_ROOT/dbg/icu4c
	83	ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
	84
	85	*** data files & enums & parser code
	86
	87	* download files
	88	- mkdir -p $UNICODE_DATA
	89	- download Unicode files into $UNICODE_DATA
	90	+ subfolders: emoji, idna, security, ucd, uca
	91	+ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
	92	+ split Unihan into single-property files
	93	~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
	94	+ get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
	95	or from the ucd/cldr/ output folder of the Unicode Tools:
	96	Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
	97	cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
	98
	99	* for manual diffs and for Unicode Tools input data updates:
	100	remove version suffixes from the file names
	101	~$ unidata/desuffixucd.py $UNICODE_DATA
	102	(see https://sites.google.com/site/unicodetools/inputdata)
	103
	104	* process and/or copy files
	105	- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
	106	+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	107	+ For debugging, and tweaking how ppucd.txt is written,
	108	the tool has an --only_ppucd option:
	109	py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
	110
	111	- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
	112
	113	* new constants for new property values
	114	- preparseucd.py error:
	115	ValueError: missing uchar.h enum constants for some property values:
116	[(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
117	u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
118	(u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
119	(u'InPC', set([u'Top_And_Bottom_And_Left']))]
120	= PropertyValueAliases.txt new property values (diff old & new .txt files)
121	blk; Chorasmian ; Chorasmian
122	blk; CJK_Ext_G ; CJK_Unified_Ideographs_Extension_G
123	blk; Dives_Akuru ; Dives_Akuru
124	blk; Khitan_Small_Script ; Khitan_Small_Script
125	blk; Lisu_Sup ; Lisu_Supplement
126	blk; Symbols_For_Legacy_Computing ; Symbols_For_Legacy_Computing
127	blk; Tangut_Sup ; Tangut_Supplement
128	blk; Yezidi ; Yezidi
129	-> add to uchar.h before UBLOCK_COUNT
130	use long property names for enum constants,
131	for the trailing comment get the block start code point: diff old & new Blocks.txt
132	-> add to UCharacter.UnicodeBlock IDs
133	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
134	replace public static final int \1_ID = \2; \3
135	-> add to UCharacter.UnicodeBlock objects
136	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
137	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
138
139	sc ; Chrs ; Chorasmian
140	sc ; Diak ; Dives_Akuru
141	sc ; Kits ; Khitan_Small_Script
142	sc ; Yezi ; Yezidi
143	-> uscript.h & com.ibm.icu.lang.UScript
144	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
145	and in com.ibm.icu.dev.test.lang.TestUScript.java
146
147	InPC; Top_And_Bottom_And_Left ; Top_And_Bottom_And_Left
148	-> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
149
150	* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
151	(not strictly necessary for NOT_ENCODED scripts)
152	$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
153
154	* build ICU (make install)
155	to make sure that there are no syntax errors, and
156	so that the tools build can pick up the new definitions from the installed header files.
157
158	$ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
159
160	* update spoof checker UnicodeSet initializers:
161	inclusionPat & recommendedPat in i18n/uspoof.cpp
162	INCLUSION & RECOMMENDED in SpoofChecker.java
163	- make sure that the Unicode Tools tree contains the latest security data files
164	- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
165	- update the hardcoded version number there in the DIRECTORY path
166	- run the tool (no special environment variables needed)
167	- copy & paste from the Console output into the .cpp & .java files
168
169	* generate normalization data files
170	cd $ICU_ROOT/dbg/icu4c
171	bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
172	bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
173	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
174	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
175	bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
176
177	* build ICU (make install)
178	so that the tools build can pick up the new definitions from the installed header files.
179
180	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
181
182	* build Unicode tools using CMake+make
183
184	$ICU_SRC/tools/unicode/c/icudefs.txt:
185
186	# Location (--prefix) of where ICU was installed.
187	set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
188	# Location of the ICU4C source tree.
189	set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
190
191	$ICU_ROOT/dbg$
192	mkdir -p tools/unicode/c
193	cd tools/unicode/c
194
195	$ICU_ROOT/dbg/tools/unicode/c$
196	cmake ../../../../src/tools/unicode/c
197	make
198
199	* generate core properties data files
200	$ICU_ROOT/dbg/tools/unicode/c$
201	genprops/genprops $ICU_SRC/icu4c
202	- tool failure:
203	genprops: Script_Extensions indexes overflow bit field
204	genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
205	-> uprops.icu data file format :
206	add two more bits to store a script code or Script_Extensions index
207	-> generator code, C++ & Java runtime, uprops.icu format version 7.7
208	- rebuild ICU (make install) & tools
209
210	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
211	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
212	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
213	- Unicode 6.0..13.0: U+2260, U+226E, U+226F
214	- nothing new in this Unicode version, no test file to update
215
216	* run & fix ICU4C tests
217	- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
218	- Andy helps with RBBI & spoof check test failures
219
220	* collation: CLDR collation root, UCA DUCET
221
222	- UCA DUCET goes into Mark's Unicode tools, see
223	https://sites.google.com/site/unicodetools/home#TOC-UCA
224	diff the main mapping file, look for bad changes
225	(for example, more bytes per weight for common characters)
226	~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
227	~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
228
229	- CLDR root data files are checked into $CLDR_SRC/common/uca/
230	cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
231
232	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
233	cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
234	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
235	cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
236	(note removing the underscore before "Rules")
237	cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
238	- restore TODO diffs in UCARules.txt
239	meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
240	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
241	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
242	from the CLDR root files (..._CLDR_..._SHORT.txt)
243	cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
244	cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
245	cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
246	- if CLDR common/uca/unihan-index.txt changes, then update
247	CLDR common/collation/root.xml <collation type="private-unihan">
248	and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
249
250	- run genuca
251	$ICU_ROOT/dbg/tools/unicode/c$
252	genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
253	genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
254	- rebuild ICU4C
255
256	* Unihan collators
257	https://sites.google.com/site/unicodetools/unihan
258	- run Unicode Tools
259	org.unicode.draft.GenerateUnihanCollators
260	with VM arguments
261	-ea
262	-DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
263	-DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
264	-DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
265	-DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
266	-DUVERSION=13.0.0
267	- run Unicode Tools
268	org.unicode.draft.GenerateUnihanCollatorFiles
269	with the same arguments
270	- check CLDR diffs
271	cd $CLDR_SRC
272	meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
273	meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
274	- copy to CLDR
275	cd $CLDR_SRC
276	cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
277	cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
278	- run CLDR unit tests, commit to CLDR
279	- generate ICU zh collation data: run CLDR
280	org.unicode.cldr.icu.NewLdml2IcuConverter
281	with program arguments
282	-t collation
283	-s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
284	-m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
285	-d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
286	-p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
287	zh
288	and VM arguments
289	-ea
290	-DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
291	- rebuild ICU4C
292
293	* run & fix ICU4C tests, now with new CLDR collation root data
294	- run all tests with the collation test data *_SHORT.txt or the full files
295	(the full ones have comments, useful for debugging)
296	- note on intltest: if collate/UCAConformanceTest fails, then
297	utility/MultithreadTest/TestCollators will fail as well;
298	fix the conformance test before looking into the multi-thread test
299
300	* update Java data files
301	- refresh just the UCD/UCA-related/derived files, just to be safe
302	- see (ICU4C)/source/data/icu4j-readme.txt
303	- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
304	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
305	output:
306	...
307	make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
308	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
309	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
310	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
311	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
312	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
313	mkdir -p /tmp/icu4j/main/shared/data
314	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
315	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
316	mkdir -p /tmp/icu4j/main/shared/data
317	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
318	make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
319	- copy the big-endian Unicode data files to another location,
320	separate from the other data files,
321	and then refresh ICU4J
322	cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
323	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
324	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
325	cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
326	cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
327	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
328	cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
329	cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
330	cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
331	jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
332
333	* When refreshing all of ICU4J data from ICU4C
334	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
335	- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
336	or
337	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
338
339	* update CollationFCD.java
340	+ copy & paste the initializers of lcccIndex[] etc. from
341	ICU4C/source/i18n/collationfcd.cpp to
342	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
343
344	* refresh Java test .txt files
345	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
346	cd $ICU_SRC/icu4c/source/data/unidata
347	cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
348	cd ../../test/testdata
349	cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
350	cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
351
352	* run & fix ICU4J tests
353
354	*** API additions
355	- send notice to icu-design about new born-@stable API (enum constants etc.)
356
357	*** CLDR numbering systems
358	- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
359	for example, look for
360	~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
361	in new blocks (Blocks.txt)
362	Unicode 13:
363	diak 11950..11959 Dives_Akuru
364
365	*** merge the Unicode update branches back onto the trunk
366	- do not merge the icudata.jar and testdata.jar,
367	instead rebuild them from merged & tested ICU4C
368	- make sure that changes to Unicode tools are checked in:
369	http://www.unicode.org/utility/trac/log/trunk/unicodetools
370
371	---------------------------------------------------------------------------- ***
372
3d1f044b A	373	Unicode 12.1 update for ICU 64.2
	374
	375	** This is an abbreviated update with one new character for the new
	376	** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
	377	https://en.wikipedia.org/wiki/Reiwa_period
	378
	379	http://www.unicode.org/versions/Unicode12.1.0/
	380
	381	ICU-20497 Unicode 12.1
	382
	383	cldrbug 11978: Unicode 12.1
	384
	385	* Command-line environment setup
	386
	387	UNICODE_DATA=~/unidata/uni121/20190403
	388	CLDR_SRC=~/svn.cldr/uni
	389	ICU_ROOT=~/icu/uni
	390	ICU_SRC=$ICU_ROOT/src
	391	ICUDT=icudt64b
	392	ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
	393	ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
	394	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
	395
	396	*** Unicode version numbers
	397	- makedata.mak
	398	- uchar.h
	399	- com.ibm.icu.util.VersionInfo
	400	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	401
	402	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
	403	so that the makefiles see the new version number.
	404	cd $ICU_ROOT/dbg/icu4c
	405	ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
	406
	407	*** data files & enums & parser code
	408
	409	* download files
	410	- mkdir -p $UNICODE_DATA
	411	- download Unicode files into $UNICODE_DATA
	412	+ subfolders: emoji, idna, security, ucd, uca
	413	+ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
	414
	415	* for manual diffs and for Unicode Tools input data updates:
	416	remove version suffixes from the file names
	417	~$ unidata/desuffixucd.py $UNICODE_DATA
	418	(see https://sites.google.com/site/unicodetools/inputdata)
	419
	420	* process and/or copy files
	421	- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
	422	+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	423	+ For debugging, and tweaking how ppucd.txt is written,
	424	the tool has an --only_ppucd option:
	425	py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
	426
	427	- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
	428
	429	* build ICU (make install)
	430	so that the tools build can pick up the new definitions from the installed header files.
	431
	432	$ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
	433
	434	* update spoof checker UnicodeSet initializers:
	435	inclusionPat & recommendedPat in uspoof.cpp
	436	INCLUSION & RECOMMENDED in SpoofChecker.java
437	- make sure that the Unicode Tools tree contains the latest security data files
438	- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
439	- update the hardcoded version number there in the DIRECTORY path
440	- run the tool (no special environment variables needed)
441	- copy & paste from the Console output into the .cpp & .java files
442
443	* generate normalization data files
444	cd $ICU_ROOT/dbg/icu4c
445	bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
446	bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
447	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
448	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
449	bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
450
451	* build ICU (make install)
452	so that the tools build can pick up the new definitions from the installed header files.
453
454	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
455
456	* build Unicode tools using CMake+make
457
458	$ICU_SRC/tools/unicode/c/icudefs.txt:
459
460	# Location (--prefix) of where ICU was installed.
461	set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
462	# Location of the ICU4C source tree.
463	set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
464
465	$ICU_ROOT/dbg$
466	mkdir -p tools/unicode/c
467	cd tools/unicode/c
468
469	$ICU_ROOT/dbg/tools/unicode/c$
470	cmake ../../../../src/tools/unicode/c
471	make
472
473	* generate core properties data files
474	$ICU_ROOT/dbg/tools/unicode/c$
475	genprops/genprops $ICU_SRC/icu4c
476	genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
477	genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
478	- rebuild ICU (make install) & tools
479
480	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
481	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
482	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
483	- Unicode 6.0..12.1: U+2260, U+226E, U+226F
484	- nothing new in this Unicode version, no test file to update
485
486	* run & fix ICU4C tests
487	- Andy handles RBBI & spoof check test failures
488
489	* collation: CLDR collation root, UCA DUCET
490
491	- UCA DUCET goes into Mark's Unicode tools, see
492	https://sites.google.com/site/unicodetools/home#TOC-UCA
493	diff the main mapping file, look for bad changes
494	(for example, more bytes per weight for common characters)
495	~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
496	~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
497
498	- CLDR root data files are checked into $CLDR_SRC/common/uca/
499	cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
500
501	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
502	cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
503	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
504	cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
505	(note removing the underscore before "Rules")
506	cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
507	- restore TODO diffs in UCARules.txt
508	meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
509	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
510	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
511	from the CLDR root files (..._CLDR_..._SHORT.txt)
512	cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
513	cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
514	cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
515	- if CLDR common/uca/unihan-index.txt changes, then update
516	CLDR common/collation/root.xml <collation type="private-unihan">
517	and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
518
519	- run genuca, see command line above
520	- rebuild ICU4C
521
522	* Unihan collators
523	https://sites.google.com/site/unicodetools/unihan
524	- run Unicode Tools
525	org.unicode.draft.GenerateUnihanCollators
526	with VM arguments
527	-ea
528	-DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
529	-DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
530	-DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
531	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
532	-DUVERSION=12.1.0
533	- run Unicode Tools
534	org.unicode.draft.GenerateUnihanCollatorFiles
535	with the same arguments
536	- check CLDR diffs
537	cd $CLDR_SRC
538	meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
539	meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
540	- copy to CLDR
541	cd $CLDR_SRC
542	cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
543	cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
544	- run CLDR unit tests, commit to CLDR
545	- generate ICU zh collation data: run CLDR
546	org.unicode.cldr.icu.NewLdml2IcuConverter
547	with program arguments
548	-t collation
549	-s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
550	-m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
551	-d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
552	-p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
553	zh
554	and VM arguments
555	-ea
556	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
557	- rebuild ICU4C
558
559	* run & fix ICU4C tests, now with new CLDR collation root data
560	- run all tests with the collation test data *_SHORT.txt or the full files
561	(the full ones have comments, useful for debugging)
562	- note on intltest: if collate/UCAConformanceTest fails, then
563	utility/MultithreadTest/TestCollators will fail as well;
564	fix the conformance test before looking into the multi-thread test
565
566	* update Java data files
567	- refresh just the UCD/UCA-related/derived files, just to be safe
568	- see (ICU4C)/source/data/icu4j-readme.txt
569	- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
570	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
571	output:
572	...
573	make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
574	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
575	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
576	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
577	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
578	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
579	mkdir -p /tmp/icu4j/main/shared/data
580	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
581	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
582	mkdir -p /tmp/icu4j/main/shared/data
583	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
584	make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
585	- copy the big-endian Unicode data files to another location,
586	separate from the other data files,
587	and then refresh ICU4J
588	cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
589	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
590	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
591	cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
592	cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
593	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
594	cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
595	cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
596	cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
597	jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
598
599	* When refreshing all of ICU4J data from ICU4C
600	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
601	- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
602	or
603	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
604
605	* update CollationFCD.java
606	+ copy & paste the initializers of lcccIndex[] etc. from
607	ICU4C/source/i18n/collationfcd.cpp to
608	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
609
610	* refresh Java test .txt files
611	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
612	cd $ICU_SRC/icu4c/source/data/unidata
613	cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
614	cd ../../test/testdata
615	cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
616	cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
617
618	* run & fix ICU4J tests
619
620	*** API additions
621	- send notice to icu-design about new born-@stable API (enum constants etc.)
622
623	*** CLDR numbering systems
624	- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
625	for example, look for
626	~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
627	in new blocks (Blocks.txt)
628	Unicode 12: using Unicode 12 CLDR ticket #11478
629	hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
630	wcho 1E2F0..1E2F9 Wancho
631	Unicode 11: using Unicode 11 CLDR ticket #10978
632	rohg 10D30..10D39 Hanifi_Rohingya
633	gong 11DA0..11DA9 Gunjala_Gondi
634	Earlier: CLDR tickets specific to adding new numbering systems.
635	Unicode 10: http://unicode.org/cldr/trac/ticket/10219
636	Unicode 9: http://unicode.org/cldr/trac/ticket/9692
637
638	*** merge the Unicode update branches back onto the trunk
639	- do not merge the icudata.jar and testdata.jar,
640	instead rebuild them from merged & tested ICU4C
641	- make sure that changes to Unicode tools are checked in:
642	http://www.unicode.org/utility/trac/log/trunk/unicodetools
643
644	---------------------------------------------------------------------------- ***
645
646	Unicode 12.0 update for ICU 64
647
648	http://www.unicode.org/versions/Unicode12.0.0/
649	http://unicode.org/versions/beta-12.0.0.html
650	https://www.unicode.org/review/pri389/
651	http://www.unicode.org/reports/uax-proposed-updates.html
652	http://www.unicode.org/reports/tr44/tr44-23.html
653
654	ICU-20203 Unicode 12
655
656	ICU-20111 move text layout properties data into a data file
657
658	cldrbug 11478: Unicode 12
659	Accidentally used ^/trunk instead of ^/branches/markus/uni12
660
661	* Command-line environment setup
662
663	UNICODE_DATA=~/unidata/uni12/20190309
664	CLDR_SRC=~/svn.cldr/uni
665	ICU_ROOT=~/icu/uni
666	ICU_SRC=$ICU_ROOT/src
667	ICUDT=icudt63b
668	ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
669	ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
670	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
671
672	*** Unicode version numbers
673	- makedata.mak
674	- uchar.h
675	- com.ibm.icu.util.VersionInfo
676	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
677
678	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
679	so that the makefiles see the new version number.
680
681	*** data files & enums & parser code
682
683	* download files
684	- mkdir -p $UNICODE_DATA
685	- download Unicode files into $UNICODE_DATA
686	+ subfolders: emoji, idna, security, ucd, uca
687	+ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
688
689	* for manual diffs and for Unicode Tools input data updates:
690	remove version suffixes from the file names
691	~$ unidata/desuffixucd.py $UNICODE_DATA
692	(see https://sites.google.com/site/unicodetools/inputdata)
693
694	* process and/or copy files
695	- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
696	+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
697	+ For debugging, and tweaking how ppucd.txt is written,
698	the tool has an --only_ppucd option:
699	py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
700
701	- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
702
703	* build ICU (make install)
704	so that the tools build can pick up the new definitions from the installed header files.
705
706	$ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
707
708	* new constants for new property values
709	- preparseucd.py error:
710	ValueError: missing uchar.h enum constants for some property values:
711	[(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
712	u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
713	u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
714	(u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
715	= PropertyValueAliases.txt new property values (diff old & new .txt files)
716	blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
717	blk; Elymaic ; Elymaic
718	blk; Nandinagari ; Nandinagari
719	blk; Nyiakeng_Puachue_Hmong ; Nyiakeng_Puachue_Hmong
720	blk; Ottoman_Siyaq_Numbers ; Ottoman_Siyaq_Numbers
721	blk; Small_Kana_Ext ; Small_Kana_Extension
722	blk; Symbols_And_Pictographs_Ext_A ; Symbols_And_Pictographs_Extended_A
723	blk; Tamil_Sup ; Tamil_Supplement
724	blk; Wancho ; Wancho
725	-> add to uchar.h
726	use long property names for enum constants,
727	for the trailing comment get the block start code point: diff old & new Blocks.txt
728	-> add to UCharacter.UnicodeBlock IDs
729	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
730	replace public static final int \1_ID = \2; \3
731	-> add to UCharacter.UnicodeBlock objects
732	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
733	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
734
735	sc ; Elym ; Elymaic
736	sc ; Hmnp ; Nyiakeng_Puachue_Hmong
737	sc ; Nand ; Nandinagari
738	sc ; Wcho ; Wancho
739	-> uscript.h & com.ibm.icu.lang.UScript
740	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
741	and in com.ibm.icu.dev.test.lang.TestUScript.java
742
743	* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
744	(not strictly necessary for NOT_ENCODED scripts)
745	$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
746
747	* update spoof checker UnicodeSet initializers:
748	inclusionPat & recommendedPat in uspoof.cpp
749	INCLUSION & RECOMMENDED in SpoofChecker.java
750	- make sure that the Unicode Tools tree contains the latest security data files
751	- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
752	- update the hardcoded version number there in the DIRECTORY path
753	- run the tool (no special environment variables needed)
754	- copy & paste from the Console output into the .cpp & .java files
755
756	* generate normalization data files
757	cd $ICU_ROOT/dbg/icu4c
758	bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
759	bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
760	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
761	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
762	bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
763
764	* build ICU (make install)
765	so that the tools build can pick up the new definitions from the installed header files.
766
767	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
768
769	* build Unicode tools using CMake+make
770
771	$ICU_SRC/tools/unicode/c/icudefs.txt:
772
773	# Location (--prefix) of where ICU was installed.
774	set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
775	# Location of the ICU4C source tree.
776	set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
777
778	$ICU_ROOT/dbg$
779	mkdir -p tools/unicode/c
780	cd tools/unicode/c
781
782	$ICU_ROOT/dbg/tools/unicode/c$
783	cmake ../../../../src/tools/unicode/c
784	make
785
786	* generate core properties data files
787	$ICU_ROOT/dbg/tools/unicode/c$
788	genprops/genprops $ICU_SRC/icu4c
789	genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
790	genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
791	- rebuild ICU (make install) & tools
792
793	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
794	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
795	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
796	- Unicode 6.0..12.0: U+2260, U+226E, U+226F
797	- nothing new in this Unicode version, no test file to update
798
799	* run & fix ICU4C tests
800	- update test of default bidi classes:
801	Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
802	see diffs in DerivedBidiClass.txt
803	+ /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
804	+ UCharacterTest.java TestIteration() defaultBidi[]
805	- Andy handles RBBI & spoof check test failures
806
807	* collation: CLDR collation root, UCA DUCET
808
809	- UCA DUCET goes into Mark's Unicode tools, see
810	https://sites.google.com/site/unicodetools/home#TOC-UCA
811	diff the main mapping file, look for bad changes
812	(for example, more bytes per weight for common characters)
813	~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
814	~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
815
816	- CLDR root data files are checked into $CLDR_SRC/common/uca/
817	cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
818
819	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
820	cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
821	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
822	cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
823	(note removing the underscore before "Rules")
824	cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
825	- restore TODO diffs in UCARules.txt
826	meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
827	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
828	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
829	from the CLDR root files (..._CLDR_..._SHORT.txt)
830	cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
831	cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
832	cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
833	- if CLDR common/uca/unihan-index.txt changes, then update
834	CLDR common/collation/root.xml <collation type="private-unihan">
835	and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
836
837	- run genuca, see command line above;
838	deal with
839	Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
840	FDD1 119CE; [71 CD 02, 05, 05] # Nandinagari first primary (compressible)
841	(add the character to genuca.cpp sampleCharsToScripts[])
842	+ This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
843	and cache its values.
844	Works as long as the script metadata is updated before the collation data.
845	- rebuild ICU4C
846
847	* Unihan collators
848	https://sites.google.com/site/unicodetools/unihan
849	- run Unicode Tools
850	org.unicode.draft.GenerateUnihanCollators
851	with VM arguments
852	-ea
853	-DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
854	-DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
855	-DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
856	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
857	-DUVERSION=12.0.0
858	- run Unicode Tools
859	org.unicode.draft.GenerateUnihanCollatorFiles
860	with the same arguments
861	- check CLDR diffs
862	cd $CLDR_SRC
863	meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
864	meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
865	- copy to CLDR
866	cd $CLDR_SRC
867	cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
868	cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
869	- run CLDR unit tests, commit to CLDR
870	- generate ICU zh collation data: run CLDR
871	org.unicode.cldr.icu.NewLdml2IcuConverter
872	with program arguments
873	-t collation
874	-s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
875	-m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
876	-d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
877	-p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
878	zh
879	and VM arguments
880	-ea
881	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
882	- rebuild ICU4C
883
884	* run & fix ICU4C tests, now with new CLDR collation root data
885	- run all tests with the collation test data *_SHORT.txt or the full files
886	(the full ones have comments, useful for debugging)
887	- note on intltest: if collate/UCAConformanceTest fails, then
888	utility/MultithreadTest/TestCollators will fail as well;
889	fix the conformance test before looking into the multi-thread test
890
891	* update Java data files
892	- refresh just the UCD/UCA-related/derived files, just to be safe
893	- see (ICU4C)/source/data/icu4j-readme.txt
894	- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
895	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
896	output:
897	...
898	Unicode .icu files built to ./out/build/icudt63l
899	echo timestamp > uni-core-data
900	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
901	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
902	echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
903	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
904	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
905	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
906	mkdir -p /tmp/icu4j/main/shared/data
907	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
908	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
909	mkdir -p /tmp/icu4j/main/shared/data
910	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
911	make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
912	- copy the big-endian Unicode data files to another location,
913	separate from the other data files,
914	and then refresh ICU4J
915	cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
916	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
917	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
918	cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
919	cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
920	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
921	cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
922	cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
923	cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
924	jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
925
926	* When refreshing all of ICU4J data from ICU4C
927	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
928	- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
929	or
930	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
931
932	* update CollationFCD.java
933	+ copy & paste the initializers of lcccIndex[] etc. from
934	ICU4C/source/i18n/collationfcd.cpp to
935	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
936
937	* refresh Java test .txt files
938	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
939	cd $ICU_SRC/icu4c/source/data/unidata
940	cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
941	cd ../../test/testdata
942	cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
943	cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
944
945	* run & fix ICU4J tests
946
947	*** API additions
948	- send notice to icu-design about new born-@stable API (enum constants etc.)
949
950	*** CLDR numbering systems
951	- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
952	for example, look for
953	~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
954	in new blocks (Blocks.txt)
955	Unicode 12: using Unicode 12 CLDR ticket #11478
956	hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
957	wcho 1E2F0..1E2F9 Wancho
958	Unicode 11: using Unicode 11 CLDR ticket #10978
959	rohg 10D30..10D39 Hanifi_Rohingya
960	gong 11DA0..11DA9 Gunjala_Gondi
961	Earlier: CLDR tickets specific to adding new numbering systems.
962	Unicode 10: http://unicode.org/cldr/trac/ticket/10219
963	Unicode 9: http://unicode.org/cldr/trac/ticket/9692
964
965	*** merge the Unicode update branches back onto the trunk
966	- do not merge the icudata.jar and testdata.jar,
967	instead rebuild them from merged & tested ICU4C
968	- make sure that changes to Unicode tools are checked in:
969	http://www.unicode.org/utility/trac/log/trunk/unicodetools
970
971	---------------------------------------------------------------------------- ***
972
973	ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
974
975	* Command-line environment setup
976
977	UNICODE_DATA=~/unidata/uni11/20180609
978	CLDR_SRC=~/svn.cldr/uni
979	ICU_ROOT=~/icu/mine
980	ICU_SRC=$ICU_ROOT/src
981	ICUDT=icudt62b
982	ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
983	ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
984	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
985
986	*** Links
987
988	https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
989	https://unicode-org.atlassian.net/browse/ICU-12850 vo
990
991	*** data files & enums & parser code
992
993	* API additions
994	- for each of the three new enumerated properties
995	+ uchar.h: add the enum UProperty constant UCHAR_<long prop name>
996	+ uchar.h: update UCHAR_INT_LIMIT
997	+ uchar.h: add the enum U<long prop name>
998	with constants U_<short prop name>_<long value name>
999	+ UProperty.java: add the constant <long prop name>
1000	+ UProperty.java: update INT_LIMIT
1001	+ UCharacter.java: add the interface <long prop name>
1002	with constants <long value name>
1003
1004	* process and/or copy files
1005	- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1006	+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1007	+ It also writes tools/unicode/c/genprops/pnames_data.h with property and value
1008	names and aliases.
1009	+ For debugging, and tweaking how ppucd.txt is written,
1010	the tool has an --only_ppucd option:
1011	py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1012
1013	* preparseucd.py changes
1014	- add new property short names (uppercase) to _prop_and_value_re
1015	so that ParseUCharHeader() parses the new enum constants
1016
1017	* build ICU (make install)
1018	so that the tools build can pick up the new definitions from the installed header files.
1019
1020	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1021
1022	* build Unicode tools using CMake+make
1023
1024	$ICU_SRC/tools/unicode/c/icudefs.txt:
1025
1026	# Location (--prefix) of where ICU was installed.
1027	set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1028	# Location of the ICU4C source tree.
1029	set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
1030
1031	$ICU_ROOT/dbg$
1032	mkdir -p tools/unicode/c
1033	cd tools/unicode/c
1034
1035	$ICU_ROOT/dbg/tools/unicode/c$
1036	cmake ../../../../../src/tools/unicode/c
1037	make
1038
1039	* generate core properties data files
1040	$ICU_ROOT/dbg/tools/unicode/c$
1041	genprops/genprops $ICU_SRC/icu4c
1042	- rebuild ICU (make install) & tools
1043
1044	* write data for runtime, hardcoded for now
1045	- add genprops/layoutpropsbuilder.cpp with pieces from sibling files
1046	- generate new icu4c/source/common/ulayout_props_data.h
1047	- for each of the three new enumerated properties
1048	+ int property max value
1049	+ small, 8-bit UCPTrie
1050	(A small 16-bit trie with bit fields for these three properties
1051	is very nearly the same size as the sum of the three.)
1052
1053	* wire into C++
1054	- uprops.cpp: #include ulayout_props_data.h
1055	- uprops.cpp: add getInPC() etc. functions
1056	- uprops.cpp: add lines to intProps[], include max values
1057	- uprops.h: add UPropertySource constants
1058	- uprops.cpp: add uprops_addPropertyStarts(src)
1059	- uniset_props.cpp: add to UnicodeSet_initInclusion()
1060	- intltest/ucdtest.cpp: write unit tests
1061
1062	* update Java data files
1063	- refresh just the pnames.icu file with the new property [value] names, just to be safe
1064	- see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
1065	- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1066	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1067	- copy the big-endian Unicode data files to another location,
1068	separate from the other data files,
1069	and then refresh ICU4J
1070	cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1071	cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1072	jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1073
1074	* wire into Java
1075	- UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
1076	- UCharacterProperty.java: for each new property
1077	+ create a nested class to hold its CodePointTrie
1078	+ initialize it from a string literal
1079	+ paste in the initializer printed by genprops
1080	+ add a new IntProperty object to the intProps[] array
1081	+ use the correct max int value for each property, also printed by genprops
1082	- UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
1083	- UnicodeSet.java: add to getInclusions()
1084	- UCharacterTest.java: write unit tests
1085
1086	---------------------------------------------------------------------------- ***
1087
0f5d89e8 A	1088	Unicode 11.0 update for ICU 62
	1089
	1090	http://www.unicode.org/versions/Unicode11.0.0/
	1091	http://unicode.org/versions/beta-11.0.0.html
	1092	https://www.unicode.org/review/pri372/
	1093	http://www.unicode.org/reports/uax-proposed-updates.html
	1094	http://www.unicode.org/reports/tr44/tr44-21.html
	1095
	1096	* Command-line environment setup
	1097
	1098	UNICODE_DATA=~/unidata/uni11/20180521
	1099	CLDR_SRC=~/svn.cldr/uni
	1100	ICU_ROOT=~/svn.icu/uni
	1101	ICU_SRC=$ICU_ROOT/src
	1102	ICUDT=icudt61b
	1103	ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
	1104	ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
	1105	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
	1106
	1107	*** ICU Trac
	1108
	1109	- ticket:13630: Unicode 11
	1110	- ^/branches/markus/uni11
	1111
	1112	*** CLDR Trac
	1113
	1114	- cldrbug 10978: Unicode 11
	1115	- ^/branches/markus/uni11
	1116
	1117	*** Unicode version numbers
	1118	- makedata.mak
	1119	- uchar.h
	1120	- com.ibm.icu.util.VersionInfo
	1121	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	1122
	1123	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
	1124	so that the makefiles see the new version number.
	1125
	1126	*** data files & enums & parser code
	1127
	1128	* download files
	1129	- mkdir -p $UNICODE_DATA
	1130	- download Unicode files into $UNICODE_DATA
	1131	+ subfolders: emoji, idna, security, ucd, uca
	1132	+ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
	1133
	1134	* for manual diffs and for Unicode Tools input data updates:
	1135	remove version suffixes from the file names
	1136	~$ unidata/desuffixucd.py $UNICODE_DATA
	1137	(see https://sites.google.com/site/unicodetools/inputdata)
	1138
	1139	* process and/or copy files
	1140	- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
	1141	+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	1142	+ For debugging, and tweaking how ppucd.txt is written,
	1143	the tool has an --only_ppucd option:
	1144	py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
	1145
	1146	- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
	1147
	1148	* build ICU (make install)
	1149	so that the tools build can pick up the new definitions from the installed header files.
	1150
	1151	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1152
1153	* preparseucd.py changes
1154	- fix other errors
1155	NameError: unknown property Extended_Pictographic
1156	-> add Extended_Pictographic binary property
1157	-> add new short names for all Emoji properties
1158
1159	* new constants for new property values
1160	- preparseucd.py error:
1161	ValueError: missing uchar.h enum constants for some property values:
1162	[(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
1163	u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
1164	u'Indic_Siyaq_Numbers'])),
1165	(u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
1166	(u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
1167	(u'GCB', set([u'LinkC', u'Virama'])),
1168	(u'WB', set([u'WSegSpace']))]
1169	= PropertyValueAliases.txt new property values (diff old & new .txt files)
1170	blk; Chess_Symbols ; Chess_Symbols
1171	blk; Dogra ; Dogra
1172	blk; Georgian_Ext ; Georgian_Extended
1173	blk; Gunjala_Gondi ; Gunjala_Gondi
1174	blk; Hanifi_Rohingya ; Hanifi_Rohingya
1175	blk; Indic_Siyaq_Numbers ; Indic_Siyaq_Numbers
1176	blk; Makasar ; Makasar
1177	blk; Mayan_Numerals ; Mayan_Numerals
1178	blk; Medefaidrin ; Medefaidrin
1179	blk; Old_Sogdian ; Old_Sogdian
1180	blk; Sogdian ; Sogdian
1181	-> add to uchar.h
1182	use long property names for enum constants,
1183	for the trailing comment get the block start code point: diff old & new Blocks.txt
1184	-> add to UCharacter.UnicodeBlock IDs
1185	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1186	replace public static final int \1_ID = \2; \3
1187	-> add to UCharacter.UnicodeBlock objects
1188	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1189	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1190
1191	GCB; LinkC ; LinkingConsonant
1192	GCB; Virama ; Virama
1193	-> uchar.h & UCharacter.GraphemeClusterBreak
1194	-> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
1195
1196	InSC; Consonant_Initial_Postfixed ; Consonant_Initial_Postfixed
1197	-> ignore: ICU does not yet support this property
1198
1199	jg ; Hanifi_Rohingya_Kinna_Ya ; Hanifi_Rohingya_Kinna_Ya
1200	jg ; Hanifi_Rohingya_Pa ; Hanifi_Rohingya_Pa
1201	-> uchar.h & UCharacter.JoiningGroup
1202
1203	sc ; Dogr ; Dogra
1204	sc ; Gong ; Gunjala_Gondi
1205	sc ; Maka ; Makasar
1206	sc ; Medf ; Medefaidrin
1207	sc ; Rohg ; Hanifi_Rohingya
1208	sc ; Sogd ; Sogdian
1209	sc ; Sogo ; Old_Sogdian
1210	-> uscript.h & com.ibm.icu.lang.UScript
1211	-> Nushu had been added already
1212	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1213	and in com.ibm.icu.dev.test.lang.TestUScript.java
1214
1215	WB ; WSegSpace ; WSegSpace
1216	-> uchar.h & UCharacter.WordBreak
1217
1218	* New short names for emoji properties
1219	- see UTS #51
1220	- short names set in preparseucd.py
1221
1222	* New properties
1223	- boolean emoji property Extended_Pictographic
1224	-> added in preparseucd.py
1225	-> uchar.h & UProperty.java
1226	- misc. property Equivalent_Unified_Ideograph (EqUIdeo)
1227	as shown in PropertyValueAliases.txt
1228	-> ignore for now
1229
1230	* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1231	(not strictly necessary for NOT_ENCODED scripts)
1232	$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1233
1234	* update spoof checker UnicodeSet initializers:
1235	inclusionPat & recommendedPat in uspoof.cpp
1236	INCLUSION & RECOMMENDED in SpoofChecker.java
1237	- make sure that the Unicode Tools tree contains the latest security data files
1238	- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1239	- update the hardcoded version number there in the DIRECTORY path
1240	- run the tool (no special environment variables needed)
1241	- copy & paste from the Console output into the .cpp & .java files
1242
1243	* generate normalization data files
1244	cd $ICU_ROOT/dbg/icu4c
1245	bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1246	bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
1247	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1248	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1249	bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1250
1251	* build ICU (make install)
1252	so that the tools build can pick up the new definitions from the installed header files.
1253
1254	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1255
1256	* build Unicode tools using CMake+make
1257
1258	$ICU_SRC/tools/unicode/c/icudefs.txt:
1259
1260	# Location (--prefix) of where ICU was installed.
1261	set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1262	# Location of the ICU4C source tree.
1263	set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
1264
1265	$ICU_ROOT/dbg$
1266	mkdir -p tools/unicode/c
1267	cd tools/unicode/c
1268
1269	$ICU_ROOT/dbg/tools/unicode/c$
1270	cmake ../../../../src/tools/unicode/c
1271	make
1272
1273	* generate core properties data files
1274	$ICU_ROOT/dbg/tools/unicode/c$
1275	genprops/genprops $ICU_SRC/icu4c
1276	genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1277	genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1278	- rebuild ICU (make install) & tools
1279
1280	* Fix case props
1281	genprops error: casepropsbuilder: too many exceptions words
1282	genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
1283	- With the addition of Georgian Mtavruli capital letters,
1284	there are now too many simple case mappings with big mapping deltas
1285	that yield uncompressible exceptions.
1286	- Changing the data structure (now formatVersion 4),
1287	adding one bit for no-simple-case-folding (for Cherokee), and
1288	one optional slot for a big delta (for most faraway mappings),
1289	together with another bit for whether that is negative.
1290	This makes most Cherokee & Georgian etc. case mappings compressible,
1291	reducing the number of exceptions words.
1292	- Further changes to gain one more bit for the exceptions index,
1293	for future growth. Details see casepropsbuilder.cpp.
1294
1295	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1296	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1297	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1298	- Unicode 6.0..11.0: U+2260, U+226E, U+226F
1299	- nothing new in this Unicode version, no test file to update
1300
1301	* run & fix ICU4C tests
1302	- Andy handles RBBI & spoof check test failures
1303
1304	- Errors in char.txt, word.txt, word_POSIX.txt like
1305	createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET" at line 46, column 16
1306	because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
1307	-> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
1308	not empty, just to get ICU building.
1309	-> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
1310	and properties together with the rules that used them (GB 10, WB 14).
1311	-> Andy adjusts the rule sets further to sync with
1312	Unicode 11 grapheme, word, and line break spec changes.
1313
1314	* collation: CLDR collation root, UCA DUCET
1315
1316	- UCA DUCET goes into Mark's Unicode tools, see
1317	https://sites.google.com/site/unicodetools/home#TOC-UCA
1318	diff the main mapping file, look for bad changes
1319	(for example, more bytes per weight for common characters)
1320	~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
1321	~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
1322
1323	- CLDR root data files are checked into $CLDR_SRC/common/uca/
1324	cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1325
1326	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1327	cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1328	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1329	cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1330	(note removing the underscore before "Rules")
1331	cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1332	- restore TODO diffs in UCARules.txt
1333	meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1334	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1335	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1336	from the CLDR root files (..._CLDR_..._SHORT.txt)
1337	cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1338	cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1339	cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1340	- if CLDR common/uca/unihan-index.txt changes, then update
1341	CLDR common/collation/root.xml <collation type="private-unihan">
1342	and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1343
1344	- run genuca, see command line above;
1345	deal with
1346	Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1347	FDD1 1180B; [71 CC 02, 05, 05] # Dogra first primary (compressible)
1348	(add the character to genuca.cpp sampleCharsToScripts[])
1349	+ look up the USCRIPT_ code for the new sample characters
1350	(should be obvious from the comment in the error output)
1351	+ add mappings to sampleCharsToScripts[], do not replace them
1352	(in case the script sample characters flip-flop)
1353	+ insert new scripts in DUCET script order, see the top_byte table
1354	at the beginning of FractionalUCA.txt
1355	- rebuild ICU4C
1356
1357	* Unihan collators
1358	https://sites.google.com/site/unicodetools/unihan
1359	- run Unicode Tools
1360	org.unicode.draft.GenerateUnihanCollators
1361	with VM arguments
1362	-ea
1363	-DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1364	-DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1365	-DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1366	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1367	-DUVERSION=11.0.0
1368	- run Unicode Tools
1369	org.unicode.draft.GenerateUnihanCollatorFiles
1370	with the same arguments
1371	- check CLDR diffs
1372	cd $CLDR_SRC
1373	meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1374	meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1375	- copy to CLDR
1376	cd $CLDR_SRC
1377	cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1378	cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1379	- run CLDR unit tests, commit to CLDR
1380	- generate ICU zh collation data: run CLDR
1381	org.unicode.cldr.icu.NewLdml2IcuConverter
1382	with program arguments
1383	-t collation
1384	-s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1385	-m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1386	-d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
1387	-p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
1388	zh
1389	and VM arguments
1390	-ea
1391	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1392	- rebuild ICU4C
1393
1394	* run & fix ICU4C tests, now with new CLDR collation root data
1395	- run all tests with the collation test data *_SHORT.txt or the full files
1396	(the full ones have comments, useful for debugging)
1397	- note on intltest: if collate/UCAConformanceTest fails, then
1398	utility/MultithreadTest/TestCollators will fail as well;
1399	fix the conformance test before looking into the multi-thread test
1400
1401	* update Java data files
1402	- refresh just the UCD/UCA-related/derived files, just to be safe
1403	- see (ICU4C)/source/data/icu4j-readme.txt
1404	- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1405	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1406	output:
1407	...
1408	Unicode .icu files built to ./out/build/icudt61l
1409	echo timestamp > uni-core-data
1410	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1411	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
1412	echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1413	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1414	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
1415	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
1416	mkdir -p /tmp/icu4j/main/shared/data
1417	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1418	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
1419	mkdir -p /tmp/icu4j/main/shared/data
1420	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1421	make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
1422	- copy the big-endian Unicode data files to another location,
1423	separate from the other data files,
1424	and then refresh ICU4J
1425	cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1426	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1427	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1428	cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1429	cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1430	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1431	cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1432	cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1433	cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1434	jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1435
1436	* When refreshing all of ICU4J data from ICU4C
1437	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1438	- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1439	or
1440	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1441
1442	* update CollationFCD.java
1443	+ copy & paste the initializers of lcccIndex[] etc. from
1444	ICU4C/source/i18n/collationfcd.cpp to
1445	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1446
1447	* refresh Java test .txt files
1448	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1449	cd $ICU_SRC/icu4c/source/data/unidata
1450	cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1451	cd ../../test/testdata
1452	cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1453	cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1454
1455	* run & fix ICU4J tests
1456
1457	*** API additions
1458	- send notice to icu-design about new born-@stable API (enum constants etc.)
1459
1460	*** CLDR numbering systems
1461	- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1462	Unicode 11: using Unicode 11 CLDR ticket #10978
1463	rohg 10D30..10D39 Hanifi_Rohingya
1464	gong 11DA0..11DA9 Gunjala_Gondi
1465	Earlier: CLDR tickets specific to adding new numbering systems.
1466	Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1467	Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1468
1469	*** merge the Unicode update branches back onto the trunk
1470	- do not merge the icudata.jar and testdata.jar,
1471	instead rebuild them from merged & tested ICU4C
1472	- make sure that changes to Unicode tools are checked in:
1473	http://www.unicode.org/utility/trac/log/trunk/unicodetools
1474
1475	---------------------------------------------------------------------------- ***
1476
6be67b06 A	1477	Unicode 10.0 update for ICU 60
	1478
	1479	http://www.unicode.org/versions/Unicode10.0.0/
	1480	http://www.unicode.org/versions/beta-10.0.0.html
	1481	http://blog.unicode.org/2017/03/unicode-100-beta-review.html
	1482	http://www.unicode.org/review/pri350/
	1483	http://www.unicode.org/reports/uax-proposed-updates.html
	1484	http://www.unicode.org/reports/tr44/tr44-19.html
	1485
	1486	* Command-line environment setup
	1487
	1488	UNICODE_DATA=~/unidata/uni10/20170605
	1489	CLDR_SRC=~/svn.cldr/uni10
	1490	ICU_ROOT=~/svn.icu/uni10
	1491	ICU_SRC=$ICU_ROOT/src
	1492	ICUDT=icudt60b
	1493	ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
	1494	ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
	1495	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
	1496
	1497	*** ICU Trac
	1498
	1499	- ticket:12985: Unicode 10
	1500	- ticket:13061: undo hacks from emoji 5.0 update
	1501	- ticket:13062: add Emoji_Component property
	1502	- ^/branches/markus/uni10
	1503
	1504	*** CLDR Trac
	1505
	1506	- cldrbug 10055: Unicode 10
	1507	- cldrbug 9882: Unicode 10 script metadata
	1508	- cldrbug 10219: numbering systems for Unicode 10
	1509
	1510	*** Unicode version numbers
	1511	- makedata.mak
	1512	- uchar.h
	1513	- com.ibm.icu.util.VersionInfo
	1514	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	1515
	1516	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
	1517	so that the makefiles see the new version number.
	1518
	1519	*** data files & enums & parser code
	1520
	1521	* download files
	1522	- mkdir -p $UNICODE_DATA
	1523	- download Unicode 10.0 files into $UNICODE_DATA
	1524	+ subfolders: ucd, uca, idna, security
	1525	+ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
	1526	- download emoji 5.0 files into $UNICODE_DATA/emoji
	1527
	1528	* for manual diffs: remove version suffixes from the file names
	1529	~$ unidata/desuffixucd.py $UNICODE_DATA
	1530	(see https://sites.google.com/site/unicodetools/inputdata)
	1531
	1532	* process and/or copy files
	1533	- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
	1534	+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	1535	+ For debugging, and tweaking how ppucd.txt is written,
	1536	the tool has an --only_ppucd option:
	1537	py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
	1538
	1539	- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
	1540
1541	* build ICU (make install)
1542	so that the tools build can pick up the new definitions from the installed header files.
1543
1544	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1545
1546	* preparseucd.py changes
1547	- remove or add new Unicode scripts from/to the
1548	only-in-ISO-15924 list according to the error messages:
1549	ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
1550	-> adjust _scripts_only_in_iso15924 as indicated
1551	- fix other errors
1552	Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
1553	-> add vo=Vertical_Orientation to _ignored_properties
1554	-> later removed again, parsing the file, even though we do not yet store data for runtime use
1555
1556	* new constants for new property values
1557	- preparseucd.py error:
1558	ValueError: missing uchar.h enum constants for some property values:
1559	[(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
1560	u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
1561	(u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
1562	u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
1563	u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
1564	(u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
1565	= PropertyValueAliases.txt new property values (diff old & new .txt files)
1566	blk; CJK_Ext_F ; CJK_Unified_Ideographs_Extension_F
1567	blk; Kana_Ext_A ; Kana_Extended_A
1568	blk; Masaram_Gondi ; Masaram_Gondi
1569	blk; Nushu ; Nushu
1570	blk; Soyombo ; Soyombo
1571	blk; Syriac_Sup ; Syriac_Supplement
1572	blk; Zanabazar_Square ; Zanabazar_Square
1573	-> add to uchar.h
1574	use long property names for enum constants,
1575	for the trailing comment get the block start code point: diff old & new Blocks.txt
1576	-> add to UCharacter.UnicodeBlock IDs
1577	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1578	replace public static final int \1_ID = \2; \3
1579	-> add to UCharacter.UnicodeBlock objects
1580	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
1581	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1582
1583	jg ; Malayalam_Bha ; Malayalam_Bha
1584	jg ; Malayalam_Ja ; Malayalam_Ja
1585	jg ; Malayalam_Lla ; Malayalam_Lla
1586	jg ; Malayalam_Llla ; Malayalam_Llla
1587	jg ; Malayalam_Nga ; Malayalam_Nga
1588	jg ; Malayalam_Nna ; Malayalam_Nna
1589	jg ; Malayalam_Nnna ; Malayalam_Nnna
1590	jg ; Malayalam_Nya ; Malayalam_Nya
1591	jg ; Malayalam_Ra ; Malayalam_Ra
1592	jg ; Malayalam_Ssa ; Malayalam_Ssa
1593	jg ; Malayalam_Tta ; Malayalam_Tta
1594	-> uchar.h & UCharacter.JoiningGroup
1595
1596	sc ; Gonm ; Masaram_Gondi
1597	sc ; Nshu ; Nushu
1598	sc ; Soyo ; Soyombo
1599	sc ; Zanb ; Zanabazar_Square
1600	-> uscript.h & com.ibm.icu.lang.UScript
1601	-> Nushu had been added already
1602	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1603	and in com.ibm.icu.dev.test.lang.TestUScript.java
1604
1605	* New properties as shown in PropertyValueAliases.txt changes
1606	- boolean Emoji_Component from emoji 5
1607	-> uchar.h & UProperty.java
1608	- boolean
1609	# Regional_Indicator (RI)
1610
1611	RI ; N ; No ; F ; False
1612	RI ; Y ; Yes ; T ; True
1613	-> uchar.h & UProperty.java
1614	-> single immutable range, to be hardcoded
1615	- boolean
1616	# Prepended_Concatenation_Mark (PCM)
1617
1618	PCM; N ; No ; F ; False
1619	PCM; Y ; Yes ; T ; True
1620	-> was new in Unicode 9
1621	-> uchar.h & UProperty.java
1622	- enumerated
1623	# Vertical_Orientation (vo)
1624
1625	vo ; R ; Rotated
1626	vo ; Tr ; Transformed_Rotated
1627	vo ; Tu ; Transformed_Upright
1628	vo ; U ; Upright
1629	-> only pre-parsed for now, but not yet stored for runtime use
1630
1631	* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1632	(not strictly necessary for NOT_ENCODED scripts)
1633	$ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1634
1635	* generate normalization data files
1636	cd $ICU_ROOT/dbg/icu4c
1637	bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1638	bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt
1639	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1640	bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1641	bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1642
1643	* build ICU (make install)
1644	so that the tools build can pick up the new definitions from the installed header files.
1645
1646	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1647
1648	* build Unicode tools using CMake+make
1649
1650	$ICU_SRC/tools/unicode/c/icudefs.txt:
1651
1652	# Location (--prefix) of where ICU was installed.
1653	set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1654	# Location of the ICU4C source tree.
1655	set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
1656
1657	$ICU_ROOT/dbg/tools/unicode/c$
1658	cmake ../../../../src/tools/unicode/c
1659	make
1660
1661	* generate core properties data files
1662	$ICU_ROOT/dbg/tools/unicode/c$
1663	genprops/genprops $ICU_SRC/icu4c
1664	genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1665	genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1666	- rebuild ICU (make install) & tools
1667
1668	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1669	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1670	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1671	- Unicode 6.0..10.0: U+2260, U+226E, U+226F
1672	- nothing new in this Unicode version, no test file to update
1673
1674	* run & fix ICU4C tests
1675	- Andy handles RBBI & spoof check test failures
1676
1677	* collation: CLDR collation root, UCA DUCET
1678
1679	- UCA DUCET goes into Mark's Unicode tools, see
1680	https://sites.google.com/site/unicodetools/home#TOC-UCA
1681	- CLDR root data files are checked into $CLDR_SRC/common/uca/
1682	cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1683
1684	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1685	cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1686	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1687	cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1688	(note removing the underscore before "Rules")
1689	cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1690	- restore TODO diffs in UCARules.txt
1691	meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1692	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1693	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1694	from the CLDR root files (..._CLDR_..._SHORT.txt)
1695	cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1696	cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1697	cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1698	- if CLDR common/uca/unihan-index.txt changes, then update
1699	CLDR common/collation/root.xml <collation type="private-unihan">
1700	and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1701
1702	- run genuca, see command line above;
1703	deal with
1704	Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
1705	FDD1 11D10; [70 D5 02, 05, 05] # Masaram_Gondi first primary (compressible)
1706	(add the character to genuca.cpp sampleCharsToScripts[])
1707	+ look up the USCRIPT_ code for the new sample characters
1708	(should be obvious from the comment in the error output)
1709	+ add mappings to sampleCharsToScripts[], do not replace them
1710	(in case the script sample characters flip-flop)
1711	+ insert new scripts in DUCET script order, see the top_byte table
1712	at the beginning of FractionalUCA.txt
1713	- rebuild ICU4C
1714
1715	* Unihan collators
1716	https://sites.google.com/site/unicodetools/unihan
1717	- run Unicode Tools
1718	org.unicode.draft.GenerateUnihanCollators
1719	with VM arguments
1720	-ea
1721	-DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1722	-DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1723	-DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1724	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1725	-DUVERSION=10.0.0
1726	- run Unicode Tools
1727	org.unicode.draft.GenerateUnihanCollatorFiles
1728	with the same arguments
1729	- check CLDR diffs
1730	cd $CLDR_SRC
1731	meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1732	meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1733	- copy to CLDR
1734	cd $CLDR_SRC
1735	cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1736	cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1737	- run CLDR unit tests, commit to CLDR
1738	- generate ICU zh collation data: run CLDR
1739	org.unicode.cldr.icu.NewLdml2IcuConverter
1740	with program arguments
1741	-t collation
1742	-s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
1743	-m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
1744	-d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
1745	-p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
1746	zh
1747	and VM arguments
1748	-ea
1749	-DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1750	- rebuild ICU4C
1751
1752	* run & fix ICU4C tests, now with new CLDR collation root data
1753	- run all tests with the collation test data *_SHORT.txt or the full files
1754	(the full ones have comments, useful for debugging)
1755	- note on intltest: if collate/UCAConformanceTest fails, then
1756	utility/MultithreadTest/TestCollators will fail as well;
1757	fix the conformance test before looking into the multi-thread test
1758
1759	* update Java data files
1760	- refresh just the UCD/UCA-related/derived files, just to be safe
1761	- see (ICU4C)/source/data/icu4j-readme.txt
1762	- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1763	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1764	output:
1765	...
1766	Unicode .icu files built to ./out/build/icudt60l
1767	echo timestamp > uni-core-data
1768	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1769	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
1770	echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1771	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1772	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
1773	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
1774	mkdir -p /tmp/icu4j/main/shared/data
1775	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1776	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
1777	mkdir -p /tmp/icu4j/main/shared/data
1778	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1779	make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
1780	- copy the big-endian Unicode data files to another location,
1781	separate from the other data files,
1782	and then refresh ICU4J
1783	cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1784	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1785	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1786	cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1787	cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1788	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1789	cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1790	cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1791	cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1792	jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1793
1794	* When refreshing all of ICU4J data from ICU4C
1795	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1796	- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1797	or
1798	- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1799
1800	* update CollationFCD.java
1801	+ copy & paste the initializers of lcccIndex[] etc. from
1802	ICU4C/source/i18n/collationfcd.cpp to
1803	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1804
1805	* refresh Java test .txt files
1806	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1807	cd $ICU_SRC/icu4c/source/data/unidata
1808	cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1809	cd ../../test/testdata
1810	cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1811	cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1812
1813	* run & fix ICU4J tests
1814
1815	*** API additions
1816	- send notice to icu-design about new born-@stable API (enum constants etc.)
1817
1818	*** CLDR numbering systems
1819	- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
1820	Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1821	Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1822
1823	*** merge the Unicode update branches back onto the trunk
1824	- do not merge the icudata.jar and testdata.jar,
1825	instead rebuild them from merged & tested ICU4C
1826	- make sure that changes to Unicode tools are checked in:
1827	http://www.unicode.org/utility/trac/log/trunk/unicodetools
f3c0d7a5 A	1828
	1829	---------------------------------------------------------------------------- ***
	1830
	1831	Emoji 5.0 update for ICU 59
	1832	- ICU 59 mostly remains on Unicode 9.0
	1833	- except updates bidi and segmentation data to Unicode 10 beta
	1834
	1835	First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
	1836
	1837	* Command-line environment setup
	1838
	1839	ICU_ROOT=~/svn.icu/trunk
	1840	ICU_SRC_DIR=$ICU_ROOT/src
	1841	ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
	1842	ICUDT=icudt59b
	1843	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
	1844	SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
	1845	UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
	1846
	1847	*** ICU Trac
	1848
	1849	- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
	1850	- changes directly on trunk
	1851
	1852	*** data files & enums & parser code
	1853
	1854	* download files
	1855
	1856	- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
	1857	- download emoji 5.0 beta files into the same uni90e50 folder
	1858	- download Unicode 10.0 beta files: ucd
	1859	+ copy Unicode 10 bidi files to the uni90e50/ucd folder:
	1860	BidiBrackets.txt
	1861	BidiCharacterTest.txt
	1862	BidiMirroring.txt
	1863	BidiTest.txt
	1864	extracted/DerivedBidiClass.txt
	1865	+ copy Unicode 10 segmentation files to the uni90e50/ucd folder:
	1866	LineBreak.txt
	1867	auxiliary/*
	1868
	1869	* preparseucd.py changes
	1870	- adjust for combined trunks
	1871	- write new copyright lines
	1872	- ignore new Emoji_Component property for now
	1873
	1874	* process and/or copy files
	1875	- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
	1876	+ This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	1877
	1878	- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
	1879
	1880	* build ICU (make install)
	1881	so that the tools build can pick up the new definitions from the installed header files.
	1882
	1883	$ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
	1884
	1885	* build Unicode tools using CMake+make
	1886
	1887	~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
	1888
	1889	# Location (--prefix) of where ICU was installed.
	1890	set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
	1891	# Location of the ICU4C source tree.
1892	set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
1893
1894	~/svn.icu/trunk/dbg/tools/unicode/c$
1895	cmake ../../../../src/tools/unicode/c
1896	make
1897
1898	* generate core properties data files
1899	~/svn.icu/trunk/dbg/tools/unicode/c$
1900	genprops/genprops $ICU4C_SRC_DIR
1901	- rebuild ICU (make install) & tools
1902
1903	* run & fix ICU4C tests
1904	- Andy handles RBBI & spoof check test failures
1905
1906	* update Java data files
1907	- refresh just the UCD/UCA-related/derived files, just to be safe
1908	- see (ICU4C)/source/data/icu4j-readme.txt
1909	- mkdir /tmp/icu4j
1910	- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1911	output:
1912	...
1913	Unicode .icu files built to ./out/build/icudt59l
1914	echo timestamp > uni-core-data
1915	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1916	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
1917	echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1918	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1919	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
1920	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
1921	mkdir -p /tmp/icu4j/main/shared/data
1922	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1923	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
1924	mkdir -p /tmp/icu4j/main/shared/data
1925	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1926	make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
1927	- copy the big-endian Unicode data files to another location,
1928	separate from the other data files,
1929	and then refresh ICU4J
1930	cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
1931	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1932	cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1933	cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1934	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1935	cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1936	jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1937
1938	* When refreshing all of ICU4J data from ICU4C
1939	- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1940	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
1941	or
1942	- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
1943
1944	* refresh Java test .txt files
1945	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1946	cd $ICU4C_SRC_DIR/source/data/unidata
1947	cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1948	cd ../../test/testdata
1949	cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1950	cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1951
1952	* run & fix ICU4J tests
1953
1954	---------------------------------------------------------------------------- ***
1955
1956	Unicode 9.0 update for ICU 58
1957
1958	* Command-line environment setup
1959
1960	ICU_ROOT=~/svn.icu/trunk
1961	ICU_SRC_DIR=$ICU_ROOT/src
1962	ICUDT=icudt58b
1963	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1964	SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
1965	UNIDATA=$ICU_SRC_DIR/source/data/unidata
1966
1967	http://www.unicode.org/review/pri323/ -- beta review
1968	http://www.unicode.org/reports/uax-proposed-updates.html
1969	http://www.unicode.org/versions/beta-9.0.0.html
1970	http://www.unicode.org/versions/Unicode9.0.0/
1971	http://www.unicode.org/reports/tr44/tr44-17.html
1972
1973	*** ICU Trac
1974
1975	- ticket:12526: integrate Unicode 9
1976	- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
1977	- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
1978
1979	*** CLDR Trac
1980
1981	- cldrbug 9414: UCA 9
1982	- ^/branches/markus/uni90 at r11518 from trunk at r11517
1983
1984	- cldrbug 8745: Unicode 9.0 script metadata
1985
1986	*** Unicode version numbers
1987	- makedata.mak
1988	- uchar.h
1989	- com.ibm.icu.util.VersionInfo
1990	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1991
1992	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1993	so that the makefiles see the new version number.
1994
1995	*** data files & enums & parser code
1996
1997	* file preparation
1998
1999	- download UCD & IDNA files
2000	- make sure that the Unicode data folder passed into preparseucd.py
2001	includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2002	- only for manual diffs: remove version suffixes from the file names
2003	~/unidata/uni70/20140403$ ../../desuffixucd.py .
2004	(see https://sites.google.com/site/unicodetools/inputdata)
2005	- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2006	- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2007	- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2008
2009	- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
2010	and copy to $UNIDATA
2011	cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
2012
2013	* preparseucd.py changes
2014	- remove or add new Unicode scripts from/to the
2015	only-in-ISO-15924 list according to the error messages:
2016	ValueError: remove ['Tang'] from _scripts_only_in_iso15924
2017	ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
2018	ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
2019	ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
2020	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2021	and in com.ibm.icu.dev.test.lang.TestUScript.java
2022	- DerivedNumericValues.txt new numeric values
2023	0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
2024	0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH
2025	0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS
2026	0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH
2027	0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS
2028	-> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
2029	uchar.c, UCharacterProperty.java
2030	to support a new series of values
2031	- adjust preparseucd.py for Tangut algorithmic names
2032	in ppucd.txt:
2033	algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
2034	->
2035	algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
2036	- avoid block-compressing most String/Miscellaneous property values,
2037	triggered by genprops not coping with a multi-code point Case_Folding on
2038	block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
2039	keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
2040
2041	* PropertyAliases.txt changes
2042	- 1 new property PCM=Prepended_Concatenation_Mark
2043	Ignore: Only useful for layout engines.
2044	Ok to list in ppucd.txt.
2045
2046	* PropertyValueAliases.txt new property values
2047	blk; Adlam ; Adlam
2048	blk; Bhaiksuki ; Bhaiksuki
2049	blk; Cyrillic_Ext_C ; Cyrillic_Extended_C
2050	blk; Glagolitic_Sup ; Glagolitic_Supplement
2051	blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation
2052	blk; Marchen ; Marchen
2053	blk; Mongolian_Sup ; Mongolian_Supplement
2054	blk; Newa ; Newa
2055	blk; Osage ; Osage
2056	blk; Tangut ; Tangut
2057	blk; Tangut_Components ; Tangut_Components
2058	-> add to uchar.h
2059	use long property names for enum constants
2060	-> add to UCharacter.UnicodeBlock IDs
2061	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2062	replace public static final int \1_ID = \2; \3
2063	-> add to UCharacter.UnicodeBlock objects
2064	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2065	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2066
2067	GCB; EB ; E_Base
2068	GCB; EBG ; E_Base_GAZ
2069	GCB; EM ; E_Modifier
2070	GCB; GAZ ; Glue_After_Zwj
2071	GCB; ZWJ ; ZWJ
2072	-> uchar.h & UCharacter.GraphemeClusterBreak
2073
2074	jg ; African_Feh ; African_Feh
2075	jg ; African_Noon ; African_Noon
2076	jg ; African_Qaf ; African_Qaf
2077	-> uchar.h & UCharacter.JoiningGroup
2078
2079	lb ; EB ; E_Base
2080	lb ; EM ; E_Modifier
2081	lb ; ZWJ ; ZWJ
2082	-> uchar.h & UCharacter.LineBreak
2083
2084	sc ; Adlm ; Adlam
2085	sc ; Bhks ; Bhaiksuki
2086	sc ; Marc ; Marchen
2087	sc ; Newa ; Newa
2088	sc ; Osge ; Osage
2089	sc ; Tang ; Tangut
2090	-> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2091
2092	WB ; EB ; E_Base
2093	WB ; EBG ; E_Base_GAZ
2094	WB ; EM ; E_Modifier
2095	WB ; GAZ ; Glue_After_Zwj
2096	WB ; ZWJ ; ZWJ
2097	-> uchar.h & UCharacter.WordBreak
2098
2099	* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2100	(not strictly necessary for NOT_ENCODED scripts)
2101	~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2102
2103	* generate normalization data files
2104	cd $ICU_ROOT/dbg
2105	bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2106	bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2107	bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2108	bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2109	bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2110
2111	* build ICU (make install)
2112	so that the tools build can pick up the new definitions from the installed header files.
2113
2114	$ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
2115
2116	* build Unicode tools using CMake+make
2117
2118	~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2119
2120	# Location (--prefix) of where ICU was installed.
2121	set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2122	# Location of the ICU source tree.
2123	set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2124
2125	~/svn.icutools/trunk/dbg/unicode/c$
2126	cmake ../../../src/unicode/c
2127	make
2128
2129	* generate core properties data files
2130	~/svn.icutools/trunk/dbg/unicode/c$
2131	genprops/genprops $ICU_SRC_DIR
2132	genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2133	genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2134	- rebuild ICU (make install) & tools
2135
2136	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2137	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2138	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2139	- Unicode 6.0..9.0: U+2260, U+226E, U+226F
2140	- nothing new in 9.0, no test file to update
2141
2142	* run & fix ICU4C tests
2143	- Andy handles RBBI & spoof check test failures
2144
2145	* collation: CLDR collation root, UCA DUCET
2146
2147	- UCA DUCET goes into Mark's Unicode tools, see
2148	https://sites.google.com/site/unicodetools/home#TOC-UCA
2149	- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2150	cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
2151
2152	- cd (CLDR UCA branch)/common/uca/
2153	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2154	cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2155	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2156	cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2157	(note removing the underscore before "Rules")
2158	cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2159	- restore TODO diffs in UCARules.txt
2160	meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2161	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2162	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2163	from the CLDR root files (..._CLDR_..._SHORT.txt)
2164	cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2165	cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2166	cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2167	- if CLDR common/uca/unihan-index.txt changes, then update
2168	CLDR common/collation/root.xml <collation type="private-unihan">
2169	and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2170
2171	- run genuca, see command line above;
2172	deal with
2173	Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
2174	FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible)
2175	(add the character to genuca.cpp sampleCharsToScripts[])
2176	+ look up the USCRIPT_ code for the new sample characters
2177	(should be obvious from the comment in the error output)
2178	+ add mappings to sampleCharsToScripts[], do not replace them
2179	(in case the script sample characters flip-flop)
2180	+ insert new scripts in DUCET script order, see the top_byte table
2181	at the beginning of FractionalUCA.txt
2182	- rebuild ICU4C
2183
2184	* Unihan collators
2185	- run Unicode Tools
2186	org.unicode.draft.GenerateUnihanCollators
2187	with VM arguments
2188	-DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
2189	-DOTHER_WORKSPACE=/home/mscherer/svn.unitools
2190	-DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
2191	-DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2192	-DUVERSION=9.0.0
2193	-ea
2194	- run Unicode Tools
2195	org.unicode.draft.GenerateUnihanCollatorFiles
2196	with the same arguments
2197	- check CLDR diffs
2198	cd ~/svn.cldr/trunk
2199	meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2200	meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2201	- copy to CLDR
2202	cd ~/svn.cldr/trunk
2203	cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2204	cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2205	- commit to CLDR
2206	- generate ICU zh collation data: run CLDR
2207	org.unicode.cldr.icu.NewLdml2IcuConverter
2208	with program arguments
2209	-t collation
2210	-s /home/mscherer/svn.cldr/trunk/common/collation
2211	-m /home/mscherer/svn.cldr/trunk/common/supplemental
2212	-d /home/mscherer/svn.icu/trunk/src/source/data/coll
2213	-p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
2214	zh
2215	and VM arguments
2216	-DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2217	- rebuild ICU4C
2218
2219	* run & fix ICU4C tests, now with new CLDR collation root data
2220	- run all tests with the collation test data *_SHORT.txt or the full files
2221	(the full ones have comments, useful for debugging)
2222	- note on intltest: if collate/UCAConformanceTest fails, then
2223	utility/MultithreadTest/TestCollators will fail as well;
2224	fix the conformance test before looking into the multi-thread test
2225
2226	* update Java data files
2227	- refresh just the UCD/UCA-related/derived files, just to be safe
2228	- see (ICU4C)/source/data/icu4j-readme.txt
2229	- mkdir /tmp/icu4j
2230	- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2231	output:
2232	...
2233	Unicode .icu files built to ./out/build/icudt58l
2234	echo timestamp > uni-core-data
2235	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2236	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
2237	echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2238	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2239	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
2240	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
2241	mkdir -p /tmp/icu4j/main/shared/data
2242	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2243	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
2244	mkdir -p /tmp/icu4j/main/shared/data
2245	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2246	make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2247	- copy the big-endian Unicode data files to another location,
2248	separate from the other data files,
2249	and then refresh ICU4J
2250	cd ~/svn.icu/trunk/dbg/data/out/icu4j
2251	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2252	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2253	cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2254	cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2255	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2256	cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2257	cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2258	cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2259	jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2260
2261	* When refreshing all of ICU4J data from ICU4C
2262	- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2263	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2264	or
2265	- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2266
2267	* update CollationFCD.java
2268	+ copy & paste the initializers of lcccIndex[] etc. from
2269	ICU4C/source/i18n/collationfcd.cpp to
2270	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2271
2272	* refresh Java test .txt files
2273	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2274	cd $ICU_SRC_DIR/source/data/unidata
2275	cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2276	cd ../../test/testdata
2277	cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2278	cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2279
2280	* run & fix ICU4J tests
2281
2282	*** LayoutEngine script information
2283
2284	* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2285	This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2286	in the working directory.
2287
2288	(It also generates ScriptRunData.cpp, which is no longer needed.)
2289
2290	It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2291	(a plain text file)
2292	which maps ICU versions to the numbers of script/language constants
2293	that were added then.
2294	(This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2295
2296	The generated files have a current copyright date and "@deprecated" statement.
2297
2298	* Review changes, fix Java tool if necessary, and copy to ICU4C
2299	cd ~/svn.icu4j/trunk/src
2300	meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2301	cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2302	cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2303
2304	*** API additions
2305	- send notice to icu-design about new born-@stable API (enum constants etc.)
2306
2307	*** merge the Unicode update branches back onto the trunk
2308	- do not merge the icudata.jar and testdata.jar,
2309	instead rebuild them from merged & tested ICU4C
2310	- make sure that changes to Unicode tools & ICU tools are checked in
2311	http://www.unicode.org/utility/trac/log/trunk/unicodetools
2312	http://bugs.icu-project.org/trac/log/tools/trunk
2313
2314	---------------------------------------------------------------------------- ***
2315
2316	New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
2317
2318	Adding
2319	- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
2320	- new combination/alias codes: Hanb, Jamo
2321	- used in CLDR 29 and in spoof checker
2322	- new Z* code: Zsye
2323
2324	Add new codes to uscript.h & UScript.java, see Unicode update logs.
2325	-> com.ibm.icu.lang.UScript
2326	find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2327	replace public static final int \1 = \2; \3
2328
2329	Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
2330	add new script codes.
2331	"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
2332
2333	Note: If we have to run preparseucd.py again before the Unicode 9 update,
2334	then we need to manually keep/restore the new script codes.
2335
2336	ICU_ROOT=~/svn.icu/trunk
2337	ICU_SRC_DIR=$ICU_ROOT/src
2338	ICUDT=icudt57b
2339	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2340	SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2341	UNIDATA=$ICU_SRC_DIR/source/data/unidata
2342
2343	Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
2344	see http://bugs.icu-project.org/trac/ticket/12141
2345
2346	make install, then icutools cmake & make, then
2347	~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2348
2349	Generate Java data as usual, only update pnames.icu & uprops.icu.
2350
2351	*** LayoutEngine script information
2352
2353	* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2354	This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2355	in the working directory.
2356
2357	(It also generates ScriptRunData.cpp, which is no longer needed.)
2358
2359	It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2360	(a plain text file)
2361	which maps ICU versions to the numbers of script/language constants
2362	that were added then.
2363	(This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2364
2365	The generated files have a current copyright date and "@deprecated" statement.
b331163b	2366
f3c0d7a5 A	2367	* Review changes, fix Java tool if necessary, and copy to ICU4C
	2368	cd ~/svn.icu4j/trunk/src
	2369	meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
	2370	cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
	2371	cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
b331163b A	2372
	2373	---------------------------------------------------------------------------- ***
	2374
2ca993e8	2375	Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
b331163b	2376
2ca993e8 A	2377	Edit preparseucd.py to add & parse new properties.
2ca993e8 A	2378	They share the UCD property namespace but are not listed in PropertyAliases.txt.
b331163b	2379
2ca993e8 A	2380	Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
2ca993e8 A	2381	Initial data from emoji/2.0/
b331163b	2382
2ca993e8 A	2383	ICU_ROOT=~/svn.icu/trunk
	2384	ICU_SRC_DIR=$ICU_ROOT/src
	2385	ICUDT=icudt56b
	2386	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
	2387	SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
	2388	UNIDATA=$ICU_SRC_DIR/source/data/unidata
b331163b	2389
2ca993e8	2390	Add binary-property constants to uchar.h enum UProperty & UProperty.java.
b331163b	2391
2ca993e8 A	2392	~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2ca993e8 A	2393	(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
b331163b	2394
2ca993e8	2395	Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
b331163b	2396
2ca993e8 A	2397	make install, then icutools cmake & make, then
2ca993e8 A	2398	~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
b331163b	2399
2ca993e8 A	2400	Generate Java data as usual, only update pnames.icu & uprops.icu.
	2401
	2402	---------------------------------------------------------------------------- ***
	2403
	2404	Unicode 8.0 update for ICU 56
	2405
	2406	* Command-line environment setup
	2407
	2408	ICU_ROOT=~/svn.icu/trunk
	2409	ICU_SRC_DIR=$ICU_ROOT/src
	2410	ICUDT=icudt56b
	2411	export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
	2412	SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
	2413	UNIDATA=$ICU_SRC_DIR/source/data/unidata
	2414
	2415	http://www.unicode.org/review/pri297/ -- beta review
	2416	http://www.unicode.org/reports/uax-proposed-updates.html
	2417	http://unicode.org/versions/beta-8.0.0.html
	2418	http://www.unicode.org/versions/Unicode8.0.0/
	2419	http://www.unicode.org/reports/tr44/tr44-15.html
	2420
	2421	*** ICU Trac
	2422
	2423	- ticket:11574: Unicode 8
	2424	- C++ branches/markus/uni80 at r37351 from trunk at r37343
	2425	- Java branches/markus/uni80 at r37352 from trunk at r37338
	2426
	2427	*** CLDR Trac
	2428
	2429	- cldrbug 8311: UCA 8
	2430	- branches/markus/uni80 at r11518 from trunk at r11517
	2431
	2432	- cldrbug 8109: Unicode 8.0 script metadata
	2433	- cldrbug 8418: Updated segmentation for Unicode 8.0
	2434
	2435	*** Unicode version numbers
	2436	- makedata.mak
	2437	- uchar.h
	2438	- com.ibm.icu.util.VersionInfo
	2439	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	2440
	2441	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
	2442	so that the makefiles see the new version number.
	2443
	2444	*** data files & enums & parser code
	2445
	2446	* file preparation
	2447
	2448	- download UCD & IDNA files
	2449	- make sure that the Unicode data folder passed into preparseucd.py
	2450	includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
	2451	- only for manual diffs: remove version suffixes from the file names
	2452	~/unidata/uni70/20140403$ ../../desuffixucd.py .
	2453	(see https://sites.google.com/site/unicodetools/inputdata)
	2454	- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
	2455	- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
	2456	- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	2457
	2458	- also: from http://unicode.org/Public/security/8.0.0/ download new
	2459	confusables.txt & confusablesWholeScript.txt
	2460	and copy to $UNIDATA
	2461	~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
	2462	~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
	2463
2464	* initial preparseucd.py changes
2465	- remove new Unicode scripts from the
2466	only-in-ISO-15924 list according to the error message:
2467	ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
2468	from _scripts_only_in_iso15924
2469	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2470	and in com.ibm.icu.dev.test.lang.TestUScript.java
2471	- property and file name change:
2472	IndicMatraCategory -> IndicPositionalCategory
2473	- UnicodeData.txt unusual numeric values (improper fractions)
2474	109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
2475	109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
2476	109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
2477	109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
2478	109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
2479	109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
2480	109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
2481	109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
2482	109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
2483	109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
2484	-> change preparseucd.py to map them to proper fractions (e.g., 1/6)
2485	which are listed in DerivedNumericValues.txt;
2486	keeps storage in data file simple
2487
2488	* PropertyValueAliases.txt changes
2489	- 10 new Block (blk) values:
2490	blk; Ahom ; Ahom
2491	blk; Anatolian_Hieroglyphs ; Anatolian_Hieroglyphs
2492	blk; Cherokee_Sup ; Cherokee_Supplement
2493	blk; CJK_Ext_E ; CJK_Unified_Ideographs_Extension_E
2494	blk; Early_Dynastic_Cuneiform ; Early_Dynastic_Cuneiform
2495	blk; Hatran ; Hatran
2496	blk; Multani ; Multani
2497	blk; Old_Hungarian ; Old_Hungarian
2498	blk; Sup_Symbols_And_Pictographs ; Supplemental_Symbols_And_Pictographs
2499	blk; Sutton_SignWriting ; Sutton_SignWriting
2500	-> add to uchar.h
2501	use long property names for enum constants
2502	-> add to UCharacter.UnicodeBlock IDs
2503	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2504	replace public static final int \1_ID = \2; \3
2505	-> add to UCharacter.UnicodeBlock objects
2506	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2507	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2508	- 6 new Script (sc) values:
2509	sc ; Ahom ; Ahom
2510	sc ; Hatr ; Hatran
2511	sc ; Hluw ; Anatolian_Hieroglyphs
2512	sc ; Hung ; Old_Hungarian
2513	sc ; Mult ; Multani
2514	sc ; Sgnw ; SignWriting
2515	-> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2516
2517	* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2518	(not strictly necessary for NOT_ENCODED scripts)
2519	~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2520
2521	* generate normalization data files
2522	cd $ICU_ROOT/dbg
2523	bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2524	bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2525	bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2526	bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2527	bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2528
2529	* build ICU (make install)
2530	so that the tools build can pick up the new definitions from the installed header files.
2531
2532	$ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2533
2534	* build Unicode tools using CMake+make
2535
2536	~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2537
2538	# Location (--prefix) of where ICU was installed.
2539	set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2540	# Location of the ICU source tree.
2541	set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2542
2543	~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2544	~/svn.icutools/trunk/dbg/unicode/c$ make
2545
2546	* generate core properties data files
2547	- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
b331163b A	2548	- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
b331163b A	2549	- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2ca993e8 A	2550	- rebuild ICU (make install) & tools
	2551	- run genuca again (see step above) so that it picks up the new nfc.nrm
	2552	- rebuild ICU (make install) & tools
	2553
	2554	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
	2555	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
	2556	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
	2557	- Unicode 6.0..8.0: U+2260, U+226E, U+226F
	2558	- nothing new in 8.0, no test file to update
	2559
	2560	* run & fix ICU4C tests
	2561	- bad Cherokee case folding due to difference in fallbacks:
	2562	UCD case folding falls back to no mapping,
	2563	ICU runtime case folding falls back to lowercasing;
	2564	fixed casepropsbuilder.cpp to generate scf mappings to self
	2565	when there is an slc mapping but no scf
	2566	- Andy handles RBBI & spoof check test failures
	2567
	2568	* collation: CLDR collation root, UCA DUCET
	2569
	2570	- UCA DUCET goes into Mark's Unicode tools, see
	2571	https://sites.google.com/site/unicodetools/home#TOC-UCA
	2572	- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
	2573	- cd (CLDR UCA branch)/common/uca/
	2574	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
	2575	cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
	2576	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
	2577	cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
	2578	(note removing the underscore before "Rules")
	2579	cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
	2580	- restore TODO diffs in UCARules.txt
	2581	meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
	2582	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
	2583	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
	2584	from the CLDR root files (..._CLDR_..._SHORT.txt)
	2585	cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
	2586	cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
	2587	cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
	2588	- if CLDR common/uca/unihan-index.txt changes, then update
	2589	CLDR common/collation/root.xml <collation type="private-unihan">
	2590	and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
	2591	- run genuca, see command line above;
	2592	deal with
	2593	Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
	2594	(add the character to genuca.cpp sampleCharsToScripts[])
	2595	+ look up the script for the new sample characters
	2596	(e.g., in FractionalUCA.txt)
	2597	+ add mappings to sampleCharsToScripts[], do not replace them
	2598	(in case the script sample characters flip-flop)
	2599	+ insert new scripts in DUCET script order, see the top_byte table
	2600	at the beginning of FractionalUCA.txt
	2601	- rebuild ICU4C
	2602
	2603	* run & fix ICU4C tests, now with new CLDR collation root data
	2604	- run all tests with the collation test data *_SHORT.txt or the full files
	2605	(the full ones have comments, useful for debugging)
	2606	- note on intltest: if collate/UCAConformanceTest fails, then
	2607	utility/MultithreadTest/TestCollators will fail as well;
	2608	fix the conformance test before looking into the multi-thread test
	2609	- fixed bug in CollationWeights::getWeightRanges()
	2610	exposed by new data and CollationTest::TestRootElements
	2611
	2612	* update Java data files
	2613	- refresh just the UCD/UCA-related/derived files, just to be safe
2614	- see (ICU4C)/source/data/icu4j-readme.txt
2615	- mkdir /tmp/icu4j
2616	- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2617	output:
2618	...
2619	Unicode .icu files built to ./out/build/icudt56l
2620	echo timestamp > uni-core-data
2621	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2622	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
2623	echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2624	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2625	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
2626	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
2627	mkdir -p /tmp/icu4j/main/shared/data
2628	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2629	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
2630	mkdir -p /tmp/icu4j/main/shared/data
2631	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2632	make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2633	- copy the big-endian Unicode data files to another location,
2634	separate from the other data files,
2635	and then refresh ICU4J
2636	cd ~/svn.icu/trunk/dbg/data/out/icu4j
2637	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2638	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2639	cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2640	cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2641	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2642	cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2643	cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2644	cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2645	jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2646
2647	* When refreshing all of ICU4J data from ICU4C
2648	- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2649	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2650	or
2651	- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2652
2653	* update CollationFCD.java
2654	+ copy & paste the initializers of lcccIndex[] etc. from
2655	ICU4C/source/i18n/collationfcd.cpp to
2656	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2657
2658	* refresh Java test .txt files
2659	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2660	cd $ICU_SRC_DIR/source/data/unidata
2661	cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2662	cd ../../test/testdata
2663	cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2664	cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2665
2666	* run & fix ICU4J tests
2667
2668	*** LayoutEngine script information
2669
2670	* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
2671	because the layout engine was deprecated in ICU 54.
2672	Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
2673	to write lines that we used to add manually.
2674
2675	* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2676	This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2677	in the working directory.
2678
2679	(It also generates ScriptRunData.cpp, which is no longer needed.)
2680
2681	It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2682	(a plain text file)
2683	which maps ICU versions to the numbers of script/language constants
2684	that were added then.
2685	(This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2686
2687	The generated files have a current copyright date and "@deprecated" statement.
2688
2689	* Review changes, fix Java tool if necessary, and copy to ICU4C
2690	cd ~/svn.icu4j/trunk/src
2691	meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2692	cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2693	cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2694
2695	*** API additions
2696	- send notice to icu-design about new born-@stable API (enum constants etc.)
b331163b	2697
2ca993e8 A	2698	*** merge the Unicode update branches back onto the trunk
	2699	- do not merge the icudata.jar and testdata.jar,
	2700	instead rebuild them from merged & tested ICU4C
	2701	- make sure that changes to Unicode tools & ICU tools are checked in
	2702	http://www.unicode.org/utility/trac/log/trunk/unicodetools
	2703	http://bugs.icu-project.org/trac/log/tools/trunk
b331163b A	2704
	2705	---------------------------------------------------------------------------- ***
	2706
	2707	Unicode 7.0 update for ICU 54
	2708
	2709	http://www.unicode.org/review/pri271/ -- beta review
	2710	http://www.unicode.org/reports/uax-proposed-updates.html
	2711	http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
	2712	http://www.unicode.org/reports/tr44/tr44-13.html
	2713
	2714	*** ICU Trac
	2715
	2716	- ticket 10821: Unicode 7.0, UCA 7.0
	2717	- C++ branches/markus/uni70 at r35584 from trunk at r35580
	2718	- Java branches/markus/uni70 at r35587 from trunk at r35545
	2719
	2720	*** CLDR Trac
	2721
	2722	- ticket 7195: UCA 7.0 CLDR root collation
	2723	- branches/markus/uni70 at r10062 from trunk at r10061
	2724
	2725	- ticket 6762: script metadata for Unicode 7.0 new scripts
	2726
	2727	*** Unicode version numbers
	2728	- makedata.mak
	2729	- uchar.h
	2730	- com.ibm.icu.util.VersionInfo
	2731	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	2732
	2733	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
	2734	so that the makefiles see the new version number.
	2735
	2736	*** data files & enums & parser code
	2737
	2738	* file preparation
	2739
	2740	- download UCD & IDNA files
	2741	- make sure that the Unicode data folder passed into preparseucd.py
	2742	includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
	2743	- only for manual diffs: remove version suffixes from the file names
	2744	~/unidata/uni70/20140403$ ../../desuffixucd.py .
	2745	(see https://sites.google.com/site/unicodetools/inputdata)
	2746	- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
	2747	- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
	2748	- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	2749	- Restore TODO diffs in source/data/unidata/UCARules.txt
	2750	cd $ICU_SRC_DIR
	2751	meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
	2752	- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
	2753
	2754	- also: from http://unicode.org/Public/security/7.0.0/ download new
	2755	confusables.txt & confusablesWholeScript.txt
	2756	and copy to $ICU_ROOT/src/source/data/unidata/
	2757
	2758	* initial preparseucd.py changes
	2759	- remove new Unicode scripts from the
	2760	only-in-ISO-15924 list according to the error message:
	2761	ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
	2762	'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
	2763	'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
	2764	from _scripts_only_in_iso15924
	2765	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
	2766	and in com.ibm.icu.dev.test.lang.TestUScript.java
	2767	- NamesList.txt now has a heading with a non-ASCII character
2768	+ keep ppucd.txt in platform charset, rather than changing tool/test parsers
2769	+ escape non-ASCII characters in heading comments
2770	- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
2771	+ get the copyright from the first file whose copyright line contains the current year
2772
2773	* PropertyValueAliases.txt changes
2774	- 32 new Block (blk) values:
2775	blk; Bassa_Vah ; Bassa_Vah
2776	blk; Caucasian_Albanian ; Caucasian_Albanian
2777	blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers
2778	blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended
2779	blk; Duployan ; Duployan
2780	blk; Elbasan ; Elbasan
2781	blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended
2782	blk; Grantha ; Grantha
2783	blk; Khojki ; Khojki
2784	blk; Khudawadi ; Khudawadi
2785	blk; Latin_Ext_E ; Latin_Extended_E
2786	blk; Linear_A ; Linear_A
2787	blk; Mahajani ; Mahajani
2788	blk; Manichaean ; Manichaean
2789	blk; Mende_Kikakui ; Mende_Kikakui
2790	blk; Modi ; Modi
2791	blk; Mro ; Mro
2792	blk; Myanmar_Ext_B ; Myanmar_Extended_B
2793	blk; Nabataean ; Nabataean
2794	blk; Old_North_Arabian ; Old_North_Arabian
2795	blk; Old_Permic ; Old_Permic
2796	blk; Ornamental_Dingbats ; Ornamental_Dingbats
2797	blk; Pahawh_Hmong ; Pahawh_Hmong
2798	blk; Palmyrene ; Palmyrene
2799	blk; Pau_Cin_Hau ; Pau_Cin_Hau
2800	blk; Psalter_Pahlavi ; Psalter_Pahlavi
2801	blk; Shorthand_Format_Controls ; Shorthand_Format_Controls
2802	blk; Siddham ; Siddham
2803	blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers
2804	blk; Sup_Arrows_C ; Supplemental_Arrows_C
2805	blk; Tirhuta ; Tirhuta
2806	blk; Warang_Citi ; Warang_Citi
2807	-> add to uchar.h
2808	use long property names for enum constants
2809	-> add to UCharacter.UnicodeBlock IDs
2810	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2811	replace public static final int \1_ID = \2; \3
2812	-> add to UCharacter.UnicodeBlock objects
2813	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
2814	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2815	- 28 new Joining_Group (jg) values:
2816	jg ; Manichaean_Aleph ; Manichaean_Aleph
2817	jg ; Manichaean_Ayin ; Manichaean_Ayin
2818	jg ; Manichaean_Beth ; Manichaean_Beth
2819	jg ; Manichaean_Daleth ; Manichaean_Daleth
2820	jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh
2821	jg ; Manichaean_Five ; Manichaean_Five
2822	jg ; Manichaean_Gimel ; Manichaean_Gimel
2823	jg ; Manichaean_Heth ; Manichaean_Heth
2824	jg ; Manichaean_Hundred ; Manichaean_Hundred
2825	jg ; Manichaean_Kaph ; Manichaean_Kaph
2826	jg ; Manichaean_Lamedh ; Manichaean_Lamedh
2827	jg ; Manichaean_Mem ; Manichaean_Mem
2828	jg ; Manichaean_Nun ; Manichaean_Nun
2829	jg ; Manichaean_One ; Manichaean_One
2830	jg ; Manichaean_Pe ; Manichaean_Pe
2831	jg ; Manichaean_Qoph ; Manichaean_Qoph
2832	jg ; Manichaean_Resh ; Manichaean_Resh
2833	jg ; Manichaean_Sadhe ; Manichaean_Sadhe
2834	jg ; Manichaean_Samekh ; Manichaean_Samekh
2835	jg ; Manichaean_Taw ; Manichaean_Taw
2836	jg ; Manichaean_Ten ; Manichaean_Ten
2837	jg ; Manichaean_Teth ; Manichaean_Teth
2838	jg ; Manichaean_Thamedh ; Manichaean_Thamedh
2839	jg ; Manichaean_Twenty ; Manichaean_Twenty
2840	jg ; Manichaean_Waw ; Manichaean_Waw
2841	jg ; Manichaean_Yodh ; Manichaean_Yodh
2842	jg ; Manichaean_Zayin ; Manichaean_Zayin
2843	jg ; Straight_Waw ; Straight_Waw
2844	-> uchar.h & UCharacter.JoiningGroup
2845	- 23 new Script (sc) values:
2846	sc ; Aghb ; Caucasian_Albanian
2847	sc ; Bass ; Bassa_Vah
2848	sc ; Dupl ; Duployan
2849	sc ; Elba ; Elbasan
2850	sc ; Gran ; Grantha
2851	sc ; Hmng ; Pahawh_Hmong
2852	sc ; Khoj ; Khojki
2853	sc ; Lina ; Linear_A
2854	sc ; Mahj ; Mahajani
2855	sc ; Mani ; Manichaean
2856	sc ; Mend ; Mende_Kikakui
2857	sc ; Modi ; Modi
2858	sc ; Mroo ; Mro
2859	sc ; Narb ; Old_North_Arabian
2860	sc ; Nbat ; Nabataean
2861	sc ; Palm ; Palmyrene
2862	sc ; Pauc ; Pau_Cin_Hau
2863	sc ; Perm ; Old_Permic
2864	sc ; Phlp ; Psalter_Pahlavi
2865	sc ; Sidd ; Siddham
2866	sc ; Sind ; Khudawadi
2867	sc ; Tirh ; Tirhuta
2868	sc ; Wara ; Warang_Citi
2869	-> uscript.h (many were added before)
2870	comment "Mende Kikakui" for USCRIPT_MENDE
2871	add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
2872	-> com.ibm.icu.lang.UScript
2873	find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2874	replace public static final int \1 = \2; \3
2875	- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2876	(added 2012-11-01)
2877	Ahom 338 Ahom
2878	Hatr 127 Hatran
2879	Mult 323 Multani
2880	(added 2013-10-12)
2881	Modi 324 Modi
2882	Pauc 263 Pau Cin Hau
2883	Sidd 302 Siddham
2884	-> uscript.h (some overlap with additions from Unicode)
2885	-> com.ibm.icu.lang.UScript
2886	find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2887	replace public static final int \1 = \2; \3
2888	-> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
2889	-> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2890	and in com.ibm.icu.dev.test.lang.TestUScript.java
2891
2892	* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2893	(not strictly necessary for NOT_ENCODED scripts)
2894	~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2895
2896	* generate normalization data files
2897	- cd $ICU_ROOT/dbg
2898	- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2899	- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2900	- UNIDATA=$ICU_SRC_DIR/source/data/unidata
2901	- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2902	- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
2903	- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
2904	- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2905	- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
2906
2907	* build ICU (make install)
2908	so that the tools build can pick up the new definitions from the installed header files.
2909
2910	~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2911
2912	* build Unicode tools using CMake+make
2913
2914	~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2915
2916	# Location (--prefix) of where ICU was installed.
2917	set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
2918	# Location of the ICU source tree.
2919	set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
2920
2921	~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2922	~/svn.icutools/trunk/dbg/unicode/c$ make
2923
2924	* genprops work
2925	- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
2926	+ add second array of Joining_Group values for at most 10800..10FFF
2927	icutools: unicode/c/genprops/bidipropsbuilder.cpp
2928	icu: source/common/ubidi_props.h/.c/_data.h
2929	icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
2930
2931	* generate core properties data files
2932	- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2933	- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
2934	- rebuild ICU (make install) & tools
2935	- run genuca again (see step above) so that it picks up the new nfc.nrm
2936	- rebuild ICU (make install) & tools
2937
2938	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2939	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2940	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2941	- Unicode 6.0..7.0: U+2260, U+226E, U+226F
2942	- nothing new in 7.0, no test file to update
2943
2944	* run & fix ICU4C tests
2945
2946	* update Java data files
2947	- refresh just the UCD-related files, just to be safe
2948	- see (ICU4C)/source/data/icu4j-readme.txt
2949	- mkdir /tmp/icu4j
2950	- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2951	output:
2952	...
2953	Unicode .icu files built to ./out/build/icudt53l
2954	echo timestamp > uni-core-data
2955	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2956	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
2957	echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2958	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2959	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
2960	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
2961	mkdir -p /tmp/icu4j/main/shared/data
2962	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2963	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
2964	mkdir -p /tmp/icu4j/main/shared/data
2965	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2966	make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
2967	- copy the big-endian Unicode data files to another location,
2968	separate from the other data files
2969	ICUDT=icudt54b
2970	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2971	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2972	cd ~/svn.icu/uni70/dbg/data/out/icu4j
2973	cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2974	cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2975	rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2976	cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2977	cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2978	cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2979	- refresh ICU4J
2980	~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2981
2982	* update CollationFCD.java
2983	+ copy & paste the initializers of lcccIndex[] etc. from
2984	ICU4C/source/i18n/collationfcd.cpp to
2985	ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2986
2987	* refresh Java test .txt files
2988	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2989	cd $ICU_SRC_DIR/source/data/unidata
2990	cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2991	cd ../../test/testdata
2992	cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2993	cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2994
2995	* UCA
2996
2997	- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
2998	- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
2999	- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
3000	- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
3001	- output files are in ~/svn.unitools/Generated/uca/7.0.0/
3002	- review data; compare files, use blankweights.sed or similar
3003	~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
3004	- cd ~/svn.unitools/Generated/uca/7.0.0/
3005	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3006	cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3007	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3008	(note removing the underscore before "Rules")
3009	cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3010	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3011	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3012	with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3013	cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3014	cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3015	cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3016	- run genuca, see command line above
3017	- rebuild ICU4C
3018	- refresh ICU4J collation data:
3019	(subset of instructions above for properties data refresh, except copies all coll/*)
3020	ICUDT=icudt54b
3021	~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3022	~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3023	~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3024	~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3025	- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3026	- note on intltest: if collate/UCAConformanceTest fails, then
3027	utility/MultithreadTest/TestCollators will fail as well;
3028	fix the conformance test before looking into the multi-thread test
3029	- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
3030	- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
3031	~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
3032
3033	* When refreshing all of ICU4J data from ICU4C
3034	- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3035	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3036	or
3037	- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3038
3039	* run & fix ICU4J tests
3040
3041	*** LayoutEngine script information
3042
3043	(For details see the Unicode 5.2 change log below.)
3044
3045	* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3046	This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3047	in the working directory.
3048	(It also generates ScriptRunData.cpp, which is no longer needed.)
3049
3050	The generated files have a current copyright date and "@stable" statement.
3051	ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
3052	for "born stable" Unicode API constants, and to stop parsing ICU version numbers
3053	which may not contain dots any more.
3054
3055	- diff current <icu>/source/layout files vs. generated ones
3056	~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3057	review and manually merge desired changes;
3058	fix gratuitous changes, incorrect @draft/@stable and missing aliases;
3059	Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3060	- if you just copy the above files, then
3061	fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3062	manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3063
3064	*** API additions
3065	- send notice to icu-design about new born-@stable API (enum constants etc.)
3066
3067	*** merge the Unicode update branches back onto the trunk
3068	- do not merge the icudata.jar and testdata.jar,
3069	instead rebuild them from merged & tested ICU4C
3070
3071	---------------------------------------------------------------------------- ***
3072
57a6839d A	3073	Unicode 6.3 update
	3074
	3075	http://www.unicode.org/review/pri249/ -- beta review
	3076	http://www.unicode.org/reports/uax-proposed-updates.html
	3077	http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
	3078	http://www.unicode.org/reports/tr44/tr44-11.html
	3079
	3080	*** ICU Trac
	3081
	3082	- ticket 10128: update ICU to Unicode 6.3 beta
	3083	- ticket 10168: update ICU to Unicode 6.3 final
	3084	- C++ branches/markus/uni63 at r33552 from trunk at r33551
	3085	- Java branches/markus/uni63 at r33550 from trunk at r33553
	3086
	3087	- ticket 10142: implement Unicode 6.3 bidi algorithm additions
	3088
	3089	*** Unicode version numbers
	3090	- makedata.mak
	3091	- uchar.h
	3092	(configure.in & configure: have been modified to extract the version from uchar.h)
	3093	- com.ibm.icu.util.VersionInfo
	3094	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	3095
	3096	- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
	3097	so that the makefiles see the new version number.
	3098
	3099	*** data files & enums & parser code
	3100
	3101	* file preparation
	3102
	3103	- download UCD, UCA & IDNA files
	3104	- make sure that the Unicode data folder passed into preparseucd.py
	3105	includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
	3106	- modify preparseucd.py:
	3107	parse new file BidiBrackets.txt
	3108	with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
	3109	- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
	3110	- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	3111	- Check test file diffs for previously commented-out, known-failing data lines;
	3112	probably need to keep those commented out.
	3113
	3114	* PropertyAliases.txt changes
	3115	- 1 new Enumerated Property
	3116	bpt ; Bidi_Paired_Bracket_Type
	3117	-> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
	3118	-> ubidi_props.h & .c & UBiDiProps.java
	3119	-> remember to write the max value at UBIDI_MAX_VALUES_INDEX
	3120	-> uprops.cpp
	3121	-> change ubidi.icu format version from 2.0 to 2.1
	3122	- 1 new Miscellaneous Property
	3123	bpb ; Bidi_Paired_Bracket
	3124	-> uchar.h & UProperty.java
	3125	-> ppucd.h & .cpp
	3126
	3127	* PropertyValueAliases.txt changes
	3128	- 3 Bidi_Paired_Bracket_Type (bpt) values:
	3129	bpt; c ; Close
	3130	bpt; n ; None
	3131	bpt; o ; Open
	3132	-> uchar.h & UCharacter.BidiPairedBracketType
	3133	-> ubidi_props.h & .c & UBiDiProps.java
	3134	-> change ubidi.icu format version from 2.0 to 2.1
	3135	- 4 new Bidi_Class (bc) values:
	3136	bc ; FSI ; First_Strong_Isolate
3137	bc ; LRI ; Left_To_Right_Isolate
3138	bc ; RLI ; Right_To_Left_Isolate
3139	bc ; PDI ; Pop_Directional_Isolate
3140	-> uchar.h & UCharacterEnums.ECharacterDirection
3141	-> until the bidi code gets updated,
3142	Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
3143	- 3 new Word_Break (WB) values:
3144	WB ; HL ; Hebrew_Letter
3145	WB ; SQ ; Single_Quote
3146	WB ; DQ ; Double_Quote
3147	-> uchar.h & UCharacter.WordBreak
3148	-> first time Word_Break numeric constants exceed 4 bits (now 17 values)
3149	- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3150	(added 2012-10-16)
3151	Aghb 239 Caucasian Albanian
3152	Mahj 314 Mahajani
3153	-> uscript.h
3154	-> com.ibm.icu.lang.UScript
3155	find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3156	replace public static final int \1 = \2;\3
3157	-> preparseucd.py _scripts_only_in_iso15924
3158	-> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3159	and in com.ibm.icu.dev.test.lang.TestUScript.java
3160	-> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3161	(not strictly necessary for NOT_ENCODED scripts)
3162
3163	* generate normalization data files
3164	- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
3165	- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
3166	- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
3167	- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
3168	- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
3169	- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3170	- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
3171
3172	* build ICU (make install)
3173	so that the tools build can pick up the new definitions from the installed header files.
3174
3175	~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3176
3177	* build Unicode tools using CMake+make
3178
3179	~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3180
3181	# Location (--prefix) of where ICU was installed.
3182	set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
3183	# Location of the ICU source tree.
3184	set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
3185
3186	~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3187	~/svn.icutools/trunk/dbg/unicode/c$ make
3188
3189	* generate core properties data files
3190	- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
3191	- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
3192	- rebuild ICU (make install) & tools
3193	- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3194	- rebuild ICU (make install) & tools
3195
3196	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3197	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3198	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3199	- Unicode 6.0..6.3: U+2260, U+226E, U+226F
3200	- nothing new in 6.3, no test file to update
3201
3202	* update Java data files
3203	- refresh just the UCD-related files, just to be safe
3204	- see (ICU4C)/source/data/icu4j-readme.txt
3205	- mkdir /tmp/icu4j
3206	- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3207	output:
3208	...
3209	Unicode .icu files built to ./out/build/icudt52l
3210	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3211	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
3212	echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3213	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3214	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
3215	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
3216	mkdir -p /tmp/icu4j/main/shared/data
3217	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3218	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
3219	mkdir -p /tmp/icu4j/main/shared/data
3220	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3221	make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
3222	- copy the big-endian Unicode data files to another location,
3223	separate from the other data files
3224	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3225	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3226	~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3227	~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
3228	~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3229	~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3230	~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3231	- refresh ICU4J
3232	~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3233
3234	* refresh Java test .txt files
3235	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3236
3237	* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
3238
3239	- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3240	- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3241	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3242	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3243	(note removing the underscore before "Rules")
3244	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3245	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3246	with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3247	- check test file diffs for previously commented-out, known-failing data lines;
3248	probably need to keep those commented out
3249	- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3250	- run genuca, see command line above
3251	- rebuild ICU4C
3252	- refresh ICU4J collation data:
3253	(subset of instructions above for properties data refresh, except copies all coll/*)
3254	~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3255	~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3256	~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3257	~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3258	- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3259	- note on intltest: if collate/UCAConformanceTest fails, then
3260	utility/MultithreadTest/TestCollators will fail as well;
3261	fix the conformance test before looking into the multi-thread test
3262
3263	* test ICU, fix test code where necessary
3264
3265	* When refreshing all of ICU4J data from ICU4C
3266	- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3267	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3268	or
3269	- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3270
3271	*** LayoutEngine script information
3272	- skipped for Unicode 6.3: no new scripts
3273
3274	*** merge the Unicode update branches back onto the trunk
3275	- do not merge the icudata.jar and testdata.jar,
3276	instead rebuild them from merged & tested ICU4C
3277
3278	---------------------------------------------------------------------------- ***
3279
51004dcb A	3280	Unicode 6.2 update
	3281
	3282	http://www.unicode.org/review/pri230/
	3283	http://www.unicode.org/versions/beta-6.2.0.html
	3284	http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
	3285	http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values
	3286	http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol
	3287	http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols
	3288	http://www.unicode.org/reports/tr46/tr46-8.html IDNA
	3289	http://unicode.org/Public/idna/6.2.0/
	3290
	3291	*** ICU Trac
	3292
	3293	- ticket 9515: Unicode 6.2: final ICU update
	3294
	3295	- ticket 9514: UCA 6.2: fix UCARules.txt
	3296
	3297	- ticket 9437: update ICU to Unicode 6.2
	3298	- C++ branches/markus/uni62 at r32050 from trunk at r32041
	3299	- Java branches/markus/uni62 at r32068 from trunk at r32066
	3300
	3301	*** Unicode version numbers
	3302	- makedata.mak
	3303	- uchar.h
	3304	(configure.in & configure: have been modified to extract the version from uchar.h)
	3305	- com.ibm.icu.util.VersionInfo
	3306	- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
	3307
	3308	*** data files & enums & parser code
	3309
	3310	* file preparation
	3311
	3312	- download UCD, UCA & IDNA files
	3313	- make sure that the Unicode data folder passed into preparseucd.py
	3314	includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
	3315	- modify preparseucd.py: NamesList.txt is now in UTF-8
	3316	- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
	3317	- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	3318	- Check test file diffs for previously commented-out, known-failing data lines;
	3319	probably need to keep those commented out.
	3320
	3321	* PropertyValueAliases.txt changes
	3322	- 1 new Line_Break (lb) value:
	3323	lb ; RI ; Regional_Indicator
	3324	-> uchar.h & UCharacter.LineBreak
	3325	- 1 new Word_Break (WB) value:
	3326	WB ; RI ; Regional_Indicator
	3327	-> uchar.h & UCharacter.WordBreak
	3328	- 1 new Grapheme_Cluster_Break (GCB) value:
	3329	GCB; RI ; Regional_Indicator
	3330	-> uchar.h & UCharacter.GraphemeClusterBreak
	3331
	3332	* 3 new numeric values
	3333	The new value -1, which was really supposed to be NaN but that would have required
	3334	new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
	3335	but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
	3336	cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
	3337	cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
	3338	The two new values 216000 and 432000 require an addition to the encoding of numeric values.
	3339	cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
	3340	cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
	3341	-> uprops.h, uchar.c & UCharacterProperty.java
	3342	-> cucdtst.c & UCharacterTest.java
	3343
3344	* generate normalization data files
3345	- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
3346	- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
3347	- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
3348	- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
3349	- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
3350	- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3351	- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
3352
3353	* build ICU (make install)
3354	so that the tools build can pick up the new definitions from the installed header files.
3355	* build Unicode tools using CMake+make
3356
3357	* generate core properties data files
3358	- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
3359	- in initial bootstrapping, change the UCA version
3360	in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3361	- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
3362	- rebuild ICU (make install) & tools
3363	+ if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3364	check if the UCA version in FractionalUCA.txt matches the new Unicode version
3365	(see step above)
3366	- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3367	- rebuild ICU (make install) & tools
3368
3369	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3370	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3371	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3372	- Unicode 6.0..6.2: U+2260, U+226E, U+226F
3373	- nothing new in 6.2, no test file to update
3374
3375	* update Java data files
3376	- refresh just the UCD-related files, just to be safe
3377	- see (ICU4C)/source/data/icu4j-readme.txt
3378	- mkdir /tmp/icu4j
3379	- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3380	output:
3381	...
3382	Unicode .icu files built to ./out/build/icudt50l
3383	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3384	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
3385	echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3386	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3387	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
3388	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
3389	mkdir -p /tmp/icu4j/main/shared/data
3390	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3391	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
3392	mkdir -p /tmp/icu4j/main/shared/data
3393	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3394	make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
3395	- copy the big-endian Unicode data files to another location,
3396	separate from the other data files
3397	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3398	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3399	~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3400	~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
3401	~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3402	~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3403	~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3404	- refresh ICU4J
3405	~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3406
3407	* refresh Java test .txt files
3408	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3409
3410	* UCA
3411
3412	- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3413	- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3414	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3415	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3416	(note removing the underscore before "Rules")
3417	- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3418	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3419	with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3420	- check test file diffs for previously commented-out, known-failing data lines;
3421	probably need to keep those commented out
3422	- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3423	- run genuca, see command line above
3424	- rebuild ICU4C
3425	- refresh ICU4J collation data:
3426	(subset of instructions above for properties data refresh, except copies all coll/*)
3427	~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3428	~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3429	~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3430	~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3431	- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3432	- note on intltest: if collate/UCAConformanceTest fails, then
3433	utility/MultithreadTest/TestCollators will fail as well;
3434	fix the conformance test before looking into the multi-thread test
3435
3436	* test ICU, fix test code where necessary
3437
3438	* When refreshing all of ICU4J data from ICU4C
3439	- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3440	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3441	or
3442	- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3443
3444	*** LayoutEngine script information
3445	- skipped for Unicode 6.2: no new scripts
3446
3447	*** merge the Unicode update branches back onto the trunk
3448	- do not merge the icudata.jar and testdata.jar,
3449	instead rebuild them from merged & tested ICU4C
3450
3451	---------------------------------------------------------------------------- ***
73c04bcf	3452
4388f060 A	3453	Future Unicode update
	3454
	3455	Tools simplified since the Unicode 6.1 update. See
	3456	- http://site.icu-project.org/design/props/ppucd
	3457	- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
	3458
	3459	* Unicode version numbers
	3460	- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
	3461
	3462	* file preparation
	3463	- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
	3464	- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
	3465	- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
	3466	- Check test file diffs for previously commented-out, known-failing data lines;
	3467	probably need to keep those commented out.
	3468
	3469	* PropertyValueAliases.txt changes
	3470	- Script codes that are in ISO 15924 but not in Unicode are now listed in
	3471	preparseucd.py, in the _scripts_only_in_iso15924 variable.
	3472	If there are new ISO codes, then add them.
	3473	If Unicode adds some of them, then remove them from the .py variable.
	3474
	3475	* UnicodeData.txt changes
	3476	- No more manual changes for CJK ranges for algorithmic names;
	3477	those are now written to ppucd.txt and genprops reads them from there.
	3478
	3479	* generate core properties data files (makeprops.sh was deleted)
	3480	- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
	3481
	3482	* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
	3483	- it is now generated by preparseucd.py
	3484
	3485	* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
	3486	- it is now generated by preparseucd.py
	3487	- make sure that the Unicode data folder passed into preparseucd.py
	3488	includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
	3489	(can be in some subfolder)
	3490
	3491	* generate normalization data files
	3492	- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
	3493	- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
	3494	- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
	3495	- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt
	3496	- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt
	3497	- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
	3498	- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt
	3499
	3500	* build ICU (make install)
	3501	* build Unicode tools using CMake+make
	3502
	3503	* new way to call genuca (makeuca.sh was deleted)
	3504	- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
	3505
	3506	---------------------------------------------------------------------------- ***
	3507
	3508	Unicode 6.1 update
	3509
	3510	*** ICU Trac
	3511
	3512	- ticket 8995 final update to Unicode 6.1
	3513	- ticket 8994 regenerate source/layout/CanonData.cpp
	3514
	3515	- ticket 8961 support Unicode "Age" value names
	3516	- ticket 8963 support multiple character name aliases & types
3517
3518	- ticket 8827 "update ICU to Unicode 6.1"
3519	- C++ branches/markus/uni61 at r30864 from trunk at r30843
3520	- Java branches/markus/uni61 at r30865 from trunk at r30863
3521
3522	*** Unicode version numbers
3523	- makedata.mak
3524	- uchar.h
3525	(configure.in & configure: have been modified to extract the version from uchar.h)
3526	- com.ibm.icu.util.VersionInfo
3527	- icutools/unicode/makedefs.sh
3528	+ also review & update other definitions in that file,
3529	e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
3530
3531	*** data files & enums & parser code
3532
3533	* file preparation
3534
3535	~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
3536	- This prepares both unidata and testdata files in respective output subfolders.
3537	- Check test file diffs for previously commented-out, known-failing data lines;
3538	probably need to keep those commented out.
3539
3540	* PropertyValueAliases.txt changes
3541	- 11 new block names:
3542	Arabic_Extended_A
3543	Arabic_Mathematical_Alphabetic_Symbols
3544	Chakma
3545	Meetei_Mayek_Extensions
3546	Meroitic_Cursive
3547	Meroitic_Hieroglyphs
3548	Miao
3549	Sharada
3550	Sora_Sompeng
3551	Sundanese_Supplement
3552	Takri
3553	-> add to uchar.h
3554	-> add to UCharacter.UnicodeBlock IDs
3555	Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3556	replace public static final int \1_ID = \2; \3
3557	-> add to UCharacter.UnicodeBlock objects
3558	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
3559	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3560	- 1 new Joining_Group (jg) value:
3561	Rohingya_Yeh
3562	-> uchar.h & UCharacter.JoiningGroup
3563	- 2 new Line_Break (lb) values:
3564	CJ=Conditional_Japanese_Starter
3565	HL=Hebrew_Letter
3566	-> uchar.h & UCharacter.LineBreak
3567	- 7 new scripts:
3568	sc ; Cakm ; Chakma
3569	sc ; Merc ; Meroitic_Cursive
3570	sc ; Mero ; Meroitic_Hieroglyphs
3571	sc ; Plrd ; Miao
3572	sc ; Shrd ; Sharada
3573	sc ; Sora ; Sora_Sompeng
3574	sc ; Takr ; Takri
3575	-> remove these from SyntheticPropertyValueAliases.txt
3576	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3577	and in com.ibm.icu.dev.test.lang.TestUScript.java
3578	- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3579	(added 2011-06-21)
3580	Khoj 322 Khojki
3581	Tirh 326 Tirhuta
3582	and another one added 2011-12-09
3583	Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
3584	-> uscript.h
3585	-> com.ibm.icu.lang.UScript
3586	find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3587	replace public static final int \1 = \2;\3
3588	-> SyntheticPropertyValueAliases.txt
3589	-> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3590	and in com.ibm.icu.dev.test.lang.TestUScript.java
3591
3592	* UnicodeData.txt changes
3593	- the last Unihan code point changes from U+9FCB to U+9FCC
3594	search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
3595	+ do change gennames.c
3596	+ do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
3597
3598	* DerivedBidiClass.txt changes
3599	- 2 new default-AL blocks:
3600	# Arabic Extended-A: U+08A0 - U+08FF (was default-R)
3601	# Arabic Mathematical Alphabetic Symbols:
3602	# U+1EE00 - U+1EEFF (was default-R)
3603	- 2 new default-R blocks:
3604	# Meroitic Hieroglyphs:
3605	# U+10980 - U+1099F
3606	# Meroitic Cursive: U+109A0 - U+109FF
3607	-> should be picked up by the explicit data in the file
3608
3609	* NameAliases.txt changes
3610	- from
3611	# Each line has two fields
3612	# First field: Code point
3613	# Second field: Alias
3614	- to
3615	# Each line has three fields, as described here:
3616	#
3617	# First field: Code point
3618	# Second field: Alias
3619	# Third field: Type
3620	- Also, the file previously allowed multiple aliases but only now does it
3621	actually provide multiple, even multiple of the same type. For example,
3622	FEFF;BYTE ORDER MARK;alternate
3623	FEFF;BOM;abbreviation
3624	FEFF;ZWNBSP;abbreviation
3625	- This breaks our gennames parser, unames.icu data structure, and API.
3626	Fix gennames to only pick up "correction" aliases.
3627	New ticket #8963 for further changes.
3628
3629	* run genpname/preparse.pl (on Linux)
3630	+ cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3631	+ make sure that data.h is writable
3632	+ perl preparse.pl ~/svn.icu/trunk/src > out.txt
3633	+ preparse.pl shows no errors, out.txt Info and Warning lines look ok
3634
3635	* build ICU (make install)
3636	so that the tools build can pick up the new definitions from the installed header files.
3637	* build Unicode tools (at least genpname) using CMake+make
3638
3639	* run genpname
3640	(builds both pnames.icu and propname_data.h)
3641	- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3642	- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3643
3644	* build ICU (make install)
3645	* build Unicode tools using CMake+make
3646
3647	* update source/data/unidata/norm2/nfkc_cf.txt
3648	- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3649
3650	* update source/data/unidata/norm2/uts46.txt
3651	- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3652	to ~/svn.icu/tools/trunk/src/unicode/py
3653	- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
3654	- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3655	- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3656
3657	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3658	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3659	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3660	- Unicode 6.0..6.1: U+2260, U+226E, U+226F
3661	- nothing new in 6.1, no test file to update
3662
3663	* generate core properties data files
3664	- in initial bootstrapping, change the UCA version
3665	in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3666	- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3667	- rebuild ICU & tools
3668	+ if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3669	check if the UCA version in FractionalUCA.txt matches the new Unicode version
3670	(see step above)
3671	- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
3672	~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3673	- rebuild ICU & tools
3674
3675	* update Java data files
3676	- refresh just the UCD-related files, just to be safe
3677	- see (ICU4C)/source/data/icu4j-readme.txt
3678	- mkdir /tmp/icu4j
3679	- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3680	output:
3681	...
3682	Unicode .icu files built to ./out/build/icudt49l
3683	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3684	mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
3685	echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3686	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3687	mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
3688	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
3689	mkdir -p /tmp/icu4j/main/shared/data
3690	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3691	jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
3692	mkdir -p /tmp/icu4j/main/shared/data
3693	cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3694	make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
3695	- copy the big-endian Unicode data files to another location,
3696	separate from the other data files
3697	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3698	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3699	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3700	~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
3701	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3702	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3703	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3704	- refresh ICU4J
3705	~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3706
3707	* refresh Java test .txt files
3708	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3709
3710	* test ICU so far, fix test code where necessary
3711	- temporarily ignore collation issues that look like UCA/UCD mismatches,
3712	until UCA data is updated
3713
3714	* UCA
3715
3716	- get output from Mark's tools; look in
3717	http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
3718	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3719	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3720	(note removing the underscore before "Rules")
3721	- update (ICU)/source/test/testdata/CollationTest_*.txt
3722	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3723	with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3724	- check test file diffs for previously commented-out, known-failing data lines;
3725	probably need to keep those commented out
3726	- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3727	- run makeuca.sh:
3728	~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3729	- rebuild ICU4C
3730	- refresh ICU4J collation data:
3731	(subset of instructions above for properties data refresh, except copies all coll/*)
3732	~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3733	~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3734	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3735	~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3736	- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3737	- note on intltest: if collate/UCAConformanceTest fails, then
3738	utility/MultithreadTest/TestCollators will fail as well;
3739	fix the conformance test before looking into the multi-thread test
3740
3741	* When refreshing all of ICU4J data from ICU4C
3742	- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3743	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3744	or
3745	- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3746
3747	*** LayoutEngine script information
3748
3749	(For details see the Unicode 5.2 change log below.)
3750
3751	* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3752	This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3753	in the working directory.
3754	(It also generates ScriptRunData.cpp, which is no longer needed.)
3755
3756	The generated files have a current copyright date and "@draft" statement.
3757
3758	- diff current <icu>/source/layout files vs. generated ones
3759	~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3760	review and manually merge desired changes;
3761	fix gratuitous changes, incorrect @draft and missing aliases;
3762	Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3763	- if you just copy the above files, then
3764	fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3765	manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3766
3767	*** merge the Unicode update branches back onto the trunk
3768	- do not merge the icudata.jar and testdata.jar,
3769	instead rebuild them from merged & tested ICU4C
3770
3771	---------------------------------------------------------------------------- ***
3772
3773	ICU 4.8 (no Unicode update, just new script codes)
3774
3775	* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3776	(added 2010-12-21)
3777	Afak 439 Afaka
3778	Jurc 510 Jurchen
3779	Mroo 199 Mro, Mru
3780	Nshu 499 Nüshu
3781	Shrd 319 Sharada, Śāradā
3782	Sora 398 Sora Sompeng
3783	Takr 321 Takri, Ṭākrī, Ṭāṅkrī
3784	Tang 520 Tangut
3785	Wole 480 Woleai
3786	-> uscript.h
3787	-> com.ibm.icu.lang.UScript
3788	find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3789	replace public static final int \1 = \2;\3
3790	-> genpname/SyntheticPropertyValueAliases.txt
3791	-> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3792	and in com.ibm.icu.dev.test.lang.TestUScript.java
3793
3794	* run genpname/preparse.pl (on Linux)
3795	+ cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3796	+ make sure that data.h is writable
3797	+ perl preparse.pl ~/svn.icu/trunk/src > out.txt
3798	+ preparse.pl shows no errors, out.txt Info and Warning lines look ok
3799
3800	* rebuild Unicode tools (at least genpname) using make
3801	- You might first need to "make install" ICU so that the tools build can pick
3802	up the new definitions from the installed header files.
3803
3804	* run genpname
3805	(builds both pnames.icu and propname_data.h)
3806	- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3807	- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3808	- rebuild ICU & tools
3809
3810	* run genprops
3811	- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3812	- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3813	- rebuild ICU & tools
3814
3815	* update Java data files
3816	- refresh just the UCD-related files, just to be safe
3817	- see (ICU4C)/source/data/icu4j-readme.txt
3818	- mkdir /tmp/icu4j
3819	- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3820	- copy the big-endian Unicode data files to another location,
3821	separate from the other data files
3822	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3823	~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3824	~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3825	- refresh ICU4J
3826	~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
3827
3828	* should have updated the layout engine script codes but forgot
3829
3830	---------------------------------------------------------------------------- ***
3831
729e4ab9 A	3832	Unicode 6.0 update
	3833
	3834	*** related ICU Trac tickets
	3835
	3836	7264 Unicode 6.0 Update
	3837
	3838	*** Unicode version numbers
	3839	- makedata.mak
	3840	- uchar.h
	3841	(configure.in & configure: have been modified to extract the version from uchar.h)
	3842	- com.ibm.icu.util.VersionInfo
	3843
	3844	*** data files & enums & parser code
	3845
	3846	* file preparation
	3847
	3848	~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
	3849	- This now prepares both unidata and testdata files in respective output subfolders.
	3850
	3851	* PropertyAliases.txt changes
	3852	- new Script_Extensions property defined in the new ScriptExtensions.txt file
	3853	but not listed in PropertyAliases.txt; reported to unicode.org;
	3854	-> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
	3855	scx; Script_Extensions
	3856	-> uchar.h with new UProperty section
	3857	-> com.ibm.icu.lang.UProperty, parallel with uchar.h
	3858
	3859	* PropertyValueAliases.txt changes
	3860	- 12 new block names:
	3861	Alchemical_Symbols
	3862	Bamum_Supplement
	3863	Batak
	3864	Brahmi
	3865	CJK_Unified_Ideographs_Extension_D
	3866	Emoticons
	3867	Ethiopic_Extended_A
	3868	Kana_Supplement
	3869	Mandaic
	3870	Miscellaneous_Symbols_And_Pictographs
	3871	Playing_Cards
	3872	Transport_And_Map_Symbols
	3873	-> add to uchar.h
	3874	-> add to UCharacter.UnicodeBlock
	3875	Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
	3876	replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
	3877	- Joining_Group (jg) values:
	3878	Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
	3879	-> uchar.h & UCharacter.JoiningGroup
	3880	- 3 new scripts:
	3881	sc ; Batk ; Batak
	3882	sc ; Brah ; Brahmi
	3883	sc ; Mand ; Mandaic
	3884	-> remove these from SyntheticPropertyValueAliases.txt
	3885	-> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
	3886	-> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
	3887	and in com.ibm.icu.dev.test.lang.TestUScript.java
	3888	- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
	3889	(added 2009-11-11..2010-07-18)
	3890	Bass 259 Bassa Vah
	3891	Dupl 755 Duployan shortand
	3892	Elba 226 Elbasan
	3893	Gran 343 Grantha
	3894	Kpel 436 Kpelle
	3895	Loma 437 Loma
3896	Mend 438 Mende
3897	Merc 101 Meroitic Cursive
3898	Narb 106 Old North Arabian
3899	Nbat 159 Nabataean
3900	Palm 126 Palmyrene
3901	Sind 318 Sindhi
3902	Wara 262 Warang Citi
3903	-> uscript.h
3904	-> com.ibm.icu.lang.UScript
3905	find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3906	replace public static final int \1 = \2;\3
3907	-> SyntheticPropertyValueAliases.txt
3908	-> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3909	and in com.ibm.icu.dev.test.lang.TestUScript.java
3910	- ISO 15924 name change
3911	Mero 100 Meroitic Hieroglyphs (was Meroitic)
3912	-> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
3913	- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
3914
3915	* UnicodeData.txt changes
3916	- new CJK block:
3917	2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
3918	2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
3919	-> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
3920
3921	* build Unicode tools using CMake+make
3922
3923	* run genpname/preparse.pl (on Linux)
3924	+ cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3925	+ make sure that data.h is writable
3926	+ perl preparse.pl ~/svn.icu/trunk/src > out.txt
3927	+ preparse.pl shows no errors, out.txt Info and Warning lines look ok
3928
3929	* rebuild Unicode tools (at least genpname) using make
3930	- You might first need to "make install" ICU so that the tools build can pick
3931	up the new definitions from the installed header files.
3932
3933	* run genpname
3934	- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3935	- rebuild ICU & tools
3936
3937	* update source/data/unidata/norm2/nfkc_cf.txt
3938	- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3939
3940	* update source/data/unidata/norm2/uts46.txt
3941	- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
3942	to ~/svn.icu/tools/trunk/src/unicode/py
3943	- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
3944	- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3945	- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3946
3947	* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3948	sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3949	- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3950	- Unicode 6.0: U+2260, U+226E, U+226F
3951
3952	* generate core properties data files
3953	- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3954	- rebuild ICU & tools
3955	- run makeuca.sh so that genuca picks up the new nfc.nrm:
3956	~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3957	- rebuild ICU & tools
3958
3959	* implement new Script_Extensions property (provisional)
3960	- parser & generator: genprops & uprops.icu
3961	- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
3962	- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
3963
3964	* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
3965	- (one-time change)
3966	- genbidi/gencase/genprops tools changes
3967	- re-run makeprops.sh (see above)
3968	- UCharacterProperty.java, UCharacterTypeIterator.java,
3969	UBiDiProps.java, UCaseProps.java, and several others with minor changes;
3970	UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
3971
3972	* update Java data files
3973	- refresh just the UCD-related files, just to be safe
3974	- see (ICU4C)/source/data/icu4j-readme.txt
3975	- mkdir /tmp/icu4j
3976	- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3977	output:
3978	...
3979	Unicode .icu files built to ./out/build/icudt45l
3980	mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3981	echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3982	LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3983	jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
3984	mkdir -p /tmp/icu4j/main/shared/data
3985	cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3986	- copy the big-endian Unicode data files to another location,
3987	separate from the other data files
3988	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3989	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3990	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3991	~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
3992	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3993	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3994	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3995	- refresh ICU4J
3996	~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
3997
3998	* refresh Java test .txt files
3999	- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4000
4001	* un-hardcode normalization skippable (NF*_Inert) test data
4002	- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
4003
4004	* copy updated break iterator test files
4005	- now handled by early ucdcopy.py and
4006	copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
4007	(old instructions:
4008	copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
4009	to ~/svn.icu/trunk/src/source/test/testdata)
4010	- they are not used in ICU4J
4011
4012	* UCA
4013
4014	- get output from Mark's tools; look in
4015	http://www.unicode.org/~book/incoming/mark/uca6.0.0/
4016	http://www.macchiato.com/unicode/utc/additional-uca-files
4017	http://www.unicode.org/Public/UCA/6.0.0/
4018	http://www.unicode.org/~mdavis/uca/
4019	- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4020	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4021	- update Han-implicit ranges for new CJK extensions:
4022	swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4023	- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
4024	do not add it into invuca so that tailoring primary-after an ignorable works
4025	- genuca: permit space between [variable top] bytes
4026	- ucol.cpp: treat noncharacters like unassigned rather than ignorable
4027	- run makeuca.sh:
4028	~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4029	- rebuild ICU4C
4030	- refresh ICU4J collation data:
4031	(subset of instructions above for properties data refresh, except copies all coll/*)
4032	~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4033	mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4034	~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4035	~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4036	- update (ICU)/source/test/testdata/CollationTest_*.txt
4037	and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4038	with output from Mark's Unicode tools
4039	- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4040	- note on intltest: if collate/UCAConformanceTest fails, then
4041	utility/MultithreadTest/TestCollators will fail as well;
4042	fix the conformance test before looking into the multi-thread test
4043
4044	* When refreshing all of ICU4J data from ICU4C
4045	- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4046	- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4047	or
4048	- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4049
4050	*** LayoutEngine script information
4051
4052	(For details see the Unicode 5.2 change log below.)
4053
4054	* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4055	ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4056	ScriptRunData.cpp, which is no longer needed.)
4057
4058	The generated files have a current copyright date and "@draft" statement.
4059
4060	* copy the above files into <icu>/source/layout, replacing the old files.
4061	* fix mixed line endings
4062	* review the diffs and fix incorrect @draft and missing aliases;
4063	Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4064	* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4065
4066	---------------------------------------------------------------------------- ***
4067
4068	Unicode 5.2 update
4069
4070	*** related ICU Trac tickets
4071
4072	7084 Unicode 5.2
4073
4074	7167 verify collation bytes
4075	7235 Java test NAME_ALIAS
4076	7236 Java DerivedCoreProperties.txt test
4077	7237 Java BidiTest.txt
4078	7238 UTrie2 in core unidata
4079	7239 test for tailoring gaps
4080	7240 Java fix CollationMiscTest
4081	7243 update layout engine for Unicode 5.2
4082
4083	*** Unicode version numbers
4084	- makedata.mak
4085	- uchar.h
4086	- configure.in & configure
4087	- update ucdVersion in gennames.c if an algorithmic range changes
4088
4089	*** data files & enums & parser code
4090
4091	* file preparation
4092
4093	python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
4094	- includes finding files regardless of version numbers,
4095	copying them, and performing the equivalent processing of the
4096	ucdstrip and ucdmerge tools on the desired set of files
4097
4098	* notes on changes
4099	- PropertyAliases.txt
4100	moved from numeric to enumerated:
4101	ccc ; Canonical_Combining_Class
4102	new string properties:
4103	NFKC_CF ; NFKC_Casefold
4104	Name_Alias; Name_Alias
4105	new binary properties:
4106	Cased ; Cased
4107	CI ; Case_Ignorable
4108	CWCF ; Changes_When_Casefolded
4109	CWCM ; Changes_When_Casemapped
4110	CWKCF ; Changes_When_NFKC_Casefolded
4111	CWL ; Changes_When_Lowercased
4112	CWT ; Changes_When_Titlecased
4113	CWU ; Changes_When_Uppercased
4114	new CJK Unihan properties (not supported by ICU)
4115	- PropertyValueAliases.txt
4116	new block names
4117	new scripts
4118	one script code change:
4119	sc ; Qaai ; Inherited
4120	->
4121	sc ; Zinh ; Inherited ; Qaai
4122	new Line_Break (lb) value:
4123	lb ; CP ; Close_Parenthesis
4124	new Joining_Group (jg) values: Farsi_Yeh, Nya
4125	other new values:
4126	ccc; 214; ATA ; Attached_Above
4127	- DerivedBidiClass.txt
4128	new default-R range: U+1E800 - U+1EFFF
4129	- UnicodeData.txt
4130	all of the ISO comments are gone
4131	new CJK block end:
4132	9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
4133	new CJK block:
4134	2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
4135	2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
4136
4137	* genpname
4138	- run preparse.pl
4139	+ cd \svn\icuproj\icu\trunk\source\tools\genpname
4140	+ make sure that data.h is writable
4141	+ perl preparse.pl \svn\icuproj\icu\trunk > out.txt
4142	+ preparse.pl complains with errors like the following:
4143	Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
4144	This is because ICU 4.0 had scripts from ISO 15924 which are now
4145	added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
4146	and PropertyValueAliases.txt.
4147	-> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4148	Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
4149	+ preparse.pl complains with errors about block names missing from uchar.h; add them
4150
4151	* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4152	- new block & script values
4153	+ 26 new blocks
4154	copy new blocks from Blocks.txt
4155	MS VC++ 2008 regular expression:
4156	find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
4157	replace with " UBLOCK_\3 = 172, /[\1]/"
4158	+ several new script values already added in ICU 4.0 for ISO 15924 coverage
4159	(removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
4160	+ 3 new script values added for ISO 15924 and Unicode 5.2 coverage
4161	+ 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
4162	(added to SyntheticPropertyValueAliases.txt)
4163	- new Joining Group (JG) values: Farsi_Yeh, Nya
4164	- new Line_Break (lb) value:
4165	lb ; CP ; Close_Parenthesis
4166
4167	* hardcoded Unihan range end/limit
4168	- Unihan range end moves from 9FC3 to 9FCB
4169	search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
4170	+ do change gennames.c
4171
4172	* Compare definitions of new binary properties with what we used to use
4173	in algorithms, to see if the definitions changed.
4174	- Verified that definitions for Cased and Case_Ignorable are unchanged.
4175	The gencase tool now parses the newly public Case_Ignorable values
4176	in case the definition changes in the future.
4177
4178	* uchar.c & uprops.h & uprops.c & genprops
4179	- new numeric values that didn't exist in Unicode data before:
4180	1/7, 1/9, 1/10, 3/10, 1/16, 3/16
4181	the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
4182	therefore redesign the encoding of numeric types and values for formatVersion 6;
4183	design for simple numbers up to at least 144 ("one gross"),
4184	large values up to at least 10^20,
4185	and fractions with numerators -1..17 and denominators 1..16
4186	to cover current and expected future values
4187	(e.g., more Han numeric values, Meroitic twelfths)
4188
4189	* reimplement Hangul_Syllable_Type for new Jamo characters
4190	- the old code assumed that all Jamo characters are in the 11xx block
4191	- Unicode 5.2 fills holes there and adds new Jamo characters in
4192	A960..A97F; Hangul Jamo Extended-A
4193	and in
4194	D7B0..D7FF; Hangul Jamo Extended-B
4195	- Hangul_Syllable_Type can be trivially derived from a subset of
4196	Grapheme_Cluster_Break values
4197
4198	* build Unicode data source code for hardcoding core data
4199	C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
4200
4201	ICU data make path is \svn\icuproj\icu\trunk\source\data\
4202	ICU root path is \svn\icuproj\icu\trunk
4203	Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4204	Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4205	Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4206	Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4207	Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4208	Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4209	Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4210	Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
4211	Creating data file for Unicode Property Names
4212	Creating data file for Unicode Character Properties
4213	Creating data file for Unicode Case Mapping Properties
4214	Creating data file for Unicode BiDi/Shaping Properties
4215	Creating data file for Unicode Normalization
4216	Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
4217	Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
4218
4219	- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
4220	and rebuild the common library
4221
4222	*** UCA
4223
4224	- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
4225	- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
4226	- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
4227	[ Begin obsolete instructions:
4228	Starting with UCA 5.2, we use the CollationTest__SHORT.txt files not the _STUB.txt files.
4229	- generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
4230	on Windows:
4231	python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
4232	python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
4233	End obsolete instructions]
4234	- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4235	not just the *_STUB.txt files
4236	- note on intltest: if collate/UCAConformanceTest fails, then
4237	utility/MultithreadTest/TestCollators will fail as well;
4238	fix the conformance test before looking into the multi-thread test
4239
4240	*** Implement Cased & Case_Ignorable properties
4241	- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
4242	- Problem: These properties should be disjoint, but aren't
4243	- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
4244	- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
4245
4246	*** Implement Changes_When_Xyz properties
4247	- without stored data
4248
4249	*** Implement Name_Alias property
4250	- add it as another name field in unames.icu
4251	- make it available via u_charName() and UCharNameChoice and
4252	- consider it in u_charFromName()
4253
4254	*** Break iterators
4255
4256	* Update break iterator rules to new UAX versions and new property values
4257	* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
4258
4259	*** new BidiTest file
4260	- review format and data
4261	- copy BidiTest.txt to source/test/testdata
4262	- write test code using this data
4263	- fix ICU code where it fails the conformance test
4264
4265	*** Java
4266	- generally, find and update code corresponding to C/C++
4267	- UCharacter.UnicodeBlock constants:
4268	a) add an _ID integer per new block, update COUNT
4269	b) add a class instance per new block
4270	Visual Studio regex:
4271	find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
4272	replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4273	- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
4274
4275	- port test changes to Java
4276
4277	*** LayoutEngine script information
4278
4279	(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
4280
4281	* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4282	ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4283	ScriptRunData.cpp, which is no longer needed.)
4284
4285	The generated files have a current copyright date and "@draft" statement.
4286
4287	-> Eric Mader wrote in email on 20090930:
4288	"I think the tool has been modified to update @draft to @stable for
4289	older scripts and to add @draft for new scripts.
4290	(I worked with an intern on this last year.)
4291	You should check the output after you run it."
4292
4293	* copy the above files into <icu>/source/layout, replacing the old files.
4294	* fix mixed line endings
4295	* review the diffs and fix incorrect @draft and missing aliases
4296	* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4297
4298	Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4299	and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4300
4301	-> Eric Mader wrote in email on 20090930:
4302	"This is just a matter of making sure that all the per-script tables have
4303	entries for any new scripts that were added.
4304	If any new Indic characters were added, then the class tables in
4305	IndicClassTables.cpp should be updated to reflect this.
4306	John Emmons should know how to do this if it's required."
4307
4308	* rebuild the layout and layoutex libraries.
4309
4310	*** Documentation
4311	- Update User Guide
4312	+ Jamo_Short_Name, sfc->scf, binary property value aliases
4313
4314	---------------------------------------------------------------------------- ***
4315
46f4442e A	4316	Unicode 5.1 update
	4317
	4318	*** related ICU Trac tickets
	4319
	4320	5696 Update to Unicode 5.1
	4321
	4322	*** Unicode version numbers
	4323	- makedata.mak
	4324	- uchar.h
	4325	- configure.in & configure
	4326	- update ucdVersion in gennames.c if an algorithmic range changes
	4327
	4328	*** data files & enums & parser code
	4329
	4330	* file preparation
	4331	- ucdstrip:
	4332	DerivedCoreProperties.txt
	4333	DerivedNormalizationProps.txt
	4334	NormalizationTest.txt
	4335	PropList.txt
	4336	Scripts.txt
	4337	GraphemeBreakProperty.txt
	4338	SentenceBreakProperty.txt
	4339	WordBreakProperty.txt
	4340	- ucdstrip and ucdmerge:
	4341	EastAsianWidth.txt
	4342	LineBreak.txt
	4343
	4344	* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
	4345	copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
	4346	copy 5.1.0\ucd\Blocks.txt ..\unidata\
	4347	copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
	4348	copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
	4349	copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
	4350	copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
	4351	copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
	4352	copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
	4353	copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
	4354	copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
	4355	copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
	4356	copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
	4357	copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
	4358
	4359	ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
	4360	ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
	4361	ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
	4362	ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
	4363	ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
	4364	ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
	4365	ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
	4366	ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
	4367	ucdstrip < 5.1.0\ucd\EastAsianWidth.txt \| ucdmerge > ..\unidata\EastAsianWidth.txt
	4368	ucdstrip < 5.1.0\ucd\LineBreak.txt \| ucdmerge > ..\unidata\LineBreak.txt
	4369
	4370	* genpname
	4371	- run preparse.pl
	4372	+ cd \svn\icuproj\icu\uni51\source\tools\genpname
	4373	+ make sure that data.h is writable
	4374	+ perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
	4375	+ preparse.pl complains with errors like the following:
	4376	Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
	4377	This is because ICU 3.8 had scripts from ISO 15924 which are now
	4378	added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
	4379	and PropertyValueAliases.txt.
4380	-> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4381	Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
4382	+ PropertyValueAliases.txt now explicitly contains values for boolean properties:
4383	N/Y, No/Yes, F/T, False/True
4384	-> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
4385	It will use further values from the file if present.
4386
4387	* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4388	- new block & script values
4389	+ 17 new blocks
4390	+ 11 new script values already added in ICU 3.8 for ISO 15924 coverage
4391	(removed from SyntheticPropertyValueAliases.txt)
4392	+ 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
4393	(added to SyntheticPropertyValueAliases.txt)
4394	- uprops.icu (uprops.h) only provides 7 bits for script codes.
4395	In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
4396	There is none above 127 yet which is the script code for an
4397	assigned Unicode character, so ICU 4.0 uprops.icu does not store any
4398	script code values greater than 127.
4399	However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
4400	in a parallel bit field, and that overflows now.
4401	Also, future values >=128 would be incompatible anyway.
4402	uprops.h is modified to move around several of the bit fields
4403	in the properties vector words, and now uses 8 bits for the script code.
4404	Two other bit fields also grow to accommodate future growth:
4405	Block (current count: 172) grows from 8 to 9 bits,
4406	and Word_Break grows from 4 to 5 bits.
4407	- renamed property Simple_Case_Folding (sfc->scf)
4408	+ nothing to be done: handled as normal alias
4409	- new property JSN Jamo_Short_Name
4410	+ no new API: only contributes to the Name property
4411	- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
4412	- new Joining Group (JG) value: Burushashki_Yeh_Barree
4413	- new Sentence_Break (SB) values:
4414	SB ; CR ; CR
4415	SB ; EX ; Extend
4416	SB ; LF ; LF
4417	SB ; SC ; SContinue
4418	- new Word_Break (WB) values:
4419	WB ; CR ; CR
4420	WB ; Extend ; Extend
4421	WB ; LF ; LF
4422	WB ; MB ; MidNumLet
4423
4424	* Further changes in the 2008-02-29 update:
4425	- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
4426	because they should not normally be invisible.
4427	- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
4428	- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
4429	- new Word_Break (WB) value: NL=Newline
4430
4431	* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
4432	- Unihan range end moves from 9FBB to 9FC3
4433	search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
4434	+ do change gennames.c
4435
4436	* build Unicode data source code for hardcoding core data
4437	C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
4438
4439	ICU data make path is \svn\icuproj\icu\uni51\source\data\
4440	ICU root path is \svn\icuproj\icu\uni51
4441	Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4442	Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4443	Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4444	Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4445	Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4446	Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4447	Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4448	Creating data file for Unicode Character Properties
4449	Creating data file for Unicode Case Mapping Properties
4450	Creating data file for Unicode BiDi/Shaping Properties
4451	Creating data file for Unicode Normalization
4452	Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
4453	Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
4454
4455	- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
4456	and rebuild the common library
4457
4458	*** Break iterators
4459
4460	* Update break iterator rules to new UAX versions and new property values
4461
4462	*** UCA
4463
4464	* update FractionalUCA.txt and UCARules.txt with new canonical closure
4465
4466	*** Test suites
4467	- Test that APIs using Unicode property value aliases (like UnicodeSet)
4468	support all of the boolean values N/Y, No/Yes, F/T, False/True
4469	-> TestBinaryValues() tests in both cintltst and intltest
4470
4471	*** LayoutEngine script information
4472	* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4473	ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4474	ScriptRunData.cpp, which is no longer needed.)
4475
4476	The generated files have a current copyright date and "@draft" statement.
4477
4478	* copy the above files into <icu>/source/layout, replacing the old files.
4479
4480	Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4481	and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4482
4483	* rebuild the layout and layoutex libraries.
4484
4485	*** Documentation
4486	- Update User Guide
4487	+ Jamo_Short_Name, sfc->scf, binary property value aliases
4488
4489	---------------------------------------------------------------------------- ***
4490
73c04bcf A	4491	Unicode 5.0 update
	4492
	4493	*** related Jitterbugs
	4494
	4495	5084 RFE: Update to Unicode 5.0
	4496
	4497	*** data files & enums & parser code
	4498
	4499	* file preparation
	4500	- ucdstrip:
	4501	DerivedCoreProperties.txt
	4502	DerivedNormalizationProps.txt
	4503	NormalizationTest.txt
	4504	PropList.txt
	4505	Scripts.txt
	4506	GraphemeBreakProperty.txt
	4507	SentenceBreakProperty.txt
	4508	WordBreakProperty.txt
	4509	- ucdstrip and ucdmerge:
	4510	EastAsianWidth.txt
	4511	LineBreak.txt
	4512
46f4442e	4513	* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
73c04bcf A	4514	copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
	4515	copy 5.0.0\ucd\Blocks.txt ..\unidata\
	4516	copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
	4517	copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
	4518	copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
	4519	copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
	4520	copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
	4521	copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
	4522	copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
	4523	copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
	4524	copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
	4525	copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
	4526	copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
	4527
	4528	ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
	4529	ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
	4530	ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
	4531	ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
	4532	ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
	4533	ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
	4534	ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
	4535	ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
	4536	ucdstrip < 5.0.0\ucd\EastAsianWidth.txt \| ucdmerge > ..\unidata\EastAsianWidth.txt
	4537	ucdstrip < 5.0.0\ucd\LineBreak.txt \| ucdmerge > ..\unidata\LineBreak.txt
	4538
	4539	* update FractionalUCA.txt and UCARules.txt with new canonical closure
	4540
	4541	* genpname
	4542	- run preparse.pl
	4543	+ make sure that data.h is writable
	4544	+ perl preparse.pl \cvs\oss\icu > out.txt
	4545
	4546	* uchar.h & uscript.h & uprops.h & uprops.c & genprops
	4547	- new block & script values
	4548	+ script values already added in ICU 3.6 because all of ISO 15924 is now covered
	4549
	4550	* build Unicode data source code for hardcoding core data
	4551	C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
	4552
	4553	ICU data make path is \cvs\oss\icu\source\data\
	4554	ICU root path is \cvs\oss\icu
	4555	Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
	4556	[etc.]
	4557	Creating data file for Unicode Character Properties
	4558	Creating data file for Unicode Case Mapping Properties
	4559	Creating data file for Unicode BiDi/Shaping Properties
	4560	Creating data file for Unicode Normalization
	4561	Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
	4562	Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
	4563
	4564	- copy the .c source files to C:\cvs\oss\icu\source\common
	4565	and rebuild the common library
	4566
	4567	*** Unicode version numbers
	4568	- makedata.mak
	4569	- uchar.h
	4570	- configure.in
	4571
	4572	*** LayoutEngine script information
	4573	* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
	4574	ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
	4575	ScriptRunData.cpp, which is no longer needed.)
	4576
	4577	The generated files have a current copyright date and "@draft" statement.
4578
4579	* copy the above files into <icu>/source/layout, replacing the old files.
4580
4581	Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4582	and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4583
4584	* rebuild the layout and layoutex libraries.
4585
4586	---------------------------------------------------------------------------- ***
4587
4588	Unicode 4.1 update
4589
4590	*** related Jitterbugs
4591
4592	4332 RFE: Update to Unicode 4.1
4593	4157 RBBI, TR29 4.1 updates
4594
4595	*** data files & enums & parser code
4596
4597	* file preparation
4598	- ucdstrip:
4599	DerivedCoreProperties.txt
4600	DerivedNormalizationProps.txt
4601	NormalizationTest.txt
4602	GraphemeBreakProperty.txt
4603	SentenceBreakProperty.txt
4604	WordBreakProperty.txt
4605	- ucdstrip and ucdmerge:
4606	EastAsianWidth.txt
4607	LineBreak.txt
4608
4609	* add new files to the repository
4610	GraphemeBreakProperty.txt
4611	SentenceBreakProperty.txt
4612	WordBreakProperty.txt
4613
4614	* update FractionalUCA.txt and UCARules.txt with new canonical closure
4615
4616	* genpname
4617	- handle new enumerated properties in sub read_uchar
4618	- run preparse.pl
4619
4620	* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4621	- new binary properties
4622	+ Pattern_Syntax
4623	+ Pattern_White_Space
4624	- new enumerated properties
4625	+ Grapheme_Cluster_Break
4626	+ Sentence_Break
4627	+ Word_Break
4628	- new block & script & line break values
4629
4630	* gencase
4631	- case-ignorable changes
4632	see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4633	now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
4634
4635	*** Unicode version numbers
4636	- makedata.mak
4637	- uchar.h
4638	- configure.in
4639
4640	*** tests
4641	- verify that u_charMirror() round-trips
4642	- test all new properties and some new values of old properties
4643
4644	*** other code
4645
4646	* hardcoded Unihan range end/limit
4647	- Unihan range end moves from 9FA5 to 9FBB
4648	search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
4649	+ do not modify BOCU/BOCSU code because that would change the encoding
4650	and break binary compatibility!
4651	+ similarly, do not change the GB 18030 range data (ucnvmbcs.c),
4652	NamePrepProfile.txt
4653	+ ignore trietest.c: test data is arbitrary
4654	+ ignore tstnorm.cpp: test optimization, not important
4655	+ ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
4656	+ do change line_th.txt and word_th.txt
4657	by replacing hardcoded ranges with the new property values
4658	+ do change gennames.c
4659
4660	source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4661	source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4662	source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
4663
4664	* case mappings
4665	- compare new special casing context conditions with previous ones
4666	see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4667
4668	* genpname
4669	- consider storing only the short name if it is the same as the long name
4670
4671	*** other reviews
4672	- UAX #29 changes (grapheme/word/sentence breaks)
4673	- UAX #14 changes (line breaks)
4674	- Pattern_Syntax & Pattern_White_Space
4675
4676	---------------------------------------------------------------------------- ***
4677
374ca955 A	4678	Unicode 4.0.1 update
	4679
	4680	*** related Jitterbugs
	4681
	4682	3170 RFE: Update to Unicode 4.0.1
	4683	3171 Add new Unicode 4.0.1 properties
	4684	3520 use Unicode 4.0.1 updates for break iteration
	4685
	4686	*** data files & enums & parser code
	4687
	4688	* file preparation
	4689	- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
	4690	- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
	4691
	4692	* file fixes
	4693	- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
	4694	according to PRI #26
	4695	http://www.unicode.org/review/resolved-pri.html#pri26
	4696	- undone again because no corrigendum in sight;
	4697	instead modified tests to not check consistency on this for Unicode 4.0.1
	4698
	4699	* ucdterms.txt
	4700	- update from http://www.unicode.org/copyright.html
	4701	formatted for plain text
	4702
	4703	* uchar.h & uprops.h & uprops.c & genprops
	4704	- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
	4705	- add U_LB_INSEPARABLE due to a spelling fix
	4706	+ put short name comment only on line with new constant
	4707	for genpname perl script parser
	4708	- new binary properties
	4709	+ STerm
	4710	+ Variation_Selector
	4711
	4712	* genpname
	4713	- fix genpname perl script so that it doesn't choke on more than 2 names per property value
	4714	- perl script: correctly calculate the maximum number of fields per row
	4715
	4716	* uscript.h
	4717	- new script code Hrkt=Katakana_Or_Hiragana
	4718
	4719	* gennorm.c track changes in DerivedNormalizationProps.txt
	4720	- "FNC" -> "FC_NFKC"
	4721	- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
	4722
	4723	* genprops/props2.c track changes in DerivedNumericValues.txt
	4724	- changed from 3 columns to 2, dropping the numeric type
	4725	+ assume that the type is always numeric for Han characters,
	4726	and that only those are added in addition to what UnicodeData.txt lists
	4727
	4728	*** Unicode version numbers
	4729	- makedata.mak
	4730	- uchar.h
	4731	- configure.in
	4732
	4733	*** tests
	4734	- update test of default bidi classes according to PRI #28
	4735	/tsutil/cucdtst/TestUnicodeData
	4736	http://www.unicode.org/review/resolved-pri.html#pri28
	4737	- bidi tests: change exemplar character for ES depending on Unicode version
	4738	- change hardcoded expected property values where they change
	4739
	4740	*** other code
	4741
4742	* name matching
4743	- read UCD.html
4744
4745	* scripts
4746	- use new Hrkt=Katakana_Or_Hiragana
4747
4748	* ZWJ & ZWNJ
4749	- are now part of combining character sequences
4750	- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ