]> git.saurik.com Git - apple/icu.git/blob - icuSources/tools/gentz/readme.txt
ICU-3.13.tar.gz
[apple/icu.git] / icuSources / tools / gentz / readme.txt
1 Copyright (C) 1999-2001, International Business Machines Corporation
2 and others. All Rights Reserved.
3
4 Readme file for ICU time zone data (source/tools/gentz)
5
6 Alan Liu
7 Last updated 2 Feb 2001
8
9
10 RAW DATA
11 --------
12 The time zone data in ICU is taken from the UNIX data files at
13 ftp://elsie.nci.nih.gov/pub/tzdata<year>. The other input to the
14 process is an alias table, described below.
15
16
17 BUILD PROCESS
18 -------------
19 Two tools are used to process the data into a format suitable for ICU:
20
21 tz.pl directory of raw data files + tz.alias -> tz.txt
22 gentz tz.txt -> tz.dat (memory mappable binary file)
23
24 After gentz is run, standard ICU data tools are used to incorporate
25 tz.dat into the icudata module. The tz.pl script is run manually;
26 everything else is automatic.
27
28 In order to incorporate the raw data from that source into ICU, take
29 the following steps.
30
31 1. Download the archive of current zone data. This should be a file
32 named something like tzdata1999j.tar.gz. Use the URL listed above.
33
34 2. Unpack the archive into a directory, retaining the name of the
35 archive. For example, unpack tzdata1999j.tar.gz into tzdata1999j/.
36 Place this directory anywhere; one option is to place it within
37 source/tools/gentz.
38
39 3. Run the perl script tz.pl, passing it the directory location as a
40 command-line argument. On Windows system use the batch file
41 tz.bat. Also specify one or more ourput files: .txt, .htm|.html,
42 and .java.
43
44 For ICU4C specify .txt; typically
45
46 <icu>/source/data/misc/timezone.txt
47
48 where icu is the ICU4C root directory. Double check that this is
49 the correct location and file name; they change periodically.
50
51 It is useful to generate an html file. After it is generated,
52 review it for correctness.
53
54 As the third argument, pass in "tz.java". This will generate a
55 java source file that will be used to update the ICU4J data.
56
57 4. Do a standard build. The build scripts will automatically detect
58 that a new .txt file is present and rebuild the binary data (using
59 gentz) from that.
60
61 The .txt and .htm files and typically checked into CVS, whereas
62 the raw data files are not, since they are readily available from the
63 URL listed above.
64
65 Additional steps are required to update the ICU4J data. First you
66 must have a current, working installation of icu4j. These instructions
67 will assume it is in directory "/icu4j".
68
69 5. Copy the tz.java file generated in step 3 to /icu4j/tz.java.
70
71 6. Change to the /icu4j directory and compile the tz.java file, with
72 /icu4j/classes on the classpath.
73
74 7. Run the resulting java program (again with /icu4j/classes on the
75 classpath) and capture the output in a file named tz.tmp.
76
77 8. Open /icu4j/src/com/ibm/util/TimeZoneData.java. Delete the section
78 that starts with the line "BEGIN GENERATED SOURCE CODE" and ends
79 with the line "END GENERATED SOURCE CODE". Replace it with the
80 contents of tz.tmp. If there are extraneous control-M characters
81 or other similar problems, fix them.
82
83 9. Rebuild icu4j and make sure there are no build errors. Rerun all
84 the tests in /icu4j/src/com/ibm/test/timezone and make sure they
85 all pass. If all is well, check the new TimeZoneData.java into
86 CVS.
87
88
89 ALIAS TABLE
90 -----------
91 For backward compatibility, we define several three-letter IDs that
92 have been used since early ICU and correspond to IDs used in old JDKs.
93 These IDs are listed in tz.alias. The tz.pl script processes this
94 alias table and issues errors if there are problems.
95
96
97 IDS
98 ---
99 All *system* zone IDs must consist only of characters in the invariant
100 set. See utypes.h for an explanation of what this means. If an ID is
101 encountered that contains a non-invariant character, tz.pl complains.
102 Non-system zones may use non-invariant characters.
103
104
105 Etc/GMT...
106 ----------
107 Users may be confused by the fact that various zones with names of the
108 form Etc/GMT+n appear to have an offset of the wrong sign. For
109 example, Etc/GMT+8 is 8 hours *behind* GMT; that is, it corresponds to
110 what one typically sees displayed as "GMT-8:00". The reason for this
111 inversion is explained in the UNIX zone data file "etcetera".
112 Briefly, this is done intentionally in order to comply with
113 POSIX-style signedness. In ICU we reproduce the UNIX zone behavior
114 faithfully, including this confusing aspect.