]>
Commit | Line | Data |
---|---|---|
b75a7d8f A |
1 | Copyright (c) 2002, International Business Machines Corporation and others. All Rights Reserved. |
2 | This is an exercise for the ICU Workshop (September 2000). | |
3 | http://oss.software.ibm.com/icu/docs/workshop_2000/agenda.html | |
4 | ||
5 | Day 2: September 12th 2000 | |
6 | Pre-requisite: | |
7 | 1. All the hardware and software requirements from Day 1. | |
8 | 2. Attended or fully understand Day 1 material. | |
9 | 3. Read through the ICU user's guide at | |
10 | http://oss.software.ibm.com/icu/userguide/. | |
11 | ||
12 | #Transformation Support | |
13 | 10:45am - 12:00pm | |
14 | Alan Liu | |
15 | ||
16 | Topics: | |
17 | 1. What is the Unicode normalization? | |
18 | 2. What kind of case mapping support is available in ICU? | |
19 | 3. What is Transliteration and how do I use a Transliterator on a document? | |
20 | 4. How do I add my own Transliterator? | |
21 | ||
22 | ||
23 | INSTRUCTIONS | |
24 | ------------ | |
25 | ||
26 | This exercise was developed and tested on ICU release 1.6.0, Win32, | |
27 | Microsoft Visual C++ 6.0. It should work on other ICU releases and | |
28 | other platforms as well. | |
29 | ||
30 | MSVC: | |
31 | Open the file "translit.dsw" in Microsoft Visual C++. | |
32 | ||
33 | Unix: | |
34 | - Build and install ICU with a prefix, for example '--prefix=/home/srl/ICU' | |
35 | - Set the variable ICU_PREFIX=/home/srl/ICU and use GNU make in | |
36 | this directory. | |
37 | - You may use 'make check' to invoke this sample. | |
38 | ||
39 | ||
40 | PROBLEMS | |
41 | -------- | |
42 | ||
43 | Problem 0: | |
44 | ||
45 | To start with, the program prints out a series of dates formatted in | |
46 | Greek. Set up the program, build it, and run it. | |
47 | ||
48 | Problem 1: Basic Transliterator (Easy) | |
49 | ||
50 | The Greek text shows up almost entirely as Unicode escapes. These | |
51 | are unreadable on a US machine. Use an existing system | |
52 | transliterator to transliterate the Greek text to Latin so it can be | |
53 | phonetically read on a US machine. If you don't know the names of | |
54 | the system transliterators, use Transliterator::getAvailableID() and | |
55 | Transliterator::countAvailableIDs(), or look directly in the index | |
56 | table icu/data/translit_index.txt. | |
57 | ||
58 | Problem 2: RuleBasedTransliterator (Medium) | |
59 | ||
60 | Some of the text is still unreadable and shows up as Unicode escape | |
61 | sequences. Create a RuleBasedTransliterator to change the | |
62 | unreadable characters to close ASCII equivalents. For example, the | |
63 | rule "\u00C0 > A;" will change an 'A' with a grave accent to a plain | |
64 | 'A'. | |
65 | ||
66 | To save typing, use UnicodeSets to handle ranges of characters. | |
67 | ||
68 | See the included file "U0080.pdf" for a table of the U+00C0 to U+00FF | |
69 | Unicode block. | |
70 | ||
71 | Problem 3: Transliterator subclassing; Normalizer (Difficult) | |
72 | ||
73 | The rule-based approach is flexible and, in most cases, the best | |
74 | choice for creating a new transliterator. Sometimes, however, a | |
75 | more elegant algorithmic solution is available. Instead of typing | |
76 | in a list of rules, you can write C++ code to accomplish the desired | |
77 | transliteration. | |
78 | ||
79 | Use a Normalizer to remove accents from characters. You will need | |
80 | to convert each character to a sequence of base and combining | |
81 | characters by applying a canonical denormalization transformation. | |
82 | Then discard the combining characters (the accents etc.) leaving the | |
83 | base character. Wrap this all up in a subclass of the | |
84 | Transliterator class that overrides the pure virtual | |
85 | handleTransliterate() method. | |
86 | ||
87 | ||
88 | ANSWERS | |
89 | ------- | |
90 | ||
91 | The exercise includes answers. These are in the "answers" directory, | |
92 | and are numbered 1, 2, etc. In some cases new files that the user | |
93 | needs to create are included in the answers directory. | |
94 | ||
95 | If you get stuck and you want to move to the next step, copy the | |
96 | answers file into the main directory in order to proceed. E.g., | |
97 | "main_1.cpp" contains the original "main.cpp" file. "main_2.cpp" | |
98 | contains the "main.cpp" file after problem 1. Etc. | |
99 | ||
100 | ||
101 | Have fun! |