]>
Commit | Line | Data |
---|---|---|
1 | Copyright (c) 2002-2010, International Business Machines Corporation and others. All Rights Reserved. | |
2 | ||
3 | ||
4 | IMPORTANT: | |
5 | ||
6 | This sample was originally intended as an exercise for the ICU Workshop (September 2000). | |
7 | The code currently provided in the solution file is the answer to the exercises, each step can still be found in the 'answers' subdirectory. | |
8 | ||
9 | ||
10 | ||
11 | http://www.icu-project.org/docs/workshop_2000/agenda.html | |
12 | ||
13 | Day 2: September 12th 2000 | |
14 | Pre-requisite: | |
15 | 1. All the hardware and software requirements from Day 1. | |
16 | 2. Attended or fully understand Day 1 material. | |
17 | 3. Read through the ICU user's guide at | |
18 | http://www.icu-project.org/userguide/. | |
19 | ||
20 | #Transformation Support | |
21 | 10:45am - 12:00pm | |
22 | Alan Liu | |
23 | ||
24 | Topics: | |
25 | 1. What is the Unicode normalization? | |
26 | 2. What kind of case mapping support is available in ICU? | |
27 | 3. What is Transliteration and how do I use a Transliterator on a document? | |
28 | 4. How do I add my own Transliterator? | |
29 | ||
30 | ||
31 | INSTRUCTIONS | |
32 | ------------ | |
33 | ||
34 | This exercise was developed and tested on ICU release 1.6.0, Win32, | |
35 | Microsoft Visual C++ 6.0. It should work on other ICU releases and | |
36 | other platforms as well. | |
37 | ||
38 | MSVC: | |
39 | Open the file "translit.sln" in Microsoft Visual C++. | |
40 | ||
41 | Unix: | |
42 | - Build and install ICU with a prefix, for example '--prefix=/home/srl/ICU' | |
43 | - Set the variable ICU_PREFIX=/home/srl/ICU and use GNU make in | |
44 | this directory. | |
45 | - You may use 'make check' to invoke this sample. | |
46 | ||
47 | ||
48 | PROBLEMS | |
49 | -------- | |
50 | ||
51 | Problem 0: | |
52 | ||
53 | To start with, the program prints out a series of dates formatted in | |
54 | Greek. Set up the program, build it, and run it. | |
55 | ||
56 | Problem 1: Basic Transliterator (Easy) | |
57 | ||
58 | The Greek text shows up almost entirely as Unicode escapes. These | |
59 | are unreadable on a US machine. Use an existing system | |
60 | transliterator to transliterate the Greek text to Latin so it can be | |
61 | phonetically read on a US machine. If you don't know the names of | |
62 | the system transliterators, use Transliterator::getAvailableID() and | |
63 | Transliterator::countAvailableIDs(), or look directly in the index | |
64 | table icu/data/translit_index.txt. | |
65 | ||
66 | Problem 2: RuleBasedTransliterator (Medium) | |
67 | ||
68 | Some of the text is still unreadable and shows up as Unicode escape | |
69 | sequences. Create a RuleBasedTransliterator to change the | |
70 | unreadable characters to close ASCII equivalents. For example, the | |
71 | rule "\u00C0 > A;" will change an 'A' with a grave accent to a plain | |
72 | 'A'. | |
73 | ||
74 | To save typing, use UnicodeSets to handle ranges of characters. | |
75 | ||
76 | See the included file "U0080.pdf" for a table of the U+00C0 to U+00FF | |
77 | Unicode block. | |
78 | ||
79 | Problem 3: Transliterator subclassing; Normalizer (Difficult) | |
80 | ||
81 | The rule-based approach is flexible and, in most cases, the best | |
82 | choice for creating a new transliterator. Sometimes, however, a | |
83 | more elegant algorithmic solution is available. Instead of typing | |
84 | in a list of rules, you can write C++ code to accomplish the desired | |
85 | transliteration. | |
86 | ||
87 | Use a Normalizer to remove accents from characters. You will need | |
88 | to convert each character to a sequence of base and combining | |
89 | characters by applying a canonical denormalization transformation. | |
90 | Then discard the combining characters (the accents etc.) leaving the | |
91 | base character. Wrap this all up in a subclass of the | |
92 | Transliterator class that overrides the pure virtual | |
93 | handleTransliterate() method. | |
94 | ||
95 | ||
96 | ANSWERS | |
97 | ------- | |
98 | ||
99 | The exercise includes answers. These are in the "answers" directory, | |
100 | and are numbered 1, 2, etc. In some cases new files that the user | |
101 | needs to create are included in the answers directory. | |
102 | ||
103 | If you get stuck and you want to move to the next step, copy the | |
104 | answers file into the main directory in order to proceed. E.g., | |
105 | "main_1.cpp" contains the original "main.cpp" file. "main_2.cpp" | |
106 | contains the "main.cpp" file after problem 1. Etc. | |
107 | ||
108 | ||
109 | Have fun! |