]>
Commit | Line | Data |
---|---|---|
f3c0d7a5 A |
1 | Copyright (C) 2016 and later: Unicode, Inc. and others. |
2 | License & terms of use: http://www.unicode.org/copyright.html#License | |
3 | ||
729e4ab9 A |
4 | Copyright (c) 2002-2010, International Business Machines Corporation and others. All Rights Reserved. |
5 | ||
6 | ||
7 | IMPORTANT: | |
8 | ||
9 | This sample was originally intended as an exercise for the ICU Workshop (September 2000). | |
10 | The code currently provided in the solution file is the answer to the exercises, each step can still be found in the 'answers' subdirectory. | |
11 | ||
12 | ||
13 | ||
46f4442e | 14 | http://www.icu-project.org/docs/workshop_2000/agenda.html |
b75a7d8f A |
15 | |
16 | Day 2: September 12th 2000 | |
17 | Pre-requisite: | |
18 | 1. All the hardware and software requirements from Day 1. | |
19 | 2. Attended or fully understand Day 1 material. | |
20 | 3. Read through the ICU user's guide at | |
46f4442e | 21 | http://www.icu-project.org/userguide/. |
b75a7d8f A |
22 | |
23 | #Transformation Support | |
24 | 10:45am - 12:00pm | |
25 | Alan Liu | |
26 | ||
27 | Topics: | |
28 | 1. What is the Unicode normalization? | |
29 | 2. What kind of case mapping support is available in ICU? | |
30 | 3. What is Transliteration and how do I use a Transliterator on a document? | |
31 | 4. How do I add my own Transliterator? | |
32 | ||
33 | ||
34 | INSTRUCTIONS | |
35 | ------------ | |
36 | ||
37 | This exercise was developed and tested on ICU release 1.6.0, Win32, | |
38 | Microsoft Visual C++ 6.0. It should work on other ICU releases and | |
39 | other platforms as well. | |
40 | ||
41 | MSVC: | |
73c04bcf | 42 | Open the file "translit.sln" in Microsoft Visual C++. |
b75a7d8f A |
43 | |
44 | Unix: | |
45 | - Build and install ICU with a prefix, for example '--prefix=/home/srl/ICU' | |
46 | - Set the variable ICU_PREFIX=/home/srl/ICU and use GNU make in | |
47 | this directory. | |
48 | - You may use 'make check' to invoke this sample. | |
49 | ||
50 | ||
51 | PROBLEMS | |
52 | -------- | |
53 | ||
54 | Problem 0: | |
55 | ||
56 | To start with, the program prints out a series of dates formatted in | |
57 | Greek. Set up the program, build it, and run it. | |
58 | ||
59 | Problem 1: Basic Transliterator (Easy) | |
60 | ||
61 | The Greek text shows up almost entirely as Unicode escapes. These | |
62 | are unreadable on a US machine. Use an existing system | |
63 | transliterator to transliterate the Greek text to Latin so it can be | |
64 | phonetically read on a US machine. If you don't know the names of | |
65 | the system transliterators, use Transliterator::getAvailableID() and | |
66 | Transliterator::countAvailableIDs(), or look directly in the index | |
67 | table icu/data/translit_index.txt. | |
68 | ||
69 | Problem 2: RuleBasedTransliterator (Medium) | |
70 | ||
71 | Some of the text is still unreadable and shows up as Unicode escape | |
72 | sequences. Create a RuleBasedTransliterator to change the | |
73 | unreadable characters to close ASCII equivalents. For example, the | |
74 | rule "\u00C0 > A;" will change an 'A' with a grave accent to a plain | |
75 | 'A'. | |
76 | ||
77 | To save typing, use UnicodeSets to handle ranges of characters. | |
78 | ||
79 | See the included file "U0080.pdf" for a table of the U+00C0 to U+00FF | |
80 | Unicode block. | |
81 | ||
82 | Problem 3: Transliterator subclassing; Normalizer (Difficult) | |
83 | ||
84 | The rule-based approach is flexible and, in most cases, the best | |
85 | choice for creating a new transliterator. Sometimes, however, a | |
86 | more elegant algorithmic solution is available. Instead of typing | |
87 | in a list of rules, you can write C++ code to accomplish the desired | |
88 | transliteration. | |
89 | ||
90 | Use a Normalizer to remove accents from characters. You will need | |
91 | to convert each character to a sequence of base and combining | |
92 | characters by applying a canonical denormalization transformation. | |
93 | Then discard the combining characters (the accents etc.) leaving the | |
94 | base character. Wrap this all up in a subclass of the | |
95 | Transliterator class that overrides the pure virtual | |
96 | handleTransliterate() method. | |
97 | ||
98 | ||
99 | ANSWERS | |
100 | ------- | |
101 | ||
102 | The exercise includes answers. These are in the "answers" directory, | |
103 | and are numbered 1, 2, etc. In some cases new files that the user | |
104 | needs to create are included in the answers directory. | |
105 | ||
106 | If you get stuck and you want to move to the next step, copy the | |
107 | answers file into the main directory in order to proceed. E.g., | |
108 | "main_1.cpp" contains the original "main.cpp" file. "main_2.cpp" | |
109 | contains the "main.cpp" file after problem 1. Etc. | |
110 | ||
111 | ||
112 | Have fun! |