]>
Commit | Line | Data |
---|---|---|
15b6757b FM |
1 | ///////////////////////////////////////////////////////////////////////////// |
2 | // Name: mbconvclasses | |
3 | // Purpose: topic overview | |
4 | // Author: wxWidgets team | |
5 | // RCS-ID: $Id$ | |
6 | // Licence: wxWindows license | |
7 | ///////////////////////////////////////////////////////////////////////////// | |
8 | ||
9 | /*! | |
36c9828f | 10 | |
15b6757b | 11 | @page mbconvclasses_overview wxMBConv classes overview |
36c9828f FM |
12 | |
13 | Classes: #wxMBConv, wxMBConvLibc, | |
14 | #wxMBConvUTF7, #wxMBConvUTF8, | |
15 | #wxCSConv, | |
15b6757b FM |
16 | #wxMBConvUTF16, #wxMBConvUTF32 |
17 | The wxMBConv classes in wxWidgets enable an Unicode-aware application to | |
18 | easily convert between Unicode and the variety of 8-bit encoding systems still | |
19 | in use. | |
20 | @ref needforconversion_overview | |
21 | @ref conversionandwxstring_overview | |
22 | @ref mbconvclasses_overview | |
23 | @ref mbconvobjects_overview | |
24 | #wxCSConv | |
25 | @ref convertingstrings_overview | |
26 | @ref convertingbuffers_overview | |
36c9828f FM |
27 | |
28 | ||
15b6757b | 29 | @section needforconversion Background: The need for conversion |
36c9828f | 30 | |
15b6757b FM |
31 | As programs are becoming more and more globalized, and users exchange documents |
32 | across country boundaries as never before, applications increasingly need to | |
33 | take into account all the different character sets in use around the world. It | |
34 | is no longer enough to just depend on the default byte-sized character set that | |
35 | computers have traditionally used. | |
36 | A few years ago, a solution was proposed: the Unicode standard. Able to contain | |
37 | the complete set of characters in use in one unified global coding system, | |
38 | it would resolve the character set problems once and for all. | |
39 | But it hasn't happened yet, and the migration towards Unicode has created new | |
40 | challenges, resulting in "compatibility encodings" such as UTF-8. A large | |
41 | number of systems out there still depends on the old 8-bit encodings, hampered | |
42 | by the huge amounts of legacy code still widely deployed. Even sending | |
43 | Unicode data from one Unicode-aware system to another may need encoding to an | |
44 | 8-bit multibyte encoding (UTF-7 or UTF-8 is typically used for this purpose), to | |
45 | pass unhindered through any traditional transport channels. | |
36c9828f | 46 | |
15b6757b | 47 | @section conversionandwxstring Background: The wxString class |
36c9828f | 48 | |
15b6757b FM |
49 | If you have compiled wxWidgets in Unicode mode, the wxChar type will become |
50 | identical to wchar_t rather than char, and a wxString stores wxChars. Hence, | |
51 | all wxString manipulation in your application will then operate on Unicode | |
52 | strings, and almost as easily as working with ordinary char strings (you | |
53 | just need to remember to use the wxT() macro to encapsulate any string | |
54 | literals). | |
55 | But often, your environment doesn't want Unicode strings. You could be sending | |
56 | data over a network, or processing a text file for some other application. You | |
57 | need a way to quickly convert your easily-handled Unicode data to and from a | |
58 | traditional 8-bit encoding. And this is what the wxMBConv classes do. | |
36c9828f | 59 | |
15b6757b | 60 | @section wxmbconvclasses wxMBConv classes |
36c9828f | 61 | |
15b6757b FM |
62 | The base class for all these conversions is the wxMBConv class (which itself |
63 | implements standard libc locale conversion). Derived classes include | |
64 | wxMBConvLibc, several different wxMBConvUTFxxx classes, and wxCSConv, which | |
65 | implement different kinds of conversions. You can also derive your own class | |
66 | for your own custom encoding and use it, should you need it. All you need to do | |
67 | is override the MB2WC and WC2MB methods. | |
36c9828f | 68 | |
15b6757b | 69 | @section wxmbconvobjects wxMBConv objects |
36c9828f | 70 | |
15b6757b FM |
71 | Several of the wxWidgets-provided wxMBConv classes have predefined instances |
72 | (wxConvLibc, wxConvFileName, wxConvUTF7, wxConvUTF8, wxConvLocal). You can use | |
73 | these predefined objects directly, or you can instantiate your own objects. | |
74 | A variable, wxConvCurrent, points to the conversion object that the user | |
75 | interface is supposed to use, in the case that the user interface is not | |
76 | Unicode-based (like with GTK+ 1.2). By default, it points to wxConvLibc or | |
77 | wxConvLocal, depending on which works best on the current platform. | |
36c9828f | 78 | |
15b6757b | 79 | @section wxcsconvclass wxCSConv |
36c9828f | 80 | |
15b6757b FM |
81 | The wxCSConv class is special because when it is instantiated, you can tell it |
82 | which character set it should use, which makes it meaningful to keep many | |
83 | instances of them around, each with a different character set (or you can | |
84 | create a wxCSConv instance on the fly). | |
85 | The predefined wxCSConv instance, wxConvLocal, is preset to use the | |
86 | default user character set, but you should rarely need to use it directly, | |
87 | it is better to go through wxConvCurrent. | |
36c9828f | 88 | |
15b6757b | 89 | @section convertingstrings Converting strings |
36c9828f | 90 | |
15b6757b FM |
91 | Once you have chosen which object you want to use to convert your text, |
92 | here is how you would use them with wxString. These examples all assume | |
93 | that you are using a Unicode build of wxWidgets, although they will still | |
94 | compile in a non-Unicode build (they just won't convert anything). | |
95 | Example 1: Constructing a wxString from input in current encoding. | |
36c9828f | 96 | |
15b6757b FM |
97 | @code |
98 | wxString str(input_data, *wxConvCurrent); | |
99 | @endcode | |
36c9828f | 100 | |
15b6757b | 101 | Example 2: Input in UTF-8 encoding. |
36c9828f | 102 | |
15b6757b FM |
103 | @code |
104 | wxString str(input_data, wxConvUTF8); | |
105 | @endcode | |
36c9828f | 106 | |
15b6757b | 107 | Example 3: Input in KOI8-R. Construction of wxCSConv instance on the fly. |
36c9828f | 108 | |
15b6757b FM |
109 | @code |
110 | wxString str(input_data, wxCSConv(wxT("koi8-r"))); | |
111 | @endcode | |
36c9828f | 112 | |
15b6757b | 113 | Example 4: Printing a wxString to stdout in UTF-8 encoding. |
36c9828f | 114 | |
15b6757b FM |
115 | @code |
116 | puts(str.mb_str(wxConvUTF8)); | |
117 | @endcode | |
36c9828f | 118 | |
15b6757b FM |
119 | Example 5: Printing a wxString to stdout in custom encoding. |
120 | Using preconstructed wxCSConv instance. | |
36c9828f | 121 | |
15b6757b FM |
122 | @code |
123 | wxCSConv cust(user_encoding); | |
124 | printf("Data: %s\n", (const char*) str.mb_str(cust)); | |
125 | @endcode | |
36c9828f | 126 | |
15b6757b FM |
127 | Note: Since mb_str() returns a temporary wxCharBuffer to hold the result |
128 | of the conversion, you need to explicitly cast it to const char* if you use | |
129 | it in a vararg context (like with printf). | |
36c9828f | 130 | |
15b6757b | 131 | @section convertingbuffers Converting buffers |
36c9828f | 132 | |
15b6757b FM |
133 | If you have specialized needs, or just don't want to use wxString, you |
134 | can also use the conversion methods of the conversion objects directly. | |
135 | This can even be useful if you need to do conversion in a non-Unicode | |
136 | build of wxWidgets; converting a string from UTF-8 to the current | |
137 | encoding should be possible by doing this: | |
36c9828f | 138 | |
15b6757b FM |
139 | @code |
140 | wxString str(wxConvUTF8.cMB2WC(input_data), *wxConvCurrent); | |
141 | @endcode | |
36c9828f | 142 | |
15b6757b FM |
143 | Here, cMB2WC of the UTF8 object returns a wxWCharBuffer containing a Unicode |
144 | string. The wxString constructor then converts it back to an 8-bit character | |
145 | set using the passed conversion object, *wxConvCurrent. (In a Unicode build | |
146 | of wxWidgets, the constructor ignores the passed conversion object and | |
147 | retains the Unicode data.) | |
148 | This could also be done by first making a wxString of the original data: | |
36c9828f | 149 | |
15b6757b FM |
150 | @code |
151 | wxString input_str(input_data); | |
152 | wxString str(input_str.wc_str(wxConvUTF8), *wxConvCurrent); | |
153 | @endcode | |
36c9828f | 154 | |
15b6757b | 155 | To print a wxChar buffer to a non-Unicode stdout: |
36c9828f | 156 | |
15b6757b FM |
157 | @code |
158 | printf("Data: %s\n", (const char*) wxConvCurrent-cWX2MB(unicode_data)); | |
159 | @endcode | |
36c9828f | 160 | |
15b6757b FM |
161 | If you need to do more complex processing on the converted data, you |
162 | may want to store the temporary buffer in a local variable: | |
36c9828f | 163 | |
15b6757b FM |
164 | @code |
165 | const wxWX2MBbuf tmp_buf = wxConvCurrent-cWX2MB(unicode_data); | |
166 | const char *tmp_str = (const char*) tmp_buf; | |
167 | printf("Data: %s\n", tmp_str); | |
168 | process_data(tmp_str); | |
169 | @endcode | |
36c9828f | 170 | |
15b6757b FM |
171 | If a conversion had taken place in cWX2MB (i.e. in a Unicode build), |
172 | the buffer will be deallocated as soon as tmp_buf goes out of scope. | |
173 | (The macro wxWX2MBbuf reflects the correct return value of cWX2MB | |
174 | (either char* or wxCharBuffer), except for the const.) | |
36c9828f | 175 | |
15b6757b | 176 | */ |
36c9828f FM |
177 | |
178 |