]>
Commit | Line | Data |
---|---|---|
f6bcfd97 BP |
1 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
2 | %% Name: tmbconv.tex | |
fc2171bd | 3 | %% Purpose: Overview of the wxMBConv classes in wxWidgets |
f6bcfd97 BP |
4 | %% Author: Ove Kaaven |
5 | %% Modified by: | |
6 | %% Created: 25.03.00 | |
7 | %% RCS-ID: $Id$ | |
8 | %% Copyright: (c) 2000 Ove Kaaven | |
8795498c | 9 | %% Licence: wxWindows licence |
f6bcfd97 BP |
10 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
11 | ||
12 | \section{wxMBConv classes overview}\label{mbconvclasses} | |
13 | ||
46c81560 | 14 | Classes: \helpref{wxMBConv}{wxmbconv}, wxMBConvLibc, |
f6bcfd97 | 15 | \helpref{wxMBConvUTF7}{wxmbconvutf7}, \helpref{wxMBConvUTF8}{wxmbconvutf8}, |
802fa226 | 16 | \helpref{wxCSConv}{wxcsconv}, |
845f4268 | 17 | \helpref{wxMBConvUTF16}{wxmbconvutf16}, \helpref{wxMBConvUTF32}{wxmbconvutf32} |
f6bcfd97 | 18 | |
802fa226 | 19 | The wxMBConv classes in wxWidgets enable an Unicode-aware application to |
f6bcfd97 BP |
20 | easily convert between Unicode and the variety of 8-bit encoding systems still |
21 | in use. | |
22 | ||
a203f6c0 | 23 | \subsection{Background: The need for conversion}\label{needforconversion} |
f6bcfd97 BP |
24 | |
25 | As programs are becoming more and more globalized, and users exchange documents | |
26 | across country boundaries as never before, applications increasingly need to | |
27 | take into account all the different character sets in use around the world. It | |
28 | is no longer enough to just depend on the default byte-sized character set that | |
29 | computers have traditionally used. | |
30 | ||
31 | A few years ago, a solution was proposed: the Unicode standard. Able to contain | |
32 | the complete set of characters in use in one unified global coding system, | |
33 | it would resolve the character set problems once and for all. | |
34 | ||
35 | But it hasn't happened yet, and the migration towards Unicode has created new | |
36 | challenges, resulting in "compatibility encodings" such as UTF-8. A large | |
37 | number of systems out there still depends on the old 8-bit encodings, hampered | |
38 | by the huge amounts of legacy code still widely deployed. Even sending | |
39 | Unicode data from one Unicode-aware system to another may need encoding to an | |
40 | 8-bit multibyte encoding (UTF-7 or UTF-8 is typically used for this purpose), to | |
41 | pass unhindered through any traditional transport channels. | |
42 | ||
a203f6c0 | 43 | \subsection{Background: The wxString class}\label{conversionandwxstring} |
f6bcfd97 | 44 | |
fc2171bd | 45 | If you have compiled wxWidgets in Unicode mode, the wxChar type will become |
f6bcfd97 BP |
46 | identical to wchar\_t rather than char, and a wxString stores wxChars. Hence, |
47 | all wxString manipulation in your application will then operate on Unicode | |
48 | strings, and almost as easily as working with ordinary char strings (you | |
49 | just need to remember to use the wxT() macro to encapsulate any string | |
50 | literals). | |
51 | ||
52 | But often, your environment doesn't want Unicode strings. You could be sending | |
53 | data over a network, or processing a text file for some other application. You | |
54 | need a way to quickly convert your easily-handled Unicode data to and from a | |
43e8916f | 55 | traditional 8-bit encoding. And this is what the wxMBConv classes do. |
f6bcfd97 | 56 | |
a203f6c0 | 57 | \subsection{wxMBConv classes}\label{wxmbconvclasses} |
f6bcfd97 BP |
58 | |
59 | The base class for all these conversions is the wxMBConv class (which itself | |
60 | implements standard libc locale conversion). Derived classes include | |
845f4268 VZ |
61 | wxMBConvLibc, several different wxMBConvUTFxxx classes, and wxCSConv, which |
62 | implement different kinds of conversions. You can also derive your own class | |
63 | for your own custom encoding and use it, should you need it. All you need to do | |
64 | is override the MB2WC and WC2MB methods. | |
f6bcfd97 | 65 | |
a203f6c0 | 66 | \subsection{wxMBConv objects}\label{wxmbconvobjects} |
f6bcfd97 | 67 | |
fc2171bd | 68 | Several of the wxWidgets-provided wxMBConv classes have predefined instances |
9c3d92c5 | 69 | (wxConvLibc, wxConvFileName, wxConvUTF7, wxConvUTF8, wxConvLocal). You can use |
845f4268 | 70 | these predefined objects directly, or you can instantiate your own objects. |
f6bcfd97 | 71 | |
845f4268 VZ |
72 | A variable, wxConvCurrent, points to the conversion object that the user |
73 | interface is supposed to use, in the case that the user interface is not | |
74 | Unicode-based (like with GTK+ 1.2). By default, it points to wxConvLibc or | |
75 | wxConvLocal, depending on which works best on the current platform. | |
f6bcfd97 | 76 | |
a203f6c0 | 77 | \subsection{wxCSConv}\label{wxcsconvclass} |
f6bcfd97 BP |
78 | |
79 | The wxCSConv class is special because when it is instantiated, you can tell it | |
80 | which character set it should use, which makes it meaningful to keep many | |
81 | instances of them around, each with a different character set (or you can | |
82 | create a wxCSConv instance on the fly). | |
83 | ||
84 | The predefined wxCSConv instance, wxConvLocal, is preset to use the | |
85 | default user character set, but you should rarely need to use it directly, | |
86 | it is better to go through wxConvCurrent. | |
87 | ||
a203f6c0 | 88 | \subsection{Converting strings}\label{convertingstrings} |
f6bcfd97 BP |
89 | |
90 | Once you have chosen which object you want to use to convert your text, | |
91 | here is how you would use them with wxString. These examples all assume | |
fc2171bd | 92 | that you are using a Unicode build of wxWidgets, although they will still |
f6bcfd97 BP |
93 | compile in a non-Unicode build (they just won't convert anything). |
94 | ||
95 | Example 1: Constructing a wxString from input in current encoding. | |
96 | ||
97 | \begin{verbatim} | |
98 | wxString str(input_data, *wxConvCurrent); | |
99 | \end{verbatim} | |
100 | ||
101 | Example 2: Input in UTF-8 encoding. | |
102 | ||
103 | \begin{verbatim} | |
104 | wxString str(input_data, wxConvUTF8); | |
105 | \end{verbatim} | |
106 | ||
107 | Example 3: Input in KOI8-R. Construction of wxCSConv instance on the fly. | |
108 | ||
109 | \begin{verbatim} | |
110 | wxString str(input_data, wxCSConv(wxT("koi8-r"))); | |
111 | \end{verbatim} | |
112 | ||
113 | Example 4: Printing a wxString to stdout in UTF-8 encoding. | |
114 | ||
115 | \begin{verbatim} | |
116 | puts(str.mb_str(wxConvUTF8)); | |
117 | \end{verbatim} | |
118 | ||
119 | Example 5: Printing a wxString to stdout in custom encoding. | |
120 | Using preconstructed wxCSConv instance. | |
121 | ||
122 | \begin{verbatim} | |
123 | wxCSConv cust(user_encoding); | |
124 | printf("Data: %s\n", (const char*) str.mb_str(cust)); | |
125 | \end{verbatim} | |
126 | ||
e7240349 | 127 | Note: Since mb\_str() returns a temporary wxCharBuffer to hold the result |
f6bcfd97 BP |
128 | of the conversion, you need to explicitly cast it to const char* if you use |
129 | it in a vararg context (like with printf). | |
130 | ||
a203f6c0 | 131 | \subsection{Converting buffers}\label{convertingbuffers} |
f6bcfd97 BP |
132 | |
133 | If you have specialized needs, or just don't want to use wxString, you | |
134 | can also use the conversion methods of the conversion objects directly. | |
135 | This can even be useful if you need to do conversion in a non-Unicode | |
fc2171bd | 136 | build of wxWidgets; converting a string from UTF-8 to the current |
f6bcfd97 BP |
137 | encoding should be possible by doing this: |
138 | ||
139 | \begin{verbatim} | |
140 | wxString str(wxConvUTF8.cMB2WC(input_data), *wxConvCurrent); | |
141 | \end{verbatim} | |
142 | ||
143 | Here, cMB2WC of the UTF8 object returns a wxWCharBuffer containing a Unicode | |
144 | string. The wxString constructor then converts it back to an 8-bit character | |
145 | set using the passed conversion object, *wxConvCurrent. (In a Unicode build | |
fc2171bd | 146 | of wxWidgets, the constructor ignores the passed conversion object and |
f6bcfd97 BP |
147 | retains the Unicode data.) |
148 | ||
149 | This could also be done by first making a wxString of the original data: | |
150 | ||
151 | \begin{verbatim} | |
152 | wxString input_str(input_data); | |
153 | wxString str(input_str.wc_str(wxConvUTF8), *wxConvCurrent); | |
154 | \end{verbatim} | |
155 | ||
156 | To print a wxChar buffer to a non-Unicode stdout: | |
157 | ||
158 | \begin{verbatim} | |
159 | printf("Data: %s\n", (const char*) wxConvCurrent->cWX2MB(unicode_data)); | |
160 | \end{verbatim} | |
161 | ||
162 | If you need to do more complex processing on the converted data, you | |
163 | may want to store the temporary buffer in a local variable: | |
164 | ||
165 | \begin{verbatim} | |
166 | const wxWX2MBbuf tmp_buf = wxConvCurrent->cWX2MB(unicode_data); | |
167 | const char *tmp_str = (const char*) tmp_buf; | |
168 | printf("Data: %s\n", tmp_str); | |
169 | process_data(tmp_str); | |
170 | \end{verbatim} | |
171 | ||
172 | If a conversion had taken place in cWX2MB (i.e. in a Unicode build), | |
e7240349 | 173 | the buffer will be deallocated as soon as tmp\_buf goes out of scope. |
f6bcfd97 BP |
174 | (The macro wxWX2MBbuf reflects the correct return value of cWX2MB |
175 | (either char* or wxCharBuffer), except for the const.) | |
176 |