]> git.saurik.com Git - wxWidgets.git/blob - docs/doxygen/overviews/unicode.h
Added missing semi-colon in DVC interface header, and fixed the appearance alias...
[wxWidgets.git] / docs / doxygen / overviews / unicode.h
1 /////////////////////////////////////////////////////////////////////////////
2 // Name: unicode.h
3 // Purpose: topic overview
4 // Author: wxWidgets team
5 // RCS-ID: $Id$
6 // Licence: wxWindows license
7 /////////////////////////////////////////////////////////////////////////////
8
9 /**
10
11 @page overview_unicode Unicode Support in wxWidgets
12
13 This section briefly describes the state of the Unicode support in wxWidgets.
14 Read it if you want to know more about how to write programs able to work with
15 characters from languages other than English.
16
17 @li @ref overview_unicode_what
18 @li @ref overview_unicode_ansi
19 @li @ref overview_unicode_supportin
20 @li @ref overview_unicode_supportout
21 @li @ref overview_unicode_settings
22
23 <hr>
24
25
26 @section overview_unicode_what What is Unicode?
27
28 wxWidgets has support for compiling in Unicode mode on the platforms which
29 support it. Unicode is a standard for character encoding which addresses the
30 shortcomings of the previous, 8 bit standards, by using at least 16 (and
31 possibly 32) bits for encoding each character. This allows to have at least
32 65536 characters (what is called the BMP, or basic multilingual plane) and
33 possible 2^32 of them instead of the usual 256 and is sufficient to encode all
34 of the world languages at once. A different approach is to encode all
35 strings in UTF8 which does not require the use of wide characters and
36 additionally is backwards compatible with 7-bit ASCII. The solution to
37 use UTF8 is prefered under Linux and partially OS X.
38
39 More details about Unicode may be found at <http://www.unicode.org/>.
40
41 Writing internationalized programs is much easier with Unicode. Moreover
42 even a program which uses only standard ASCII can benefit from using Unicode
43 for string representation because there will be no need to convert all
44 strings the program uses to/from Unicode each time a system call is made.
45
46 @section overview_unicode_ansi Unicode and ANSI Modes
47
48 Until wxWidgets 3.0 it was possible to compile the library both in
49 ANSI (=8-bit) mode as well as in wide char mode (16-bit per character
50 on Windows and 32-but on most Unix versions, Linux and OS X). This
51 has been changed in wxWidget with the removal of the ANSI mode,
52 but much effort has been made so that most of the previous ANSI
53 code should still compile and work as before.
54
55 @section overview_unicode_supportin Unicode Support in wxWidgets
56
57 Since wxWidgets 3.0 Unicode support is always enabled meaning
58 that the wxString class always uses Unicode to encode its content.
59 Under Windows wxString uses UCS-2 (basically an array of 16-bit
60 wchar_t). Under Unix, Linux and OS X however, wxString uses UTF8
61 to encode its content.
62
63 For the programmer, the biggest change is that iterating over
64 a string can be slower than before since wxString has to parse
65 the entire string in order to find the n-th character in a
66 string, meaning that iterating over a string should no longer
67 be done by index but using iterators. Old code will still work
68 but might be less efficient.
69
70 Old code like this:
71
72 @code
73 wxString s = wxT("hello");
74 size_t i;
75 for (i = 0; i < s.Len(); i++)
76 {
77 wxChar ch = s[i];
78
79 // do something with it
80 }
81 @endcode
82
83 should be replaced (especially in time critical places) with:
84
85 @code
86 wxString s = "hello";
87 wxString::const_iterator i;
88 for (i = s.begin(); i != s.end(); ++i)
89 {
90 wxUniChar uni_ch = *i;
91 wxChar ch = uni_ch;
92 // same as: wxChar ch = *i
93
94 // do something with it
95 }
96 @endcode
97
98 If you want to replace individual characters in the string you
99 need to get a reference to that character:
100
101 @code
102 wxString s = "hello";
103 wxString::iterator i;
104 for (i = s.begin(); i != s.end(); ++i)
105 {
106 wxUniCharRef ch = *i;
107 ch = 'a';
108 // same as: *i = 'a';
109 }
110 @endcode
111
112 which will change the content of the wxString s from "hello" to "aaaaa".
113
114 String literals are translated to Unicode when they are assigned to
115 a wxString object so code can be written like this:
116
117 @code
118 wxString s = "Hello, world!";
119 int len = s.Len();
120 @endcode
121
122 wxWidgets provides wrappers around most Posix C functions (like printf(..))
123 and the syntax has been adapted to support input with wxString, normal
124 C-style strings and wchar_t strings:
125
126 @code
127 wxString s;
128 s.Printf( "%s %s %s", "hello1", L"hello2", wxString("hello3") );
129 wxPrintf( "Three times hello %s\n", s );
130 @endcode
131
132 @section overview_unicode_supportout Unicode and the Outside World
133
134 We have seen that it was easy to write Unicode programs using wxWidgets types
135 and macros, but it has been also mentioned that it isn't quite enough. Although
136 everything works fine inside the program, things can get nasty when it tries to
137 communicate with the outside world which, sadly, often expects ANSI strings (a
138 notable exception is the entire Win32 API which accepts either Unicode or ANSI
139 strings and which thus makes it unnecessary to ever perform any conversions in
140 the program). GTK 2.0 only accepts UTF-8 strings.
141
142 To get an ANSI string from a wxString, you may use the mb_str() function which
143 always returns an ANSI string (independently of the mode - while the usual
144 c_str() returns a pointer to the internal representation which is either ASCII
145 or Unicode). More rarely used, but still useful, is wc_str() function which
146 always returns the Unicode string.
147
148 Sometimes it is also necessary to go from ANSI strings to wxStrings. In this
149 case, you can use the converter-constructor, as follows:
150
151 @code
152 const char* ascii_str = "Some text";
153 wxString str(ascii_str, wxConvUTF8);
154 @endcode
155
156 For more information about converters and Unicode see the @ref overview_mbconv.
157
158
159 @section overview_unicode_settings Unicode Related Compilation Settings
160
161 You should define @c wxUSE_UNICODE to 1 to compile your program in Unicode
162 mode. Since wxWidgets 3.0 this is always the case. When compiled in UTF8
163 mode @c wxUSE_UNICODE_UTF8 is also defined.
164
165 */
166