]> git.saurik.com Git - wxWidgets.git/blame - docs/doxygen/overviews/unicode.h
Remove const consts from docs
[wxWidgets.git] / docs / doxygen / overviews / unicode.h
CommitLineData
15b6757b 1/////////////////////////////////////////////////////////////////////////////
2cd3cc94 2// Name: unicode.h
15b6757b
FM
3// Purpose: topic overview
4// Author: wxWidgets team
5// RCS-ID: $Id$
6// Licence: wxWindows license
7/////////////////////////////////////////////////////////////////////////////
8
880efa2a 9/**
36c9828f 10
2cd3cc94
BP
11@page overview_unicode Unicode Support in wxWidgets
12
13This section briefly describes the state of the Unicode support in wxWidgets.
14Read it if you want to know more about how to write programs able to work with
15characters from languages other than English.
36c9828f 16
2cd3cc94
BP
17@li @ref overview_unicode_what
18@li @ref overview_unicode_ansi
19@li @ref overview_unicode_supportin
20@li @ref overview_unicode_supportout
21@li @ref overview_unicode_settings
36c9828f 22
2cd3cc94 23<hr>
36c9828f
FM
24
25
2cd3cc94
BP
26@section overview_unicode_what What is Unicode?
27
28wxWidgets has support for compiling in Unicode mode on the platforms which
29support it. Unicode is a standard for character encoding which addresses the
30shortcomings of the previous, 8 bit standards, by using at least 16 (and
31possibly 32) bits for encoding each character. This allows to have at least
3265536 characters (what is called the BMP, or basic multilingual plane) and
33possible 2^32 of them instead of the usual 256 and is sufficient to encode all
7b74e828
RR
34of the world languages at once. A different approach is to encode all
35strings in UTF8 which does not require the use of wide characters and
36additionally is backwards compatible with 7-bit ASCII. The solution to
37use UTF8 is prefered under Linux and partially OS X.
2cd3cc94 38
7b74e828 39More details about Unicode may be found at <http://www.unicode.org/>.
2cd3cc94 40
ffac5996 41Writing internationalized programs is much easier with Unicode. Moreover
7b74e828
RR
42even a program which uses only standard ASCII can benefit from using Unicode
43for string representation because there will be no need to convert all
44strings the program uses to/from Unicode each time a system call is made.
2cd3cc94
BP
45
46@section overview_unicode_ansi Unicode and ANSI Modes
47
7b74e828
RR
48Until wxWidgets 3.0 it was possible to compile the library both in
49ANSI (=8-bit) mode as well as in wide char mode (16-bit per character
50on Windows and 32-but on most Unix versions, Linux and OS X). This
ffac5996
RR
51has been changed in wxWidget with the removal of the ANSI mode,
52but much effort has been made so that most of the previous ANSI
53code should still compile and work as before.
2cd3cc94 54
7b74e828 55@section overview_unicode_supportin Unicode Support in wxWidgets
2cd3cc94 56
7b74e828
RR
57Since wxWidgets 3.0 Unicode support is always enabled meaning
58that the wxString class always uses Unicode to encode its content.
ffac5996
RR
59Under Windows wxString uses UCS-2 (basically an array of 16-bit
60wchar_t). Under Unix, Linux and OS X however, wxString uses UTF8
61to encode its content.
2cd3cc94 62
7b74e828
RR
63For the programmer, the biggest change is that iterating over
64a string can be slower than before since wxString has to parse
65the entire string in order to find the n-th character in a
66string, meaning that iterating over a string should no longer
67be done by index but using iterators. Old code will still work
68but might be less efficient.
2cd3cc94 69
7b74e828 70Old code like this:
2cd3cc94 71
7b74e828
RR
72@code
73wxString s = wxT("hello");
74size_t i;
75for (i = 0; i < s.Len(); i++)
76{
77 wxChar ch = s[i];
78
79 // do something with it
80}
81@endcode
82
83should be replaced (especially in time critical places) with:
2cd3cc94
BP
84
85@code
7b74e828 86wxString s = "hello";
36b952b7 87wxString::const_iterator i;
7b74e828
RR
88for (i = s.begin(); i != s.end(); ++i)
89{
90 wxUniChar uni_ch = *i;
91 wxChar ch = uni_ch;
92 // same as: wxChar ch = *i
93
94 // do something with it
95}
2cd3cc94
BP
96@endcode
97
7b74e828
RR
98If you want to replace individual characters in the string you
99need to get a reference to that character:
2cd3cc94 100
7b74e828
RR
101@code
102wxString s = "hello";
103wxString::iterator i;
104for (i = s.begin(); i != s.end(); ++i)
105{
106 wxUniCharRef ch = *i;
107 ch = 'a';
108 // same as: *i = 'a';
109}
110@endcode
2cd3cc94 111
7b74e828 112which will change the content of the wxString s from "hello" to "aaaaa".
2cd3cc94 113
7b74e828
RR
114String literals are translated to Unicode when they are assigned to
115a wxString object so code can be written like this:
2cd3cc94 116
7b74e828
RR
117@code
118wxString s = "Hello, world!";
119int len = s.Len();
120@endcode
2cd3cc94 121
7b74e828
RR
122wxWidgets provides wrappers around most Posix C functions (like printf(..))
123and the syntax has been adapted to support input with wxString, normal
124C-style strings and wchar_t strings:
2cd3cc94 125
7b74e828
RR
126@code
127wxString s;
128s.Printf( "%s %s %s", "hello1", L"hello2", wxString("hello3") );
129wxPrintf( "Three times hello %s\n", s );
130@endcode
2cd3cc94
BP
131
132@section overview_unicode_supportout Unicode and the Outside World
133
134We have seen that it was easy to write Unicode programs using wxWidgets types
135and macros, but it has been also mentioned that it isn't quite enough. Although
136everything works fine inside the program, things can get nasty when it tries to
137communicate with the outside world which, sadly, often expects ANSI strings (a
138notable exception is the entire Win32 API which accepts either Unicode or ANSI
139strings and which thus makes it unnecessary to ever perform any conversions in
140the program). GTK 2.0 only accepts UTF-8 strings.
141
142To get an ANSI string from a wxString, you may use the mb_str() function which
143always returns an ANSI string (independently of the mode - while the usual
144c_str() returns a pointer to the internal representation which is either ASCII
145or Unicode). More rarely used, but still useful, is wc_str() function which
146always returns the Unicode string.
147
148Sometimes it is also necessary to go from ANSI strings to wxStrings. In this
149case, you can use the converter-constructor, as follows:
150
151@code
152const char* ascii_str = "Some text";
153wxString str(ascii_str, wxConvUTF8);
154@endcode
155
2cd3cc94
BP
156For more information about converters and Unicode see the @ref overview_mbconv.
157
158
159@section overview_unicode_settings Unicode Related Compilation Settings
160
161You should define @c wxUSE_UNICODE to 1 to compile your program in Unicode
7b74e828
RR
162mode. Since wxWidgets 3.0 this is always the case. When compiled in UTF8
163mode @c wxUSE_UNICODE_UTF8 is also defined.
2cd3cc94
BP
164
165*/
166