]>
git.saurik.com Git - wxWidgets.git/blob - docs/doxygen/overviews/unicode.h
0913bfbfb1807e302e671a536c6c22bf347f1889
   1 ///////////////////////////////////////////////////////////////////////////// 
   3 // Purpose:     topic overview 
   4 // Author:      wxWidgets team 
   6 // Licence:     wxWindows license 
   7 ///////////////////////////////////////////////////////////////////////////// 
  11 @page overview_unicode Unicode Support in wxWidgets 
  13 This section briefly describes the state of the Unicode support in wxWidgets. 
  14 Read it if you want to know more about how to write programs able to work with 
  15 characters from languages other than English. 
  17 @li @ref overview_unicode_what 
  18 @li @ref overview_unicode_ansi 
  19 @li @ref overview_unicode_supportin 
  20 @li @ref overview_unicode_supportout 
  21 @li @ref overview_unicode_settings 
  26 @section overview_unicode_what What is Unicode? 
  28 wxWidgets has support for compiling in Unicode mode on the platforms which 
  29 support it. Unicode is a standard for character encoding which addresses the 
  30 shortcomings of the previous, 8 bit standards, by using at least 16 (and 
  31 possibly 32) bits for encoding each character. This allows to have at least 
  32 65536 characters (what is called the BMP, or basic multilingual plane) and 
  33 possible 2^32 of them instead of the usual 256 and is sufficient to encode all 
  34 of the world languages at once. A different approach is to encode all  
  35 strings in UTF8 which does not require the use of wide characters and 
  36 additionally is backwards compatible with 7-bit ASCII. The solution to 
  37 use UTF8 is prefered under Linux and partially OS X. 
  39 More details about Unicode may be found at <http://www.unicode.org/>. 
  41 Writing internationalized programs is much easier with Unicode Moreover 
  42 even a program which uses only standard ASCII can benefit from using Unicode 
  43 for string representation because there will be no need to convert all 
  44 strings the program uses to/from Unicode each time a system call is made. 
  46 @section overview_unicode_ansi Unicode and ANSI Modes 
  48 Until wxWidgets 3.0 it was possible to compile the library both in 
  49 ANSI (=8-bit) mode as well as in wide char mode (16-bit per character 
  50 on Windows and 32-but on most Unix versions, Linux and OS X). This 
  51 has been changed in wxWidget with the removal of the ANSI mode. 
  53 @section overview_unicode_supportin Unicode Support in wxWidgets 
  55 Since wxWidgets 3.0 Unicode support is always enabled meaning 
  56 that the wxString class always uses Unicode to encode its content. 
  57 Under Windows wxString uses the standard Windows encoding UCS-2 
  58 (basically an array of 16-bit wchar_t). Under Unix and OS X however, 
  59 wxString uses UTF8 to encode its content. 
  61 For the programmer, the biggest change is that iterating over 
  62 a string can be slower than before since wxString has to parse 
  63 the entire string in order to find the n-th character in a  
  64 string, meaning that iterating over a string should no longer 
  65 be done by index but using iterators. Old code will still work 
  66 but might be less efficient. 
  71 wxString s = wxT("hello"); 
  73 for (i = 0; i < s.Len(); i++) 
  77     // do something with it 
  81 should be replaced (especially in time critical places) with: 
  86 for (i = s.begin(); i != s.end(); ++i) 
  88     wxUniChar uni_ch = *i; 
  90     // same as:   wxChar ch = *i 
  92     // do something with it 
  96 If you want to replace individual characters in the string you 
  97 need to get a reference to that character: 
 100 wxString s = "hello"; 
 101 wxString::iterator i; 
 102 for (i = s.begin(); i != s.end(); ++i) 
 104     wxUniCharRef ch = *i; 
 106     // same as:  *i = 'a'; 
 110 which will change the content of the wxString s from "hello" to "aaaaa". 
 112 String literals are translated to Unicode when they are assigned to 
 113 a wxString object so code can be written like this: 
 116 wxString s = "Hello, world!"; 
 120 wxWidgets provides wrappers around most Posix C functions (like printf(..)) 
 121 and the syntax has been adapted to support input with wxString, normal 
 122 C-style strings and wchar_t strings: 
 126 s.Printf( "%s %s %s", "hello1", L"hello2", wxString("hello3") ); 
 127 wxPrintf( "Three times hello %s\n", s ); 
 130 @section overview_unicode_supportout Unicode and the Outside World 
 132 We have seen that it was easy to write Unicode programs using wxWidgets types 
 133 and macros, but it has been also mentioned that it isn't quite enough. Although 
 134 everything works fine inside the program, things can get nasty when it tries to 
 135 communicate with the outside world which, sadly, often expects ANSI strings (a 
 136 notable exception is the entire Win32 API which accepts either Unicode or ANSI 
 137 strings and which thus makes it unnecessary to ever perform any conversions in 
 138 the program). GTK 2.0 only accepts UTF-8 strings. 
 140 To get an ANSI string from a wxString, you may use the mb_str() function which 
 141 always returns an ANSI string (independently of the mode - while the usual 
 142 c_str() returns a pointer to the internal representation which is either ASCII 
 143 or Unicode). More rarely used, but still useful, is wc_str() function which 
 144 always returns the Unicode string. 
 146 Sometimes it is also necessary to go from ANSI strings to wxStrings. In this 
 147 case, you can use the converter-constructor, as follows: 
 150 const char* ascii_str = "Some text"; 
 151 wxString str(ascii_str, wxConvUTF8); 
 154 For more information about converters and Unicode see the @ref overview_mbconv. 
 157 @section overview_unicode_settings Unicode Related Compilation Settings 
 159 You should define @c wxUSE_UNICODE to 1 to compile your program in Unicode 
 160 mode. Since wxWidgets 3.0 this is always the case. When compiled in UTF8 
 161 mode @c wxUSE_UNICODE_UTF8 is also defined.