+++ /dev/null
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%% Name: tunicode.tex
-%% Purpose: Overview of the Unicode support in wxWidgets
-%% Author: Vadim Zeitlin
-%% Modified by:
-%% Created: 22.09.99
-%% RCS-ID: $Id$
-%% Copyright: (c) 1999 Vadim Zeitlin <zeitlin@dptmaths.ens-cachan.fr>
-%% Licence: wxWindows license
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-
-\section{Unicode support in wxWidgets}\label{unicode}
-
-This section briefly describes the state of the Unicode support in wxWidgets.
-Read it if you want to know more about how to write programs able to work with
-characters from languages other than English.
-
-\subsection{What is Unicode?}\label{whatisunicode}
-
-wxWidgets has support for compiling in Unicode mode
-on the platforms which support it. Unicode is a standard for character
-encoding which addresses the shortcomings of the previous, 8 bit standards, by
-using at least 16 (and possibly 32) bits for encoding each character. This
-allows to have at least 65536 characters (what is called the BMP, or basic
-multilingual plane) and possible $2^{32}$ of them instead of the usual 256 and
-is sufficient to encode all of the world languages at once. More details about
-Unicode may be found at \urlref{http://www.unicode.org}{http://www.unicode.org}.
-
-% TODO expand on it, say that Unicode extends ASCII, mention ISO8859, ...
-
-As this solution is obviously preferable to the previous ones (think of
-incompatible encodings for the same language, locale chaos and so on), many
-modern operating systems support it. The probably first example is Windows NT
-which uses only Unicode internally since its very first version.
-
-Writing internationalized programs is much easier with Unicode and, as the
-support for it improves, it should become more and more so. Moreover, in the
-Windows NT/2000 case, even the program which uses only standard ASCII can profit
-from using Unicode because they will work more efficiently - there will be no
-need for the system to convert all strings the program uses to/from Unicode
-each time a system call is made.
-
-\subsection{Unicode and ANSI modes}\label{unicodeandansi}
-
-As not all platforms supported by wxWidgets support Unicode (fully) yet, in
-many cases it is unwise to write a program which can only work in Unicode
-environment. A better solution is to write programs in such way that they may
-be compiled either in ANSI (traditional) mode or in the Unicode one.
-
-This can be achieved quite simply by using the means provided by wxWidgets.
-Basically, there are only a few things to watch out for:
-
-\begin{itemize}
-\item Character type ({\tt char} or {\tt wchar\_t})
-\item Literal strings (i.e. {\tt "Hello, world!"} or {\tt '*'})
-\item String functions ({\tt strlen()}, {\tt strcpy()}, ...)
-\item Special preprocessor tokens ({\tt \_\_FILE\_\_}, {\tt \_\_DATE\_\_}
-and {\tt \_\_TIME\_\_})
-\end{itemize}
-
-Let's look at them in order. First of all, each character in an Unicode
-program takes 2 bytes instead of usual one, so another type should be used to
-store the characters ({\tt char} only holds 1 byte usually). This type is
-called {\tt wchar\_t} which stands for {\it wide-character type}.
-
-Also, the string and character constants should be encoded using wide
-characters ({\tt wchar\_t} type) which typically take $2$ or $4$ bytes instead
-of {\tt char} which only takes one. This is achieved by using the standard C
-(and C++) way: just put the letter {\tt 'L'} after any string constant and it
-becomes a {\it long} constant, i.e. a wide character one. To make things a bit
-more readable, you are also allowed to prefix the constant with {\tt 'L'}
-instead of putting it after it.
-
-Of course, the usual standard C functions don't work with {\tt wchar\_t}
-strings, so another set of functions exists which do the same thing but accept
-{\tt wchar\_t *} instead of {\tt char *}. For example, a function to get the
-length of a wide-character string is called {\tt wcslen()} (compare with
-{\tt strlen()} - you see that the only difference is that the "str" prefix
-standing for "string" has been replaced with "wcs" standing for "wide-character
-string").
-
-And finally, the standard preprocessor tokens enumerated above expand to ANSI
-strings but it is more likely that Unicode strings are wanted in the Unicode
-build. wxWidgets provides the macros {\tt \_\_TFILE\_\_}, {\tt \_\_TDATE\_\_}
-and {\tt \_\_TTIME\_\_} which behave exactly as the standard ones except that
-they produce ANSI strings in ANSI build and Unicode ones in the Unicode build.
-
-To summarize, here is a brief example of how a program which can be compiled
-in both ANSI and Unicode modes could look like:
-
-\begin{verbatim}
-#ifdef __UNICODE__
- wchar_t wch = L'*';
- const wchar_t *ws = L"Hello, world!";
- int len = wcslen(ws);
-
- wprintf(L"Compiled at %s\n", __TDATE__);
-#else // ANSI
- char ch = '*';
- const char *s = "Hello, world!";
- int len = strlen(s);
-
- printf("Compiled at %s\n", __DATE__);
-#endif // Unicode/ANSI
-\end{verbatim}
-
-Of course, it would be nearly impossibly to write such programs if it had to
-be done this way (try to imagine the number of {\tt \#ifdef UNICODE} an average
-program would have had!). Luckily, there is another way - see the next
-section.
-
-\subsection{Unicode support in wxWidgets}\label{unicodeinsidewxw}
-
-In wxWidgets, the code fragment from above should be written instead:
-
-\begin{verbatim}
- wxChar ch = wxT('*');
- wxString s = wxT("Hello, world!");
- int len = s.Len();
-\end{verbatim}
-
-What happens here? First of all, you see that there are no more {\tt \#ifdef}s
-at all. Instead, we define some types and macros which behave differently in
-the Unicode and ANSI builds and allow us to avoid using conditional
-compilation in the program itself.
-
-We have a {\tt wxChar} type which maps either on {\tt char} or {\tt wchar\_t}
-depending on the mode in which program is being compiled. There is no need for
-a separate type for strings though, because the standard
-\helpref{wxString}{wxstring} supports Unicode, i.e. it stores either ANSI or
-Unicode strings depending on the compile mode.
-
-Finally, there is a special \helpref{wxT()}{wxt} macro which should enclose all
-literal strings in the program. As it is easy to see comparing the last
-fragment with the one above, this macro expands to nothing in the (usual) ANSI
-mode and prefixes {\tt 'L'} to its argument in the Unicode mode.
-
-The important conclusion is that if you use {\tt wxChar} instead of
-{\tt char}, avoid using C style strings and use {\tt wxString} instead and
-don't forget to enclose all string literals inside \helpref{wxT()}{wxt} macro, your
-program automatically becomes (almost) Unicode compliant!
-
-Just let us state once again the rules:
-
-\begin{itemize}
-\item Always use {\tt wxChar} instead of {\tt char}
-\item Always enclose literal string constants in \helpref{wxT()}{wxt} macro
-unless they're already converted to the right representation (another standard
-wxWidgets macro \helpref{\_()}{underscore} does it, for example, so there is no
-need for {\tt wxT()} in this case) or you intend to pass the constant directly
-to an external function which doesn't accept wide-character strings.
-\item Use {\tt wxString} instead of C style strings.
-\end{itemize}
-
-\subsection{Unicode and the outside world}\label{unicodeoutsidewxw}
-
-We have seen that it was easy to write Unicode programs using wxWidgets types
-and macros, but it has been also mentioned that it isn't quite enough.
-Although everything works fine inside the program, things can get nasty when
-it tries to communicate with the outside world which, sadly, often expects
-ANSI strings (a notable exception is the entire Win32 API which accepts either
-Unicode or ANSI strings and which thus makes it unnecessary to ever perform
-any conversions in the program). GTK 2.0 only accepts UTF-8 strings.
-
-To get an ANSI string from a wxString, you may use the
-mb\_str() function which always returns an ANSI
-string (independently of the mode - while the usual
-\helpref{c\_str()}{wxstringcstr} returns a pointer to the internal
-representation which is either ASCII or Unicode). More rarely used, but still
-useful, is wc\_str() function which always returns
-the Unicode string.
-
-Sometimes it is also necessary to go from ANSI strings to wxStrings.
-In this case, you can use the converter-constructor, as follows:
-
-\begin{verbatim}
- const char* ascii_str = "Some text";
- wxString str(ascii_str, wxConvUTF8);
-\end{verbatim}
-
-This code also compiles fine under a non-Unicode build of wxWidgets,
-but in that case the converter is ignored.
-
-For more information about converters and Unicode see
-the \helpref{wxMBConv classes overview}{mbconvclasses}.
-
-% TODO describe fn_str(), wx_str(), wxCharBuf classes, ...
-
-\subsection{Unicode-related compilation settings}\label{unicodesettings}
-
-You should define {\tt wxUSE\_UNICODE} to $1$ to compile your program in
-Unicode mode. This currently works for wxMSW, wxGTK, wxMac and wxX11. If you
-compile your program in ANSI mode you can still define {\tt wxUSE\_WCHAR\_T}
-to get some limited support for {\tt wchar\_t} type.
-
-This will allow your program to perform conversions between Unicode strings and
-ANSI ones (using \helpref{wxMBConv classes}{mbconvclasses})
-and construct wxString objects from Unicode strings (presumably read
-from some external file or elsewhere).
-
-\subsection{Traps for the unwary}
-
-\begin{itemize}
- \item Casting c\_str() to void* is now char*, not wxChar*
- \item Passing c\_str(), mb\_str() or wc\_str() to variadic functions
-doesn't work
-\end{itemize}