// Purpose: topic overview
// Author: wxWidgets team
// RCS-ID: $Id$
-// Licence: wxWindows license
+// Licence: wxWindows licence
/////////////////////////////////////////////////////////////////////////////
/**
@page overview_string wxString Overview
-Classes: wxString, wxArrayString, wxStringTokenizer
-
-@li @ref overview_string_intro
-@li @ref overview_string_internal
-@li @ref overview_string_binary
-@li @ref overview_string_comparison
-@li @ref overview_string_advice
-@li @ref overview_string_related
-@li @ref overview_string_tuning
-@li @ref overview_string_settings
-
-
-<hr>
-
-
-@section overview_string_intro Introduction
+@tableofcontents
wxString is a class which represents a Unicode string of arbitrary length and
containing arbitrary Unicode characters.
in previous versions.
-@section overview_string_internal Internal wxString encoding
+@section overview_string_internal Internal wxString Encoding
Since wxWidgets 3.0 wxString internally uses <b>UTF-16</b> (with Unicode
code units stored in @c wchar_t) under Windows and <b>UTF-8</b> (with Unicode
For simplicity of implementation, wxString when <tt>wxUSE_UNICODE_WCHAR==1</tt>
(e.g. on Windows) uses <em>per code unit indexing</em> instead of
<em>per code point indexing</em> and doesn't know anything about surrogate pairs;
-in other words it always considers code points to be composed by 1 code point,
+in other words it always considers code points to be composed by 1 code unit,
while this is really true only for characters in the @e BMP (Basic Multilingual Plane).
Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user
code has to take care of <em>surrogate pairs</em> himself.
@remarks
Note that while the behaviour of wxString when <tt>wxUSE_UNICODE_WCHAR==1</tt>
resembles UCS-2 encoding, it's not completely correct to refer to wxString as
-UCS-2 encoded since you can encode characters outside the @e BMP in a wxString.
+UCS-2 encoded since you can encode code points outside the @e BMP in a wxString
+as two code units (i.e. as a surrogate pair; as already mentioned however wxString
+will "see" them as two different code points)
When instead <tt>wxUSE_UNICODE_UTF8==1</tt> (e.g. on Linux and Mac OS X)
wxString handles UTF8 multi-bytes sequences just fine also for characters outside
See wxCSConv and wxMBConv.
-@subsection overview_string_iterating Iterating wxString's characters
+@subsection overview_string_iterating Iterating wxString Characters
As previously described, when <tt>wxUSE_UNICODE_UTF8==1</tt>, wxString internally
uses the variable-length UTF8 encoding.
case-insensitive string comparison function known either as @c stricmp() or
@c strcasecmp() on different platforms.
-The <tt>@<wx/string.h@></tt> header also defines ::wxSnprintf and ::wxVsnprintf
+The <tt>@<wx/string.h@></tt> header also defines wxSnprintf() and wxVsnprintf()
functions which should be used instead of the inherently dangerous standard
@c sprintf() and which use @c snprintf() instead which does buffer size checks
whenever possible. Of course, you may also use wxString::Printf which is also
See also @ref page_wxusedef_important.
*/
-