+For simplicity of implementation, wxString when <tt>wxUSE_UNICODE_WCHAR==1</tt>
+(e.g. on Windows) uses <em>per code unit indexing</em> instead of
+<em>per code point indexing</em> and doesn't know anything about surrogate pairs;
+in other words it always considers code points to be composed by 1 code unit,
+while this is really true only for characters in the @e BMP (Basic Multilingual Plane).
+Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user
+code has to take care of <em>surrogate pairs</em> himself.
+(Note however that Windows itself has built-in support for surrogate pairs in UTF-16,
+such as for drawing strings on screen.)
+
+@remarks
+Note that while the behaviour of wxString when <tt>wxUSE_UNICODE_WCHAR==1</tt>
+resembles UCS-2 encoding, it's not completely correct to refer to wxString as
+UCS-2 encoded since you can encode code points outside the @e BMP in a wxString
+as two code units (i.e. as a surrogate pair; as already mentioned however wxString
+will "see" them as two different code points)
+
+When instead <tt>wxUSE_UNICODE_UTF8==1</tt> (e.g. on Linux and Mac OS X)
+wxString handles UTF8 multi-bytes sequences just fine also for characters outside
+the BMP (it implements <em>per code point indexing</em>), so that you can use
+UTF8 in a completely transparent way:
+
+Example:
+@code
+ // first test, using exotic characters outside of the Unicode BMP:
+
+ wxString test = wxString::FromUTF8("\xF0\x90\x8C\x80");
+ // U+10300 is "OLD ITALIC LETTER A" and is part of Unicode Plane 1
+ // in UTF8 it's encoded as 0xF0 0x90 0x8C 0x80
+
+ // it's a single Unicode code-point encoded as:
+ // - a UTF16 surrogate pair under Windows
+ // - a UTF8 multiple-bytes sequence under Linux
+ // (without considering the final NULL)
+
+ wxPrintf("wxString reports a length of %d character(s)", test.length());
+ // prints "wxString reports a length of 1 character(s)" on Linux
+ // prints "wxString reports a length of 2 character(s)" on Windows
+ // since wxString on Windows doesn't have surrogate pairs support!
+
+
+ // second test, this time using characters part of the Unicode BMP:
+
+ wxString test2 = wxString::FromUTF8("\x41\xC3\xA0\xE2\x82\xAC");
+ // this is the UTF8 encoding of capital letter A followed by
+ // 'small case letter a with grave' followed by the 'euro sign'
+
+ // they are 3 Unicode code-points encoded as:
+ // - 3 UTF16 code units under Windows
+ // - 6 UTF8 code units under Linux
+ // (without considering the final NULL)
+
+ wxPrintf("wxString reports a length of %d character(s)", test2.length());
+ // prints "wxString reports a length of 3 character(s)" on Linux
+ // prints "wxString reports a length of 3 character(s)" on Windows
+@endcode
+
+To better explain what stated above, consider the second string of the example
+above; it's composed by 3 characters and the final @c NULL:
+
+@image html overview_wxstring_encoding.png
+
+As you can see, UTF16 encoding is straightforward (for characters in the @e BMP)
+and in this example the UTF16-encoded wxString takes 8 bytes.
+UTF8 encoding is more elaborated and in this example takes 7 bytes.
+
+In general, for strings containing many latin characters UTF8 provides a big
+advantage with regards to the memory footprint respect UTF16, but requires some
+more processing for common operations like e.g. length calculation.
+
+Finally, note that the type used by wxString to store Unicode code units
+(@c wchar_t or @c char) is always @c typedef-ined to be ::wxStringCharType.
+
+
+@section overview_string_binary Using wxString to store binary data
+
+wxString can be used to store binary data (even if it contains @c NULs) using the
+functions wxString::To8BitData and wxString::From8BitData.
+
+Beware that even if @c NUL character is allowed, in the current string implementation
+some methods might not work correctly with them.
+
+Note however that other classes like wxMemoryBuffer are more suited to this task.
+For handling binary data you may also want to look at the wxStreamBuffer,
+wxMemoryOutputStream, wxMemoryInputStream classes.