X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/2f365fcbd591ef4da63f5ca44d1f4b22ab20d287..cd95f7e65c4e1ee61a5d90eb13687ff468cb13ad:/docs/doxygen/overviews/string.h
diff --git a/docs/doxygen/overviews/string.h b/docs/doxygen/overviews/string.h
index 3829548e3c..927a208d0e 100644
--- a/docs/doxygen/overviews/string.h
+++ b/docs/doxygen/overviews/string.h
@@ -3,7 +3,7 @@
// Purpose: topic overview
// Author: wxWidgets team
// RCS-ID: $Id$
-// Licence: wxWindows license
+// Licence: wxWindows licence
/////////////////////////////////////////////////////////////////////////////
/**
@@ -56,7 +56,7 @@ see the @ref overview_unicode_encodings paragraph.
For simplicity of implementation, wxString when wxUSE_UNICODE_WCHAR==1
(e.g. on Windows) uses per code unit indexing instead of
per code point indexing and doesn't know anything about surrogate pairs;
-in other words it always considers code points to be composed by 1 code point,
+in other words it always considers code points to be composed by 1 code unit,
while this is really true only for characters in the @e BMP (Basic Multilingual Plane).
Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user
code has to take care of surrogate pairs himself.
@@ -66,7 +66,9 @@ such as for drawing strings on screen.)
@remarks
Note that while the behaviour of wxString when wxUSE_UNICODE_WCHAR==1
resembles UCS-2 encoding, it's not completely correct to refer to wxString as
-UCS-2 encoded since you can encode characters outside the @e BMP in a wxString.
+UCS-2 encoded since you can encode code points outside the @e BMP in a wxString
+as two code units (i.e. as a surrogate pair; as already mentioned however wxString
+will "see" them as two different code points)
When instead wxUSE_UNICODE_UTF8==1 (e.g. on Linux and Mac OS X)
wxString handles UTF8 multi-bytes sequences just fine also for characters outside