From: Francesco Montorsi Date: Sat, 6 Dec 2008 16:24:52 +0000 (+0000) Subject: moved many things from wxString reference page to the wxString overview; updated... X-Git-Url: https://git.saurik.com/wxWidgets.git/commitdiff_plain/727aa9062ba6ffa3153069e15df38dca958172d5 moved many things from wxString reference page to the wxString overview; updated some old/incoherent informations; added some DIA-drawn graphs showing UTF8/UCS2 different representation used by wxString git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@57140 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775 --- diff --git a/docs/doxygen/images/overview_unicode_codes.dia b/docs/doxygen/images/overview_unicode_codes.dia new file mode 100644 index 0000000000..e8bd50f988 Binary files /dev/null and b/docs/doxygen/images/overview_unicode_codes.dia differ diff --git a/docs/doxygen/images/overview_unicode_codes.png b/docs/doxygen/images/overview_unicode_codes.png index c936ea0661..0da2d8ffa8 100644 Binary files a/docs/doxygen/images/overview_unicode_codes.png and b/docs/doxygen/images/overview_unicode_codes.png differ diff --git a/docs/doxygen/images/overview_wxstring_encoding.dia b/docs/doxygen/images/overview_wxstring_encoding.dia new file mode 100644 index 0000000000..4d42a4a1a0 Binary files /dev/null and b/docs/doxygen/images/overview_wxstring_encoding.dia differ diff --git a/docs/doxygen/images/overview_wxstring_encoding.png b/docs/doxygen/images/overview_wxstring_encoding.png new file mode 100644 index 0000000000..f81af5d1a2 Binary files /dev/null and b/docs/doxygen/images/overview_wxstring_encoding.png differ diff --git a/docs/doxygen/overviews/mbconvclasses.h b/docs/doxygen/overviews/mbconvclasses.h index 4dbb18b64c..5cec572421 100644 --- a/docs/doxygen/overviews/mbconvclasses.h +++ b/docs/doxygen/overviews/mbconvclasses.h @@ -51,6 +51,8 @@ unhindered through any traditional transport channels. @section overview_mbconv_string Background: The wxString Class +@todo rewrite this overview; it's not up2date with wxString changes + If you have compiled wxWidgets in Unicode mode, the wxChar type will become identical to wchar_t rather than char, and a wxString stores wxChars. Hence, all wxString manipulation in your application will then operate on Unicode diff --git a/docs/doxygen/overviews/string.h b/docs/doxygen/overviews/string.h index 42247d28d4..54513b6bb4 100644 --- a/docs/doxygen/overviews/string.h +++ b/docs/doxygen/overviews/string.h @@ -13,10 +13,12 @@ Classes: wxString, wxArrayString, wxStringTokenizer @li @ref overview_string_intro +@li @ref overview_string_internal @li @ref overview_string_comparison @li @ref overview_string_advice @li @ref overview_string_related @li @ref overview_string_tuning +@li @ref overview_string_settings
@@ -24,25 +26,104 @@ Classes: wxString, wxArrayString, wxStringTokenizer @section overview_string_intro Introduction -wxString is a class which represents a character string of arbitrary length and -containing arbitrary characters. The ASCII NUL character is allowed, but be -aware that in the current string implementation some methods might not work -correctly in this case. +wxString is a class which represents a Unicode string of arbitrary length and +containing arbitrary characters. -Since wxWidgets 3.0 wxString internally uses UCS-2 (basically 2-byte per -character wchar_t) under Windows and UTF-8 under Unix, Linux and -OS X to store its content. Much work has been done to make -existing code using ANSI string literals work as before. +The @c NUL character is allowed, but be +aware that in the current string implementation some methods might not work +correctly in this case. @todo still true? This class has all the standard operations you can expect to find in a string class: dynamic memory management (string extends to accommodate new -characters), construction from other strings, C strings, wide character C strings +characters), construction from other strings, C strings, wide character C strings and characters, assignment operators, access to individual characters, string -concatenation and comparison, substring extraction, case conversion, trimming and padding (with -spaces), searching and replacing and both C-like @c printf (wxString::Printf) +concatenation and comparison, substring extraction, case conversion, trimming and +padding (with spaces), searching and replacing and both C-like @c printf (wxString::Printf) and stream-like insertion functions as well as much more - see wxString for a list of all functions. +The wxString class has been completely rewritten for wxWidgets 3.0 but much work +has been done to make existing code using ANSI string literals work as it did +in previous versions. + + +@section overview_string_internal Internal wxString encoding + +Since wxWidgets 3.0 wxString internally uses UCS-2 (with Unicode +code units stored in @c wchar_t) under Windows and UTF-8 (with Unicode +code units stored in @c char) under Unix, Linux and Mac OS X to store its content. + +For definitions of code units and code points terms, please +see the @ref overview_unicode_encodings paragraph. + +Note that there is a difference about UCS-2 and UTF-16: the first is a fixed-length +encoding, without surrogate pairs, while the latter is a +variable-length encoding. Except for this the two encodings are identical. + +For simplicity of implementation, wxString when wxUSE_UNICODE_WCHAR==1 +(e.g. on Windows) uses UCS-2 and thus doesn't know anything about surrogate pairs; +it always consider 1 code unit per 1 code point, while this is really true only for +characters in the @e BMP (Basic Multilingual Plane). +Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user +code has to take care of surrogate pair handling himself. +(Note however that Windows itself has built-in support for surrogate pairs in UTF-16, +such as for drawing strings on screen.) + +When instead wxUSE_UNICODE_UTF8==1 (e.g. on Linux and Mac OS X) +wxString handles UTF8 multi-bytes sequences just fine, so that you can use +UTF8 in a completely transparent way: + +Example: +@code + // first test, using exotic characters outside of the Unicode BMP: + + wxString test = wxString::FromUTF8("\xF0\x90\x8C\x80"); + // U+10300 is "OLD ITALIC LETTER A" and is part of Unicode Plane 1 + // in UTF8 it's encoded as 0xF0 0x90 0x8C 0x80 + + // it's a single Unicode code-point encoded as: + // - a UTF16 surrogate pair under Windows + // - a UTF8 multiple-bytes sequence under Linux + // (without considering the final NULL) + + wxPrintf("wxString reports a length of %d character(s)", test.length()); + // prints "wxString reports a length of 1 character(s)" on Linux + // prints "wxString reports a length of 2 character(s)" on Windows + // since Windows doesn't have surrogate pairs support! + + + // second test, this time using characters part of the Unicode BMP: + + wxString test2 = wxString::FromUTF8("\x41\xC3\xA0\xE2\x82\xAC"); + // this is the UTF8 encoding of capital letter A followed by + // 'small case letter a with grave' followed by the 'euro sign' + + // they are 3 Unicode code-points encoded as: + // - 3 UTF16 code units under Windows + // - 6 UTF8 code units under Linux + // (without considering the final NULL) + + wxPrintf("wxString reports a length of %d character(s)", test2.length()); + // prints "wxString reports a length of 3 character(s)" on Linux + // prints "wxString reports a length of 3 character(s)" on Windows +@endcode + +To better explain what stated above, consider the second string of the example +above; it's composed by 3 characters and the final @c NULL: + +@image html overview_wxstring_encoding.png + +As you can see, UCS2/UTF16 encoding is straightforward (for characters in the @e BMP) +and in this example the UCS2-encoded wxString takes 8 bytes. +UTF8 encoding is more elaborated and in this example takes 7 bytes. + +The type used by wxString to store Unicode code units is called wxStringCharType. + +In general, for strings containing many latin characters UTF8 provides a big +advantage in memory footprint respect UTF16, but requires some more processing +for common operations like e.g. length calculation. + + @section overview_string_comparison Comparison to Other String Classes @@ -50,52 +131,53 @@ The advantages of using a special string class instead of working directly with C strings are so obvious that there is a huge number of such classes available. The most important advantage is the need to always remember to allocate/free memory for C strings; working with fixed size buffers almost inevitably leads -to buffer overflows. At last, C++ has a standard string class (std::string). So +to buffer overflows. At last, C++ has a standard string class (@c std::string). So why the need for wxString? There are several advantages: -@li Efficiency: Since wxWidgets 3.0 wxString uses std::string (UTF8 - mode under Linux, Unix and OS X) or std::wstring (MSW) internally by - default to store its constent. wxString will therefore inherit the - performance characteristics from std::string. +@li Efficiency: Since wxWidgets 3.0 wxString uses @c std::string (in UTF8 + mode under Linux, Unix and OS X) or @c std::wstring (in UTF16 mode under Windows) + internally by default to store its contents. wxString will therefore inherit the + performance characteristics from @c std::string. @li Compatibility: This class tries to combine almost full compatibility - with the old wxWidgets 1.xx wxString class, some reminiscence to MFC - CString class and 90% of the functionality of std::string class. -@li Rich set of functions: Some of the functions present in wxString are very - useful but don't exist in most of other string classes: for example, - wxString::AfterFirst, wxString::BeforeLast, wxString::operators or - wxString::Printf. Of course, all the standard string operations are - supported as well. -@li Unicode wxString is Unicode friendly: it allows to easily convert to - and from ANSI and Unicode strings (see the @ref overview_unicode "unicode overview" - for more details) and maps to @c wstring transparently. + with the old wxWidgets 1.xx wxString class, some reminiscence of MFC's + CString class and 90% of the functionality of @c std::string class. +@li Rich set of functions: Some of the functions present in wxString are + very useful but don't exist in most of other string classes: for example, + wxString::AfterFirst, wxString::BeforeLast, wxString::Printf. + Of course, all the standard string operations are supported as well. +@li wxString is Unicode friendly: it allows to easily convert to + and from ANSI and Unicode strings (see @ref overview_unicode + for more details) and maps to @c std::wstring transparently. @li Used by wxWidgets: And, of course, this class is used everywhere inside wxWidgets so there is no performance loss which would result from - conversions of objects of any other string class (including std::string) to + conversions of objects of any other string class (including @c std::string) to wxString internally by wxWidgets. However, there are several problems as well. The most important one is probably that there are often several functions to do exactly the same thing: for example, to get the length of the string either one of wxString::length(), wxString::Len() or wxString::Length() may be used. The first function, as -almost all the other functions in lowercase, is std::string compatible. The +almost all the other functions in lowercase, is @c std::string compatible. The second one is the "native" wxString version and the last one is the wxWidgets 1.xx way. -So which is better to use? The usage of the std::string compatible functions is +So which is better to use? The usage of the @c std::string compatible functions is strongly advised! It will both make your code more familiar to other C++ -programmers (who are supposed to have knowledge of std::string but not of +programmers (who are supposed to have knowledge of @c std::string but not of wxString), let you reuse the same code in both wxWidgets and other programs (by -just typedefing wxString as std::string when used outside wxWidgets) and by +just typedefing wxString as @c std::string when used outside wxWidgets) and by staying compatible with future versions of wxWidgets which will probably start -using std::string sooner or later too. +using @c std::string sooner or later too. -In the situations where there is no corresponding std::string function, please +In the situations where there is no corresponding @c std::string function, please try to use the new wxString methods and not the old wxWidgets 1.xx variants which are deprecated and may disappear in future versions. @section overview_string_advice Advice About Using wxString +@subsection overview_string_implicitconv Implicit conversions + Probably the main trap with using this class is the implicit conversion operator to const char*. It is advised that you use wxString::c_str() instead to clearly indicate when the conversion is done. Specifically, the @@ -124,8 +206,8 @@ because the argument of @c puts() is known to be of the type const char*, this is @b not done for @c printf() which is a function with variable number of arguments (and whose arguments are of unknown types). So this call may do any number of things (including displaying the correct -string on screen), although the most likely result is a program crash. The -solution is to use wxString::c_str(). Just replace this line with this: +string on screen), although the most likely result is a program crash. +The solution is to use wxString::c_str(). Just replace this line with this: @code printf("Hello, %s!\n", output.c_str()); @@ -138,10 +220,43 @@ its contents are completely arbitrary. The solution to this problem is also easy, just make the function return wxString instead of a C string. This leads us to the following general advice: all functions taking string -arguments should take const wxString (this makes assignment to the +arguments should take const wxString& (this makes assignment to the strings inside the function faster) and all functions returning strings should return wxString - this makes it safe to return local variables. +Finally note that wxString uses the current locale encoding to convert any C string +literal to Unicode. The same is done for converting to and from @c std::string +and for the return value of c_str(). +For this conversion, the @a wxConvLibc class instance is used. +See wxCSConv and wxMBConv. + + +@subsection overview_string_iterating Iterating wxString's characters + +As previously described, when wxUSE_UNICODE_UTF8==1, wxString internally +uses the variable-length UTF8 encoding. +Accessing a UTF-8 string by index can be very @b inefficient because +a single character is represented by a variable number of bytes so that +the entire string has to be parsed in order to find the character. +Since iterating over a string by index is a common programming technique and +was also possible and encouraged by wxString using the access operator[]() +wxString implements caching of the last used index so that iterating over +a string is a linear operation even in UTF-8 mode. + +It is nonetheless recommended to use @b iterators (instead of index based +access) like this: + +@code +wxString s = "hello"; +wxString::const_iterator i; +for (i = s.begin(); i != s.end(); ++i) +{ + wxUniChar uni_ch = *i; + // do something with it +} +@endcode + + @section overview_string_related String Related Functions and Classes @@ -158,7 +273,7 @@ these problems: wxIsEmpty() verifies whether the string is empty (returning case-insensitive string comparison function known either as @c stricmp() or @c strcasecmp() on different platforms. -The @ header also defines wxSnprintf and wxVsnprintf +The @ header also defines ::wxSnprintf and ::wxVsnprintf functions which should be used instead of the inherently dangerous standard @c sprintf() and which use @c snprintf() instead which does buffer size checks whenever possible. Of course, you may also use wxString::Printf which is also @@ -180,7 +295,7 @@ wxStrings. @note This section is strictly about performance issues and is absolutely not necessary to read for using wxString class. Please skip it unless you feel -familiar with profilers and relative tools. +familiar with profilers and relative tools. For the performance reasons wxString doesn't allocate exactly the amount of memory needed for each string. Instead, it adds a small amount of space to each @@ -244,5 +359,16 @@ really consider fine tuning wxString for your application). It goes without saying that a profiler should be used to measure the precise difference the change to @c EXTRA_ALLOC makes to your program. + +@section overview_string_settings wxString Related Compilation Settings + +Much work has been done to make existing code using ANSI string literals +work as before version 3.0. +If you nonetheless need to have a wxString that uses @c wchar_t +on Unix and Linux, too, you can specify this on the command line with the +@c configure @c --disable-utf8 switch or you can consider using wxUString +or @c std::wstring instead. + + */ diff --git a/docs/doxygen/overviews/unicode.h b/docs/doxygen/overviews/unicode.h index e372007b1e..e50454a1cd 100644 --- a/docs/doxygen/overviews/unicode.h +++ b/docs/doxygen/overviews/unicode.h @@ -49,30 +49,34 @@ other services should be ready to deal with Unicode. When working with Unicode, it's important to define the meaning of some terms. -A @e glyph is a particular image that represents a @e character or part of a character. +A glyph is a particular image that represents a character or part +of a character. Any character may have one or more glyph associated; e.g. some of the possible glyphs for the capital letter 'A' are: @image html overview_unicode_glyphs.png Unicode assigns each character of almost any existing alphabet/script a number, -which is called code point; it's typically indicated in documentation +which is called code point; it's typically indicated in documentation manuals and in the Unicode website as @c U+xxxx where @c xxxx is an hexadecimal number. The Unicode standard divides the space of all possible code points in @e planes; a plane is a range of 65,536 (1000016) contiguous Unicode code points. Planes are numbered from 0 to 16, where the first one is the @e BMP, or Basic Multilingual Plane. +The BMP contains characters for all modern languages, and a large number of +special characters. The other planes in fact contain mainly historic scripts, +special-purpose characters or are unused. Code points are represented in computer memory as a sequence of one or more -code units, where a code unit is a unit of memory: 8, 16, or 32 bits. +code units, where a code unit is a unit of memory: 8, 16, or 32 bits. More precisely, a code unit is the minimal bit combination that can represent a unit of encoded text for processing or interchange. The @e UTF or Unicode Transformation Formats are algorithms mapping the Unicode code points to code unit sequences. The simplest of them is UTF-32 where -each code unit is composed by 32 bits (4 bytes) and each code point is represented -by a single code unit. +each code unit is composed by 32 bits (4 bytes) and each code point is always +represented by a single code unit (fixed length encoding). (Note that even UTF-32 is still not completely trivial as the mapping is different for little and big-endian architectures). UTF-32 is commonly used under Unix systems for internal representation of Unicode strings. @@ -81,6 +85,7 @@ Another very widespread standard is UTF-16 which is used by Microsoft Win it encodes the first (approximately) 64 thousands of Unicode code points (the BMP plane) using 16-bit code units (2 bytes) and uses a pair of 16-bit code units to encode the characters beyond this. These pairs are called @e surrogate. +Thus UTF16 uses a variable number of code units to encode each code point. Finally, the most widespread encoding used for the external Unicode storage (e.g. files and network protocols) is UTF-8 which is byte-oriented and so @@ -107,7 +112,7 @@ Typically when UTF8 is used, code units are stored into @c char types, since @c char are 8bit wide on almost all systems; when using UTF16 typically code units are stored into @c wchar_t types since @c wchar_t is at least 16bits on all systems. This is also the approach used by wxString. -See @ref overview_wxstring for more info. +See @ref overview_string for more info. See also http://unicode.org/glossary/ for the official definitions of the terms reported above. @@ -123,8 +128,8 @@ programs require the Microsoft Layer for Unicode to run on Windows 95/98/ME. However, unlike the Unicode build mode of the previous versions of wxWidgets, this support is mostly transparent: you can still continue to work with the @b narrow -(i.e. current-locale-encoded @c char*) strings even if @b wide -(i.e. UTF16/UCS2-encoded @c wchar_t* or UTF8-encoded @c char) strings are also +(i.e. current locale-encoded @c char*) strings even if @b wide +(i.e. UTF16/UCS2-encoded @c wchar_t* or UTF8-encoded @c char*) strings are also supported. Any wxWidgets function accepts arguments of either type as both kinds of strings are implicitly converted to wxString, so both @code @@ -132,7 +137,7 @@ wxMessageBox("Hello, world!"); @endcode and the somewhat less usual @code -wxMessageBox(L"Salut \u00e0 toi!"); // 00E0 is "Latin Small Letter a with Grave" +wxMessageBox(L"Salut \u00E0 toi!"); // U+00E0 is "Latin Small Letter a with Grave" @endcode work as expected. @@ -147,9 +152,10 @@ in the case of gcc). In particular, the most common encoding used under modern Unix systems is UTF-8 and as the string above is not a valid UTF-8 byte sequence, nothing would be displayed at all in this case. Thus it is important to never use 8-bit (instead of 7-bit) characters directly in the program source -but use wide strings or, alternatively, write +but use wide strings or, alternatively, write: @code -wxMessageBox(wxString::FromUTF8("Salut \xc3\xa0 toi!")); +wxMessageBox(wxString::FromUTF8("Salut \xC3\xA0 toi!")); + // in UTF8 the character U+00E0 is encoded as 0xC3A0 @endcode In a similar way, wxString provides access to its contents as either @c wchar_t or @@ -327,6 +333,7 @@ different encoding of it. So you need to be able to convert the data to various representations and the wxString methods wxString::ToAscii(), wxString::ToUTF8() (or its synonym wxString::utf8_str()), wxString::mb_str(), wxString::c_str() and wxString::wc_str() can be used for this. + The first of them should be only used for the string containing 7-bit ASCII characters only, anything else will be replaced by some substitution character. wxString::mb_str() converts the string to the encoding used by the current locale diff --git a/interface/wx/string.h b/interface/wx/string.h index ed2afa915f..7d57b228a6 100644 --- a/interface/wx/string.h +++ b/interface/wx/string.h @@ -6,59 +6,6 @@ // Licence: wxWindows license ///////////////////////////////////////////////////////////////////////////// -/** - @class wxStringBuffer - - This tiny class allows you to conveniently access the wxString internal buffer - as a writable pointer without any risk of forgetting to restore the string - to the usable state later. - - For example, assuming you have a low-level OS function called - @c "GetMeaningOfLifeAsString(char *)" returning the value in the provided - buffer (which must be writable, of course) you might call it like this: - - @code - wxString theAnswer; - GetMeaningOfLifeAsString(wxStringBuffer(theAnswer, 1024)); - if ( theAnswer != "42" ) - wxLogError("Something is very wrong!"); - @endcode - - Note that the exact usage of this depends on whether or not wxUSE_STL is - enabled. If wxUSE_STL is enabled, wxStringBuffer creates a separate empty - character buffer, and if wxUSE_STL is disabled, it uses GetWriteBuf() from - wxString, keeping the same buffer wxString uses intact. In other words, - relying on wxStringBuffer containing the old wxString data is not a good - idea if you want to build your program both with and without wxUSE_STL. - - @library{wxbase} - @category{data} -*/ -class wxStringBuffer -{ -public: - /** - Constructs a writable string buffer object associated with the given string - and containing enough space for at least @a len characters. - Basically, this is equivalent to calling wxString::GetWriteBuf() and - saving the result. - */ - wxStringBuffer(const wxString& str, size_t len); - - /** - Restores the string passed to the constructor to the usable state by calling - wxString::UngetWriteBuf() on it. - */ - ~wxStringBuffer(); - - /** - Returns the writable pointer to a buffer of the size at least equal to the - length specified in the constructor. - */ - wxStringCharType* operator wxStringCharType *(); -}; - - /** @class wxString @@ -68,66 +15,29 @@ public: version wxWidgets 3.0. wxString is a class representing a Unicode character string. - wxString uses @c std::string internally to store its content - unless this is not supported by the compiler or disabled - specifically when building wxWidgets and it therefore inherits - many features from @c std::string. Most implementations of - @c std::string are thread-safe and don't use reference counting. - By default, wxString uses @c std::string internally even if - wxUSE_STL is not defined. - - wxString now internally uses UTF-16 under Windows and UTF-8 under - Unix, Linux and OS X to store its content. Note that when iterating - over a UTF-16 string under Windows, the user code has to take care - of surrogate pair handling whereas Windows itself has built-in - support pairs in UTF-16, such as for drawing strings on screen. - - Much work has been done to make existing code using ANSI string literals - work as before. If you nonetheless need to have a wxString that uses wchar_t - on Unix and Linux, too, you can specify this on the command line with the - @c configure @c --disable-utf8 switch or you can consider using wxUString - or std::wstring instead. - - Accessing a UTF-8 string by index can be very inefficient because - a single character is represented by a variable number of bytes so that - the entire string has to be parsed in order to find the character. - Since iterating over a string by index is a common programming technique and - was also possible and encouraged by wxString using the access operator[]() - wxString implements caching of the last used index so that iterating over - a string is a linear operation even in UTF-8 mode. - - It is nonetheless recommended to use iterators (instead of index based - access) like this: - - @code - wxString s = "hello"; - wxString::const_iterator i; - for (i = s.begin(); i != s.end(); ++i) - { - wxUniChar uni_ch = *i; - // do something with it - } - @endcode - - Please see the @ref overview_string and the @ref overview_unicode for more - information about it. - - wxString uses the current locale encoding to convert any C string - literal to Unicode. The same is done for converting to and from - @c std::string and for the return value of c_str(). - For this conversion, the @a wxConvLibc class instance is used. - See wxCSConv and wxMBConv. - - wxString implements most of the methods of the @c std::string class. - These standard functions are only listed here, but they are not - fully documented in this manual. Please see the STL documentation. + wxString uses @c std::basic_string internally (even if @c wxUSE_STL is not defined) + to store its content (unless this is not supported by the compiler or disabled + specifically when building wxWidgets) and it therefore inherits + many features from @c std::basic_string. (Note that most implementations of + @c std::basic_string are thread-safe and don't use reference counting.) + + These @c std::basic_string standard functions are only listed here, but + they are not fully documented in this manual; see the STL documentation + (http://www.cppreference.com/wiki/string/start) for more info. The behaviour of all these functions is identical to the behaviour described there. You may notice that wxString sometimes has several functions which do - the same thing like Length(), Len() and length() which - all return the string length. In all cases of such duplication the - @c std::string compatible method should be used. + the same thing like Length(), Len() and length() which all return the + string length. In all cases of such duplication the @c std::string + compatible methods should be used. + + For informations about the internal encoding used by wxString and + for important warnings and advices for using it, please read + the @ref overview_string. + + In wxWidgets 3.0 wxString always stores Unicode strings, so you should + be sure to read also @ref overview_unicode. @section string_construct Constructors and assignment operators @@ -229,6 +139,7 @@ public: original string is not modified and the function returns the extracted substring. + @li at() @li substr() @li Mid() @li operator()() @@ -1344,14 +1255,6 @@ public: STL reference for their documentation. */ //@{ - size_t length() const; - size_type size() const; - size_type max_size() const; - size_type capacity() const; - void reserve(size_t sz); - - void resize(size_t nSize, wxUniChar ch = '\0'); - wxString& append(const wxString& str, size_t pos, size_t n); wxString& append(const wxString& str); wxString& append(const char *sz, size_t n); @@ -1366,8 +1269,13 @@ public: wxString& assign(size_t n, wxUniChar ch); wxString& assign(const_iterator first, const_iterator last); + wxUniChar at(size_t n) const; + wxUniCharRef at(size_t n); + void clear(); + size_type capacity() const; + int compare(const wxString& str) const; int compare(size_t nStart, size_t nLen, const wxString& str) const; int compare(size_t nStart, size_t nLen, @@ -1377,6 +1285,8 @@ public: int compare(size_t nStart, size_t nLen, const wchar_t* sz, size_t nCount = npos) const; + wxCStrData data() const; + bool empty() const; wxString& erase(size_type pos = 0, size_type n = npos); @@ -1387,6 +1297,28 @@ public: size_t find(const char* sz, size_t nStart = 0, size_t n = npos) const; size_t find(const wchar_t* sz, size_t nStart = 0, size_t n = npos) const; size_t find(wxUniChar ch, size_t nStart = 0) const; + size_t find_first_of(const char* sz, size_t nStart = 0) const; + size_t find_first_of(const wchar_t* sz, size_t nStart = 0) const; + size_t find_first_of(const char* sz, size_t nStart, size_t n) const; + size_t find_first_of(const wchar_t* sz, size_t nStart, size_t n) const; + size_t find_first_of(wxUniChar c, size_t nStart = 0) const + size_t find_last_of (const wxString& str, size_t nStart = npos) const + size_t find_last_of (const char* sz, size_t nStart = npos) const; + size_t find_last_of (const wchar_t* sz, size_t nStart = npos) const; + size_t find_last_of(const char* sz, size_t nStart, size_t n) const; + size_t find_last_of(const wchar_t* sz, size_t nStart, size_t n) const; + size_t find_last_of(wxUniChar c, size_t nStart = npos) const + size_t find_first_not_of(const wxString& str, size_t nStart = 0) const + size_t find_first_not_of(const char* sz, size_t nStart = 0) const; + size_t find_first_not_of(const wchar_t* sz, size_t nStart = 0) const; + size_t find_first_not_of(const char* sz, size_t nStart, size_t n) const; + size_t find_first_not_of(const wchar_t* sz, size_t nStart, size_t n) const; + size_t find_first_not_of(wxUniChar ch, size_t nStart = 0) const; + size_t find_last_not_of(const wxString& str, size_t nStart = npos) const + size_t find_last_not_of(const char* sz, size_t nStart = npos) const; + size_t find_last_not_of(const wchar_t* sz, size_t nStart = npos) const; + size_t find_last_not_of(const char* sz, size_t nStart, size_t n) const; + size_t find_last_not_of(const wchar_t* sz, size_t nStart, size_t n) const; wxString& insert(size_t nPos, const wxString& str); wxString& insert(size_t nPos, const wxString& str, size_t nStart, size_t n); @@ -1397,6 +1329,13 @@ public: void insert(iterator it, const_iterator first, const_iterator last); void insert(iterator it, size_type n, wxUniChar ch); + size_t length() const; + + size_type max_size() const; + + void reserve(size_t sz); + void resize(size_t nSize, wxUniChar ch = '\0'); + wxString& replace(size_t nStart, size_t nLen, const wxString& str); wxString& replace(size_t nStart, size_t nLen, size_t nCount, wxUniChar ch); wxString& replace(size_t nStart, size_t nLen, @@ -1423,12 +1362,10 @@ public: size_t rfind(const wchar_t* sz, size_t nStart = npos, size_t n = npos) const; size_t rfind(wxUniChar ch, size_t nStart = npos) const; + size_type size() const; wxString substr(size_t nStart = 0, size_t nLen = npos) const; - void swap(wxString& str); - //@} - }; /** @@ -1510,3 +1447,55 @@ public: wxChar* operator wxChar *(); }; + +/** + @class wxStringBuffer + + This tiny class allows you to conveniently access the wxString internal buffer + as a writable pointer without any risk of forgetting to restore the string + to the usable state later. + + For example, assuming you have a low-level OS function called + @c "GetMeaningOfLifeAsString(char *)" returning the value in the provided + buffer (which must be writable, of course) you might call it like this: + + @code + wxString theAnswer; + GetMeaningOfLifeAsString(wxStringBuffer(theAnswer, 1024)); + if ( theAnswer != "42" ) + wxLogError("Something is very wrong!"); + @endcode + + Note that the exact usage of this depends on whether or not @c wxUSE_STL is + enabled. If @c wxUSE_STL is enabled, wxStringBuffer creates a separate empty + character buffer, and if @c wxUSE_STL is disabled, it uses GetWriteBuf() from + wxString, keeping the same buffer wxString uses intact. In other words, + relying on wxStringBuffer containing the old wxString data is not a good + idea if you want to build your program both with and without @c wxUSE_STL. + + @library{wxbase} + @category{data} +*/ +class wxStringBuffer +{ +public: + /** + Constructs a writable string buffer object associated with the given string + and containing enough space for at least @a len characters. + Basically, this is equivalent to calling wxString::GetWriteBuf() and + saving the result. + */ + wxStringBuffer(const wxString& str, size_t len); + + /** + Restores the string passed to the constructor to the usable state by calling + wxString::UngetWriteBuf() on it. + */ + ~wxStringBuffer(); + + /** + Returns the writable pointer to a buffer of the size at least equal to the + length specified in the constructor. + */ + wxStringCharType* operator wxStringCharType *(); +};