X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/57ab6f2314860f6efd2d1339913c91a302020a8e..9d94efd8b0385f1afd70f71c8242a28d61527a68:/interface/wx/string.h diff --git a/interface/wx/string.h b/interface/wx/string.h index 93ff1cbee8..951719f524 100644 --- a/interface/wx/string.h +++ b/interface/wx/string.h @@ -2,7 +2,6 @@ // Name: string.h // Purpose: interface of wxStringBuffer, wxString // Author: wxWidgets team -// RCS-ID: $Id$ // Licence: wxWindows licence ///////////////////////////////////////////////////////////////////////////// @@ -10,34 +9,275 @@ /** @class wxString - The wxString class has been completely rewritten for wxWidgets 3.0 - and this change was actually the main reason for the calling that - version wxWidgets 3.0. + String class for passing textual data to or receiving it from wxWidgets. + + @note + While the use of wxString is unavoidable in wxWidgets program, you are + encouraged to use the standard string classes @c std::string or @c + std::wstring in your applications and convert them to and from wxString + only when interacting with wxWidgets. + + + wxString is a class representing a Unicode character string but with + methods taking or returning both @c wchar_t wide characters and @c wchar_t* + wide strings and traditional @c char characters and @c char* strings. The + dual nature of wxString API makes it simple to use in all cases and, + importantly, allows the code written for either ANSI or Unicode builds of + the previous wxWidgets versions to compile and work correctly with the + single unified Unicode build of wxWidgets 3.0. It is also mostly + transparent when using wxString with the few exceptions described below. + + + @section string_api API overview + + wxString tries to be similar to both @c std::string and @c std::wstring and + can mostly be used as either class. It provides practically all of the + methods of these classes, which behave exactly the same as in the standard + C++, and so are not documented here (please see any standard library + documentation, for example http://en.cppreference.com/w/cpp/string for more + details). + + In addition to these standard methods, wxString adds functions dealing with + the conversions between different string encodings, described below, as + well as many extra helpers such as functions for formatted output + (Printf(), Format(), ...), case conversion (MakeUpper(), Capitalize(), ...) + and various others (Trim(), StartsWith(), Matches(), ...). All of the + non-standard methods follow wxWidgets "CamelCase" naming convention and are + documented here. + + Notice that some wxString methods exist in several versions for + compatibility reasons. For example all of length(), Length() and Len() are + provided. In such cases it is recommended to use the standard string-like + method, i.e. length() in this case. + + + @section string_conv Converting to and from wxString + + wxString can be created from: + - ASCII string guaranteed to contain only 7 bit characters using + wxString::FromAscii(). + - Narrow @c char* string in the current locale encoding using implicit + wxString::wxString(const char*) constructor. + - Narrow @c char* string in UTF-8 encoding using wxString::FromUTF8(). + - Narrow @c char* string in the given encoding using + wxString::wxString(const char*, const wxMBConv&) constructor passing a + wxCSConv corresponding to the encoding as the second argument. + - Standard @c std::string using implicit wxString::wxString(const + std::string&) constructor. Notice that this constructor supposes that + the string contains data in the current locale encoding, use FromUTF8() + or the constructor taking wxMBConv if this is not the case. + - Wide @c wchar_t* string using implicit wxString::wxString(const + wchar_t*) constructor. + - Standard @c std::wstring using implicit wxString::wxString(const + std::wstring&) constructor. + + Notice that many of the constructors are implicit, meaning that you don't + even need to write them at all to pass the existing string to some + wxWidgets function taking a wxString. + + Similarly, wxString can be converted to: + - ASCII string using wxString::ToAscii(). This is a potentially + destructive operation as all non-ASCII string characters are replaced + with a placeholder character. + - String in the current locale encoding implicitly or using c_str() or + mb_str() methods. This is a potentially destructive operation as an @e + empty string is returned if the conversion fails. + - String in UTF-8 encoding using wxString::utf8_str(). + - String in any given encoding using mb_str() with the appropriate + wxMBConv object. This is also a potentially destructive operation. + - Standard @c std::string using wxString::ToStdString(). The contents + of the returned string use the current locale encoding, so this + conversion is potentially destructive as well. + - Wide C string using wxString::wc_str(). + - Standard @c std::wstring using wxString::ToStdWstring(). + + @note If you built wxWidgets with @c wxUSE_STL set to 1, the implicit + conversions to both narrow and wide C strings are disabled and replaced + with implicit conversions to @c std::string and @c std::wstring. + + Please notice that the conversions marked as "potentially destructive" + above can result in loss of data if their result is not checked, so you + need to verify that converting the contents of a non-empty Unicode string + to a non-UTF-8 multibyte encoding results in non-empty string. The simplest + and best way to ensure that the conversion never fails is to always use + UTF-8. + + + @section string_gotchas Traps for the unwary + + As mentioned above, wxString tries to be compatible with both narrow and + wide standard string classes and mostly does it transparently, but there + are some exceptions. + + @subsection string_gotchas_element String element access + + Some problems are caused by wxString::operator[]() which returns an object + of a special proxy class allowing to assign either a simple @c char or a @c + wchar_t to the given index. Because of this, the return type of this + operator is neither @c char nor @c wchar_t nor a reference to one of these + types but wxUniCharRef which is not a primitive type and hence can't be + used in the @c switch statement. So the following code does @e not compile + @code + wxString s(...); + switch ( s[n] ) { + case 'A': + ... + break; + } + @endcode + and you need to use + @code + switch ( s[n].GetValue() ) { + ... + } + @endcode + instead. Alternatively, you can use an explicit cast: + @code + switch ( static_cast(s[n]) ) { + ... + } + @endcode + but notice that this will result in an assert failure if the character at + the given position is not representable as a single @c char in the current + encoding, so you may want to cast to @c int instead if non-ASCII values can + be used. + + Another consequence of this unusual return type arises when it is used with + template deduction or C++11 @c auto keyword. Unlike with the normal + references which are deduced to be of the referenced type, the deduced type + for wxUniCharRef is wxUniCharRef itself. This results in potentially + unexpected behaviour, for example: + @code + wxString s("abc"); + auto c = s[0]; + c = 'x'; // Modifies the string! + wxASSERT( s == "xbc" ); + @endcode + Due to this, either explicitly specify the variable type: + @code + int c = s[0]; + c = 'x'; // Doesn't modify the string any more. + wxASSERT( s == "abc" ); + @endcode + or explicitly convert the return value: + @code + auto c = s[0].GetValue(); + c = 'x'; // Doesn't modify the string neither. + wxASSERT( s == "abc" ); + @endcode - wxString is a class representing a Unicode character string. - wxString uses @c std::basic_string internally (even if @c wxUSE_STL is not defined) - to store its content (unless this is not supported by the compiler or disabled - specifically when building wxWidgets) and it therefore inherits - many features from @c std::basic_string. (Note that most implementations of - @c std::basic_string are thread-safe and don't use reference counting.) - These @c std::basic_string standard functions are only listed here, but - they are not fully documented in this manual; see the STL documentation - (http://www.cppreference.com/wiki/string/start) for more info. - The behaviour of all these functions is identical to the behaviour - described there. + @subsection string_gotchas_conv Conversion to C string - You may notice that wxString sometimes has several functions which do - the same thing like Length(), Len() and length() which all return the - string length. In all cases of such duplication the @c std::string - compatible methods should be used. + A different class of problems happens due to the dual nature of the return + value of wxString::c_str() method, which is also used for implicit + conversions. The result of calls to this method is convertible to either + narrow @c char* string or wide @c wchar_t* string and so, again, has + neither the former nor the latter type. Usually, the correct type will be + chosen depending on how you use the result but sometimes the compiler can't + choose it because of an ambiguity, e.g.: + @code + // Some non-wxWidgets functions existing for both narrow and wide + // strings: + void dump_text(const char* text); // Version (1) + void dump_text(const wchar_t* text); // Version (2) + + wxString s(...); + dump_text(s); // ERROR: ambiguity. + dump_text(s.c_str()); // ERROR: still ambiguous. + @endcode + In this case you need to explicitly convert to the type that you need to + use or use a different, non-ambiguous, conversion function (which is + usually the best choice): + @code + dump_text(static_cast(s)); // OK, calls (1) + dump_text(static_cast(s.c_str())); // OK, calls (2) + dump_text(s.mb_str()); // OK, calls (1) + dump_text(s.wc_str()); // OK, calls (2) + dump_text(s.wx_str()); // OK, calls ??? + @endcode - For informations about the internal encoding used by wxString and - for important warnings and advices for using it, please read - the @ref overview_string. + @subsection string_vararg Using wxString with vararg functions - Since wxWidgets 3.0 wxString always stores Unicode strings, so you should - be sure to read also @ref overview_unicode. + A special subclass of the problems arising due to the polymorphic nature of + wxString::c_str() result type happens when using functions taking an + arbitrary number of arguments, such as the standard @c printf(). Due to the + rules of the C++ language, the types for the "variable" arguments of such + functions are not specified and hence the compiler cannot convert wxString + objects, or the objects returned by wxString::c_str(), to these unknown + types automatically. Hence neither wxString objects nor the results of most + of the conversion functions can be passed as vararg arguments: + @code + // ALL EXAMPLES HERE DO NOT WORK, DO NOT USE THEM! + printf("Don't do this: %s", s); + printf("Don't do that: %s", s.c_str()); + printf("Nor even this: %s", s.mb_str()); + wprintf("And even not always this: %s", s.wc_str()); + @endcode + Instead you need to either explicitly cast to the needed type: + @code + // These examples work but are not the best solution, see below. + printf("You can do this: %s", static_cast(s)); + printf("Or this: %s", static_cast(s.c_str())); + printf("And this: %s", static_cast(s.mb_str())); + wprintf("Or this: %s", static_cast(s.wc_str())); + @endcode + But a better solution is to use wxWidgets-provided functions, if possible, + as is the case for @c printf family of functions: + @code + // This is the recommended way. + wxPrintf("You can do just this: %s", s); + wxPrintf("And this (but it is redundant): %s", s.c_str()); + wxPrintf("And this (not using Unicode): %s", s.mb_str()); + wxPrintf("And this (always Unicode): %s", s.wc_str()); + @endcode + Notice that wxPrintf() replaces both @c printf() and @c wprintf() and + accepts wxString objects, results of c_str() calls but also @c char* and + @c wchar_t* strings directly. + + wxWidgets provides wx-prefixed equivalents to all the standard vararg + functions and a few more, notably wxString::Format(), wxLogMessage(), + wxLogError() and other log functions. But if you can't use one of those + functions and need to pass wxString objects to non-wx vararg functions, you + need to use the explicit casts as explained above. + + + @section string_performance Performance characteristics + + wxString uses @c std::basic_string internally to store its content (unless + this is not supported by the compiler or disabled specifically when + building wxWidgets) and it therefore inherits many features from @c + std::basic_string. In particular, most modern implementations of @c + std::basic_string are thread-safe and don't use reference counting (making + copying large strings potentially expensive) and so wxString has the same + characteristics. + + By default, wxString uses @c std::basic_string specialized for the + platform-dependent @c wchar_t type, meaning that it is not memory-efficient + for ASCII strings, especially under Unix platforms where every ASCII + character, normally fitting in a byte, is represented by a 4 byte @c + wchar_t. + + It is possible to build wxWidgets with @c wxUSE_UNICODE_UTF8 set to 1 in + which case an UTF-8-encoded string representation is stored in @c + std::basic_string specialized for @c char, i.e. the usual @c std::string. + In this case the memory efficiency problem mentioned above doesn't arise + but run-time performance of many wxString methods changes dramatically, in + particular accessing the N-th character of the string becomes an operation + taking O(N) time instead of O(1), i.e. constant, time by default. Thus, if + you do use this so called UTF-8 build, you should avoid using indices to + access the strings whenever possible and use the iterators instead. As an + example, traversing the string using iterators is an O(N), where N is the + string length, operation in both the normal ("wchar_t") and UTF-8 builds + but doing it using indices becomes O(N^2) in UTF-8 case meaning that simply + checking every character of a reasonably long (e.g. a couple of millions + elements) string can take an unreasonably long time. + + However, if you do use iterators, UTF-8 build can be a better choice than + the default build, especially for the memory-constrained embedded systems. + Notice also that GTK+ and DirectFB use UTF-8 internally, so using this + build not only saves memory for ASCII strings but also avoids conversions + between wxWidgets and the underlying toolkit. @section string_index Index of the member groups @@ -497,9 +737,12 @@ public: /** Converts the string to an ASCII, 7-bit string in the form of a wxCharBuffer (Unicode builds only) or a C string (ANSI builds). - Note that this conversion only works if the string contains only ASCII - characters. The @ref mb_str() "mb_str" method provides more - powerful means of converting wxString to C string. + + Note that this conversion is only lossless if the string contains only + ASCII characters as all the non-ASCII ones are replaced with the @c '_' + (underscore) character. + + Use mb_str() or utf8_str() to convert to other encodings. */ const char* ToAscii() const; @@ -1795,7 +2038,7 @@ public: you can do: @code if (wxStringCheck(myString)) - ... // the entire string contains oly digits! + ... // the entire string contains only digits! else ... // at least one character of myString is not a digit @endcode