X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/2839804c3530db6756d2eb4a081a442ad8b2f773..d3fa4bc22e84e3ca4d88cc1772f2d414140a1017:/interface/wx/string.h diff --git a/interface/wx/string.h b/interface/wx/string.h index db5eb2965e..54f6357eb6 100644 --- a/interface/wx/string.h +++ b/interface/wx/string.h @@ -3,41 +3,282 @@ // Purpose: interface of wxStringBuffer, wxString // Author: wxWidgets team // RCS-ID: $Id$ -// Licence: wxWindows license +// Licence: wxWindows licence ///////////////////////////////////////////////////////////////////////////// /** @class wxString - The wxString class has been completely rewritten for wxWidgets 3.0 - and this change was actually the main reason for the calling that - version wxWidgets 3.0. + String class for passing textual data to or receiving it from wxWidgets. + + @note + While the use of wxString is unavoidable in wxWidgets program, you are + encouraged to use the standard string classes @c std::string or @c + std::wstring in your applications and convert them to and from wxString + only when interacting with wxWidgets. + + + wxString is a class representing a Unicode character string but with + methods taking or returning both @c wchar_t wide characters and @c wchar_t* + wide strings and traditional @c char characters and @c char* strings. The + dual nature of wxString API makes it simple to use in all cases and, + importantly, allows the code written for either ANSI or Unicode builds of + the previous wxWidgets versions to compile and work correctly with the + single unified Unicode build of wxWidgets 3.0. It is also mostly + transparent when using wxString with the few exceptions described below. + + + @section string_api API overview + + wxString tries to be similar to both @c std::string and @c std::wstring and + can mostly be used as either class. It provides practically all of the + methods of these classes, which behave exactly the same as in the standard + C++, and so are not documented here (please see any standard library + documentation, for example http://en.cppreference.com/w/cpp/string for more + details). + + In addition to these standard methods, wxString adds functions dealing with + the conversions between different string encodings, described below, as + well as many extra helpers such as functions for formatted output + (Printf(), Format(), ...), case conversion (MakeUpper(), Capitalize(), ...) + and various others (Trim(), StartsWith(), Matches(), ...). All of the + non-standard methods follow wxWidgets "CamelCase" naming convention and are + documented here. + + Notice that some wxString methods exist in several versions for + compatibility reasons. For example all of length(), Length() and Len() are + provided. In such cases it is recommended to use the standard string-like + method, i.e. length() in this case. + + + @section string_conv Converting to and from wxString + + wxString can be created from: + - ASCII string guaranteed to contain only 7 bit characters using + wxString::FromAscii(). + - Narrow @c char* string in the current locale encoding using implicit + wxString::wxString(const char*) constructor. + - Narrow @c char* string in UTF-8 encoding using wxString::FromUTF8(). + - Narrow @c char* string in the given encoding using + wxString::wxString(const char*, const wxMBConv&) constructor passing a + wxCSConv corresponding to the encoding as the second argument. + - Standard @c std::string using implicit wxString::wxString(const + std::string&) constructor. Notice that this constructor supposes that + the string contains data in the current locale encoding, use FromUTF8() + or the constructor taking wxMBConv if this is not the case. + - Wide @c wchar_t* string using implicit wxString::wxString(const + wchar_t*) constructor. + - Standard @c std::wstring using implicit wxString::wxString(const + std::wstring&) constructor. + + Notice that many of the constructors are implicit, meaning that you don't + even need to write them at all to pass the existing string to some + wxWidgets function taking a wxString. + + Similarly, wxString can be converted to: + - ASCII string using wxString::ToAscii(). This is a potentially + destructive operation as all non-ASCII string characters are replaced + with a placeholder character. + - String in the current locale encoding implicitly or using c_str() or + mb_str() methods. This is a potentially destructive operation as an @e + empty string is returned if the conversion fails. + - String in UTF-8 encoding using wxString::utf8_str(). + - String in any given encoding using mb_str() with the appropriate + wxMBConv object. This is also a potentially destructive operation. + - Standard @c std::string using wxString::ToStdString(). The contents + of the returned string use the current locale encoding, so this + conversion is potentially destructive as well. + - Wide C string using wxString::wc_str(). + - Standard @c std::wstring using wxString::ToStdWstring(). + + @note If you built wxWidgets with @c wxUSE_STL set to 1, the implicit + conversions to both narrow and wide C strings are disabled and replaced + with implicit conversions to @c std::string and @c std::wstring. + + Please notice that the conversions marked as "potentially destructive" + above can result in loss of data if their result is not checked, so you + need to verify that converting the contents of a non-empty Unicode string + to a non-UTF-8 multibyte encoding results in non-empty string. The simplest + and best way to ensure that the conversion never fails is to always use + UTF-8. + + + @section string_gotchas Traps for the unwary + + As mentioned above, wxString tries to be compatible with both narrow and + wide standard string classes and mostly does it transparently, but there + are some exceptions. + + @subsection string_gotchas_element String element access + + Some problems are caused by wxString::operator[]() which returns an object + of a special proxy class allowing to assign either a simple @c char or a @c + wchar_t to the given index. Because of this, the return type of this + operator is neither @c char nor @c wchar_t nor a reference to one of these + types but wxUniCharRef which is not a primitive type and hence can't be + used in the @c switch statement. So the following code does @e not compile + @code + wxString s(...); + switch ( s[n] ) { + case 'A': + ... + break; + } + @endcode + and you need to use + @code + switch ( s[n].GetValue() ) { + ... + } + @endcode + instead. Alternatively, you can use an explicit cast: + @code + switch ( static_cast(s[n]) ) { + ... + } + @endcode + but notice that this will result in an assert failure if the character at + the given position is not representable as a single @c char in the current + encoding, so you may want to cast to @c int instead if non-ASCII values can + be used. + + Another consequence of this unusual return type arises when it is used with + template deduction or C++11 @c auto keyword. Unlike with the normal + references which are deduced to be of the referenced type, the deduced type + for wxUniCharRef is wxUniCharRef itself. This results in potentially + unexpected behaviour, for example: + @code + wxString s("abc"); + auto c = s[0]; + c = 'x'; // Modifies the string! + wxASSERT( s == "xbc" ); + @endcode + Due to this, either explicitly specify the variable type: + @code + int c = s[0]; + c = 'x'; // Doesn't modify the string any more. + wxASSERT( s == "abc" ); + @endcode + or explicitly convert the return value: + @code + auto c = s[0].GetValue(); + c = 'x'; // Doesn't modify the string neither. + wxASSERT( s == "abc" ); + @endcode - wxString is a class representing a Unicode character string. - wxString uses @c std::basic_string internally (even if @c wxUSE_STL is not defined) - to store its content (unless this is not supported by the compiler or disabled - specifically when building wxWidgets) and it therefore inherits - many features from @c std::basic_string. (Note that most implementations of - @c std::basic_string are thread-safe and don't use reference counting.) - These @c std::basic_string standard functions are only listed here, but - they are not fully documented in this manual; see the STL documentation - (http://www.cppreference.com/wiki/string/start) for more info. - The behaviour of all these functions is identical to the behaviour - described there. + @subsection string_gotchas_conv Conversion to C string - You may notice that wxString sometimes has several functions which do - the same thing like Length(), Len() and length() which all return the - string length. In all cases of such duplication the @c std::string - compatible methods should be used. + A different class of problems happens due to the dual nature of the return + value of wxString::c_str() method, which is also used for implicit + conversions. The result of calls to this method is convertible to either + narrow @c char* string or wide @c wchar_t* string and so, again, has + neither the former nor the latter type. Usually, the correct type will be + chosen depending on how you use the result but sometimes the compiler can't + choose it because of an ambiguity, e.g.: + @code + // Some non-wxWidgets functions existing for both narrow and wide + // strings: + void dump_text(const char* text); // Version (1) + void dump_text(const wchar_t* text); // Version (2) + + wxString s(...); + dump_text(s); // ERROR: ambiguity. + dump_text(s.c_str()); // ERROR: still ambiguous. + @endcode + In this case you need to explicitly convert to the type that you need to + use or use a different, non-ambiguous, conversion function (which is + usually the best choice): + @code + dump_text(static_cast(s)); // OK, calls (1) + dump_text(static_cast(s.c_str())); // OK, calls (2) + dump_text(s.mb_str()); // OK, calls (1) + dump_text(s.wc_str()); // OK, calls (2) + dump_text(s.wx_str()); // OK, calls ??? + @endcode - For informations about the internal encoding used by wxString and - for important warnings and advices for using it, please read - the @ref overview_string. + @subsection string_vararg Using wxString with vararg functions - Since wxWidgets 3.0 wxString always stores Unicode strings, so you should - be sure to read also @ref overview_unicode. + A special subclass of the problems arising due to the polymorphic nature of + wxString::c_str() result type happens when using functions taking an + arbitrary number of arguments, such as the standard @c printf(). Due to the + rules of the C++ language, the types for the "variable" arguments of such + functions are not specified and hence the compiler cannot convert wxString + objects, or the objects returned by wxString::c_str(), to these unknown + types automatically. Hence neither wxString objects nor the results of most + of the conversion functions can be passed as vararg arguments: + @code + // ALL EXAMPLES HERE DO NOT WORK, DO NOT USE THEM! + printf("Don't do this: %s", s); + printf("Don't do that: %s", s.c_str()); + printf("Nor even this: %s", s.mb_str()); + wprintf("And even not always this: %s", s.wc_str()); + @endcode + Instead you need to either explicitly cast to the needed type: + @code + // These examples work but are not the best solution, see below. + printf("You can do this: %s", static_cast(s)); + printf("Or this: %s", static_cast(s.c_str())); + printf("And this: %s", static_cast(s.mb_str())); + wprintf("Or this: %s", static_cast(s.wc_str())); + @endcode + But a better solution is to use wxWidgets-provided functions, if possible, + as is the case for @c printf family of functions: + @code + // This is the recommended way. + wxPrintf("You can do just this: %s", s); + wxPrintf("And this (but it is redundant): %s", s.c_str()); + wxPrintf("And this (not using Unicode): %s", s.mb_str()); + wxPrintf("And this (always Unicode): %s", s.wc_str()); + @endcode + Notice that wxPrintf() replaces both @c printf() and @c wprintf() and + accepts wxString objects, results of c_str() calls but also @c char* and + @c wchar_t* strings directly. + + wxWidgets provides wx-prefixed equivalents to all the standard vararg + functions and a few more, notably wxString::Format(), wxLogMessage(), + wxLogError() and other log functions. But if you can't use one of those + functions and need to pass wxString objects to non-wx vararg functions, you + need to use the explicit casts as explained above. + + + @section string_performance Performance characteristics + + wxString uses @c std::basic_string internally to store its content (unless + this is not supported by the compiler or disabled specifically when + building wxWidgets) and it therefore inherits many features from @c + std::basic_string. In particular, most modern implementations of @c + std::basic_string are thread-safe and don't use reference counting (making + copying large strings potentially expensive) and so wxString has the same + characteristics. + + By default, wxString uses @c std::basic_string specialized for the + platform-dependent @c wchar_t type, meaning that it is not memory-efficient + for ASCII strings, especially under Unix platforms where every ASCII + character, normally fitting in a byte, is represented by a 4 byte @c + wchar_t. + + It is possible to build wxWidgets with @c wxUSE_UNICODE_UTF8 set to 1 in + which case an UTF-8-encoded string representation is stored in @c + std::basic_string specialized for @c char, i.e. the usual @c std::string. + In this case the memory efficiency problem mentioned above doesn't arise + but run-time performance of many wxString methods changes dramatically, in + particular accessing the N-th character of the string becomes an operation + taking O(N) time instead of O(1), i.e. constant, time by default. Thus, if + you do use this so called UTF-8 build, you should avoid using indices to + access the strings whenever possible and use the iterators instead. As an + example, traversing the string using iterators is an O(N), where N is the + string length, operation in both the normal ("wchar_t") and UTF-8 builds + but doing it using indices becomes O(N^2) in UTF-8 case meaning that simply + checking every character of a reasonably long (e.g. a couple of millions + elements) string can take an unreasonably long time. + + However, if you do use iterators, UTF-8 build can be a better choice than + the default build, especially for the memory-constrained embedded systems. + Notice also that GTK+ and DirectFB use UTF-8 internally, so using this + build not only saves memory for ASCII strings but also avoids conversions + between wxWidgets and the underlying toolkit. @section string_index Index of the member groups @@ -112,6 +353,26 @@ public: */ wxString(const wxString& stringSrc); + /** + Construct a string consisting of @a nRepeat copies of ch. + */ + wxString(wxUniChar ch, size_t nRepeat = 1); + + /** + Construct a string consisting of @a nRepeat copies of ch. + */ + wxString(wxUniCharRef ch, size_t nRepeat = 1); + + /** + Construct a string consisting of @a nRepeat copies of ch + converted to Unicode using the current locale encoding. + */ + wxString(char ch, size_t nRepeat = 1); + + /** + Construct a string consisting of @a nRepeat copies of ch. + */ + wxString(wchar_t ch, size_t nRepeat = 1); /** Constructs a string from the string literal @a psz using @@ -161,11 +422,15 @@ public: /** Constructs a string from @a str using the using the current locale encoding to convert it to Unicode (wxConvLibc). + + @see ToStdString() */ wxString(const std::string& str); /** Constructs a string from @a str. + + @see ToStdWstring() */ wxString(const std::wstring& str); @@ -300,7 +565,7 @@ public: void SetChar(size_t n, wxUniChar ch); /** - Returns a the last character. + Returns the last character. This is a wxWidgets 1.xx compatibility function; you should not use it in new code. @@ -423,7 +688,7 @@ public: const wxScopedCharBuffer utf8_str() const; /** - Converts the strings contents to the wide character represention + Converts the strings contents to the wide character representation and returns it as a temporary wxWCharBuffer object (Unix and OS X) or returns a pointer to the internal string contents in wide character mode (Windows). @@ -468,19 +733,17 @@ public: @see wxString::From8BitData() */ - const char* To8BitData() const; - - /** - @overload - */ - const wxCharBuffer To8BitData() const; + const wxScopedCharBuffer To8BitData() const; /** Converts the string to an ASCII, 7-bit string in the form of a wxCharBuffer (Unicode builds only) or a C string (ANSI builds). - Note that this conversion only works if the string contains only ASCII - characters. The @ref mb_str() "mb_str" method provides more - powerful means of converting wxString to C string. + + Note that this conversion is only lossless if the string contains only + ASCII characters as all the non-ASCII ones are replaced with the @c '_' + (underscore) character. + + Use mb_str() or utf8_str() to convert to other encodings. */ const char* ToAscii() const; @@ -489,6 +752,36 @@ public: */ const wxCharBuffer ToAscii() const; + /** + Return the string as an std::string in current locale encoding. + + Note that if the conversion of (Unicode) string contents to the current + locale fails, the return string will be empty. Be sure to check for + this to avoid silent data loss. + + Instead of using this function it's also possible to write + @code + std::string s; + wxString wxs; + ... + s = std::string(wxs); + @endcode + but using ToStdString() may make the code more clear. + + @since 2.9.1 + */ + std::string ToStdString() const; + + /** + Return the string as an std::wstring. + + Unlike ToStdString(), there is no danger of data loss when using this + function. + + @since 2.9.1 + */ + std::wstring ToStdWstring() const; + /** Same as utf8_str(). */ @@ -564,6 +857,7 @@ public: wxString& operator<<(wchar_t ch); wxString& operator<<(const wxCharBuffer& s); wxString& operator<<(const wxWCharBuffer& s); + wxString& operator<<(wxUniChar ch); wxString& operator<<(wxUniCharRef ch); wxString& operator<<(unsigned int ui); wxString& operator<<(long l); @@ -733,14 +1027,32 @@ public: /** Gets all characters before the first occurrence of @e ch. Returns the whole string if @a ch is not found. + + @param ch The character to look for. + @param rest Filled with the part of the string following the first + occurrence of @a ch or cleared if it was not found. The same string + is returned by AfterFirst() but it is more efficient to use this + output parameter if both the "before" and "after" parts are needed + than calling both functions one after the other. This parameter is + available in wxWidgets version 2.9.2 and later only. + @return Part of the string before the first occurrence of @a ch. */ - wxString BeforeFirst(wxUniChar ch) const; + wxString BeforeFirst(wxUniChar ch, wxString *rest = NULL) const; /** Gets all characters before the last occurrence of @e ch. Returns the empty string if @a ch is not found. + + @param ch The character to look for. + @param rest Filled with the part of the string following the last + occurrence of @a ch or the copy of this string if it was not found. + The same string is returned by AfterLast() but it is more efficient + to use this output parameter if both the "before" and "after" parts + are needed than calling both functions one after the other. This + parameter is available in wxWidgets version 2.9.2 and later only. + @return Part of the string before the last occurrence of @a ch. */ - wxString BeforeLast(wxUniChar ch) const; + wxString BeforeLast(wxUniChar ch, wxString *rest = NULL) const; //@} @@ -884,10 +1196,14 @@ public: @member_group_name{numconv, Conversion to numbers} The string provides functions for conversion to signed and unsigned integer and - floating point numbers. All functions take a pointer to the variable to - put the numeric value in and return @true if the @b entire string could be - converted to a number. - */ + floating point numbers. + + All functions take a pointer to the variable to put the numeric value + in and return @true if the @b entire string could be converted to a + number. Notice if there is a valid number in the beginning of the + string, it is returned in the output parameter even if the function + returns @false because there is more text following it. + */ //@{ /** @@ -895,7 +1211,7 @@ public: Returns @true on success (the number is stored in the location pointed to by @a val) or @false if the string does not represent such number (the value of - @a val is not modified in this case). + @a val may still be modified in this case). Note that unlike ToCDouble() this function uses a localized version of @c wxStrtod() and thus needs as decimal point (and thousands separator) the @@ -903,14 +1219,21 @@ public: you are sure that this string contains a floating point number formatted with the rules of the locale currently in use (see wxLocale). - Refer to the docs of the standard function @c strtod() for more details about - the supported syntax. + Also notice that even this function is locale-specific it does not + support strings with thousands separators in them, even if the current + locale uses digits grouping. You may use wxNumberFormatter::FromString() + to parse such strings. + + Please refer to the documentation of the standard function @c strtod() + for more details about the supported syntax. @see ToCDouble(), ToLong(), ToULong() */ bool ToDouble(double* val) const; /** + Variant of ToDouble() always working in "C" locale. + Works like ToDouble() but unlike it this function expects the floating point number to be formatted always with the rules dictated by the "C" locale (in particular, the decimal point must be a dot), independently from the @@ -925,8 +1248,8 @@ public: Returns @true on success in which case the number is stored in the location pointed to by @a val or @false if the string does not represent a - valid number in the given base (the value of @a val is not modified - in this case). + valid number in the given base (the value of @a val may still be + modified in this case). The value of @a base must be comprised between 2 and 36, inclusive, or be a special value 0 which means that the usual rules of @c C numbers are @@ -941,14 +1264,20 @@ public: that this string contains an integer number formatted with the rules of the locale currently in use (see wxLocale). - Refer to the docs of the standard function @c strtol() for more details about - the supported syntax. + As with ToDouble(), this function does not support strings containing + thousands separators even if the current locale uses digits grouping. + You may use wxNumberFormatter::FromString() to parse such strings. + + Please refer to the documentation of the standard function @c strtol() + for more details about the supported syntax. @see ToCDouble(), ToDouble(), ToULong() */ bool ToLong(long* val, int base = 10) const; /** + Variant of ToLong() always working in "C" locale. + Works like ToLong() but unlike it this function expects the integer number to be formatted always with the rules dictated by the "C" locale, independently from the current application-wide locale (see wxLocale). @@ -973,8 +1302,8 @@ public: Returns @true on success in which case the number is stored in the location pointed to by @a val or @false if the string does not - represent a valid number in the given base (the value of @a val is not - modified in this case). + represent a valid number in the given base (the value of @a val may + still be modified in this case). Please notice that this function behaves in the same way as the standard @c strtoul() and so it simply converts negative numbers to unsigned @@ -988,6 +1317,8 @@ public: bool ToULong(unsigned long* val, int base = 10) const; /** + Variant of ToULong() always working in "C" locale. + Works like ToULong() but unlike it this function expects the integer number to be formatted always with the rules dictated by the "C" locale, independently from the current application-wide locale (see wxLocale). @@ -997,8 +1328,9 @@ public: bool ToCULong(unsigned long* val, int base = 10) const; /** - This is exactly the same as ToULong() but works with 64 - bit integer numbers. + This is exactly the same as ToULong() but works with 64 bit integer + numbers. + Please see ToLongLong() for additional remarks. */ bool ToULongLong(wxULongLong_t* val, int base = 10) const; @@ -1022,6 +1354,16 @@ public: Note that if @c wxUSE_PRINTF_POS_PARAMS is set to 1, then this function supports Unix98-style positional parameters: + @code + wxString str; + + str.Printf(wxT("%d %d %d"), 1, 2, 3); + // str now contains "1 2 3" + + str.Printf(wxT("%2$d %3$d %1$d"), 1, 2, 3); + // str now contains "2 3 1" + @endcode + @note This function will use a safe version of @e vsprintf() (usually called @e vsnprintf()) whenever available to always allocate the buffer of correct size. Unfortunately, this function is not available on all platforms and the @@ -1228,7 +1570,7 @@ public: /** @member_group_name{iter, Iterator interface} - These methods return iterators to the beginnnig or end of the string. + These methods return iterators to the beginning or end of the string. Please see any STL reference (e.g. http://www.cppreference.com/wiki/string/start) for their documentation. @@ -1375,7 +1717,7 @@ public: // STATIC FUNCTIONS - // Keep these functions separed from the other groups or Doxygen gets confused + // Keep these functions separated from the other groups or Doxygen gets confused // ----------------------------------------------------------------------------- /** @@ -1431,6 +1773,44 @@ public: static wxString FromAscii(char c); //@} + /** + Returns a string with the textual representation of the number in C + locale. + + Unlike FromDouble() the string returned by this function always uses + the period character as decimal separator, independently of the current + locale. Otherwise its behaviour is identical to the other function. + + @since 2.9.1 + + @see ToCDouble() + */ + static wxString FromCDouble(double val, int precision = -1); + + /** + Returns a string with the textual representation of the number. + + For the default value of @a precision, this function behaves as a + simple wrapper for @code wxString::Format("%g", val) @endcode. If @a + precision is positive (or zero), the @c %.Nf format is used with the + given precision value. + + Notice that the string returned by this function uses the decimal + separator appropriate for the current locale, e.g. @c "," and not a + period in French locale. Use FromCDouble() if this is unwanted. + + @param val + The value to format. + @param precision + The number of fractional digits to use in or -1 to use the most + appropriate format. This parameter is new in wxWidgets 2.9.2. + + @since 2.9.1 + + @see ToDouble() + */ + static wxString FromDouble(double val, int precision = -1); + //@{ /** Converts C string encoded in UTF-8 to wxString. @@ -1659,7 +2039,7 @@ public: you can do: @code if (wxStringCheck(myString)) - ... // the entire string contains oly digits! + ... // the entire string contains only digits! else ... // at least one character of myString is not a digit @endcode