| 1 | \section{wxString overview}\label{wxstringoverview} |
| 2 | |
| 3 | Classes: \helpref{wxString}{wxstring}, \helpref{wxArrayString}{wxarraystring}, \helpref{wxStringTokenizer}{wxstringtokenizer} |
| 4 | |
| 5 | \subsection{Introduction}\label{introductiontowxstring} |
| 6 | |
| 7 | wxString is a class which represents a character string of arbitrary length (limited by |
| 8 | {\it MAX\_INT} which is usually 2147483647 on 32 bit machines) and containing |
| 9 | arbitrary characters. The ASCII NUL character is allowed, but be aware that |
| 10 | in the current string implementation some methods might not work correctly |
| 11 | in this case. |
| 12 | |
| 13 | wxString works with both ASCII (traditional, 7 or 8 bit, characters) as well as |
| 14 | Unicode (wide characters) strings. |
| 15 | |
| 16 | This class has all the standard operations you can expect to find in a string class: |
| 17 | dynamic memory management (string extends to accommodate new characters), |
| 18 | construction from other strings, C strings and characters, assignment operators, |
| 19 | access to individual characters, string concatenation and comparison, substring |
| 20 | extraction, case conversion, trimming and padding (with spaces), searching and |
| 21 | replacing and both C-like \helpref{Printf()}{wxstringprintf} and stream-like |
| 22 | insertion functions as well as much more - see \helpref{wxString}{wxstring} |
| 23 | for a list of all functions. |
| 24 | |
| 25 | \subsection{Comparison of wxString to other string classes}\label{otherstringclasses} |
| 26 | |
| 27 | The advantages of using a special string class instead of working directly with |
| 28 | C strings are so obvious that there is a huge number of such classes available. |
| 29 | The most important advantage is the need to always |
| 30 | remember to allocate/free memory for C strings; working with fixed size buffers almost |
| 31 | inevitably leads to buffer overflows. At last, C++ has a standard string class |
| 32 | (std::string). So why the need for wxString? |
| 33 | |
| 34 | There are several advantages: |
| 35 | |
| 36 | \begin{enumerate}\itemsep=0pt |
| 37 | \item {\bf Efficiency} This class was made to be as efficient as possible: both |
| 38 | in terms of size (each wxString objects takes exactly the same space as a {\it |
| 39 | char *} pointer, sing \helpref{reference counting}{wxstringrefcount}) and speed. |
| 40 | It also provides performance \helpref{statistics gathering code}{wxstringtuning} |
| 41 | which may be enabled to fine tune the memory allocation strategy for your |
| 42 | particular application - and the gain might be quite big. |
| 43 | \item {\bf Compatibility} This class tries to combine almost full compatibility |
| 44 | with the old wxWidgets 1.xx wxString class, some reminiscence to MFC CString |
| 45 | class and 90\% of the functionality of std::string class. |
| 46 | \item {\bf Rich set of functions} Some of the functions present in wxString are |
| 47 | very useful but don't exist in most of other string classes: for example, |
| 48 | \helpref{AfterFirst}{wxstringafterfirst}, |
| 49 | \helpref{BeforeLast}{wxstringbeforelast}, \helpref{operator<<}{wxstringoperatorout} |
| 50 | or \helpref{Printf}{wxstringprintf}. Of course, all the standard string |
| 51 | operations are supported as well. |
| 52 | \item {\bf Unicode} wxString is Unicode friendly: it allows to easily convert |
| 53 | to and from ANSI and Unicode strings in any build mode (see the |
| 54 | \helpref{Unicode overview}{unicode} for more details) and maps to either |
| 55 | {\tt string} or {\tt wstring} transparently depending on the current mode. |
| 56 | \item {\bf Used by wxWidgets} And, of course, this class is used everywhere |
| 57 | inside wxWidgets so there is no performance loss which would result from |
| 58 | conversions of objects of any other string class (including std::string) to |
| 59 | wxString internally by wxWidgets. |
| 60 | \end{enumerate} |
| 61 | |
| 62 | However, there are several problems as well. The most important one is probably |
| 63 | that there are often several functions to do exactly the same thing: for |
| 64 | example, to get the length of the string either one of |
| 65 | length(), \helpref{Len()}{wxstringlen} or |
| 66 | \helpref{Length()}{wxstringlength} may be used. The first function, as almost |
| 67 | all the other functions in lowercase, is std::string compatible. The second one |
| 68 | is "native" wxString version and the last one is wxWidgets 1.xx way. So the |
| 69 | question is: which one is better to use? And the answer is that: |
| 70 | |
| 71 | {\bf The usage of std::string compatible functions is strongly advised!} It will |
| 72 | both make your code more familiar to other C++ programmers (who are supposed to |
| 73 | have knowledge of std::string but not of wxString), let you reuse the same code |
| 74 | in both wxWidgets and other programs (by just typedefing wxString as std::string |
| 75 | when used outside wxWidgets) and by staying compatible with future versions of |
| 76 | wxWidgets which will probably start using std::string sooner or later too. |
| 77 | |
| 78 | In the situations where there is no corresponding std::string function, please |
| 79 | try to use the new wxString methods and not the old wxWidgets 1.xx variants |
| 80 | which are deprecated and may disappear in future versions. |
| 81 | |
| 82 | \subsection{Some advice about using wxString}\label{wxstringadvices} |
| 83 | |
| 84 | Probably the main trap with using this class is the implicit conversion operator to |
| 85 | {\it const char *}. It is advised that you use \helpref{c\_str()}{wxstringcstr} |
| 86 | instead to clearly indicate when the conversion is done. Specifically, the |
| 87 | danger of this implicit conversion may be seen in the following code fragment: |
| 88 | |
| 89 | \begin{verbatim} |
| 90 | // this function converts the input string to uppercase, output it to the screen |
| 91 | // and returns the result |
| 92 | const char *SayHELLO(const wxString& input) |
| 93 | { |
| 94 | wxString output = input.Upper(); |
| 95 | |
| 96 | printf("Hello, %s!\n", output); |
| 97 | |
| 98 | return output; |
| 99 | } |
| 100 | \end{verbatim} |
| 101 | |
| 102 | There are two nasty bugs in these three lines. First of them is in the call to the |
| 103 | {\it printf()} function. Although the implicit conversion to C strings is applied |
| 104 | automatically by the compiler in the case of |
| 105 | |
| 106 | \begin{verbatim} |
| 107 | puts(output); |
| 108 | \end{verbatim} |
| 109 | |
| 110 | because the argument of {\it puts()} is known to be of the type {\it const char *}, |
| 111 | this is {\bf not} done for {\it printf()} which is a function with variable |
| 112 | number of arguments (and whose arguments are of unknown types). So this call may |
| 113 | do anything at all (including displaying the correct string on screen), although |
| 114 | the most likely result is a program crash. The solution is to use |
| 115 | \helpref{c\_str()}{wxstringcstr}: just replace this line with |
| 116 | |
| 117 | \begin{verbatim} |
| 118 | printf("Hello, %s!\n", output.c_str()); |
| 119 | \end{verbatim} |
| 120 | |
| 121 | The second bug is that returning {\it output} doesn't work. The implicit cast is |
| 122 | used again, so the code compiles, but as it returns a pointer to a buffer |
| 123 | belonging to a local variable which is deleted as soon as the function exits, |
| 124 | its contents is totally arbitrary. The solution to this problem is also easy: |
| 125 | just make the function return wxString instead of a C string. |
| 126 | |
| 127 | This leads us to the following general advice: all functions taking string |
| 128 | arguments should take {\it const wxString\&} (this makes assignment to the |
| 129 | strings inside the function faster because of |
| 130 | \helpref{reference counting}{wxstringrefcount}) and all functions returning |
| 131 | strings should return {\it wxString} - this makes it safe to return local |
| 132 | variables. |
| 133 | |
| 134 | \subsection{Other string related functions and classes}\label{relatedtostring} |
| 135 | |
| 136 | As most programs use character strings, the standard C library provides quite |
| 137 | a few functions to work with them. Unfortunately, some of them have rather |
| 138 | counter-intuitive behaviour (like strncpy() which doesn't always terminate the |
| 139 | resulting string with a NULL) and are in general not very safe (passing NULL |
| 140 | to them will probably lead to program crash). Moreover, some very useful |
| 141 | functions are not standard at all. This is why in addition to all wxString |
| 142 | functions, there are also a few global string functions which try to correct |
| 143 | these problems: \helpref{wxIsEmpty()}{wxisempty} verifies whether the string |
| 144 | is empty (returning {\tt true} for {\tt NULL} pointers), |
| 145 | \helpref{wxStrlen()}{wxstrlen} also handles NULLs correctly and returns 0 for |
| 146 | them and \helpref{wxStricmp()}{wxstricmp} is just a platform-independent |
| 147 | version of case-insensitive string comparison function known either as |
| 148 | stricmp() or strcasecmp() on different platforms. |
| 149 | |
| 150 | The {\tt <wx/string.h>} header also defines \helpref{wxSnprintf}{wxsnprintf} |
| 151 | and \helpref{wxVsnprintf}{wxvsnprintf} functions which should be used instead |
| 152 | of the inherently dangerous standard {\tt sprintf()} and which use {\tt |
| 153 | snprintf()} instead which does buffer size checks whenever possible. Of |
| 154 | course, you may also use \helpref{wxString::Printf}{wxstringprintf} which is |
| 155 | also safe. |
| 156 | |
| 157 | There is another class which might be useful when working with wxString: |
| 158 | \helpref{wxStringTokenizer}{wxstringtokenizer}. It is helpful when a string must |
| 159 | be broken into tokens and replaces the standard C library {\it |
| 160 | strtok()} function. |
| 161 | |
| 162 | And the very last string-related class is \helpref{wxArrayString}{wxarraystring}: it |
| 163 | is just a version of the "template" dynamic array class which is specialized to work |
| 164 | with strings. Please note that this class is specially optimized (using its |
| 165 | knowledge of the internal structure of wxString) for storing strings and so it is |
| 166 | vastly better from a performance point of view than a wxObjectArray of wxStrings. |
| 167 | |
| 168 | \subsection{Reference counting and why you shouldn't care about it}\label{wxstringrefcount} |
| 169 | |
| 170 | wxString objects use a technique known as {\it copy on write} (COW). This means |
| 171 | that when a string is assigned to another, no copying really takes place: only |
| 172 | the reference count on the shared string data is incremented and both strings |
| 173 | share the same data. |
| 174 | |
| 175 | But as soon as one of the two (or more) strings is modified, the data has to be |
| 176 | copied because the changes to one of the strings shouldn't be seen in the |
| 177 | others. As data copying only happens when the string is written to, this is |
| 178 | known as COW. |
| 179 | |
| 180 | What is important to understand is that all this happens absolutely |
| 181 | transparently to the class users and that whether a string is shared or not is |
| 182 | not seen from the outside of the class - in any case, the result of any |
| 183 | operation on it is the same. |
| 184 | |
| 185 | Probably the unique case when you might want to think about reference |
| 186 | counting is when a string character is taken from a string which is not a |
| 187 | constant (or a constant reference). In this case, due to C++ rules, the |
| 188 | "read-only" {\it operator[]} (which is the same as |
| 189 | \helpref{GetChar()}{wxstringgetchar}) cannot be chosen and the "read/write" |
| 190 | {\it operator[]} (the same as |
| 191 | \helpref{GetWritableChar()}{wxstringgetwritablechar}) is used instead. As the |
| 192 | call to this operator may modify the string, its data is unshared (COW is done) |
| 193 | and so if the string was really shared there is some performance loss (both in |
| 194 | terms of speed and memory consumption). In the rare cases when this may be |
| 195 | important, you might prefer using \helpref{GetChar()}{wxstringgetchar} instead |
| 196 | of the array subscript operator for this reasons. Please note that |
| 197 | \helpref{at()}{wxstringat} method has the same problem as the subscript operator in |
| 198 | this situation and so using it is not really better. Also note that if all |
| 199 | string arguments to your functions are passed as {\it const wxString\&} (see the |
| 200 | section \helpref{Some advice}{wxstringadvices}) this situation will almost |
| 201 | never arise because for constant references the correct operator is called automatically. |
| 202 | |
| 203 | \subsection{Tuning wxString for your application}\label{wxstringtuning} |
| 204 | |
| 205 | \normalbox{{\bf Note:} this section is strictly about performance issues and is |
| 206 | absolutely not necessary to read for using wxString class. Please skip it unless |
| 207 | you feel familiar with profilers and relative tools. If you do read it, please |
| 208 | also read the preceding section about |
| 209 | \helpref{reference counting}{wxstringrefcount}.} |
| 210 | |
| 211 | For the performance reasons wxString doesn't allocate exactly the amount of |
| 212 | memory needed for each string. Instead, it adds a small amount of space to each |
| 213 | allocated block which allows it to not reallocate memory (a relatively |
| 214 | expensive operation) too often as when, for example, a string is constructed by |
| 215 | subsequently adding one character at a time to it, as for example in: |
| 216 | |
| 217 | \begin{verbatim} |
| 218 | // delete all vowels from the string |
| 219 | wxString DeleteAllVowels(const wxString& original) |
| 220 | { |
| 221 | wxString result; |
| 222 | |
| 223 | size_t len = original.length(); |
| 224 | for ( size_t n = 0; n < len; n++ ) |
| 225 | { |
| 226 | if ( strchr("aeuio", tolower(original[n])) == NULL ) |
| 227 | result += original[n]; |
| 228 | } |
| 229 | |
| 230 | return result; |
| 231 | } |
| 232 | \end{verbatim} |
| 233 | |
| 234 | This is quite a common situation and not allocating extra memory at all would |
| 235 | lead to very bad performance in this case because there would be as many memory |
| 236 | (re)allocations as there are consonants in the original string. Allocating too |
| 237 | much extra memory would help to improve the speed in this situation, but due to |
| 238 | a great number of wxString objects typically used in a program would also |
| 239 | increase the memory consumption too much. |
| 240 | |
| 241 | The very best solution in precisely this case would be to use |
| 242 | \helpref{Alloc()}{wxstringalloc} function to preallocate, for example, len bytes |
| 243 | from the beginning - this will lead to exactly one memory allocation being |
| 244 | performed (because the result is at most as long as the original string). |
| 245 | |
| 246 | However, using Alloc() is tedious and so wxString tries to do its best. The |
| 247 | default algorithm assumes that memory allocation is done in granularity of at |
| 248 | least 16 bytes (which is the case on almost all of wide-spread platforms) and so |
| 249 | nothing is lost if the amount of memory to allocate is rounded up to the next |
| 250 | multiple of 16. Like this, no memory is lost and 15 iterations from 16 in the |
| 251 | example above won't allocate memory but use the already allocated pool. |
| 252 | |
| 253 | The default approach is quite conservative. Allocating more memory may bring |
| 254 | important performance benefits for programs using (relatively) few very long |
| 255 | strings. The amount of memory allocated is configured by the setting of {\it |
| 256 | EXTRA\_ALLOC} in the file string.cpp during compilation (be sure to understand |
| 257 | why its default value is what it is before modifying it!). You may try setting |
| 258 | it to greater amount (say twice nLen) or to 0 (to see performance degradation |
| 259 | which will follow) and analyse the impact of it on your program. If you do it, |
| 260 | you will probably find it helpful to also define WXSTRING\_STATISTICS symbol |
| 261 | which tells the wxString class to collect performance statistics and to show |
| 262 | them on stderr on program termination. This will show you the average length of |
| 263 | strings your program manipulates, their average initial length and also the |
| 264 | percent of times when memory wasn't reallocated when string concatenation was |
| 265 | done but the already preallocated memory was used (this value should be about |
| 266 | 98\% for the default allocation policy, if it is less than 90\% you should |
| 267 | really consider fine tuning wxString for your application). |
| 268 | |
| 269 | It goes without saying that a profiler should be used to measure the precise |
| 270 | difference the change to EXTRA\_ALLOC makes to your program. |
| 271 | |