Commit | Line | Data |
---|---|---|
a660d684 KB |
1 | \section{wxString overview}\label{wxstringoverview} |
2 | ||
9e2be6f0 | 3 | Classes: \helpref{wxString}{wxstring}, \helpref{wxArrayString}{wxarraystring}, \helpref{wxStringTokenizer}{wxstringtokenizer} |
a660d684 | 4 | |
99f09bc1 VZ |
5 | \subsection{Introduction} |
6 | ||
532372a3 JS |
7 | wxString is a class which represents a character string of arbitrary length (limited by |
8 | {\it MAX\_INT} which is usually 2147483647 on 32 bit machines) and containing | |
9 | arbitrary characters. The ASCII NUL character is allowed, although care should be | |
10 | taken when passing strings containing it to other functions. | |
99f09bc1 | 11 | |
2b5f62a0 VZ |
12 | wxString works with both ASCII (traditional, 7 or 8 bit, characters) as well as |
13 | Unicode (wide characters) strings. | |
99f09bc1 | 14 | |
532372a3 | 15 | This class has all the standard operations you can expect to find in a string class: |
f6bcfd97 | 16 | dynamic memory management (string extends to accommodate new characters), |
99f09bc1 | 17 | construction from other strings, C strings and characters, assignment operators, |
532372a3 | 18 | access to individual characters, string concatenation and comparison, substring |
99f09bc1 VZ |
19 | extraction, case conversion, trimming and padding (with spaces), searching and |
20 | replacing and both C-like \helpref{Printf()}{wxstringprintf} and stream-like | |
532372a3 JS |
21 | insertion functions as well as much more - see \helpref{wxString}{wxstring} |
22 | for a list of all functions. | |
99f09bc1 VZ |
23 | |
24 | \subsection{Comparison of wxString to other string classes} | |
25 | ||
26 | The advantages of using a special string class instead of working directly with | |
532372a3 JS |
27 | C strings are so obvious that there is a huge number of such classes available. |
28 | The most important advantage is the need to always | |
ed93168b VZ |
29 | remember to allocate/free memory for C strings; working with fixed size buffers almost |
30 | inevitably leads to buffer overflows. At last, C++ has a standard string class | |
31 | (std::string). So why the need for wxString? | |
99f09bc1 VZ |
32 | |
33 | There are several advantages: | |
34 | ||
35 | \begin{enumerate}\itemsep=0pt | |
40b480c3 | 36 | \item {\bf Efficiency} This class was made to be as efficient as possible: both |
532372a3 JS |
37 | in terms of size (each wxString objects takes exactly the same space as a {\it |
38 | char *} pointer, sing \helpref{reference counting}{wxstringrefcount}) and speed. | |
99f09bc1 VZ |
39 | It also provides performance \helpref{statistics gathering code}{wxstringtuning} |
40 | which may be enabled to fine tune the memory allocation strategy for your | |
40b480c3 JS |
41 | particular application - and the gain might be quite big. |
42 | \item {\bf Compatibility} This class tries to combine almost full compatibility | |
99f09bc1 | 43 | with the old wxWindows 1.xx wxString class, some reminiscence to MFC CString |
532372a3 | 44 | class and 90\% of the functionality of std::string class. |
40b480c3 | 45 | \item {\bf Rich set of functions} Some of the functions present in wxString are |
99f09bc1 VZ |
46 | very useful but don't exist in most of other string classes: for example, |
47 | \helpref{AfterFirst}{wxstringafterfirst}, | |
fd34e3a5 | 48 | \helpref{BeforeLast}{wxstringbeforelast}, \helpref{operator<<}{wxstringoperatorout} |
99f09bc1 | 49 | or \helpref{Printf}{wxstringprintf}. Of course, all the standard string |
40b480c3 | 50 | operations are supported as well. |
2b5f62a0 VZ |
51 | \item {\bf Unicode} wxString is Unicode friendly: it allows to easily convert |
52 | to and from ANSI and Unicode strings in any build mode (see the | |
53 | \helpref{Unicode overview}{unicode} for more details) and maps to either | |
54 | {\tt string} or {\tt wstring} transparently depending on the current mode. | |
40b480c3 | 55 | \item {\bf Used by wxWindows} And, of course, this class is used everywhere |
99f09bc1 VZ |
56 | inside wxWindows so there is no performance loss which would result from |
57 | conversions of objects of any other string class (including std::string) to | |
40b480c3 | 58 | wxString internally by wxWindows. |
99f09bc1 VZ |
59 | \end{enumerate} |
60 | ||
61 | However, there are several problems as well. The most important one is probably | |
62 | that there are often several functions to do exactly the same thing: for | |
63 | example, to get the length of the string either one of | |
f6bcfd97 BP |
64 | length(), \helpref{Len()}{wxstringlen} or |
65 | \helpref{Length()}{wxstringlength} may be used. The first function, as almost | |
99f09bc1 VZ |
66 | all the other functions in lowercase, is std::string compatible. The second one |
67 | is "native" wxString version and the last one is wxWindows 1.xx way. So the | |
68 | question is: which one is better to use? And the answer is that: | |
69 | ||
70 | {\bf The usage of std::string compatible functions is strongly advised!} It will | |
71 | both make your code more familiar to other C++ programmers (who are supposed to | |
72 | have knowledge of std::string but not of wxString), let you reuse the same code | |
73 | in both wxWindows and other programs (by just typedefing wxString as std::string | |
74 | when used outside wxWindows) and by staying compatible with future versions of | |
75 | wxWindows which will probably start using std::string sooner or later too. | |
76 | ||
f6bcfd97 | 77 | In the situations where there is no corresponding std::string function, please |
99f09bc1 | 78 | try to use the new wxString methods and not the old wxWindows 1.xx variants |
532372a3 | 79 | which are deprecated and may disappear in future versions. |
99f09bc1 | 80 | |
40b480c3 | 81 | \subsection{Some advice about using wxString}\label{wxstringadvices} |
99f09bc1 | 82 | |
40b480c3 | 83 | Probably the main trap with using this class is the implicit conversion operator to |
99f09bc1 | 84 | {\it const char *}. It is advised that you use \helpref{c\_str()}{wxstringcstr} |
532372a3 | 85 | instead to clearly indicate when the conversion is done. Specifically, the |
99f09bc1 VZ |
86 | danger of this implicit conversion may be seen in the following code fragment: |
87 | ||
88 | \begin{verbatim} | |
99f09bc1 VZ |
89 | // this function converts the input string to uppercase, output it to the screen |
90 | // and returns the result | |
91 | const char *SayHELLO(const wxString& input) | |
92 | { | |
93 | wxString output = input.Upper(); | |
94 | ||
95 | printf("Hello, %s!\n", output); | |
96 | ||
97 | return output; | |
98 | } | |
99f09bc1 VZ |
99 | \end{verbatim} |
100 | ||
40b480c3 | 101 | There are two nasty bugs in these three lines. First of them is in the call to the |
99f09bc1 | 102 | {\it printf()} function. Although the implicit conversion to C strings is applied |
40b480c3 | 103 | automatically by the compiler in the case of |
99f09bc1 VZ |
104 | |
105 | \begin{verbatim} | |
106 | puts(output); | |
107 | \end{verbatim} | |
108 | ||
40b480c3 JS |
109 | because the argument of {\it puts()} is known to be of the type {\it const char *}, |
110 | this is {\bf not} done for {\it printf()} which is a function with variable | |
99f09bc1 VZ |
111 | number of arguments (and whose arguments are of unknown types). So this call may |
112 | do anything at all (including displaying the correct string on screen), although | |
113 | the most likely result is a program crash. The solution is to use | |
114 | \helpref{c\_str()}{wxstringcstr}: just replace this line with | |
115 | ||
116 | \begin{verbatim} | |
117 | printf("Hello, %s!\n", output.c_str()); | |
118 | \end{verbatim} | |
119 | ||
120 | The second bug is that returning {\it output} doesn't work. The implicit cast is | |
121 | used again, so the code compiles, but as it returns a pointer to a buffer | |
122 | belonging to a local variable which is deleted as soon as the function exits, | |
123 | its contents is totally arbitrary. The solution to this problem is also easy: | |
532372a3 | 124 | just make the function return wxString instead of a C string. |
99f09bc1 VZ |
125 | |
126 | This leads us to the following general advice: all functions taking string | |
127 | arguments should take {\it const wxString\&} (this makes assignment to the | |
128 | strings inside the function faster because of | |
129 | \helpref{reference counting}{wxstringrefcount}) and all functions returning | |
130 | strings should return {\it wxString} - this makes it safe to return local | |
131 | variables. | |
132 | ||
133 | \subsection{Other string related functions and classes} | |
134 | ||
7ae8ee14 VZ |
135 | As most programs use character strings, the standard C library provides quite |
136 | a few functions to work with them. Unfortunately, some of them have rather | |
137 | counter-intuitive behaviour (like strncpy() which doesn't always terminate the | |
138 | resulting string with a NULL) and are in general not very safe (passing NULL | |
139 | to them will probably lead to program crash). Moreover, some very useful | |
140 | functions are not standard at all. This is why in addition to all wxString | |
141 | functions, there are also a few global string functions which try to correct | |
142 | these problems: \helpref{wxIsEmpty()}{wxisempty} verifies whether the string | |
f47658bb | 143 | is empty (returning {\tt TRUE} for {\tt NULL} pointers), |
7ae8ee14 VZ |
144 | \helpref{wxStrlen()}{wxstrlen} also handles NULLs correctly and returns 0 for |
145 | them and \helpref{wxStricmp()}{wxstricmp} is just a platform-independent | |
146 | version of case-insensitive string comparison function known either as | |
147 | stricmp() or strcasecmp() on different platforms. | |
99f09bc1 | 148 | |
378b05f7 VZ |
149 | The {\tt <wx/string.h>} header also defines \helpref{wxSnprintf}{wxsnprintf} |
150 | and \helpref{wxVsnprintf}{wxvsnprintf} functions which should be used instead | |
151 | of the inherently dangerous standard {\tt sprintf()} and which use {\tt | |
152 | snprintf()} instead which does buffer size checks whenever possible. Of | |
153 | course, you may also use \helpref{wxString::Printf}{wxstringprintf} which is | |
154 | also safe. | |
155 | ||
99f09bc1 VZ |
156 | There is another class which might be useful when working with wxString: |
157 | \helpref{wxStringTokenizer}{wxstringtokenizer}. It is helpful when a string must | |
40b480c3 | 158 | be broken into tokens and replaces the standard C library {\it |
99f09bc1 VZ |
159 | strtok()} function. |
160 | ||
9e2be6f0 | 161 | And the very last string-related class is \helpref{wxArrayString}{wxarraystring}: it |
40b480c3 | 162 | is just a version of the "template" dynamic array class which is specialized to work |
532372a3 JS |
163 | with strings. Please note that this class is specially optimized (using its |
164 | knowledge of the internal structure of wxString) for storing strings and so it is | |
165 | vastly better from a performance point of view than a wxObjectArray of wxStrings. | |
99f09bc1 VZ |
166 | |
167 | \subsection{Reference counting and why you shouldn't care about it}\label{wxstringrefcount} | |
168 | ||
169 | wxString objects use a technique known as {\it copy on write} (COW). This means | |
170 | that when a string is assigned to another, no copying really takes place: only | |
532372a3 | 171 | the reference count on the shared string data is incremented and both strings |
99f09bc1 VZ |
172 | share the same data. |
173 | ||
174 | But as soon as one of the two (or more) strings is modified, the data has to be | |
175 | copied because the changes to one of the strings shouldn't be seen in the | |
f6bcfd97 | 176 | others. As data copying only happens when the string is written to, this is |
99f09bc1 VZ |
177 | known as COW. |
178 | ||
179 | What is important to understand is that all this happens absolutely | |
180 | transparently to the class users and that whether a string is shared or not is | |
181 | not seen from the outside of the class - in any case, the result of any | |
182 | operation on it is the same. | |
183 | ||
184 | Probably the unique case when you might want to think about reference | |
185 | counting is when a string character is taken from a string which is not a | |
186 | constant (or a constant reference). In this case, due to C++ rules, the | |
187 | "read-only" {\it operator[]} (which is the same as | |
ed93168b | 188 | \helpref{GetChar()}{wxstringgetchar}) cannot be chosen and the "read/write" |
99f09bc1 VZ |
189 | {\it operator[]} (the same as |
190 | \helpref{GetWritableChar()}{wxstringgetwritablechar}) is used instead. As the | |
191 | call to this operator may modify the string, its data is unshared (COW is done) | |
192 | and so if the string was really shared there is some performance loss (both in | |
193 | terms of speed and memory consumption). In the rare cases when this may be | |
194 | important, you might prefer using \helpref{GetChar()}{wxstringgetchar} instead | |
532372a3 JS |
195 | of the array subscript operator for this reasons. Please note that |
196 | \helpref{at()}{wxstringat} method has the same problem as the subscript operator in | |
99f09bc1 VZ |
197 | this situation and so using it is not really better. Also note that if all |
198 | string arguments to your functions are passed as {\it const wxString\&} (see the | |
40b480c3 | 199 | section \helpref{Some advice}{wxstringadvices}) this situation will almost |
99f09bc1 VZ |
200 | never arise because for constant references the correct operator is called automatically. |
201 | ||
202 | \subsection{Tuning wxString for your application}\label{wxstringtuning} | |
203 | ||
204 | \normalbox{{\bf Note:} this section is strictly about performance issues and is | |
205 | absolutely not necessary to read for using wxString class. Please skip it unless | |
206 | you feel familiar with profilers and relative tools. If you do read it, please | |
207 | also read the preceding section about | |
ed93168b | 208 | \helpref{reference counting}{wxstringrefcount}.} |
99f09bc1 VZ |
209 | |
210 | For the performance reasons wxString doesn't allocate exactly the amount of | |
211 | memory needed for each string. Instead, it adds a small amount of space to each | |
532372a3 | 212 | allocated block which allows it to not reallocate memory (a relatively |
99f09bc1 VZ |
213 | expensive operation) too often as when, for example, a string is constructed by |
214 | subsequently adding one character at a time to it, as for example in: | |
215 | ||
216 | \begin{verbatim} | |
99f09bc1 VZ |
217 | // delete all vowels from the string |
218 | wxString DeleteAllVowels(const wxString& original) | |
219 | { | |
220 | wxString result; | |
221 | ||
222 | size_t len = original.length(); | |
223 | for ( size_t n = 0; n < len; n++ ) | |
224 | { | |
225 | if ( strchr("aeuio", tolower(original[n])) == NULL ) | |
226 | result += original[n]; | |
227 | } | |
228 | ||
229 | return result; | |
230 | } | |
99f09bc1 VZ |
231 | \end{verbatim} |
232 | ||
40b480c3 | 233 | This is quite a common situation and not allocating extra memory at all would |
99f09bc1 VZ |
234 | lead to very bad performance in this case because there would be as many memory |
235 | (re)allocations as there are consonants in the original string. Allocating too | |
236 | much extra memory would help to improve the speed in this situation, but due to | |
237 | a great number of wxString objects typically used in a program would also | |
238 | increase the memory consumption too much. | |
239 | ||
240 | The very best solution in precisely this case would be to use | |
241 | \helpref{Alloc()}{wxstringalloc} function to preallocate, for example, len bytes | |
242 | from the beginning - this will lead to exactly one memory allocation being | |
243 | performed (because the result is at most as long as the original string). | |
244 | ||
245 | However, using Alloc() is tedious and so wxString tries to do its best. The | |
246 | default algorithm assumes that memory allocation is done in granularity of at | |
247 | least 16 bytes (which is the case on almost all of wide-spread platforms) and so | |
248 | nothing is lost if the amount of memory to allocate is rounded up to the next | |
249 | multiple of 16. Like this, no memory is lost and 15 iterations from 16 in the | |
250 | example above won't allocate memory but use the already allocated pool. | |
251 | ||
252 | The default approach is quite conservative. Allocating more memory may bring | |
253 | important performance benefits for programs using (relatively) few very long | |
254 | strings. The amount of memory allocated is configured by the setting of {\it | |
255 | EXTRA\_ALLOC} in the file string.cpp during compilation (be sure to understand | |
256 | why its default value is what it is before modifying it!). You may try setting | |
257 | it to greater amount (say twice nLen) or to 0 (to see performance degradation | |
258 | which will follow) and analyse the impact of it on your program. If you do it, | |
259 | you will probably find it helpful to also define WXSTRING\_STATISTICS symbol | |
260 | which tells the wxString class to collect performance statistics and to show | |
261 | them on stderr on program termination. This will show you the average length of | |
262 | strings your program manipulates, their average initial length and also the | |
263 | percent of times when memory wasn't reallocated when string concatenation was | |
f6bcfd97 | 264 | done but the already preallocated memory was used (this value should be about |
99f09bc1 VZ |
265 | 98\% for the default allocation policy, if it is less than 90\% you should |
266 | really consider fine tuning wxString for your application). | |
267 | ||
268 | It goes without saying that a profiler should be used to measure the precise | |
269 | difference the change to EXTRA\_ALLOC makes to your program. | |
bd0df01f | 270 |