]>
Commit | Line | Data |
---|---|---|
15b6757b FM |
1 | ///////////////////////////////////////////////////////////////////////////// |
2 | // Name: string | |
3 | // Purpose: topic overview | |
4 | // Author: wxWidgets team | |
5 | // RCS-ID: $Id$ | |
6 | // Licence: wxWindows license | |
7 | ///////////////////////////////////////////////////////////////////////////// | |
8 | ||
9 | /*! | |
36c9828f | 10 | |
15b6757b | 11 | @page string_overview wxString overview |
36c9828f | 12 | |
15b6757b FM |
13 | Classes: #wxString, #wxArrayString, #wxStringTokenizer |
14 | #Introduction | |
15 | @ref otherstringclasses_overview | |
16 | @ref stringadvices_overview | |
17 | @ref relatedtostring_overview | |
18 | @ref stringrefcount_overview | |
19 | @ref stringtuning_overview | |
36c9828f FM |
20 | |
21 | ||
15b6757b | 22 | @section introductiontowxstring Introduction |
36c9828f FM |
23 | |
24 | wxString is a class which represents a character string of arbitrary length (limited by | |
15b6757b FM |
25 | @e MAX_INT which is usually 2147483647 on 32 bit machines) and containing |
26 | arbitrary characters. The ASCII NUL character is allowed, but be aware that | |
27 | in the current string implementation some methods might not work correctly | |
28 | in this case. | |
29 | wxString works with both ASCII (traditional, 7 or 8 bit, characters) as well as | |
30 | Unicode (wide characters) strings. | |
31 | This class has all the standard operations you can expect to find in a string class: | |
32 | dynamic memory management (string extends to accommodate new characters), | |
33 | construction from other strings, C strings and characters, assignment operators, | |
34 | access to individual characters, string concatenation and comparison, substring | |
35 | extraction, case conversion, trimming and padding (with spaces), searching and | |
36 | replacing and both C-like #Printf() and stream-like | |
36c9828f | 37 | insertion functions as well as much more - see #wxString |
15b6757b | 38 | for a list of all functions. |
36c9828f | 39 | |
15b6757b | 40 | @section otherstringclasses Comparison of wxString to other string classes |
36c9828f | 41 | |
15b6757b FM |
42 | The advantages of using a special string class instead of working directly with |
43 | C strings are so obvious that there is a huge number of such classes available. | |
44 | The most important advantage is the need to always | |
45 | remember to allocate/free memory for C strings; working with fixed size buffers almost | |
46 | inevitably leads to buffer overflows. At last, C++ has a standard string class | |
47 | (std::string). So why the need for wxString? | |
48 | There are several advantages: | |
36c9828f FM |
49 | |
50 | ||
15b6757b FM |
51 | @b Efficiency This class was made to be as efficient as possible: both |
52 | in terms of size (each wxString objects takes exactly the same space as a @e char * pointer, sing @ref stringrefcount_overview) and speed. | |
36c9828f | 53 | It also provides performance @ref stringtuning_overview |
15b6757b FM |
54 | which may be enabled to fine tune the memory allocation strategy for your |
55 | particular application - and the gain might be quite big. | |
56 | @b Compatibility This class tries to combine almost full compatibility | |
57 | with the old wxWidgets 1.xx wxString class, some reminiscence to MFC CString | |
58 | class and 90% of the functionality of std::string class. | |
59 | @b Rich set of functions Some of the functions present in wxString are | |
36c9828f FM |
60 | very useful but don't exist in most of other string classes: for example, |
61 | #AfterFirst, | |
62 | #BeforeLast, #operator | |
15b6757b FM |
63 | or #Printf. Of course, all the standard string |
64 | operations are supported as well. | |
65 | @b Unicode wxString is Unicode friendly: it allows to easily convert | |
36c9828f | 66 | to and from ANSI and Unicode strings in any build mode (see the |
15b6757b FM |
67 | @ref unicode_overview for more details) and maps to either |
68 | @c string or @c wstring transparently depending on the current mode. | |
69 | @b Used by wxWidgets And, of course, this class is used everywhere | |
70 | inside wxWidgets so there is no performance loss which would result from | |
71 | conversions of objects of any other string class (including std::string) to | |
72 | wxString internally by wxWidgets. | |
36c9828f FM |
73 | |
74 | ||
15b6757b FM |
75 | However, there are several problems as well. The most important one is probably |
76 | that there are often several functions to do exactly the same thing: for | |
36c9828f FM |
77 | example, to get the length of the string either one of |
78 | length(), #Len() or | |
15b6757b FM |
79 | #Length() may be used. The first function, as almost |
80 | all the other functions in lowercase, is std::string compatible. The second one | |
81 | is "native" wxString version and the last one is wxWidgets 1.xx way. So the | |
82 | question is: which one is better to use? And the answer is that: | |
83 | @b The usage of std::string compatible functions is strongly advised! It will | |
84 | both make your code more familiar to other C++ programmers (who are supposed to | |
85 | have knowledge of std::string but not of wxString), let you reuse the same code | |
86 | in both wxWidgets and other programs (by just typedefing wxString as std::string | |
87 | when used outside wxWidgets) and by staying compatible with future versions of | |
88 | wxWidgets which will probably start using std::string sooner or later too. | |
89 | In the situations where there is no corresponding std::string function, please | |
90 | try to use the new wxString methods and not the old wxWidgets 1.xx variants | |
91 | which are deprecated and may disappear in future versions. | |
36c9828f | 92 | |
15b6757b | 93 | @section wxstringadvices Some advice about using wxString |
36c9828f FM |
94 | |
95 | Probably the main trap with using this class is the implicit conversion operator to | |
15b6757b FM |
96 | @e const char *. It is advised that you use #c_str() |
97 | instead to clearly indicate when the conversion is done. Specifically, the | |
98 | danger of this implicit conversion may be seen in the following code fragment: | |
36c9828f | 99 | |
15b6757b FM |
100 | @code |
101 | // this function converts the input string to uppercase, output it to the screen | |
102 | // and returns the result | |
103 | const char *SayHELLO(const wxString& input) | |
104 | { | |
105 | wxString output = input.Upper(); | |
36c9828f | 106 | |
15b6757b | 107 | printf("Hello, %s!\n", output); |
36c9828f | 108 | |
15b6757b FM |
109 | return output; |
110 | } | |
111 | @endcode | |
36c9828f FM |
112 | |
113 | There are two nasty bugs in these three lines. First of them is in the call to the | |
15b6757b FM |
114 | @e printf() function. Although the implicit conversion to C strings is applied |
115 | automatically by the compiler in the case of | |
36c9828f | 116 | |
15b6757b FM |
117 | @code |
118 | puts(output); | |
119 | @endcode | |
36c9828f | 120 | |
15b6757b FM |
121 | because the argument of @e puts() is known to be of the type @e const char *, |
122 | this is @b not done for @e printf() which is a function with variable | |
123 | number of arguments (and whose arguments are of unknown types). So this call may | |
124 | do anything at all (including displaying the correct string on screen), although | |
36c9828f | 125 | the most likely result is a program crash. The solution is to use |
15b6757b | 126 | #c_str(): just replace this line with |
36c9828f | 127 | |
15b6757b FM |
128 | @code |
129 | printf("Hello, %s!\n", output.c_str()); | |
130 | @endcode | |
36c9828f | 131 | |
15b6757b FM |
132 | The second bug is that returning @e output doesn't work. The implicit cast is |
133 | used again, so the code compiles, but as it returns a pointer to a buffer | |
134 | belonging to a local variable which is deleted as soon as the function exits, | |
135 | its contents is totally arbitrary. The solution to this problem is also easy: | |
136 | just make the function return wxString instead of a C string. | |
137 | This leads us to the following general advice: all functions taking string | |
138 | arguments should take @e const wxString (this makes assignment to the | |
36c9828f | 139 | strings inside the function faster because of |
15b6757b FM |
140 | @ref stringrefcount_overview) and all functions returning |
141 | strings should return @e wxString - this makes it safe to return local | |
142 | variables. | |
36c9828f | 143 | |
15b6757b | 144 | @section relatedtostring Other string related functions and classes |
36c9828f | 145 | |
15b6757b FM |
146 | As most programs use character strings, the standard C library provides quite |
147 | a few functions to work with them. Unfortunately, some of them have rather | |
148 | counter-intuitive behaviour (like strncpy() which doesn't always terminate the | |
149 | resulting string with a @NULL) and are in general not very safe (passing @NULL | |
150 | to them will probably lead to program crash). Moreover, some very useful | |
151 | functions are not standard at all. This is why in addition to all wxString | |
152 | functions, there are also a few global string functions which try to correct | |
153 | these problems: #wxIsEmpty() verifies whether the string | |
36c9828f | 154 | is empty (returning @true for @NULL pointers), |
15b6757b FM |
155 | #wxStrlen() also handles @NULLs correctly and returns 0 for |
156 | them and #wxStricmp() is just a platform-independent | |
157 | version of case-insensitive string comparison function known either as | |
158 | stricmp() or strcasecmp() on different platforms. | |
36c9828f | 159 | The @c wx/string.h header also defines #wxSnprintf |
15b6757b FM |
160 | and #wxVsnprintf functions which should be used instead |
161 | of the inherently dangerous standard @c sprintf() and which use @c snprintf() instead which does buffer size checks whenever possible. Of | |
162 | course, you may also use wxString::Printf which is | |
163 | also safe. | |
36c9828f | 164 | There is another class which might be useful when working with wxString: |
15b6757b FM |
165 | #wxStringTokenizer. It is helpful when a string must |
166 | be broken into tokens and replaces the standard C library @e strtok() function. | |
167 | And the very last string-related class is #wxArrayString: it | |
168 | is just a version of the "template" dynamic array class which is specialized to work | |
169 | with strings. Please note that this class is specially optimized (using its | |
170 | knowledge of the internal structure of wxString) for storing strings and so it is | |
171 | vastly better from a performance point of view than a wxObjectArray of wxStrings. | |
36c9828f | 172 | |
15b6757b | 173 | @section wxstringrefcount Reference counting and why you shouldn't care about it |
36c9828f | 174 | |
15b6757b FM |
175 | All considerations for wxObject-derived @ref trefcount_overview objects |
176 | are valid also for wxString, even if it does not derive from wxObject. | |
177 | Probably the unique case when you might want to think about reference | |
178 | counting is when a string character is taken from a string which is not a | |
179 | constant (or a constant reference). In this case, due to C++ rules, the | |
36c9828f FM |
180 | "read-only" @e operator[] (which is the same as |
181 | #GetChar()) cannot be chosen and the "read/write" | |
182 | @e operator[] (the same as | |
15b6757b FM |
183 | #GetWritableChar()) is used instead. As the |
184 | call to this operator may modify the string, its data is unshared (COW is done) | |
185 | and so if the string was really shared there is some performance loss (both in | |
186 | terms of speed and memory consumption). In the rare cases when this may be | |
187 | important, you might prefer using #GetChar() instead | |
36c9828f | 188 | of the array subscript operator for this reasons. Please note that |
15b6757b FM |
189 | #at() method has the same problem as the subscript operator in |
190 | this situation and so using it is not really better. Also note that if all | |
191 | string arguments to your functions are passed as @e const wxString (see the | |
192 | section @ref stringadvices_overview) this situation will almost | |
193 | never arise because for constant references the correct operator is called automatically. | |
36c9828f | 194 | |
15b6757b | 195 | @section wxstringtuning Tuning wxString for your application |
36c9828f FM |
196 | |
197 | ||
15b6757b FM |
198 | @b Note: this section is strictly about performance issues and is |
199 | absolutely not necessary to read for using wxString class. Please skip it unless | |
200 | you feel familiar with profilers and relative tools. If you do read it, please | |
36c9828f | 201 | also read the preceding section about |
15b6757b | 202 | @ref stringrefcount_overview. |
36c9828f | 203 | |
15b6757b FM |
204 | For the performance reasons wxString doesn't allocate exactly the amount of |
205 | memory needed for each string. Instead, it adds a small amount of space to each | |
206 | allocated block which allows it to not reallocate memory (a relatively | |
207 | expensive operation) too often as when, for example, a string is constructed by | |
208 | subsequently adding one character at a time to it, as for example in: | |
36c9828f | 209 | |
15b6757b FM |
210 | @code |
211 | // delete all vowels from the string | |
212 | wxString DeleteAllVowels(const wxString& original) | |
213 | { | |
214 | wxString result; | |
36c9828f | 215 | |
15b6757b FM |
216 | size_t len = original.length(); |
217 | for ( size_t n = 0; n len; n++ ) | |
218 | { | |
219 | if ( strchr("aeuio", tolower(original[n])) == @NULL ) | |
220 | result += original[n]; | |
221 | } | |
36c9828f | 222 | |
15b6757b FM |
223 | return result; |
224 | } | |
225 | @endcode | |
36c9828f | 226 | |
15b6757b FM |
227 | This is quite a common situation and not allocating extra memory at all would |
228 | lead to very bad performance in this case because there would be as many memory | |
229 | (re)allocations as there are consonants in the original string. Allocating too | |
230 | much extra memory would help to improve the speed in this situation, but due to | |
231 | a great number of wxString objects typically used in a program would also | |
232 | increase the memory consumption too much. | |
36c9828f | 233 | The very best solution in precisely this case would be to use |
15b6757b FM |
234 | #Alloc() function to preallocate, for example, len bytes |
235 | from the beginning - this will lead to exactly one memory allocation being | |
236 | performed (because the result is at most as long as the original string). | |
237 | However, using Alloc() is tedious and so wxString tries to do its best. The | |
238 | default algorithm assumes that memory allocation is done in granularity of at | |
239 | least 16 bytes (which is the case on almost all of wide-spread platforms) and so | |
240 | nothing is lost if the amount of memory to allocate is rounded up to the next | |
241 | multiple of 16. Like this, no memory is lost and 15 iterations from 16 in the | |
242 | example above won't allocate memory but use the already allocated pool. | |
243 | The default approach is quite conservative. Allocating more memory may bring | |
244 | important performance benefits for programs using (relatively) few very long | |
245 | strings. The amount of memory allocated is configured by the setting of @e EXTRA_ALLOC in the file string.cpp during compilation (be sure to understand | |
246 | why its default value is what it is before modifying it!). You may try setting | |
247 | it to greater amount (say twice nLen) or to 0 (to see performance degradation | |
248 | which will follow) and analyse the impact of it on your program. If you do it, | |
249 | you will probably find it helpful to also define WXSTRING_STATISTICS symbol | |
250 | which tells the wxString class to collect performance statistics and to show | |
251 | them on stderr on program termination. This will show you the average length of | |
252 | strings your program manipulates, their average initial length and also the | |
253 | percent of times when memory wasn't reallocated when string concatenation was | |
254 | done but the already preallocated memory was used (this value should be about | |
255 | 98% for the default allocation policy, if it is less than 90% you should | |
256 | really consider fine tuning wxString for your application). | |
257 | It goes without saying that a profiler should be used to measure the precise | |
258 | difference the change to EXTRA_ALLOC makes to your program. | |
36c9828f | 259 | |
15b6757b | 260 | */ |
36c9828f FM |
261 | |
262 |