]>
Commit | Line | Data |
---|---|---|
1 | ///////////////////////////////////////////////////////////////////////////// | |
2 | // Name: string.h | |
3 | // Purpose: topic overview | |
4 | // Author: wxWidgets team | |
5 | // RCS-ID: $Id$ | |
6 | // Licence: wxWindows license | |
7 | ///////////////////////////////////////////////////////////////////////////// | |
8 | ||
9 | /*! | |
10 | ||
11 | @page overview_string wxString Overview | |
12 | ||
13 | Classes: wxString, wxArrayString, wxStringTokenizer | |
14 | ||
15 | @li @ref overview_string_intro | |
16 | @li @ref overview_string_comparison | |
17 | @li @ref overview_string_advice | |
18 | @li @ref overview_string_related | |
19 | @li @ref overview_string_refcount | |
20 | @li @ref overview_string_tuning | |
21 | ||
22 | ||
23 | <hr> | |
24 | ||
25 | ||
26 | @section overview_string_intro Introduction | |
27 | ||
28 | wxString is a class which represents a character string of arbitrary length | |
29 | (limited by @c MAX_INT which is usually 2147483647 on 32 bit machines) and | |
30 | containing arbitrary characters. The ASCII NUL character is allowed, but be | |
31 | aware that in the current string implementation some methods might not work | |
32 | correctly in this case. | |
33 | ||
34 | wxString works with both ASCII (traditional, 7 or 8 bit, characters) as well as | |
35 | Unicode (wide characters) strings. | |
36 | ||
37 | This class has all the standard operations you can expect to find in a string | |
38 | class: dynamic memory management (string extends to accommodate new | |
39 | characters), construction from other strings, C strings and characters, | |
40 | assignment operators, access to individual characters, string concatenation and | |
41 | comparison, substring extraction, case conversion, trimming and padding (with | |
42 | spaces), searching and replacing and both C-like @c printf (wxString::Printf) | |
43 | and stream-like insertion functions as well as much more - see wxString for a | |
44 | list of all functions. | |
45 | ||
46 | ||
47 | @section overview_string_comparison Comparison to Other String Classes | |
48 | ||
49 | The advantages of using a special string class instead of working directly with | |
50 | C strings are so obvious that there is a huge number of such classes available. | |
51 | The most important advantage is the need to always remember to allocate/free | |
52 | memory for C strings; working with fixed size buffers almost inevitably leads | |
53 | to buffer overflows. At last, C++ has a standard string class (std::string). So | |
54 | why the need for wxString? There are several advantages: | |
55 | ||
56 | @li <b>Efficiency:</b> This class was made to be as efficient as possible: both in | |
57 | terms of size (each wxString objects takes exactly the same space as a | |
58 | <tt>char*</tt> pointer, see @ref overview_string_refcount | |
59 | "reference counting") and speed. It also provides performance | |
60 | @ref overview_string_tuning "statistics gathering code" which may be | |
61 | enabled to fine tune the memory allocation strategy for your particular | |
62 | application - and the gain might be quite big. | |
63 | @li <b>Compatibility:</b> This class tries to combine almost full compatibility | |
64 | with the old wxWidgets 1.xx wxString class, some reminiscence to MFC | |
65 | CString class and 90% of the functionality of std::string class. | |
66 | @li <b>Rich set of functions:</b> Some of the functions present in wxString are very | |
67 | useful but don't exist in most of other string classes: for example, | |
68 | wxString::AfterFirst, wxString::BeforeLast, wxString::operators or | |
69 | wxString::Printf. Of course, all the standard string operations are | |
70 | supported as well. | |
71 | @li <b>Unicode wxString is Unicode friendly:</b> it allows to easily convert to | |
72 | and from ANSI and Unicode strings in any build mode (see the | |
73 | @ref overview_unicode "unicode overview" for more details) and maps to | |
74 | either @c string or @c wstring transparently depending on the current mode. | |
75 | @li <b>Used by wxWidgets:</b> And, of course, this class is used everywhere | |
76 | inside wxWidgets so there is no performance loss which would result from | |
77 | conversions of objects of any other string class (including std::string) to | |
78 | wxString internally by wxWidgets. | |
79 | ||
80 | However, there are several problems as well. The most important one is probably | |
81 | that there are often several functions to do exactly the same thing: for | |
82 | example, to get the length of the string either one of @c length(), | |
83 | wxString::Len() or wxString::Length() may be used. The first function, as | |
84 | almost all the other functions in lowercase, is std::string compatible. The | |
85 | second one is the "native" wxString version and the last one is the wxWidgets | |
86 | 1.xx way. | |
87 | ||
88 | So which is better to use? The usage of the std::string compatible functions is | |
89 | strongly advised! It will both make your code more familiar to other C++ | |
90 | programmers (who are supposed to have knowledge of std::string but not of | |
91 | wxString), let you reuse the same code in both wxWidgets and other programs (by | |
92 | just typedefing wxString as std::string when used outside wxWidgets) and by | |
93 | staying compatible with future versions of wxWidgets which will probably start | |
94 | using std::string sooner or later too. | |
95 | ||
96 | In the situations where there is no corresponding std::string function, please | |
97 | try to use the new wxString methods and not the old wxWidgets 1.xx variants | |
98 | which are deprecated and may disappear in future versions. | |
99 | ||
100 | ||
101 | @section overview_string_advice Advice About Using wxString | |
102 | ||
103 | Probably the main trap with using this class is the implicit conversion | |
104 | operator to <tt>const char*</tt>. It is advised that you use wxString::c_str() | |
105 | instead to clearly indicate when the conversion is done. Specifically, the | |
106 | danger of this implicit conversion may be seen in the following code fragment: | |
107 | ||
108 | @code | |
109 | // this function converts the input string to uppercase, | |
110 | // output it to the screen and returns the result | |
111 | const char *SayHELLO(const wxString& input) | |
112 | { | |
113 | wxString output = input.Upper(); | |
114 | printf("Hello, %s!\n", output); | |
115 | return output; | |
116 | } | |
117 | @endcode | |
118 | ||
119 | There are two nasty bugs in these three lines. The first is in the call to the | |
120 | @c printf() function. Although the implicit conversion to C strings is applied | |
121 | automatically by the compiler in the case of | |
122 | ||
123 | @code | |
124 | puts(output); | |
125 | @endcode | |
126 | ||
127 | because the argument of @c puts() is known to be of the type | |
128 | <tt>const char*</tt>, this is @b not done for @c printf() which is a function | |
129 | with variable number of arguments (and whose arguments are of unknown types). | |
130 | So this call may do any number of things (including displaying the correct | |
131 | string on screen), although the most likely result is a program crash. The | |
132 | solution is to use wxString::c_str(). Just replace this line with this: | |
133 | ||
134 | @code | |
135 | printf("Hello, %s!\n", output.c_str()); | |
136 | @endcode | |
137 | ||
138 | The second bug is that returning @c output doesn't work. The implicit cast is | |
139 | used again, so the code compiles, but as it returns a pointer to a buffer | |
140 | belonging to a local variable which is deleted as soon as the function exits, | |
141 | its contents are completely arbitrary. The solution to this problem is also | |
142 | easy, just make the function return wxString instead of a C string. | |
143 | ||
144 | This leads us to the following general advice: all functions taking string | |
145 | arguments should take <tt>const wxString</tt> (this makes assignment to the | |
146 | strings inside the function faster because of | |
147 | @ref overview_string_refcount "reference counting") and all functions returning | |
148 | strings should return wxString - this makes it safe to return local variables. | |
149 | ||
150 | ||
151 | @section overview_string_related String Related Functions and Classes | |
152 | ||
153 | As most programs use character strings, the standard C library provides quite | |
154 | a few functions to work with them. Unfortunately, some of them have rather | |
155 | counter-intuitive behaviour (like @c strncpy() which doesn't always terminate | |
156 | the resulting string with a @NULL) and are in general not very safe (passing | |
157 | @NULL to them will probably lead to program crash). Moreover, some very useful | |
158 | functions are not standard at all. This is why in addition to all wxString | |
159 | functions, there are also a few global string functions which try to correct | |
160 | these problems: wxIsEmpty() verifies whether the string is empty (returning | |
161 | @true for @NULL pointers), wxStrlen() also handles @NULL correctly and returns | |
162 | 0 for them and wxStricmp() is just a platform-independent version of | |
163 | case-insensitive string comparison function known either as @c stricmp() or | |
164 | @c strcasecmp() on different platforms. | |
165 | ||
166 | The <tt>@<wx/string.h@></tt> header also defines wxSnprintf and wxVsnprintf | |
167 | functions which should be used instead of the inherently dangerous standard | |
168 | @c sprintf() and which use @c snprintf() instead which does buffer size checks | |
169 | whenever possible. Of course, you may also use wxString::Printf which is also | |
170 | safe. | |
171 | ||
172 | There is another class which might be useful when working with wxString: | |
173 | wxStringTokenizer. It is helpful when a string must be broken into tokens and | |
174 | replaces the standard C library @c strtok() function. | |
175 | ||
176 | And the very last string-related class is wxArrayString: it is just a version | |
177 | of the "template" dynamic array class which is specialized to work with | |
178 | strings. Please note that this class is specially optimized (using its | |
179 | knowledge of the internal structure of wxString) for storing strings and so it | |
180 | is vastly better from a performance point of view than a wxObjectArray of | |
181 | wxStrings. | |
182 | ||
183 | ||
184 | @section overview_string_refcount Reference Counting and Why You Shouldn't Care | |
185 | ||
186 | All considerations for wxObject-derived | |
187 | @ref overview_refcount "reference counted" objects are valid also for wxString, | |
188 | even if it does not derive from wxObject. | |
189 | ||
190 | Probably the unique case when you might want to think about reference counting | |
191 | is when a string character is taken from a string which is not a constant (or | |
192 | a constant reference). In this case, due to C++ rules, the "read-only" | |
193 | @c operator[] (which is the same as wxString::GetChar()) cannot be chosen and | |
194 | the "read/write" @c operator[] (the same as wxString::GetWritableChar()) is | |
195 | used instead. As the call to this operator may modify the string, its data is | |
196 | unshared (COW is done) and so if the string was really shared there is some | |
197 | performance loss (both in terms of speed and memory consumption). In the rare | |
198 | cases when this may be important, you might prefer using wxString::GetChar() | |
199 | instead of the array subscript operator for this reasons. Please note that | |
200 | wxString::at() method has the same problem as the subscript operator in this | |
201 | situation and so using it is not really better. Also note that if all string | |
202 | arguments to your functions are passed as <tt>const wxString</tt> (see the | |
203 | @ref overview_string_advice section) this situation will almost never arise | |
204 | because for constant references the correct operator is called automatically. | |
205 | ||
206 | ||
207 | @section overview_string_tuning Tuning wxString for Your Application | |
208 | ||
209 | @note This section is strictly about performance issues and is absolutely not | |
210 | necessary to read for using wxString class. Please skip it unless you feel | |
211 | familiar with profilers and relative tools. If you do read it, please also | |
212 | read the preceding section about | |
213 | @ref overview_string_refcount "reference counting". | |
214 | ||
215 | For the performance reasons wxString doesn't allocate exactly the amount of | |
216 | memory needed for each string. Instead, it adds a small amount of space to each | |
217 | allocated block which allows it to not reallocate memory (a relatively | |
218 | expensive operation) too often as when, for example, a string is constructed by | |
219 | subsequently adding one character at a time to it, as for example in: | |
220 | ||
221 | @code | |
222 | // delete all vowels from the string | |
223 | wxString DeleteAllVowels(const wxString& original) | |
224 | { | |
225 | wxString result; | |
226 | ||
227 | size_t len = original.length(); | |
228 | for ( size_t n = 0; n < len; n++ ) | |
229 | { | |
230 | if ( strchr("aeuio", tolower(original[n])) == NULL ) | |
231 | result += original[n]; | |
232 | } | |
233 | ||
234 | return result; | |
235 | } | |
236 | @endcode | |
237 | ||
238 | This is quite a common situation and not allocating extra memory at all would | |
239 | lead to very bad performance in this case because there would be as many memory | |
240 | (re)allocations as there are consonants in the original string. Allocating too | |
241 | much extra memory would help to improve the speed in this situation, but due to | |
242 | a great number of wxString objects typically used in a program would also | |
243 | increase the memory consumption too much. | |
244 | ||
245 | The very best solution in precisely this case would be to use wxString::Alloc() | |
246 | function to preallocate, for example, len bytes from the beginning - this will | |
247 | lead to exactly one memory allocation being performed (because the result is at | |
248 | most as long as the original string). | |
249 | ||
250 | However, using wxString::Alloc() is tedious and so wxString tries to do its | |
251 | best. The default algorithm assumes that memory allocation is done in | |
252 | granularity of at least 16 bytes (which is the case on almost all of | |
253 | wide-spread platforms) and so nothing is lost if the amount of memory to | |
254 | allocate is rounded up to the next multiple of 16. Like this, no memory is lost | |
255 | and 15 iterations from 16 in the example above won't allocate memory but use | |
256 | the already allocated pool. | |
257 | ||
258 | The default approach is quite conservative. Allocating more memory may bring | |
259 | important performance benefits for programs using (relatively) few very long | |
260 | strings. The amount of memory allocated is configured by the setting of | |
261 | @c EXTRA_ALLOC in the file string.cpp during compilation (be sure to understand | |
262 | why its default value is what it is before modifying it!). You may try setting | |
263 | it to greater amount (say twice nLen) or to 0 (to see performance degradation | |
264 | which will follow) and analyse the impact of it on your program. If you do it, | |
265 | you will probably find it helpful to also define @c WXSTRING_STATISTICS symbol | |
266 | which tells the wxString class to collect performance statistics and to show | |
267 | them on stderr on program termination. This will show you the average length of | |
268 | strings your program manipulates, their average initial length and also the | |
269 | percent of times when memory wasn't reallocated when string concatenation was | |
270 | done but the already preallocated memory was used (this value should be about | |
271 | 98% for the default allocation policy, if it is less than 90% you should | |
272 | really consider fine tuning wxString for your application). | |
273 | ||
274 | It goes without saying that a profiler should be used to measure the precise | |
275 | difference the change to @c EXTRA_ALLOC makes to your program. | |
276 | ||
277 | */ | |
278 |