]> git.saurik.com Git - wxWidgets.git/blob - docs/doxygen/overviews/string.h
don't use $0$ and $1$, Doxygen doesn't have math mode
[wxWidgets.git] / docs / doxygen / overviews / string.h
1 /////////////////////////////////////////////////////////////////////////////
2 // Name: string.h
3 // Purpose: topic overview
4 // Author: wxWidgets team
5 // RCS-ID: $Id$
6 // Licence: wxWindows license
7 /////////////////////////////////////////////////////////////////////////////
8
9 /**
10
11 @page overview_string wxString Overview
12
13 Classes: wxString, wxArrayString, wxStringTokenizer
14
15 @li @ref overview_string_intro
16 @li @ref overview_string_comparison
17 @li @ref overview_string_advice
18 @li @ref overview_string_related
19 @li @ref overview_string_refcount
20 @li @ref overview_string_tuning
21
22
23 <hr>
24
25
26 @section overview_string_intro Introduction
27
28 wxString is a class which represents a character string of arbitrary length and
29 containing arbitrary characters. The ASCII NUL character is allowed, but be
30 aware that in the current string implementation some methods might not work
31 correctly in this case.
32
33 Since wxWidgets 3.0 wxString internally uses UCS-2 (basically 2-byte per
34 character wchar_t) under Windows and UTF-8 under Unix, Linux and
35 OS X to store its content. Much work has been done to make
36 existing code using ANSI string literals work as before.
37
38 This class has all the standard operations you can expect to find in a string
39 class: dynamic memory management (string extends to accommodate new
40 characters), construction from other strings, C strings, wide character C strings
41 and characters, assignment operators, access to individual characters, string
42 concatenation and comparison, substring extraction, case conversion, trimming and padding (with
43 spaces), searching and replacing and both C-like @c printf (wxString::Printf)
44 and stream-like insertion functions as well as much more - see wxString for a
45 list of all functions.
46
47
48 @section overview_string_comparison Comparison to Other String Classes
49
50 The advantages of using a special string class instead of working directly with
51 C strings are so obvious that there is a huge number of such classes available.
52 The most important advantage is the need to always remember to allocate/free
53 memory for C strings; working with fixed size buffers almost inevitably leads
54 to buffer overflows. At last, C++ has a standard string class (std::string). So
55 why the need for wxString? There are several advantages:
56
57 @li <b>Efficiency:</b> This class was made to be as efficient as possible: both in
58 terms of size (each wxString objects takes exactly the same space as a
59 <tt>char*</tt> pointer, see @ref overview_string_refcount
60 "reference counting") and speed. It also provides performance
61 @ref overview_string_tuning "statistics gathering code" which may be
62 enabled to fine tune the memory allocation strategy for your particular
63 application - and the gain might be quite big.
64 @li <b>Compatibility:</b> This class tries to combine almost full compatibility
65 with the old wxWidgets 1.xx wxString class, some reminiscence to MFC
66 CString class and 90% of the functionality of std::string class.
67 @li <b>Rich set of functions:</b> Some of the functions present in wxString are very
68 useful but don't exist in most of other string classes: for example,
69 wxString::AfterFirst, wxString::BeforeLast, wxString::operators or
70 wxString::Printf. Of course, all the standard string operations are
71 supported as well.
72 @li <b>Unicode wxString is Unicode friendly:</b> it allows to easily convert to
73 and from ANSI and Unicode strings in any build mode (see the
74 @ref overview_unicode "unicode overview" for more details) and maps to
75 either @c string or @c wstring transparently depending on the current mode.
76 @li <b>Used by wxWidgets:</b> And, of course, this class is used everywhere
77 inside wxWidgets so there is no performance loss which would result from
78 conversions of objects of any other string class (including std::string) to
79 wxString internally by wxWidgets.
80
81 However, there are several problems as well. The most important one is probably
82 that there are often several functions to do exactly the same thing: for
83 example, to get the length of the string either one of @c length(),
84 wxString::Len() or wxString::Length() may be used. The first function, as
85 almost all the other functions in lowercase, is std::string compatible. The
86 second one is the "native" wxString version and the last one is the wxWidgets
87 1.xx way.
88
89 So which is better to use? The usage of the std::string compatible functions is
90 strongly advised! It will both make your code more familiar to other C++
91 programmers (who are supposed to have knowledge of std::string but not of
92 wxString), let you reuse the same code in both wxWidgets and other programs (by
93 just typedefing wxString as std::string when used outside wxWidgets) and by
94 staying compatible with future versions of wxWidgets which will probably start
95 using std::string sooner or later too.
96
97 In the situations where there is no corresponding std::string function, please
98 try to use the new wxString methods and not the old wxWidgets 1.xx variants
99 which are deprecated and may disappear in future versions.
100
101
102 @section overview_string_advice Advice About Using wxString
103
104 Probably the main trap with using this class is the implicit conversion
105 operator to <tt>const char*</tt>. It is advised that you use wxString::c_str()
106 instead to clearly indicate when the conversion is done. Specifically, the
107 danger of this implicit conversion may be seen in the following code fragment:
108
109 @code
110 // this function converts the input string to uppercase,
111 // output it to the screen and returns the result
112 const char *SayHELLO(const wxString& input)
113 {
114 wxString output = input.Upper();
115 printf("Hello, %s!\n", output);
116 return output;
117 }
118 @endcode
119
120 There are two nasty bugs in these three lines. The first is in the call to the
121 @c printf() function. Although the implicit conversion to C strings is applied
122 automatically by the compiler in the case of
123
124 @code
125 puts(output);
126 @endcode
127
128 because the argument of @c puts() is known to be of the type
129 <tt>const char*</tt>, this is @b not done for @c printf() which is a function
130 with variable number of arguments (and whose arguments are of unknown types).
131 So this call may do any number of things (including displaying the correct
132 string on screen), although the most likely result is a program crash. The
133 solution is to use wxString::c_str(). Just replace this line with this:
134
135 @code
136 printf("Hello, %s!\n", output.c_str());
137 @endcode
138
139 The second bug is that returning @c output doesn't work. The implicit cast is
140 used again, so the code compiles, but as it returns a pointer to a buffer
141 belonging to a local variable which is deleted as soon as the function exits,
142 its contents are completely arbitrary. The solution to this problem is also
143 easy, just make the function return wxString instead of a C string.
144
145 This leads us to the following general advice: all functions taking string
146 arguments should take <tt>const wxString</tt> (this makes assignment to the
147 strings inside the function faster because of
148 @ref overview_string_refcount "reference counting") and all functions returning
149 strings should return wxString - this makes it safe to return local variables.
150
151
152 @section overview_string_related String Related Functions and Classes
153
154 As most programs use character strings, the standard C library provides quite
155 a few functions to work with them. Unfortunately, some of them have rather
156 counter-intuitive behaviour (like @c strncpy() which doesn't always terminate
157 the resulting string with a @NULL) and are in general not very safe (passing
158 @NULL to them will probably lead to program crash). Moreover, some very useful
159 functions are not standard at all. This is why in addition to all wxString
160 functions, there are also a few global string functions which try to correct
161 these problems: wxIsEmpty() verifies whether the string is empty (returning
162 @true for @NULL pointers), wxStrlen() also handles @NULL correctly and returns
163 0 for them and wxStricmp() is just a platform-independent version of
164 case-insensitive string comparison function known either as @c stricmp() or
165 @c strcasecmp() on different platforms.
166
167 The <tt>@<wx/string.h@></tt> header also defines wxSnprintf and wxVsnprintf
168 functions which should be used instead of the inherently dangerous standard
169 @c sprintf() and which use @c snprintf() instead which does buffer size checks
170 whenever possible. Of course, you may also use wxString::Printf which is also
171 safe.
172
173 There is another class which might be useful when working with wxString:
174 wxStringTokenizer. It is helpful when a string must be broken into tokens and
175 replaces the standard C library @c strtok() function.
176
177 And the very last string-related class is wxArrayString: it is just a version
178 of the "template" dynamic array class which is specialized to work with
179 strings. Please note that this class is specially optimized (using its
180 knowledge of the internal structure of wxString) for storing strings and so it
181 is vastly better from a performance point of view than a wxObjectArray of
182 wxStrings.
183
184
185 @section overview_string_refcount Reference Counting and Why You Shouldn't Care
186
187 All considerations for wxObject-derived
188 @ref overview_refcount "reference counted" objects are valid also for wxString,
189 even if it does not derive from wxObject.
190
191 Probably the unique case when you might want to think about reference counting
192 is when a string character is taken from a string which is not a constant (or
193 a constant reference). In this case, due to C++ rules, the "read-only"
194 @c operator[] (which is the same as wxString::GetChar()) cannot be chosen and
195 the "read/write" @c operator[] (the same as wxString::GetWritableChar()) is
196 used instead. As the call to this operator may modify the string, its data is
197 unshared (COW is done) and so if the string was really shared there is some
198 performance loss (both in terms of speed and memory consumption). In the rare
199 cases when this may be important, you might prefer using wxString::GetChar()
200 instead of the array subscript operator for this reasons. Please note that
201 wxString::at() method has the same problem as the subscript operator in this
202 situation and so using it is not really better. Also note that if all string
203 arguments to your functions are passed as <tt>const wxString</tt> (see the
204 @ref overview_string_advice section) this situation will almost never arise
205 because for constant references the correct operator is called automatically.
206
207
208 @section overview_string_tuning Tuning wxString for Your Application
209
210 @note This section is strictly about performance issues and is absolutely not
211 necessary to read for using wxString class. Please skip it unless you feel
212 familiar with profilers and relative tools. If you do read it, please also
213 read the preceding section about
214 @ref overview_string_refcount "reference counting".
215
216 For the performance reasons wxString doesn't allocate exactly the amount of
217 memory needed for each string. Instead, it adds a small amount of space to each
218 allocated block which allows it to not reallocate memory (a relatively
219 expensive operation) too often as when, for example, a string is constructed by
220 subsequently adding one character at a time to it, as for example in:
221
222 @code
223 // delete all vowels from the string
224 wxString DeleteAllVowels(const wxString& original)
225 {
226 wxString result;
227
228 size_t len = original.length();
229 for ( size_t n = 0; n < len; n++ )
230 {
231 if ( strchr("aeuio", tolower(original[n])) == NULL )
232 result += original[n];
233 }
234
235 return result;
236 }
237 @endcode
238
239 This is quite a common situation and not allocating extra memory at all would
240 lead to very bad performance in this case because there would be as many memory
241 (re)allocations as there are consonants in the original string. Allocating too
242 much extra memory would help to improve the speed in this situation, but due to
243 a great number of wxString objects typically used in a program would also
244 increase the memory consumption too much.
245
246 The very best solution in precisely this case would be to use wxString::Alloc()
247 function to preallocate, for example, len bytes from the beginning - this will
248 lead to exactly one memory allocation being performed (because the result is at
249 most as long as the original string).
250
251 However, using wxString::Alloc() is tedious and so wxString tries to do its
252 best. The default algorithm assumes that memory allocation is done in
253 granularity of at least 16 bytes (which is the case on almost all of
254 wide-spread platforms) and so nothing is lost if the amount of memory to
255 allocate is rounded up to the next multiple of 16. Like this, no memory is lost
256 and 15 iterations from 16 in the example above won't allocate memory but use
257 the already allocated pool.
258
259 The default approach is quite conservative. Allocating more memory may bring
260 important performance benefits for programs using (relatively) few very long
261 strings. The amount of memory allocated is configured by the setting of
262 @c EXTRA_ALLOC in the file string.cpp during compilation (be sure to understand
263 why its default value is what it is before modifying it!). You may try setting
264 it to greater amount (say twice nLen) or to 0 (to see performance degradation
265 which will follow) and analyse the impact of it on your program. If you do it,
266 you will probably find it helpful to also define @c WXSTRING_STATISTICS symbol
267 which tells the wxString class to collect performance statistics and to show
268 them on stderr on program termination. This will show you the average length of
269 strings your program manipulates, their average initial length and also the
270 percent of times when memory wasn't reallocated when string concatenation was
271 done but the already preallocated memory was used (this value should be about
272 98% for the default allocation policy, if it is less than 90% you should
273 really consider fine tuning wxString for your application).
274
275 It goes without saying that a profiler should be used to measure the precise
276 difference the change to @c EXTRA_ALLOC makes to your program.
277
278 */
279