X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/bd0df01f3f59940fdbca7a3472c256be2d034ab9..5c5428f9132822d5d39b02c46dd89464adff9f1f:/docs/latex/wx/tstring.tex diff --git a/docs/latex/wx/tstring.tex b/docs/latex/wx/tstring.tex index 41b70d34bf..dce111c7cb 100644 --- a/docs/latex/wx/tstring.tex +++ b/docs/latex/wx/tstring.tex @@ -1,6 +1,270 @@ \section{wxString overview}\label{wxstringoverview} -Classes: \helpref{wxString}{wxstring} +Classes: \helpref{wxString}{wxstring}, \helpref{wxArrayString}{wxarraystring}, \helpref{wxStringTokenizer}{wxstringtokenizer} -TODO. +\subsection{Introduction} + +wxString is a class which represents a character string of arbitrary length (limited by +{\it MAX\_INT} which is usually 2147483647 on 32 bit machines) and containing +arbitrary characters. The ASCII NUL character is allowed, although care should be +taken when passing strings containing it to other functions. + +wxString works with both ASCII (traditional, 7 or 8 bit, characters) as well as +Unicode (wide characters) strings. + +This class has all the standard operations you can expect to find in a string class: +dynamic memory management (string extends to accommodate new characters), +construction from other strings, C strings and characters, assignment operators, +access to individual characters, string concatenation and comparison, substring +extraction, case conversion, trimming and padding (with spaces), searching and +replacing and both C-like \helpref{Printf()}{wxstringprintf} and stream-like +insertion functions as well as much more - see \helpref{wxString}{wxstring} +for a list of all functions. + +\subsection{Comparison of wxString to other string classes} + +The advantages of using a special string class instead of working directly with +C strings are so obvious that there is a huge number of such classes available. +The most important advantage is the need to always +remember to allocate/free memory for C strings; working with fixed size buffers almost +inevitably leads to buffer overflows. At last, C++ has a standard string class +(std::string). So why the need for wxString? + +There are several advantages: + +\begin{enumerate}\itemsep=0pt +\item {\bf Efficiency} This class was made to be as efficient as possible: both +in terms of size (each wxString objects takes exactly the same space as a {\it +char *} pointer, sing \helpref{reference counting}{wxstringrefcount}) and speed. +It also provides performance \helpref{statistics gathering code}{wxstringtuning} +which may be enabled to fine tune the memory allocation strategy for your +particular application - and the gain might be quite big. +\item {\bf Compatibility} This class tries to combine almost full compatibility +with the old wxWindows 1.xx wxString class, some reminiscence to MFC CString +class and 90\% of the functionality of std::string class. +\item {\bf Rich set of functions} Some of the functions present in wxString are +very useful but don't exist in most of other string classes: for example, +\helpref{AfterFirst}{wxstringafterfirst}, +\helpref{BeforeLast}{wxstringbeforelast}, \helpref{operator<<}{wxstringoperatorout} +or \helpref{Printf}{wxstringprintf}. Of course, all the standard string +operations are supported as well. +\item {\bf Unicode} wxString is Unicode friendly: it allows to easily convert +to and from ANSI and Unicode strings in any build mode (see the +\helpref{Unicode overview}{unicode} for more details) and maps to either +{\tt string} or {\tt wstring} transparently depending on the current mode. +\item {\bf Used by wxWindows} And, of course, this class is used everywhere +inside wxWindows so there is no performance loss which would result from +conversions of objects of any other string class (including std::string) to +wxString internally by wxWindows. +\end{enumerate} + +However, there are several problems as well. The most important one is probably +that there are often several functions to do exactly the same thing: for +example, to get the length of the string either one of +length(), \helpref{Len()}{wxstringlen} or +\helpref{Length()}{wxstringlength} may be used. The first function, as almost +all the other functions in lowercase, is std::string compatible. The second one +is "native" wxString version and the last one is wxWindows 1.xx way. So the +question is: which one is better to use? And the answer is that: + +{\bf The usage of std::string compatible functions is strongly advised!} It will +both make your code more familiar to other C++ programmers (who are supposed to +have knowledge of std::string but not of wxString), let you reuse the same code +in both wxWindows and other programs (by just typedefing wxString as std::string +when used outside wxWindows) and by staying compatible with future versions of +wxWindows which will probably start using std::string sooner or later too. + +In the situations where there is no corresponding std::string function, please +try to use the new wxString methods and not the old wxWindows 1.xx variants +which are deprecated and may disappear in future versions. + +\subsection{Some advice about using wxString}\label{wxstringadvices} + +Probably the main trap with using this class is the implicit conversion operator to +{\it const char *}. It is advised that you use \helpref{c\_str()}{wxstringcstr} +instead to clearly indicate when the conversion is done. Specifically, the +danger of this implicit conversion may be seen in the following code fragment: + +\begin{verbatim} +// this function converts the input string to uppercase, output it to the screen +// and returns the result +const char *SayHELLO(const wxString& input) +{ + wxString output = input.Upper(); + + printf("Hello, %s!\n", output); + + return output; +} +\end{verbatim} + +There are two nasty bugs in these three lines. First of them is in the call to the +{\it printf()} function. Although the implicit conversion to C strings is applied +automatically by the compiler in the case of + +\begin{verbatim} + puts(output); +\end{verbatim} + +because the argument of {\it puts()} is known to be of the type {\it const char *}, +this is {\bf not} done for {\it printf()} which is a function with variable +number of arguments (and whose arguments are of unknown types). So this call may +do anything at all (including displaying the correct string on screen), although +the most likely result is a program crash. The solution is to use +\helpref{c\_str()}{wxstringcstr}: just replace this line with + +\begin{verbatim} + printf("Hello, %s!\n", output.c_str()); +\end{verbatim} + +The second bug is that returning {\it output} doesn't work. The implicit cast is +used again, so the code compiles, but as it returns a pointer to a buffer +belonging to a local variable which is deleted as soon as the function exits, +its contents is totally arbitrary. The solution to this problem is also easy: +just make the function return wxString instead of a C string. + +This leads us to the following general advice: all functions taking string +arguments should take {\it const wxString\&} (this makes assignment to the +strings inside the function faster because of +\helpref{reference counting}{wxstringrefcount}) and all functions returning +strings should return {\it wxString} - this makes it safe to return local +variables. + +\subsection{Other string related functions and classes} + +As most programs use character strings, the standard C library provides quite +a few functions to work with them. Unfortunately, some of them have rather +counter-intuitive behaviour (like strncpy() which doesn't always terminate the +resulting string with a NULL) and are in general not very safe (passing NULL +to them will probably lead to program crash). Moreover, some very useful +functions are not standard at all. This is why in addition to all wxString +functions, there are also a few global string functions which try to correct +these problems: \helpref{wxIsEmpty()}{wxisempty} verifies whether the string +is empty (returning {\tt true} for {\tt NULL} pointers), +\helpref{wxStrlen()}{wxstrlen} also handles NULLs correctly and returns 0 for +them and \helpref{wxStricmp()}{wxstricmp} is just a platform-independent +version of case-insensitive string comparison function known either as +stricmp() or strcasecmp() on different platforms. + +The {\tt } header also defines \helpref{wxSnprintf}{wxsnprintf} +and \helpref{wxVsnprintf}{wxvsnprintf} functions which should be used instead +of the inherently dangerous standard {\tt sprintf()} and which use {\tt +snprintf()} instead which does buffer size checks whenever possible. Of +course, you may also use \helpref{wxString::Printf}{wxstringprintf} which is +also safe. + +There is another class which might be useful when working with wxString: +\helpref{wxStringTokenizer}{wxstringtokenizer}. It is helpful when a string must +be broken into tokens and replaces the standard C library {\it +strtok()} function. + +And the very last string-related class is \helpref{wxArrayString}{wxarraystring}: it +is just a version of the "template" dynamic array class which is specialized to work +with strings. Please note that this class is specially optimized (using its +knowledge of the internal structure of wxString) for storing strings and so it is +vastly better from a performance point of view than a wxObjectArray of wxStrings. + +\subsection{Reference counting and why you shouldn't care about it}\label{wxstringrefcount} + +wxString objects use a technique known as {\it copy on write} (COW). This means +that when a string is assigned to another, no copying really takes place: only +the reference count on the shared string data is incremented and both strings +share the same data. + +But as soon as one of the two (or more) strings is modified, the data has to be +copied because the changes to one of the strings shouldn't be seen in the +others. As data copying only happens when the string is written to, this is +known as COW. + +What is important to understand is that all this happens absolutely +transparently to the class users and that whether a string is shared or not is +not seen from the outside of the class - in any case, the result of any +operation on it is the same. + +Probably the unique case when you might want to think about reference +counting is when a string character is taken from a string which is not a +constant (or a constant reference). In this case, due to C++ rules, the +"read-only" {\it operator[]} (which is the same as +\helpref{GetChar()}{wxstringgetchar}) cannot be chosen and the "read/write" +{\it operator[]} (the same as +\helpref{GetWritableChar()}{wxstringgetwritablechar}) is used instead. As the +call to this operator may modify the string, its data is unshared (COW is done) +and so if the string was really shared there is some performance loss (both in +terms of speed and memory consumption). In the rare cases when this may be +important, you might prefer using \helpref{GetChar()}{wxstringgetchar} instead +of the array subscript operator for this reasons. Please note that +\helpref{at()}{wxstringat} method has the same problem as the subscript operator in +this situation and so using it is not really better. Also note that if all +string arguments to your functions are passed as {\it const wxString\&} (see the +section \helpref{Some advice}{wxstringadvices}) this situation will almost +never arise because for constant references the correct operator is called automatically. + +\subsection{Tuning wxString for your application}\label{wxstringtuning} + +\normalbox{{\bf Note:} this section is strictly about performance issues and is +absolutely not necessary to read for using wxString class. Please skip it unless +you feel familiar with profilers and relative tools. If you do read it, please +also read the preceding section about +\helpref{reference counting}{wxstringrefcount}.} + +For the performance reasons wxString doesn't allocate exactly the amount of +memory needed for each string. Instead, it adds a small amount of space to each +allocated block which allows it to not reallocate memory (a relatively +expensive operation) too often as when, for example, a string is constructed by +subsequently adding one character at a time to it, as for example in: + +\begin{verbatim} +// delete all vowels from the string +wxString DeleteAllVowels(const wxString& original) +{ + wxString result; + + size_t len = original.length(); + for ( size_t n = 0; n < len; n++ ) + { + if ( strchr("aeuio", tolower(original[n])) == NULL ) + result += original[n]; + } + + return result; +} +\end{verbatim} + +This is quite a common situation and not allocating extra memory at all would +lead to very bad performance in this case because there would be as many memory +(re)allocations as there are consonants in the original string. Allocating too +much extra memory would help to improve the speed in this situation, but due to +a great number of wxString objects typically used in a program would also +increase the memory consumption too much. + +The very best solution in precisely this case would be to use +\helpref{Alloc()}{wxstringalloc} function to preallocate, for example, len bytes +from the beginning - this will lead to exactly one memory allocation being +performed (because the result is at most as long as the original string). + +However, using Alloc() is tedious and so wxString tries to do its best. The +default algorithm assumes that memory allocation is done in granularity of at +least 16 bytes (which is the case on almost all of wide-spread platforms) and so +nothing is lost if the amount of memory to allocate is rounded up to the next +multiple of 16. Like this, no memory is lost and 15 iterations from 16 in the +example above won't allocate memory but use the already allocated pool. + +The default approach is quite conservative. Allocating more memory may bring +important performance benefits for programs using (relatively) few very long +strings. The amount of memory allocated is configured by the setting of {\it +EXTRA\_ALLOC} in the file string.cpp during compilation (be sure to understand +why its default value is what it is before modifying it!). You may try setting +it to greater amount (say twice nLen) or to 0 (to see performance degradation +which will follow) and analyse the impact of it on your program. If you do it, +you will probably find it helpful to also define WXSTRING\_STATISTICS symbol +which tells the wxString class to collect performance statistics and to show +them on stderr on program termination. This will show you the average length of +strings your program manipulates, their average initial length and also the +percent of times when memory wasn't reallocated when string concatenation was +done but the already preallocated memory was used (this value should be about +98\% for the default allocation policy, if it is less than 90\% you should +really consider fine tuning wxString for your application). + +It goes without saying that a profiler should be used to measure the precise +difference the change to EXTRA\_ALLOC makes to your program.