X-Git-Url: https://git.saurik.com/wxWidgets.git/blobdiff_plain/36c9828f702fb504b07968703bcd82f04196070a..fbfe2d4e7caa118e1b609151bc72e7e5c7ac0f32:/docs/doxygen/overviews/string.h diff --git a/docs/doxygen/overviews/string.h b/docs/doxygen/overviews/string.h index a8f67d014a..42247d28d4 100644 --- a/docs/doxygen/overviews/string.h +++ b/docs/doxygen/overviews/string.h @@ -1,262 +1,248 @@ ///////////////////////////////////////////////////////////////////////////// -// Name: string +// Name: string.h // Purpose: topic overview // Author: wxWidgets team // RCS-ID: $Id$ // Licence: wxWindows license ///////////////////////////////////////////////////////////////////////////// -/*! - - @page string_overview wxString overview - - Classes: #wxString, #wxArrayString, #wxStringTokenizer - #Introduction - @ref otherstringclasses_overview - @ref stringadvices_overview - @ref relatedtostring_overview - @ref stringrefcount_overview - @ref stringtuning_overview - - - @section introductiontowxstring Introduction - - wxString is a class which represents a character string of arbitrary length (limited by - @e MAX_INT which is usually 2147483647 on 32 bit machines) and containing - arbitrary characters. The ASCII NUL character is allowed, but be aware that - in the current string implementation some methods might not work correctly - in this case. - wxString works with both ASCII (traditional, 7 or 8 bit, characters) as well as - Unicode (wide characters) strings. - This class has all the standard operations you can expect to find in a string class: - dynamic memory management (string extends to accommodate new characters), - construction from other strings, C strings and characters, assignment operators, - access to individual characters, string concatenation and comparison, substring - extraction, case conversion, trimming and padding (with spaces), searching and - replacing and both C-like #Printf() and stream-like - insertion functions as well as much more - see #wxString - for a list of all functions. - - @section otherstringclasses Comparison of wxString to other string classes - - The advantages of using a special string class instead of working directly with - C strings are so obvious that there is a huge number of such classes available. - The most important advantage is the need to always - remember to allocate/free memory for C strings; working with fixed size buffers almost - inevitably leads to buffer overflows. At last, C++ has a standard string class - (std::string). So why the need for wxString? - There are several advantages: - - - @b Efficiency This class was made to be as efficient as possible: both - in terms of size (each wxString objects takes exactly the same space as a @e char * pointer, sing @ref stringrefcount_overview) and speed. - It also provides performance @ref stringtuning_overview - which may be enabled to fine tune the memory allocation strategy for your - particular application - and the gain might be quite big. - @b Compatibility This class tries to combine almost full compatibility - with the old wxWidgets 1.xx wxString class, some reminiscence to MFC CString - class and 90% of the functionality of std::string class. - @b Rich set of functions Some of the functions present in wxString are - very useful but don't exist in most of other string classes: for example, - #AfterFirst, - #BeforeLast, #operator - or #Printf. Of course, all the standard string - operations are supported as well. - @b Unicode wxString is Unicode friendly: it allows to easily convert - to and from ANSI and Unicode strings in any build mode (see the - @ref unicode_overview for more details) and maps to either - @c string or @c wstring transparently depending on the current mode. - @b Used by wxWidgets And, of course, this class is used everywhere - inside wxWidgets so there is no performance loss which would result from - conversions of objects of any other string class (including std::string) to - wxString internally by wxWidgets. - - - However, there are several problems as well. The most important one is probably - that there are often several functions to do exactly the same thing: for - example, to get the length of the string either one of - length(), #Len() or - #Length() may be used. The first function, as almost - all the other functions in lowercase, is std::string compatible. The second one - is "native" wxString version and the last one is wxWidgets 1.xx way. So the - question is: which one is better to use? And the answer is that: - @b The usage of std::string compatible functions is strongly advised! It will - both make your code more familiar to other C++ programmers (who are supposed to - have knowledge of std::string but not of wxString), let you reuse the same code - in both wxWidgets and other programs (by just typedefing wxString as std::string - when used outside wxWidgets) and by staying compatible with future versions of - wxWidgets which will probably start using std::string sooner or later too. - In the situations where there is no corresponding std::string function, please - try to use the new wxString methods and not the old wxWidgets 1.xx variants - which are deprecated and may disappear in future versions. - - @section wxstringadvices Some advice about using wxString - - Probably the main trap with using this class is the implicit conversion operator to - @e const char *. It is advised that you use #c_str() - instead to clearly indicate when the conversion is done. Specifically, the - danger of this implicit conversion may be seen in the following code fragment: - - @code - // this function converts the input string to uppercase, output it to the screen - // and returns the result - const char *SayHELLO(const wxString& input) - { - wxString output = input.Upper(); - - printf("Hello, %s!\n", output); - - return output; - } - @endcode - - There are two nasty bugs in these three lines. First of them is in the call to the - @e printf() function. Although the implicit conversion to C strings is applied - automatically by the compiler in the case of - - @code - puts(output); - @endcode - - because the argument of @e puts() is known to be of the type @e const char *, - this is @b not done for @e printf() which is a function with variable - number of arguments (and whose arguments are of unknown types). So this call may - do anything at all (including displaying the correct string on screen), although - the most likely result is a program crash. The solution is to use - #c_str(): just replace this line with - - @code - printf("Hello, %s!\n", output.c_str()); - @endcode - - The second bug is that returning @e output doesn't work. The implicit cast is - used again, so the code compiles, but as it returns a pointer to a buffer - belonging to a local variable which is deleted as soon as the function exits, - its contents is totally arbitrary. The solution to this problem is also easy: - just make the function return wxString instead of a C string. - This leads us to the following general advice: all functions taking string - arguments should take @e const wxString (this makes assignment to the - strings inside the function faster because of - @ref stringrefcount_overview) and all functions returning - strings should return @e wxString - this makes it safe to return local - variables. - - @section relatedtostring Other string related functions and classes - - As most programs use character strings, the standard C library provides quite - a few functions to work with them. Unfortunately, some of them have rather - counter-intuitive behaviour (like strncpy() which doesn't always terminate the - resulting string with a @NULL) and are in general not very safe (passing @NULL - to them will probably lead to program crash). Moreover, some very useful - functions are not standard at all. This is why in addition to all wxString - functions, there are also a few global string functions which try to correct - these problems: #wxIsEmpty() verifies whether the string - is empty (returning @true for @NULL pointers), - #wxStrlen() also handles @NULLs correctly and returns 0 for - them and #wxStricmp() is just a platform-independent - version of case-insensitive string comparison function known either as - stricmp() or strcasecmp() on different platforms. - The @c wx/string.h header also defines #wxSnprintf - and #wxVsnprintf functions which should be used instead - of the inherently dangerous standard @c sprintf() and which use @c snprintf() instead which does buffer size checks whenever possible. Of - course, you may also use wxString::Printf which is - also safe. - There is another class which might be useful when working with wxString: - #wxStringTokenizer. It is helpful when a string must - be broken into tokens and replaces the standard C library @e strtok() function. - And the very last string-related class is #wxArrayString: it - is just a version of the "template" dynamic array class which is specialized to work - with strings. Please note that this class is specially optimized (using its - knowledge of the internal structure of wxString) for storing strings and so it is - vastly better from a performance point of view than a wxObjectArray of wxStrings. - - @section wxstringrefcount Reference counting and why you shouldn't care about it - - All considerations for wxObject-derived @ref trefcount_overview objects - are valid also for wxString, even if it does not derive from wxObject. - Probably the unique case when you might want to think about reference - counting is when a string character is taken from a string which is not a - constant (or a constant reference). In this case, due to C++ rules, the - "read-only" @e operator[] (which is the same as - #GetChar()) cannot be chosen and the "read/write" - @e operator[] (the same as - #GetWritableChar()) is used instead. As the - call to this operator may modify the string, its data is unshared (COW is done) - and so if the string was really shared there is some performance loss (both in - terms of speed and memory consumption). In the rare cases when this may be - important, you might prefer using #GetChar() instead - of the array subscript operator for this reasons. Please note that - #at() method has the same problem as the subscript operator in - this situation and so using it is not really better. Also note that if all - string arguments to your functions are passed as @e const wxString (see the - section @ref stringadvices_overview) this situation will almost - never arise because for constant references the correct operator is called automatically. - - @section wxstringtuning Tuning wxString for your application - - - @b Note: this section is strictly about performance issues and is - absolutely not necessary to read for using wxString class. Please skip it unless - you feel familiar with profilers and relative tools. If you do read it, please - also read the preceding section about - @ref stringrefcount_overview. - - For the performance reasons wxString doesn't allocate exactly the amount of - memory needed for each string. Instead, it adds a small amount of space to each - allocated block which allows it to not reallocate memory (a relatively - expensive operation) too often as when, for example, a string is constructed by - subsequently adding one character at a time to it, as for example in: - - @code - // delete all vowels from the string - wxString DeleteAllVowels(const wxString& original) - { - wxString result; - - size_t len = original.length(); - for ( size_t n = 0; n len; n++ ) - { - if ( strchr("aeuio", tolower(original[n])) == @NULL ) - result += original[n]; - } - - return result; - } - @endcode - - This is quite a common situation and not allocating extra memory at all would - lead to very bad performance in this case because there would be as many memory - (re)allocations as there are consonants in the original string. Allocating too - much extra memory would help to improve the speed in this situation, but due to - a great number of wxString objects typically used in a program would also - increase the memory consumption too much. - The very best solution in precisely this case would be to use - #Alloc() function to preallocate, for example, len bytes - from the beginning - this will lead to exactly one memory allocation being - performed (because the result is at most as long as the original string). - However, using Alloc() is tedious and so wxString tries to do its best. The - default algorithm assumes that memory allocation is done in granularity of at - least 16 bytes (which is the case on almost all of wide-spread platforms) and so - nothing is lost if the amount of memory to allocate is rounded up to the next - multiple of 16. Like this, no memory is lost and 15 iterations from 16 in the - example above won't allocate memory but use the already allocated pool. - The default approach is quite conservative. Allocating more memory may bring - important performance benefits for programs using (relatively) few very long - strings. The amount of memory allocated is configured by the setting of @e EXTRA_ALLOC in the file string.cpp during compilation (be sure to understand - why its default value is what it is before modifying it!). You may try setting - it to greater amount (say twice nLen) or to 0 (to see performance degradation - which will follow) and analyse the impact of it on your program. If you do it, - you will probably find it helpful to also define WXSTRING_STATISTICS symbol - which tells the wxString class to collect performance statistics and to show - them on stderr on program termination. This will show you the average length of - strings your program manipulates, their average initial length and also the - percent of times when memory wasn't reallocated when string concatenation was - done but the already preallocated memory was used (this value should be about - 98% for the default allocation policy, if it is less than 90% you should - really consider fine tuning wxString for your application). - It goes without saying that a profiler should be used to measure the precise - difference the change to EXTRA_ALLOC makes to your program. - - */ - +/** + +@page overview_string wxString Overview + +Classes: wxString, wxArrayString, wxStringTokenizer + +@li @ref overview_string_intro +@li @ref overview_string_comparison +@li @ref overview_string_advice +@li @ref overview_string_related +@li @ref overview_string_tuning + + +