From: Bryan Petty Date: Mon, 25 Feb 2008 10:50:43 +0000 (+0000) Subject: Doxygen topic overview cleanups. X-Git-Url: https://git.saurik.com/wxWidgets.git/commitdiff_plain/728449503cca147635cb0b33bfcabf5d66268f5c Doxygen topic overview cleanups. git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@52088 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775 --- diff --git a/docs/doxygen/mainpages/topics.h b/docs/doxygen/mainpages/topics.h index 8c1d03154e..3c8188244b 100644 --- a/docs/doxygen/mainpages/topics.h +++ b/docs/doxygen/mainpages/topics.h @@ -37,7 +37,7 @@ @li @subpage overview_backwardcompat @li @subpage overview_runtimeclass - @li @subpage overview_trefcount + @li @subpage overview_refcount @li @subpage overview_app @li @subpage overview_unicode @li @subpage overview_mbconvclasses @@ -63,7 +63,7 @@ @li @subpage overview_thread @li @subpage overview_config @li @subpage overview_fs - @li @subpage overview_resyn + @li @subpage overview_resyntax @li @subpage overview_arc @li @subpage overview_ipc diff --git a/docs/doxygen/overviews/refcount.h b/docs/doxygen/overviews/refcount.h index 08f23649d6..5feb42767e 100644 --- a/docs/doxygen/overviews/refcount.h +++ b/docs/doxygen/overviews/refcount.h @@ -1,5 +1,5 @@ ///////////////////////////////////////////////////////////////////////////// -// Name: trefcount +// Name: refcount.h // Purpose: topic overview // Author: wxWidgets team // RCS-ID: $Id$ @@ -8,114 +8,117 @@ /*! - @page trefcount_overview Reference counting +@page overview_refcount Reference Counting - @ref refcount_overview - @ref refcountequality_overview - @ref refcountdestruct_overview - @ref refcountlist_overview - @ref object_overview +@li @ref overview_refcount_ignore +@li @ref overview_refcount_equality +@li @ref overview_refcount_destruct +@li @ref overview_refcount_list +@li @ref overview_refcount_object - @section refcount Why you shouldn't care about it +
- Many wxWidgets objects use a technique known as @e reference counting, also known - as @e copy on write (COW). - This means that when an object is assigned to another, no copying really takes place: - only the reference count on the shared object data is incremented and both objects - share the same data (a very fast operation). - But as soon as one of the two (or more) objects is modified, the data has to be - copied because the changes to one of the objects shouldn't be seen in the - others. As data copying only happens when the object is written to, this is - known as COW. - What is important to understand is that all this happens absolutely - transparently to the class users and that whether an object is shared or not - is not seen from the outside of the class - in any case, the result of any - operation on it is the same. - @section refcountequality Object comparison +@section overview_refcount_ignore Why You Shouldn't Care About It - The == and != operators of @ref refcountlist_overview - always do a @c deep comparison. - This means that the equality operator will return @true if two objects are - identic and not only if they share the same data. - Note that wxWidgets follows the @e STL philosophy: when a comparison operator cannot - be implemented efficiently (like for e.g. wxImage's == operator which would need to - compare pixel-by-pixel the entire image's data), it's not implemented at all. - That's why not all reference-counted wxWidgets classes provide comparison operators. - Also note that if you only need to do a @c shallow comparison between two - #wxObject-derived classes, you should not use the == and != operators - but rather the wxObject::IsSameAs function. +Many wxWidgets objects use a technique known as reference counting, +also known as copy on write (COW). This means that when an object is +assigned to another, no copying really takes place. Only the reference count on +the shared object data is incremented and both objects share the same data (a +very fast operation). +But as soon as one of the two (or more) objects is modified, the data has to be +copied because the changes to one of the objects shouldn't be seen in the +others. As data copying only happens when the object is written to, this is +known as COW. - @section refcountdestruct Object destruction +What is important to understand is that all this happens absolutely +transparently to the class users and that whether an object is shared or not is +not seen from the outside of the class - in any case, the result of any +operation on it is the same. - When a COW object destructor is called, it may not delete the data: if it's shared, - the destructor will just decrement the shared data's reference count without destroying it. - Only when the destructor of the last object owning the data is called, the data is really - destroyed. As for all other COW-things, this happens transparently to the class users so - that you shouldn't care about it. +@section overview_refcount_equality Object Comparison - @section refcountlist List of reference-counted wxWidgets classes +The == and != operators of the reference counted classes always do a @c deep +comparison. This means that the equality operator will return @true if two +objects are identical and not only if they share the same data. - The following classes in wxWidgets have efficient (i.e. fast) assignment operators - and copy constructors since they are reference-counted: - #wxAcceleratorTable +Note that wxWidgets follows the STL philosophy: when a comparison +operator can not be implemented efficiently (like for e.g. wxImage's == +operator which would need to compare the entire image's data, pixel-by-pixel), +it's not implemented at all. That's why not all reference counted classes +provide comparison operators. - #wxAnimation +Also note that if you only need to do a @c shallow comparison between two +#wxObject derived classes, you should not use the == and != operators but +rather the wxObject::IsSameAs function. - #wxBitmap - #wxBrush +@section overview_refcount_destruct Object Destruction - #wxCursor +When a COW object destructor is called, it may not delete the data: if it's +shared, the destructor will just decrement the shared data's reference count +without destroying it. Only when the destructor of the last object owning the +data is called, the data is really destroyed. Just like all other COW-things, +this happens transparently to the class users so that you shouldn't care about +it. - #wxFont - #wxIcon +@section overview_refcount_list List of Reference Counted Classes - #wxImage +The following classes in wxWidgets have efficient (i.e. fast) assignment +operators and copy constructors since they are reference-counted: - #wxMetafile +@li #wxAcceleratorTable +@li #wxAnimation +@li #wxBitmap +@li #wxBrush +@li #wxCursor +@li #wxFont +@li #wxIcon +@li #wxImage +@li #wxMetafile +@li #wxPalette +@li #wxPen +@li #wxRegion +@li #wxString +@li #wxVariant +@li #wxVariantData - #wxPalette +Note that the list above reports the objects which are reference counted in all +ports of wxWidgets; some ports may use this technique also for other classes. - #wxPen - #wxRegion +@section overview_refcount_object Making Your Own Reference Counted Class - #wxString +Reference counting can be implemented easily using #wxObject and +#wxObjectRefData classes. Alternatively, you can also use the +#wxObjectDataPtr template. - #wxVariant +First, derive a new class from #wxObjectRefData and put there the +memory-consuming data. - #wxVariantData - Note that the list above reports the objects which are reference-counted in all ports of - wxWidgets; some ports may use this tecnique also for other classes. +Then derive a new class from #wxObject and implement there the public interface +which will be seen by the user of your class. You'll probably want to add a +function to your class which does the cast from #wxObjectRefData to your +class-specific shared data. For example: - @section wxobjectoverview Make your own reference-counted class +@code +MyClassRefData* GetData() const +{ + return wx_static_cast(MyClassRefData*, m_refData); +} +@endcode - Reference counting can be implemented easily using #wxObject - and #wxObjectRefData classes. Alternatively, you - can also use the #wxObjectDataPtrT template. - First, derive a new class from #wxObjectRefData and - put there the memory-consuming data. - Then derive a new class from #wxObject and implement there - the public interface which will be seen by the user of your class. - You'll probably want to add a function to your class which does the cast from - #wxObjectRefData to your class-specific shared data; e.g.: +In fact, any time you need to read the data from your wxObject-derived class, +you will need to call this function. - @code - MyClassRefData *GetData() const { return wx_static_cast(MyClassRefData*, m_refData); } - @endcode - - in fact, all times you'll need to read the data from your wxObject-derived class, - you'll need to call such function. - Very important, all times you need to actually modify the data placed inside your - wxObject-derived class, you must first call the wxObject::UnShare - function to be sure that the modifications won't affect other instances which are - eventually sharing your object's data. - - */ +@note Any time you need to actually modify the data placed inside your wxObject +derived class, you must first call the wxObject::UnShare function to ensure +that the modifications won't affect other instances which are eventually +sharing your object's data. +*/ diff --git a/docs/doxygen/overviews/referencenotes.h b/docs/doxygen/overviews/referencenotes.h index fd7d631fbe..82c2308b6e 100644 --- a/docs/doxygen/overviews/referencenotes.h +++ b/docs/doxygen/overviews/referencenotes.h @@ -1,5 +1,5 @@ ///////////////////////////////////////////////////////////////////////////// -// Name: referencenotes +// Name: referencenotes.h // Purpose: topic overview // Author: wxWidgets team // RCS-ID: $Id$ @@ -8,25 +8,24 @@ /*! - @page referencenotes_overview Notes on using the reference - - In the descriptions of the wxWidgets classes and their member - functions, note that descriptions of inherited member functions are not - duplicated in derived classes unless their behaviour is different. So in - using a class such as wxScrolledWindow, be aware that wxWindow functions may be - relevant. - Note also that arguments with default values may be omitted from a - function call, for brevity. Size and position arguments may usually be - given a value of -1 (the default), in which case wxWidgets will choose a - suitable value. - Most strings are returned as wxString objects. However, for remaining - char * return values, the strings are allocated and - deallocated by wxWidgets. Therefore, return values should always be - copied for long-term use, especially since the same buffer is often - used by wxWidgets. - The member functions are given in alphabetical order except for - constructors and destructors which appear first. - - */ +@page overview_referencenotes Notes on Using the Reference +In the descriptions of the wxWidgets classes and their member functions, note +that descriptions of inherited member functions are not duplicated in derived +classes unless their behaviour is different. So in using a class such as +wxScrolledWindow, be aware that wxWindow functions may be relevant. + +Note also that arguments with default values may be omitted from a function +call, for brevity. Size and position arguments may usually be given a value of +-1 (the default), in which case wxWidgets will choose a suitable value. + +Most strings are returned as wxString objects. However, for remaining char * +return values, the strings are allocated and deallocated by wxWidgets. +Therefore, return values should always be copied for long-term use, especially +since the same buffer is often used by wxWidgets. + +The member functions are given in alphabetical order except for constructors +and destructors which appear first. + +*/ diff --git a/docs/doxygen/overviews/resyntax.h b/docs/doxygen/overviews/resyntax.h index 505c3e8a43..5fd054f47a 100644 --- a/docs/doxygen/overviews/resyntax.h +++ b/docs/doxygen/overviews/resyntax.h @@ -1,5 +1,5 @@ ///////////////////////////////////////////////////////////////////////////// -// Name: resyn +// Name: resyntax.h // Purpose: topic overview // Author: wxWidgets team // RCS-ID: $Id$ @@ -8,2310 +8,1768 @@ /*! - @page resyn_overview Syntax of the builtin regular expression library +@page overview_resyntax Syntax of the Built-in Regular Expression Library - A @e regular expression describes strings of characters. It's a - pattern that matches certain strings and doesn't match others. - @b See also - #wxRegEx - @ref differentflavors_overview - @ref resyntax_overview - @ref resynbracket_overview - #Escapes - #Metasyntax - #Matching - @ref relimits_overview - @ref resynbre_overview - @ref resynchars_overview +A regular expression describes strings of characters. It's a pattern +that matches certain strings and doesn't match others. +@seealso #wxRegEx - @section differentflavors Different Flavors of REs +@li @ref overview_resyntax_differentflavors +@li @ref overview_resyntax_syntax +@li @ref overview_resyntax_bracket +@li @ref overview_resyntax_escapes +@li @ref overview_resyntax_metasyntax +@li @ref overview_resyntax_matching +@li @ref overview_resyntax_limits +@li @ref overview_resyntax_bre +@li @ref overview_resyntax_characters - @ref resyn_overview - Regular expressions ("RE''s), as defined by POSIX, come in two - flavors: @e extended REs ("EREs'') and @e basic REs ("BREs''). EREs are roughly those - of the traditional @e egrep, while BREs are roughly those of the traditional - @e ed. This implementation adds a third flavor, @e advanced REs ("AREs''), basically - EREs with some significant extensions. - This manual page primarily describes - AREs. BREs mostly exist for backward compatibility in some old programs; - they will be discussed at the #end. POSIX EREs are almost an exact subset - of AREs. Features of AREs that are not present in EREs will be indicated. - @section resyntax Regular Expression Syntax +
- @ref resyn_overview - These regular expressions are implemented using - the package written by Henry Spencer, based on the 1003.2 spec and some - (not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description - of regular expressions below is copied verbatim from his manual entry. - An ARE is one or more @e branches, separated by '@b |', matching anything that matches - any of the branches. - A branch is zero or more @e constraints or @e quantified - atoms, concatenated. It matches a match for the first, followed by a match - for the second, etc; an empty branch matches the empty string. - A quantified atom is an @e atom possibly followed by a single @e quantifier. Without a quantifier, - it matches a match for the atom. The quantifiers, and what a so-quantified - atom matches, are: +@section overview_resyntax_differentflavors Different Flavors of REs +Regular expressions ("RE''s), as defined by POSIX, come in two +flavors: @e extended REs ("EREs'') and @e basic REs ("BREs''). EREs are roughly those +of the traditional @e egrep, while BREs are roughly those of the traditional +@e ed. This implementation adds a third flavor, @e advanced REs ("AREs''), basically +EREs with some significant extensions. +This manual page primarily describes +AREs. BREs mostly exist for backward compatibility in some old programs; +they will be discussed at the #end. POSIX EREs are almost an exact subset +of AREs. Features of AREs that are not present in EREs will be indicated. +@section overview_resyntax_syntax Regular Expression Syntax +These regular expressions are implemented using +the package written by Henry Spencer, based on the 1003.2 spec and some +(not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description +of regular expressions below is copied verbatim from his manual entry. +An ARE is one or more @e branches, separated by '@b |', matching anything that matches +any of the branches. +A branch is zero or more @e constraints or @e quantified +atoms, concatenated. It matches a match for the first, followed by a match +for the second, etc; an empty branch matches the empty string. +A quantified atom is an @e atom possibly followed by a single @e quantifier. Without a quantifier, +it matches a match for the atom. The quantifiers, and what a so-quantified +atom matches, are: - @b * +@b * +a sequence of 0 or more matches of the atom - a sequence of 0 or more matches of the atom +@b + +a sequence of 1 or more matches of the atom +@b ? +a sequence of 0 or 1 matches of the atom +@b {m} - @b + +a sequence of exactly @e m matches of the atom +@b {m,} +a sequence of @e m or more matches of the atom +@b {m,n} - a sequence of 1 or more matches of the atom +a sequence of @e m through @e n (inclusive) +matches of the atom; @e m may not exceed @e n +@b *? +? ?? {m}? {m,}? {m,n}? +@e non-greedy quantifiers, +which match the same possibilities, but prefer the +smallest number rather than the largest number of matches (see #Matching) +The forms using @b { and @b } are known as @e bounds. The numbers @e m and @e n are unsigned +decimal integers with permissible values from 0 to 255 inclusive. +An atom is one of: +@b (re) - @b ? +(where @e re is any regular expression) matches a match for +@e re, with the match noted for possible reporting +@b (?:re) +as previous, but +does no reporting (a "non-capturing'' set of parentheses) +@b () - a sequence of 0 or 1 matches of the atom +matches an empty +string, noted for possible reporting +@b (?:) +matches an empty string, without reporting +@b [chars] +a @e bracket expression, matching any one of the @e chars +(see @ref resynbracket_overview for more detail) - @b {m} +@b . +matches any single character +@b \k +(where @e k is a non-alphanumeric character) +matches that character taken as an ordinary character, e.g. \\ matches a backslash +character - a sequence of exactly @e m matches of the atom +@b \c +where @e c is alphanumeric (possibly followed by other characters), +an @e escape (AREs only), see #Escapes below +@b { +when followed by a character +other than a digit, matches the left-brace character '@b {'; when followed by +a digit, it is the beginning of a @e bound (see above) +@b x - @b {m,} +where @e x is a single +character with no other significance, matches that character. +A @e constraint matches an empty string when specific conditions are met. A constraint may +not be followed by a quantifier. The simple constraints are as follows; +some more constraints are described later, under #Escapes. +@b ^ +matches at the beginning of a line - a sequence of @e m or more matches of the atom +@b $ +matches at the end of a line +@b (?=re) +@e positive lookahead +(AREs only), matches at any point where a substring matching @e re begins +@b (?!re) - @b {m,n} +@e negative lookahead (AREs only), +matches at any point where no substring matching @e re begins +The lookahead constraints may not contain back references +(see later), and all parentheses within them are considered non-capturing. +An RE may not end with '@b \'. - a sequence of @e m through @e n (inclusive) - matches of the atom; @e m may not exceed @e n +@section overview_resyntax_bracket Bracket Expressions +A @e bracket expression is a list +of characters enclosed in '@b []'. It normally matches any single character from +the list (but see below). If the list begins with '@b ^', it matches any single +character (but see below) @e not from the rest of the list. +If two characters +in the list are separated by '@b -', this is shorthand for the full @e range of +characters between those two (inclusive) in the collating sequence, e.g. +@b [0-9] in ASCII matches any decimal digit. Two ranges may not share an endpoint, +so e.g. @b a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable +programs should avoid relying on them. +To include a literal @b ] or @b - in the +list, the simplest method is to enclose it in @b [. and @b .] to make it a collating +element (see below). Alternatively, make it the first character (following +a possible '@b ^'), or (AREs only) precede it with '@b \'. +Alternatively, for '@b -', make +it the last character, or the second endpoint of a range. To use a literal +@b - as the first endpoint of a range, make it a collating element or (AREs +only) precede it with '@b \'. With the exception of these, some combinations using +@b [ (see next paragraphs), and escapes, all other special characters lose +their special significance within a bracket expression. +Within a bracket +expression, a collating element (a character, a multi-character sequence +that collates as if it were a single character, or a collating-sequence +name for either) enclosed in @b [. and @b .] stands for the +sequence of characters of that collating element. +@e wxWidgets: Currently no multi-character collating elements are defined. +So in @b [.X.], @e X can either be a single character literal or +the name of a character. For example, the following are both identical +@b [[.0.]-[.9.]] and @b [[.zero.]-[.nine.]] and mean the same as +@b [0-9]. +See @ref resynchars_overview. +Within a bracket expression, a collating element enclosed in @b [= and @b =] +is an equivalence class, standing for the sequences of characters of all +collating elements equivalent to that one, including itself. +An equivalence class may not be an endpoint of a range. +@e wxWidgets: Currently no equivalence classes are defined, so +@b [=X=] stands for just the single character @e X. +@e X can either be a single character literal or the name of a character, +see @ref resynchars_overview. +Within a bracket expression, +the name of a @e character class enclosed in @b [: and @b :] stands for the list +of all characters (not all collating elements!) belonging to that class. +Standard character classes are: - @b *? +? ?? {m}? {m,}? {m,n}? +@b alpha +A letter. +@b upper +An upper-case letter. - @e non-greedy quantifiers, - which match the same possibilities, but prefer the - smallest number rather than the largest number of matches (see #Matching) +@b lower +A lower-case letter. +@b digit +A decimal digit. +@b xdigit - The forms using @b { and @b } are known as @e bounds. The numbers @e m and @e n are unsigned - decimal integers with permissible values from 0 to 255 inclusive. - An atom is one of: +A hexadecimal digit. +@b alnum +An alphanumeric (letter or digit). +@b print +An alphanumeric (same as alnum). +@b blank - @b (re) +A space or tab character. +@b space +A character producing white space in displayed text. +@b punct - (where @e re is any regular expression) matches a match for - @e re, with the match noted for possible reporting +A punctuation character. +@b graph +A character with a visible representation. +@b cntrl +A control character. - @b (?:re) +A character class may not be used as an endpoint of a range. +@e wxWidgets: In a non-Unicode build, these character classifications depend on the +current locale, and correspond to the values return by the ANSI C 'is' +functions: isalpha, isupper, etc. In Unicode mode they are based on +Unicode classifications, and are not affected by the current locale. +There are two special cases of bracket expressions: +the bracket expressions @b [[::]] and @b [[::]] are constraints, matching empty +strings at the beginning and end of a word respectively. A word is defined +as a sequence of word characters that is neither preceded nor followed +by word characters. A word character is an @e alnum character or an underscore +(@b _). These special bracket expressions are deprecated; users of AREs should +use constraint escapes instead (see #Escapes below). - as previous, but - does no reporting (a "non-capturing'' set of parentheses) +@section overview_resyntax_escapes Escapes +Escapes (AREs only), +which begin with a @b \ followed by an alphanumeric character, come in several +varieties: character entry, class shorthands, constraint escapes, and back +references. A @b \ followed by an alphanumeric character but not constituting +a valid escape is illegal in AREs. In EREs, there are no escapes: outside +a bracket expression, a @b \ followed by an alphanumeric character merely stands +for that character as an ordinary character, and inside a bracket expression, +@b \ is an ordinary character. (The latter is the one actual incompatibility +between EREs and AREs.) +Character-entry escapes (AREs only) exist to make +it easier to specify non-printing and otherwise inconvenient characters +in REs: +@b \a - @b () +alert (bell) character, as in C +@b \b +backspace, as in C +@b \B - matches an empty - string, noted for possible reporting +synonym +for @b \ to help reduce backslash doubling in some applications where there +are multiple levels of backslash processing +@b \c@e X +(where X is any character) +the character whose low-order 5 bits are the same as those of @e X, and whose +other bits are all zero +@b \e +the character whose collating-sequence name is +'@b ESC', or failing that, the character with octal value 033 - @b (?:) +@b \f +formfeed, as in C +@b \n +newline, as in C - matches an empty string, without reporting +@b \r +carriage return, as in C +@b \t +horizontal tab, as in C +@b \u@e wxyz - @b [chars] +(where @e wxyz is exactly four hexadecimal digits) +the Unicode +character @b U+@e wxyz in the local byte ordering +@b \U@e stuvwxyz +(where @e stuvwxyz is +exactly eight hexadecimal digits) reserved for a somewhat-hypothetical Unicode +extension to 32 bits +@b \v - a @e bracket expression, matching any one of the @e chars - (see @ref resynbracket_overview for more detail) +vertical tab, as in C are all available. +@b \x@e hhh +(where +@e hhh is any sequence of hexadecimal digits) the character whose hexadecimal +value is @b 0x@e hhh (a single character no matter how many hexadecimal digits +are used). +@b \0 +the character whose value is @b 0 - @b . +@b \@e xy +(where @e xy is exactly two +octal digits, and is not a @e back reference (see below)) the character whose +octal value is @b 0@e xy +@b \@e xyz +(where @e xyz is exactly three octal digits, and is +not a back reference (see below)) +the character whose octal value is @b 0@e xyz - matches any single character +Hexadecimal digits are '@b 0'-'@b 9', '@b a'-'@b f', and '@b A'-'@b F'. Octal +digits are '@b 0'-'@b 7'. +The character-entry +escapes are always taken as ordinary characters. For example, @b \135 is @b ] in +ASCII, but @b \135 does not terminate a bracket expression. Beware, however, +that some applications (e.g., C compilers) interpret such sequences themselves +before the regular-expression package gets to see them, which may require +doubling (quadrupling, etc.) the '@b \'. +Class-shorthand escapes (AREs only) provide +shorthands for certain commonly-used character classes: - @b \k +@b \d +@b [[:digit:]] +@b \s +@b [[:space:]] - (where @e k is a non-alphanumeric character) - matches that character taken as an ordinary character, e.g. \\ matches a backslash - character +@b \w +@b [[:alnum:]_] (note underscore) +@b \D +@b [^[:digit:]] +@b \S - @b \c +@b [^[:space:]] +@b \W +@b [^[:alnum:]_] (note underscore) - where @e c is alphanumeric (possibly followed by other characters), - an @e escape (AREs only), see #Escapes below +Within bracket expressions, '@b \d', '@b \s', and +'@b \w' lose their outer brackets, and '@b \D', +'@b \S', and '@b \W' are illegal. (So, for example, +@b [a-c\d] is equivalent to @b [a-c[:digit:]]. +Also, @b [a-c\D], which is equivalent to +@b [a-c^[:digit:]], is illegal.) +A constraint escape (AREs only) is a constraint, +matching the empty string if specific conditions are met, written as an +escape: +@b \A - @b { +matches only at the beginning of the string +(see #Matching, below, +for how this differs from '@b ^') +@b \m +matches only at the beginning of a word +@b \M - when followed by a character - other than a digit, matches the left-brace character '@b {'; when followed by - a digit, it is the beginning of a @e bound (see above) +matches only at the end of a word +@b \y +matches only at the beginning or end of a word +@b \Y +matches only at a point that is not the beginning or end of +a word - @b x +@b \Z +matches only at the end of the string +(see #Matching, below, for +how this differs from '@b $') +@b \@e m +(where @e m is a nonzero digit) a @e back reference, +see below - where @e x is a single - character with no other significance, matches that character. +@b \@e mnn +(where @e m is a nonzero digit, and @e nn is some more digits, +and the decimal value @e mnn is not greater than the number of closing capturing +parentheses seen so far) a @e back reference, see below +A word is defined +as in the specification of @b [[::]] and @b [[::]] above. Constraint escapes are +illegal within bracket expressions. +A back reference (AREs only) matches +the same string matched by the parenthesized subexpression specified by +the number, so that (e.g.) @b ([bc])\1 matches @b bb or @b cc but not '@b bc'. +The subexpression +must entirely precede the back reference in the RE. Subexpressions are numbered +in the order of their leading parentheses. Non-capturing parentheses do not +define subexpressions. +There is an inherent historical ambiguity between +octal character-entry escapes and back references, which is resolved by +heuristics, as hinted at above. A leading zero always indicates an octal +escape. A single non-zero digit, not followed by another digit, is always +taken as a back reference. A multi-digit sequence not starting with a zero +is taken as a back reference if it comes after a suitable subexpression +(i.e. the number is in the legal range for a back reference), and otherwise +is taken as octal. - A @e constraint matches an empty string when specific conditions are met. A constraint may - not be followed by a quantifier. The simple constraints are as follows; - some more constraints are described later, under #Escapes. +@section overview_resyntax_metasyntax Metasyntax +In addition to the main syntax described above, +there are some special forms and miscellaneous syntactic facilities available. +Normally the flavor of RE being used is specified by application-dependent +means. However, this can be overridden by a @e director. If an RE of any flavor +begins with '@b ***:', the rest of the RE is an ARE. If an RE of any flavor begins +with '@b ***=', the rest of the RE is taken to be a literal string, with all +characters considered ordinary characters. +An ARE may begin with @e embedded options: a sequence @b (?xyz) +(where @e xyz is one or more alphabetic characters) +specifies options affecting the rest of the RE. These supplement, and can +override, any options specified by the application. The available option +letters are: +@b b - @b ^ +rest of RE is a BRE +@b c +case-sensitive matching (usual default) +@b e - matches at the beginning of a line +rest of RE is an ERE +@b i +case-insensitive matching (see #Matching, below) +@b m +historical synonym for @b n - @b $ +@b n +newline-sensitive matching (see #Matching, below) +@b p +partial newline-sensitive matching (see #Matching, below) - matches at the end of a line +@b q +rest of RE +is a literal ("quoted'') string, all ordinary characters +@b s +non-newline-sensitive matching (usual default) +@b t - @b (?=re) +tight syntax (usual default; see below) +@b w +inverse +partial newline-sensitive ("weird'') matching (see #Matching, below) +@b x + +expanded syntax (see below) + + + +Embedded options take effect at the @b ) terminating the +sequence. They are available only at the start of an ARE, and may not be +used later within it. +In addition to the usual (@e tight) RE syntax, in which +all characters are significant, there is an @e expanded syntax, available +in AREs with the embedded +x option. In the expanded syntax, white-space characters are ignored and +all characters between a @b # and the following newline (or the end of the +RE) are ignored, permitting paragraphing and commenting a complex RE. There +are three exceptions to that basic rule: + + +a white-space character or '@b #' preceded +by '@b \' is retained +white space or '@b #' within a bracket expression is retained +white space and comments are illegal within multi-character symbols like +the ARE '@b (?:' or the BRE '@b \(' + + +Expanded-syntax white-space characters are blank, +tab, newline, and any character that belongs to the @e space character class. +Finally, in an ARE, outside bracket expressions, the sequence '@b (?#ttt)' (where +@e ttt is any text not containing a '@b )') is a comment, completely ignored. Again, +this is not allowed between the characters of multi-character symbols like +'@b (?:'. Such comments are more a historical artifact than a useful facility, +and their use is deprecated; use the expanded syntax instead. +@e None of these +metasyntax extensions is available if the application (or an initial @b ***= +director) has specified that the user's input be treated as a literal string +rather than as an RE. + + +@section overview_resyntax_matching Matching + +In the event that an RE could match more than +one substring of a given string, the RE matches the one starting earliest +in the string. If the RE could match more than one substring starting at +that point, its choice is determined by its @e preference: either the longest +substring, or the shortest. +Most atoms, and all constraints, have no preference. +A parenthesized RE has the same preference (possibly none) as the RE. A +quantified atom with quantifier @b {m} or @b {m}? has the same preference (possibly +none) as the atom itself. A quantified atom with other normal quantifiers +(including @b {m,n} with @e m equal to @e n) prefers longest match. A quantified +atom with other non-greedy quantifiers (including @b {m,n}? with @e m equal to +@e n) prefers shortest match. A branch has the same preference as the first +quantified atom in it which has a preference. An RE consisting of two or +more branches connected by the @b | operator prefers longest match. +Subject to the constraints imposed by the rules for matching the whole RE, subexpressions +also match the longest or shortest possible substrings, based on their +preferences, with subexpressions starting earlier in the RE taking priority +over ones starting later. Note that outer subexpressions thus take priority +over their component subexpressions. +Note that the quantifiers @b {1,1} and +@b {1,1}? can be used to force longest and shortest preference, respectively, +on a subexpression or a whole RE. +Match lengths are measured in characters, +not collating elements. An empty string is considered longer than no match +at all. For example, @b bb* matches the three middle characters +of '@b abbbc', @b (week|wee)(night|knights) +matches all ten characters of '@b weeknights', when @b (.*).* is matched against +@b abc the parenthesized subexpression matches all three characters, and when +@b (a*)* is matched against @b bc both the whole RE and the parenthesized subexpression +match an empty string. +If case-independent matching is specified, the effect +is much as if all case distinctions had vanished from the alphabet. When +an alphabetic that exists in multiple cases appears as an ordinary character +outside a bracket expression, it is effectively transformed into a bracket +expression containing both cases, so that @b x becomes '@b [xX]'. When it appears +inside a bracket expression, all case counterparts of it are added to the +bracket expression, so that @b [x] becomes @b [xX] and @b [^x] becomes '@b [^xX]'. +If newline-sensitive +matching is specified, @b . and bracket expressions using @b ^ will never match +the newline character (so that matches will never cross newlines unless +the RE explicitly arranges it) and @b ^ and @b $ will match the empty string after +and before a newline respectively, in addition to matching at beginning +and end of string respectively. ARE @b \A and @b \Z continue to match beginning +or end of string @e only. +If partial newline-sensitive matching is specified, +this affects @b . and bracket expressions as with newline-sensitive matching, +but not @b ^ and '@b $'. +If inverse partial newline-sensitive matching is specified, +this affects @b ^ and @b $ as with newline-sensitive matching, but not @b . and bracket +expressions. This isn't very useful but is provided for symmetry. + + +@section overview_resyntax_limits Limits and Compatibility + +No particular limit is imposed on the length of REs. Programs +intended to be highly portable should not employ REs longer than 256 bytes, +as a POSIX-compliant implementation can refuse to accept such REs. +The only +feature of AREs that is actually incompatible with POSIX EREs is that @b \ +does not lose its special significance inside bracket expressions. All other +ARE features use syntax which is illegal or has undefined or unspecified +effects in POSIX EREs; the @b *** syntax of directors likewise is outside +the POSIX syntax for both BREs and EREs. +Many of the ARE extensions are +borrowed from Perl, but some have been changed to clean them up, and a +few Perl extensions are not present. Incompatibilities of note include '@b \b', +'@b \B', the lack of special treatment for a trailing newline, the addition of +complemented bracket expressions to the things affected by newline-sensitive +matching, the restrictions on parentheses and back references in lookahead +constraints, and the longest/shortest-match (rather than first-match) matching +semantics. +The matching rules for REs containing both normal and non-greedy +quantifiers have changed since early beta-test versions of this package. +(The new rules are much simpler and cleaner, but don't work as hard at guessing +the user's real intentions.) +Henry Spencer's original 1986 @e regexp package, still in widespread use, +implemented an early version of today's EREs. There are four incompatibilities between @e regexp's +near-EREs ('RREs' for short) and AREs. In roughly increasing order of significance: - @e positive lookahead - (AREs only), matches at any point where a substring matching @e re begins +In AREs, @b \ followed by an alphanumeric character is either an escape or +an error, while in RREs, it was just another way of writing the alphanumeric. +This should not be a problem because there was no reason to write such +a sequence in RREs. +@b { followed by a digit in an ARE is the beginning of +a bound, while in RREs, @b { was always an ordinary character. Such sequences +should be rare, and will often result in an error because following characters +will not look like a valid bound. +In AREs, @b \ remains a special character +within '@b []', so a literal @b \ within @b [] must be +written '@b \\'. @b \\ also gives a literal +@b \ within @b [] in RREs, but only truly paranoid programmers routinely doubled +the backslash. +AREs report the longest/shortest match for the RE, rather +than the first found in a specified search order. This may affect some RREs +which were written in the expectation that the first match would be reported. +(The careful crafting of RREs to optimize the search order for fast matching +is obsolete (AREs examine all possible matches in parallel, and their performance +is largely insensitive to their complexity) but cases where the search +order was exploited to deliberately find a match which was @e not the longest/shortest +will need rewriting.) +@section overview_resyntax_bre Basic Regular Expressions +BREs differ from EREs in +several respects. '@b |', '@b +', and @b ? are ordinary characters and there is no equivalent +for their functionality. The delimiters for bounds +are @b \{ and '@b \}', with @b { and +@b } by themselves ordinary characters. The parentheses for nested subexpressions +are @b \( and '@b \)', with @b ( and @b ) by themselves +ordinary characters. @b ^ is an ordinary +character except at the beginning of the RE or the beginning of a parenthesized +subexpression, @b $ is an ordinary character except at the end of the RE or +the end of a parenthesized subexpression, and @b * is an ordinary character +if it appears at the beginning of the RE or the beginning of a parenthesized +subexpression (after a possible leading '@b ^'). Finally, single-digit back references +are available, and @b \ and @b \ are synonyms +for @b [[::]] and @b [[::]] respectively; +no other escapes are available. - @b (?!re) +@section overview_resyntax_characters Regular Expression Character Names +Note that the character names are case sensitive. - @e negative lookahead (AREs only), - matches at any point where no substring matching @e re begins +NUL - The lookahead constraints may not contain back references - (see later), and all parentheses within them are considered non-capturing. - An RE may not end with '@b \'. - @section wxresynbracket Bracket Expressions - @ref resyn_overview - A @e bracket expression is a list - of characters enclosed in '@b []'. It normally matches any single character from - the list (but see below). If the list begins with '@b ^', it matches any single - character (but see below) @e not from the rest of the list. - If two characters - in the list are separated by '@b -', this is shorthand for the full @e range of - characters between those two (inclusive) in the collating sequence, e.g. - @b [0-9] in ASCII matches any decimal digit. Two ranges may not share an endpoint, - so e.g. @b a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable - programs should avoid relying on them. - To include a literal @b ] or @b - in the - list, the simplest method is to enclose it in @b [. and @b .] to make it a collating - element (see below). Alternatively, make it the first character (following - a possible '@b ^'), or (AREs only) precede it with '@b \'. - Alternatively, for '@b -', make - it the last character, or the second endpoint of a range. To use a literal - @b - as the first endpoint of a range, make it a collating element or (AREs - only) precede it with '@b \'. With the exception of these, some combinations using - @b [ (see next paragraphs), and escapes, all other special characters lose - their special significance within a bracket expression. - Within a bracket - expression, a collating element (a character, a multi-character sequence - that collates as if it were a single character, or a collating-sequence - name for either) enclosed in @b [. and @b .] stands for the - sequence of characters of that collating element. - @e wxWidgets: Currently no multi-character collating elements are defined. - So in @b [.X.], @e X can either be a single character literal or - the name of a character. For example, the following are both identical - @b [[.0.]-[.9.]] and @b [[.zero.]-[.nine.]] and mean the same as - @b [0-9]. - See @ref resynchars_overview. - Within a bracket expression, a collating element enclosed in @b [= and @b =] - is an equivalence class, standing for the sequences of characters of all - collating elements equivalent to that one, including itself. - An equivalence class may not be an endpoint of a range. - @e wxWidgets: Currently no equivalence classes are defined, so - @b [=X=] stands for just the single character @e X. - @e X can either be a single character literal or the name of a character, - see @ref resynchars_overview. - Within a bracket expression, - the name of a @e character class enclosed in @b [: and @b :] stands for the list - of all characters (not all collating elements!) belonging to that class. - Standard character classes are: +'\0' +SOH - @b alpha +'\001' - A letter. +STX - @b upper +'\002' - An upper-case letter. +ETX - @b lower +'\003' - A lower-case letter. +EOT - @b digit +'\004' - A decimal digit. +ENQ - @b xdigit +'\005' - A hexadecimal digit. +ACK - @b alnum +'\006' - An alphanumeric (letter or digit). +BEL - @b print +'\007' - An alphanumeric (same as alnum). +alert - @b blank +'\007' - A space or tab character. +BS - @b space +'\010' - A character producing white space in displayed text. +backspace - @b punct +'\b' - A punctuation character. +HT - @b graph +'\011' - A character with a visible representation. +tab - @b cntrl +'\t' - A control character. +LF - A character class may not be used as an endpoint of a range. - @e wxWidgets: In a non-Unicode build, these character classifications depend on the - current locale, and correspond to the values return by the ANSI C 'is' - functions: isalpha, isupper, etc. In Unicode mode they are based on - Unicode classifications, and are not affected by the current locale. - There are two special cases of bracket expressions: - the bracket expressions @b [[::]] and @b [[::]] are constraints, matching empty - strings at the beginning and end of a word respectively. A word is defined - as a sequence of word characters that is neither preceded nor followed - by word characters. A word character is an @e alnum character or an underscore - (@b _). These special bracket expressions are deprecated; users of AREs should - use constraint escapes instead (see #Escapes below). - @section wxresynescapes Escapes - @ref resyn_overview - Escapes (AREs only), - which begin with a @b \ followed by an alphanumeric character, come in several - varieties: character entry, class shorthands, constraint escapes, and back - references. A @b \ followed by an alphanumeric character but not constituting - a valid escape is illegal in AREs. In EREs, there are no escapes: outside - a bracket expression, a @b \ followed by an alphanumeric character merely stands - for that character as an ordinary character, and inside a bracket expression, - @b \ is an ordinary character. (The latter is the one actual incompatibility - between EREs and AREs.) - Character-entry escapes (AREs only) exist to make - it easier to specify non-printing and otherwise inconvenient characters - in REs: +'\012' - @b \a +newline - alert (bell) character, as in C +'\n' - @b \b +VT - backspace, as in C +'\013' - @b \B +vertical-tab - synonym - for @b \ to help reduce backslash doubling in some applications where there - are multiple levels of backslash processing +'\v' - @b \c@e X +FF - (where X is any character) - the character whose low-order 5 bits are the same as those of @e X, and whose - other bits are all zero +'\014' - @b \e +form-feed - the character whose collating-sequence name is - '@b ESC', or failing that, the character with octal value 033 +'\f' - @b \f +CR - formfeed, as in C +'\015' - @b \n +carriage-return - newline, as in C +'\r' - @b \r +SO - carriage return, as in C +'\016' - @b \t +SI - horizontal tab, as in C +'\017' - @b \u@e wxyz +DLE - (where @e wxyz is exactly four hexadecimal digits) - the Unicode - character @b U+@e wxyz in the local byte ordering +'\020' - @b \U@e stuvwxyz +DC1 - (where @e stuvwxyz is - exactly eight hexadecimal digits) reserved for a somewhat-hypothetical Unicode - extension to 32 bits +'\021' - @b \v +DC2 - vertical tab, as in C are all available. +'\022' - @b \x@e hhh +DC3 - (where - @e hhh is any sequence of hexadecimal digits) the character whose hexadecimal - value is @b 0x@e hhh (a single character no matter how many hexadecimal digits - are used). +'\023' - @b \0 +DC4 - the character whose value is @b 0 +'\024' - @b \@e xy +NAK - (where @e xy is exactly two - octal digits, and is not a @e back reference (see below)) the character whose - octal value is @b 0@e xy +'\025' - @b \@e xyz +SYN - (where @e xyz is exactly three octal digits, and is - not a back reference (see below)) - the character whose octal value is @b 0@e xyz +'\026' - Hexadecimal digits are '@b 0'-'@b 9', '@b a'-'@b f', and '@b A'-'@b F'. Octal - digits are '@b 0'-'@b 7'. - The character-entry - escapes are always taken as ordinary characters. For example, @b \135 is @b ] in - ASCII, but @b \135 does not terminate a bracket expression. Beware, however, - that some applications (e.g., C compilers) interpret such sequences themselves - before the regular-expression package gets to see them, which may require - doubling (quadrupling, etc.) the '@b \'. - Class-shorthand escapes (AREs only) provide - shorthands for certain commonly-used character classes: +ETB +'\027' - @b \d +CAN - @b [[:digit:]] +'\030' - @b \s +EM - @b [[:space:]] +'\031' - @b \w +SUB - @b [[:alnum:]_] (note underscore) +'\032' - @b \D +ESC - @b [^[:digit:]] +'\033' - @b \S +IS4 - @b [^[:space:]] +'\034' - @b \W +FS - @b [^[:alnum:]_] (note underscore) +'\034' - Within bracket expressions, '@b \d', '@b \s', and - '@b \w' lose their outer brackets, and '@b \D', - '@b \S', and '@b \W' are illegal. (So, for example, - @b [a-c\d] is equivalent to @b [a-c[:digit:]]. - Also, @b [a-c\D], which is equivalent to - @b [a-c^[:digit:]], is illegal.) - A constraint escape (AREs only) is a constraint, - matching the empty string if specific conditions are met, written as an - escape: +IS3 - @b \A +'\035' - matches only at the beginning of the string - (see #Matching, below, - for how this differs from '@b ^') +GS - @b \m +'\035' - matches only at the beginning of a word +IS2 - @b \M +'\036' - matches only at the end of a word +RS - @b \y +'\036' - matches only at the beginning or end of a word +IS1 - @b \Y +'\037' - matches only at a point that is not the beginning or end of - a word +US - @b \Z +'\037' - matches only at the end of the string - (see #Matching, below, for - how this differs from '@b $') +space - @b \@e m +' ' - (where @e m is a nonzero digit) a @e back reference, - see below +exclamation-mark - @b \@e mnn +'!' - (where @e m is a nonzero digit, and @e nn is some more digits, - and the decimal value @e mnn is not greater than the number of closing capturing - parentheses seen so far) a @e back reference, see below +quotation-mark - A word is defined - as in the specification of @b [[::]] and @b [[::]] above. Constraint escapes are - illegal within bracket expressions. - A back reference (AREs only) matches - the same string matched by the parenthesized subexpression specified by - the number, so that (e.g.) @b ([bc])\1 matches @b bb or @b cc but not '@b bc'. - The subexpression - must entirely precede the back reference in the RE. Subexpressions are numbered - in the order of their leading parentheses. Non-capturing parentheses do not - define subexpressions. - There is an inherent historical ambiguity between - octal character-entry escapes and back references, which is resolved by - heuristics, as hinted at above. A leading zero always indicates an octal - escape. A single non-zero digit, not followed by another digit, is always - taken as a back reference. A multi-digit sequence not starting with a zero - is taken as a back reference if it comes after a suitable subexpression - (i.e. the number is in the legal range for a back reference), and otherwise - is taken as octal. - @section remetasyntax Metasyntax +'"' - @ref resyn_overview - In addition to the main syntax described above, - there are some special forms and miscellaneous syntactic facilities available. - Normally the flavor of RE being used is specified by application-dependent - means. However, this can be overridden by a @e director. If an RE of any flavor - begins with '@b ***:', the rest of the RE is an ARE. If an RE of any flavor begins - with '@b ***=', the rest of the RE is taken to be a literal string, with all - characters considered ordinary characters. - An ARE may begin with @e embedded options: a sequence @b (?xyz) - (where @e xyz is one or more alphabetic characters) - specifies options affecting the rest of the RE. These supplement, and can - override, any options specified by the application. The available option - letters are: +number-sign - @b b +'#' - rest of RE is a BRE +dollar-sign - @b c +'$' - case-sensitive matching (usual default) +percent-sign - @b e +'%' - rest of RE is an ERE +ampersand - @b i +'' - case-insensitive matching (see #Matching, below) +apostrophe - @b m +'\'' - historical synonym for @b n +left-parenthesis - @b n +'(' - newline-sensitive matching (see #Matching, below) +right-parenthesis - @b p +')' - partial newline-sensitive matching (see #Matching, below) +asterisk - @b q +'*' - rest of RE - is a literal ("quoted'') string, all ordinary characters +plus-sign - @b s +'+' - non-newline-sensitive matching (usual default) +comma - @b t +',' - tight syntax (usual default; see below) +hyphen - @b w +'-' - inverse - partial newline-sensitive ("weird'') matching (see #Matching, below) +hyphen-minus - @b x +'-' - expanded syntax (see below) +period - Embedded options take effect at the @b ) terminating the - sequence. They are available only at the start of an ARE, and may not be - used later within it. - In addition to the usual (@e tight) RE syntax, in which - all characters are significant, there is an @e expanded syntax, available - in AREs with the embedded - x option. In the expanded syntax, white-space characters are ignored and - all characters between a @b # and the following newline (or the end of the - RE) are ignored, permitting paragraphing and commenting a complex RE. There - are three exceptions to that basic rule: - a white-space character or '@b #' preceded - by '@b \' is retained - white space or '@b #' within a bracket expression is retained - white space and comments are illegal within multi-character symbols like - the ARE '@b (?:' or the BRE '@b \(' +'.' - Expanded-syntax white-space characters are blank, - tab, newline, and any character that belongs to the @e space character class. - Finally, in an ARE, outside bracket expressions, the sequence '@b (?#ttt)' (where - @e ttt is any text not containing a '@b )') is a comment, completely ignored. Again, - this is not allowed between the characters of multi-character symbols like - '@b (?:'. Such comments are more a historical artifact than a useful facility, - and their use is deprecated; use the expanded syntax instead. - @e None of these - metasyntax extensions is available if the application (or an initial @b ***= - director) has specified that the user's input be treated as a literal string - rather than as an RE. - @section wxresynmatching Matching - @ref resyn_overview - In the event that an RE could match more than - one substring of a given string, the RE matches the one starting earliest - in the string. If the RE could match more than one substring starting at - that point, its choice is determined by its @e preference: either the longest - substring, or the shortest. - Most atoms, and all constraints, have no preference. - A parenthesized RE has the same preference (possibly none) as the RE. A - quantified atom with quantifier @b {m} or @b {m}? has the same preference (possibly - none) as the atom itself. A quantified atom with other normal quantifiers - (including @b {m,n} with @e m equal to @e n) prefers longest match. A quantified - atom with other non-greedy quantifiers (including @b {m,n}? with @e m equal to - @e n) prefers shortest match. A branch has the same preference as the first - quantified atom in it which has a preference. An RE consisting of two or - more branches connected by the @b | operator prefers longest match. - Subject to the constraints imposed by the rules for matching the whole RE, subexpressions - also match the longest or shortest possible substrings, based on their - preferences, with subexpressions starting earlier in the RE taking priority - over ones starting later. Note that outer subexpressions thus take priority - over their component subexpressions. - Note that the quantifiers @b {1,1} and - @b {1,1}? can be used to force longest and shortest preference, respectively, - on a subexpression or a whole RE. - Match lengths are measured in characters, - not collating elements. An empty string is considered longer than no match - at all. For example, @b bb* matches the three middle characters - of '@b abbbc', @b (week|wee)(night|knights) - matches all ten characters of '@b weeknights', when @b (.*).* is matched against - @b abc the parenthesized subexpression matches all three characters, and when - @b (a*)* is matched against @b bc both the whole RE and the parenthesized subexpression - match an empty string. - If case-independent matching is specified, the effect - is much as if all case distinctions had vanished from the alphabet. When - an alphabetic that exists in multiple cases appears as an ordinary character - outside a bracket expression, it is effectively transformed into a bracket - expression containing both cases, so that @b x becomes '@b [xX]'. When it appears - inside a bracket expression, all case counterparts of it are added to the - bracket expression, so that @b [x] becomes @b [xX] and @b [^x] becomes '@b [^xX]'. - If newline-sensitive - matching is specified, @b . and bracket expressions using @b ^ will never match - the newline character (so that matches will never cross newlines unless - the RE explicitly arranges it) and @b ^ and @b $ will match the empty string after - and before a newline respectively, in addition to matching at beginning - and end of string respectively. ARE @b \A and @b \Z continue to match beginning - or end of string @e only. - If partial newline-sensitive matching is specified, - this affects @b . and bracket expressions as with newline-sensitive matching, - but not @b ^ and '@b $'. - If inverse partial newline-sensitive matching is specified, - this affects @b ^ and @b $ as with newline-sensitive matching, but not @b . and bracket - expressions. This isn't very useful but is provided for symmetry. - @section relimits Limits And Compatibility +full-stop - @ref resyn_overview - No particular limit is imposed on the length of REs. Programs - intended to be highly portable should not employ REs longer than 256 bytes, - as a POSIX-compliant implementation can refuse to accept such REs. - The only - feature of AREs that is actually incompatible with POSIX EREs is that @b \ - does not lose its special significance inside bracket expressions. All other - ARE features use syntax which is illegal or has undefined or unspecified - effects in POSIX EREs; the @b *** syntax of directors likewise is outside - the POSIX syntax for both BREs and EREs. - Many of the ARE extensions are - borrowed from Perl, but some have been changed to clean them up, and a - few Perl extensions are not present. Incompatibilities of note include '@b \b', - '@b \B', the lack of special treatment for a trailing newline, the addition of - complemented bracket expressions to the things affected by newline-sensitive - matching, the restrictions on parentheses and back references in lookahead - constraints, and the longest/shortest-match (rather than first-match) matching - semantics. - The matching rules for REs containing both normal and non-greedy - quantifiers have changed since early beta-test versions of this package. - (The new rules are much simpler and cleaner, but don't work as hard at guessing - the user's real intentions.) - Henry Spencer's original 1986 @e regexp package, still in widespread use, - implemented an early version of today's EREs. There are four incompatibilities between @e regexp's - near-EREs ('RREs' for short) and AREs. In roughly increasing order of significance: - In AREs, @b \ followed by an alphanumeric character is either an escape or - an error, while in RREs, it was just another way of writing the alphanumeric. - This should not be a problem because there was no reason to write such - a sequence in RREs. - @b { followed by a digit in an ARE is the beginning of - a bound, while in RREs, @b { was always an ordinary character. Such sequences - should be rare, and will often result in an error because following characters - will not look like a valid bound. - In AREs, @b \ remains a special character - within '@b []', so a literal @b \ within @b [] must be - written '@b \\'. @b \\ also gives a literal - @b \ within @b [] in RREs, but only truly paranoid programmers routinely doubled - the backslash. - AREs report the longest/shortest match for the RE, rather - than the first found in a specified search order. This may affect some RREs - which were written in the expectation that the first match would be reported. - (The careful crafting of RREs to optimize the search order for fast matching - is obsolete (AREs examine all possible matches in parallel, and their performance - is largely insensitive to their complexity) but cases where the search - order was exploited to deliberately find a match which was @e not the longest/shortest - will need rewriting.) +'.' - @section wxresynbre Basic Regular Expressions - @ref resyn_overview - BREs differ from EREs in - several respects. '@b |', '@b +', and @b ? are ordinary characters and there is no equivalent - for their functionality. The delimiters for bounds - are @b \{ and '@b \}', with @b { and - @b } by themselves ordinary characters. The parentheses for nested subexpressions - are @b \( and '@b \)', with @b ( and @b ) by themselves - ordinary characters. @b ^ is an ordinary - character except at the beginning of the RE or the beginning of a parenthesized - subexpression, @b $ is an ordinary character except at the end of the RE or - the end of a parenthesized subexpression, and @b * is an ordinary character - if it appears at the beginning of the RE or the beginning of a parenthesized - subexpression (after a possible leading '@b ^'). Finally, single-digit back references - are available, and @b \ and @b \ are synonyms - for @b [[::]] and @b [[::]] respectively; - no other escapes are available. - @section wxresynchars Regular Expression Character Names - @ref resyn_overview - Note that the character names are case sensitive. +slash +'/' - NUL +solidus - '\0' +'/' - SOH +zero - '\001' +'0' - STX +one - '\002' +'1' - ETX +two - '\003' +'2' - EOT +three - '\004' +'3' - ENQ +four - '\005' +'4' - ACK +five - '\006' +'5' - BEL +six - '\007' +'6' - alert +seven - '\007' +'7' - BS +eight - '\010' +'8' - backspace +nine - '\b' +'9' - HT +colon - '\011' +':' - tab +semicolon - '\t' +';' - LF +less-than-sign - '\012' +'' - newline +equals-sign - '\n' +'=' - VT +greater-than-sign - '\013' +'' - vertical-tab +question-mark - '\v' +'?' - FF +commercial-at - '\014' +'@' - form-feed +left-square-bracket - '\f' +'[' - CR +backslash - '\015' +'\' - carriage-return +reverse-solidus - '\r' +'\' - SO +right-square-bracket - '\016' +']' - SI +circumflex - '\017' +'^' - DLE +circumflex-accent - '\020' +'^' - DC1 +underscore - '\021' +'_' - DC2 +low-line - '\022' +'_' - DC3 +grave-accent - '\023' +''' - DC4 +left-brace - '\024' +'{' - NAK +left-curly-bracket - '\025' +'{' - SYN +vertical-line - '\026' +'|' - ETB +right-brace - '\027' +'}' - CAN +right-curly-bracket - '\030' +'}' - EM +tilde - '\031' +'~' - SUB +DEL - '\032' +'\177' - - ESC - - - - - '\033' - - - - - - IS4 - - - - - '\034' - - - - - - FS - - - - - '\034' - - - - - - IS3 - - - - - '\035' - - - - - - GS - - - - - '\035' - - - - - - IS2 - - - - - '\036' - - - - - - RS - - - - - '\036' - - - - - - IS1 - - - - - '\037' - - - - - - US - - - - - '\037' - - - - - - space - - - - - ' ' - - - - - - exclamation-mark - - - - - '!' - - - - - - quotation-mark - - - - - '"' - - - - - - number-sign - - - - - '#' - - - - - - dollar-sign - - - - - '$' - - - - - - percent-sign - - - - - '%' - - - - - - ampersand - - - - - '' - - - - - - apostrophe - - - - - '\'' - - - - - - left-parenthesis - - - - - '(' - - - - - - right-parenthesis - - - - - ')' - - - - - - asterisk - - - - - '*' - - - - - - plus-sign - - - - - '+' - - - - - - comma - - - - - ',' - - - - - - hyphen - - - - - '-' - - - - - - hyphen-minus - - - - - '-' - - - - - - period - - - - - '.' - - - - - - full-stop - - - - - '.' - - - - - - slash - - - - - '/' - - - - - - solidus - - - - - '/' - - - - - - zero - - - - - '0' - - - - - - one - - - - - '1' - - - - - - two - - - - - '2' - - - - - - three - - - - - '3' - - - - - - four - - - - - '4' - - - - - - five - - - - - '5' - - - - - - six - - - - - '6' - - - - - - seven - - - - - '7' - - - - - - eight - - - - - '8' - - - - - - nine - - - - - '9' - - - - - - colon - - - - - ':' - - - - - - semicolon - - - - - ';' - - - - - - less-than-sign - - - - - '' - - - - - - equals-sign - - - - - '=' - - - - - - greater-than-sign - - - - - '' - - - - - - question-mark - - - - - '?' - - - - - - commercial-at - - - - - '@' - - - - - - left-square-bracket - - - - - '[' - - - - - - backslash - - - - - '\' - - - - - - reverse-solidus - - - - - '\' - - - - - - right-square-bracket - - - - - ']' - - - - - - circumflex - - - - - '^' - - - - - - circumflex-accent - - - - - '^' - - - - - - underscore - - - - - '_' - - - - - - low-line - - - - - '_' - - - - - - grave-accent - - - - - ''' - - - - - - left-brace - - - - - '{' - - - - - - left-curly-bracket - - - - - '{' - - - - - - vertical-line - - - - - '|' - - - - - - right-brace - - - - - '}' - - - - - - right-curly-bracket - - - - - '}' - - - - - - tilde - - - - - '~' - - - - - - DEL - - - - - '\177' - - */ - +*/