2 '\" Copyright (c) 1998 Sun Microsystems, Inc.
 
   3 '\" Copyright (c) 1999 Scriptics Corporation
 
   5 '\" This software is copyrighted by the Regents of the University of
 
   6 '\" California, Sun Microsystems, Inc., Scriptics Corporation, ActiveState
 
   7 '\" Corporation and other parties.  The following terms apply to all files
 
   8 '\" associated with the software unless explicitly disclaimed in
 
  11 '\" The authors hereby grant permission to use, copy, modify, distribute,
 
  12 '\" and license this software and its documentation for any purpose, provided
 
  13 '\" that existing copyright notices are retained in all copies and that this
 
  14 '\" notice is included verbatim in any distributions. No written agreement,
 
  15 '\" license, or royalty fee is required for any of the authorized uses.
 
  16 '\" Modifications to this software may be copyrighted by their authors
 
  17 '\" and need not follow the licensing terms described here, provided that
 
  18 '\" the new terms are clearly indicated on the first page of each file where
 
  21 '\" IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY
 
  22 '\" FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
 
  23 '\" ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY
 
  24 '\" DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE
 
  25 '\" POSSIBILITY OF SUCH DAMAGE.
 
  27 '\" THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
 
  28 '\" INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY,
 
  29 '\" FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.  THIS SOFTWARE
 
  30 '\" IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE
 
  31 '\" NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR
 
  34 '\" GOVERNMENT USE: If you are acquiring this software on behalf of the
 
  35 '\" U.S. government, the Government shall have only "Restricted Rights"
 
  36 '\" in the software and related documentation as defined in the Federal 
 
  37 '\" Acquisition Regulations (FARs) in Clause 52.227.19 (c) (2).  If you
 
  38 '\" are acquiring the software on behalf of the Department of Defense, the
 
  39 '\" software shall be classified as "Commercial Computer Software" and the
 
  40 '\" Government shall have only "Restricted Rights" as defined in Clause
 
  41 '\" 252.227-7013 (c) (1) of DFARs.  Notwithstanding the foregoing, the
 
  42 '\" authors grant the U.S. Government and others acting in its behalf
 
  43 '\" permission to use and distribute the software in accordance with the
 
  44 '\" terms specified in this license. 
 
  46 '\" RCS: @(#) Id: re_syntax.n,v 1.3 1999/07/14 19:09:36 jpeek Exp 
 
  49 .TH re_syntax n "8.1" Tcl "Tcl Built-In Commands"
 
  52 re_syntax \- Syntax of Tcl regular expressions.
 
  57 A \fIregular expression\fR describes strings of characters.
 
  58 It's a pattern that matches certain strings and doesn't match others.
 
  60 .SH "DIFFERENT FLAVORS OF REs"
 
  61 Regular expressions (``RE''s), as defined by POSIX, come in two
 
  62 flavors: \fIextended\fR REs (``EREs'') and \fIbasic\fR REs (``BREs'').
 
  63 EREs are roughly those of the traditional \fIegrep\fR, while BREs are
 
  64 roughly those of the traditional \fIed\fR.  This implementation adds
 
  65 a third flavor, \fIadvanced\fR REs (``AREs''), basically EREs with
 
  66 some significant extensions.
 
  68 This manual page primarily describes AREs.  BREs mostly exist for
 
  69 backward compatibility in some old programs; they will be discussed at
 
  70 the end.  POSIX EREs are almost an exact subset of AREs.  Features of
 
  71 AREs that are not present in EREs will be indicated.
 
  73 .SH "REGULAR EXPRESSION SYNTAX"
 
  75 Tcl regular expressions are implemented using the package written by
 
  76 Henry Spencer, based on the 1003.2 spec and some (not quite all) of
 
  77 the Perl5 extensions (thanks, Henry!).  Much of the description of
 
  78 regular expressions below is copied verbatim from his manual entry.
 
  80 An ARE is one or more \fIbranches\fR,
 
  81 separated by `\fB|\fR',
 
  82 matching anything that matches any of the branches.
 
  84 A branch is zero or more \fIconstraints\fR or \fIquantified atoms\fR,
 
  86 It matches a match for the first, followed by a match for the second, etc;
 
  87 an empty branch matches the empty string.
 
  89 A quantified atom is an \fIatom\fR possibly followed
 
  90 by a single \fIquantifier\fR.
 
  91 Without a quantifier, it matches a match for the atom.
 
  93 and what a so-quantified atom matches, are:
 
  97 a sequence of 0 or more matches of the atom
 
 100 a sequence of 1 or more matches of the atom
 
 103 a sequence of 0 or 1 matches of the atom
 
 106 a sequence of exactly \fIm\fR matches of the atom
 
 109 a sequence of \fIm\fR or more matches of the atom
 
 111 \fB{\fIm\fB,\fIn\fB}\fR
 
 112 a sequence of \fIm\fR through \fIn\fR (inclusive) matches of the atom;
 
 113 \fIm\fR may not exceed \fIn\fR
 
 115 \fB*?  +?  ??  {\fIm\fB}?  {\fIm\fB,}?  {\fIm\fB,\fIn\fB}?\fR
 
 116 \fInon-greedy\fR quantifiers,
 
 117 which match the same possibilities,
 
 118 but prefer the smallest number rather than the largest number
 
 119 of matches (see MATCHING)
 
 124 are known as \fIbound\fRs.
 
 126 \fIm\fR and \fIn\fR are unsigned decimal integers
 
 127 with permissible values from 0 to 255 inclusive.
 
 133 (where \fIre\fR is any regular expression)
 
 135 \fIre\fR, with the match noted for possible reporting
 
 139 but does no reporting
 
 140 (a ``non-capturing'' set of parentheses)
 
 143 matches an empty string,
 
 144 noted for possible reporting
 
 147 matches an empty string,
 
 151 a \fIbracket expression\fR,
 
 152 matching any one of the \fIchars\fR (see BRACKET EXPRESSIONS for more detail)
 
 155 matches any single character
 
 158 (where \fIk\fR is a non-alphanumeric character)
 
 159 matches that character taken as an ordinary character,
 
 160 e.g. \e\e matches a backslash character
 
 163 where \fIc\fR is alphanumeric
 
 164 (possibly followed by other characters),
 
 165 an \fIescape\fR (AREs only),
 
 169 when followed by a character other than a digit,
 
 170 matches the left-brace character `\fB{\fR';
 
 171 when followed by a digit, it is the beginning of a
 
 172 \fIbound\fR (see above)
 
 176 a single character with no other significance, matches that character.
 
 179 A \fIconstraint\fR matches an empty string when specific conditions
 
 181 A constraint may not be followed by a quantifier.
 
 182 The simple constraints are as follows; some more constraints are
 
 183 described later, under ESCAPES.
 
 187 matches at the beginning of a line
 
 190 matches at the end of a line
 
 193 \fIpositive lookahead\fR (AREs only), matches at any point
 
 194 where a substring matching \fIre\fR begins
 
 197 \fInegative lookahead\fR (AREs only), matches at any point
 
 198 where no substring matching \fIre\fR begins
 
 201 The lookahead constraints may not contain back references (see later),
 
 202 and all parentheses within them are considered non-capturing.
 
 204 An RE may not end with `\fB\e\fR'.
 
 206 .SH "BRACKET EXPRESSIONS"
 
 207 A \fIbracket expression\fR is a list of characters enclosed in `\fB[\|]\fR'.
 
 208 It normally matches any single character from the list (but see below).
 
 209 If the list begins with `\fB^\fR',
 
 210 it matches any single character
 
 211 (but see below) \fInot\fR from the rest of the list.
 
 213 If two characters in the list are separated by `\fB\-\fR',
 
 215 for the full \fIrange\fR of characters between those two (inclusive) in the
 
 219 in ASCII matches any decimal digit.
 
 220 Two ranges may not share an
 
 224 Ranges are very collating-sequence-dependent,
 
 225 and portable programs should avoid relying on them.
 
 232 the simplest method is to
 
 234 \fB[.\fR and \fB.]\fR
 
 235 to make it a collating element (see below).
 
 237 make it the first character
 
 238 (following a possible `\fB^\fR'),
 
 239 or (AREs only) precede it with `\fB\e\fR'.
 
 240 Alternatively, for `\fB\-\fR',
 
 241 make it the last character,
 
 242 or the second endpoint of a range.
 
 245 as the first endpoint of a range,
 
 246 make it a collating element
 
 247 or (AREs only) precede it with `\fB\e\fR'.
 
 248 With the exception of these, some combinations using
 
 251 paragraphs), and escapes,
 
 252 all other special characters lose their
 
 253 special significance within a bracket expression.
 
 255 Within a bracket expression, a collating element (a character,
 
 256 a multi-character sequence that collates as if it were a single character,
 
 257 or a collating-sequence name for either)
 
 259 \fB[.\fR and \fB.]\fR
 
 261 sequence of characters of that collating element.
 
 262 The sequence is a single element of the bracket expression's list.
 
 263 A bracket expression in a locale that has
 
 264 multi-character collating elements
 
 265 can thus match more than one character.
 
 267 So (insidiously), a bracket expression that starts with \fB^\fR
 
 268 can match multi-character collating elements even if none of them
 
 269 appear in the bracket expression!
 
 270 (\fINote:\fR Tcl currently has no multi-character collating elements.
 
 271 This information is only for illustration.)
 
 273 For example, assume the collating sequence includes a \fBch\fR
 
 274 multi-character collating element.
 
 275 Then the RE \fB[[.ch.]]*c\fR (zero or more \fBch\fP's followed by \fBc\fP)
 
 276 matches the first five characters of `\fBchchcc\fR'.
 
 277 Also, the RE \fB[^c]b\fR matches all of `\fBchb\fR'
 
 278 (because \fB[^c]\fR matches the multi-character \fBch\fR).
 
 281 Within a bracket expression, a collating element enclosed in
 
 285 is an equivalence class, standing for the sequences of characters
 
 286 of all collating elements equivalent to that one, including itself.
 
 287 (If there are no other equivalent collating elements,
 
 288 the treatment is as if the enclosing delimiters were `\fB[.\fR'\&
 
 294 are the members of an equivalence class,
 
 295 then `\fB[[=o=]]\fR', `\fB[[=\o'o^'=]]\fR',
 
 296 and `\fB[o\o'o^']\fR'\&
 
 298 An equivalence class may not be an endpoint
 
 302 Tcl currently implements only the Unicode locale.
 
 303 It doesn't define any equivalence classes.
 
 304 The examples above are just illustrations.)
 
 307 Within a bracket expression, the name of a \fIcharacter class\fR enclosed
 
 312 stands for the list of all characters
 
 313 (not all collating elements!)
 
 316 Standard character classes are:
 
 322 \fBalpha\fR     A letter. 
 
 323 \fBupper\fR     An upper-case letter. 
 
 324 \fBlower\fR     A lower-case letter. 
 
 325 \fBdigit\fR     A decimal digit. 
 
 326 \fBxdigit\fR    A hexadecimal digit. 
 
 327 \fBalnum\fR     An alphanumeric (letter or digit). 
 
 328 \fBprint\fR     An alphanumeric (same as alnum).
 
 329 \fBblank\fR     A space or tab character.
 
 330 \fBspace\fR     A character producing white space in displayed text. 
 
 331 \fBpunct\fR     A punctuation character. 
 
 332 \fBgraph\fR     A character with a visible representation. 
 
 333 \fBcntrl\fR     A control character. 
 
 337 A locale may provide others.
 
 339 (Note that the current Tcl implementation has only one locale:
 
 342 A character class may not be used as an endpoint of a range.
 
 344 There are two special cases of bracket expressions:
 
 345 the bracket expressions
 
 349 are constraints, matching empty strings at
 
 350 the beginning and end of a word respectively.
 
 351 '\" note, discussion of escapes below references this definition of word
 
 352 A word is defined as a sequence of
 
 354 that is neither preceded nor followed by
 
 356 A word character is an
 
 361 These special bracket expressions are deprecated;
 
 362 users of AREs should use constraint escapes instead (see below).
 
 364 Escapes (AREs only), which begin with a
 
 366 followed by an alphanumeric character,
 
 367 come in several varieties:
 
 368 character entry, class shorthands, constraint escapes, and back references.
 
 371 followed by an alphanumeric character but not constituting
 
 372 a valid escape is illegal in AREs.
 
 373 In EREs, there are no escapes:
 
 374 outside a bracket expression,
 
 377 followed by an alphanumeric character merely stands for that
 
 378 character as an ordinary character,
 
 379 and inside a bracket expression,
 
 381 is an ordinary character.
 
 382 (The latter is the one actual incompatibility between EREs and AREs.)
 
 384 Character-entry escapes (AREs only) exist to make it easier to specify
 
 385 non-printing and otherwise inconvenient characters in REs:
 
 389 alert (bell) character, as in C
 
 397 to help reduce backslash doubling in some
 
 398 applications where there are multiple levels of backslash processing
 
 401 (where X is any character) the character whose
 
 402 low-order 5 bits are the same as those of
 
 404 and whose other bits are all zero
 
 407 the character whose collating-sequence name
 
 409 or failing that, the character with octal value 033
 
 418 carriage return, as in C
 
 421 horizontal tab, as in C
 
 426 is exactly four hexadecimal digits)
 
 427 the Unicode character
 
 429 in the local byte ordering
 
 434 is exactly eight hexadecimal digits)
 
 435 reserved for a somewhat-hypothetical Unicode extension to 32 bits
 
 438 vertical tab, as in C
 
 444 is any sequence of hexadecimal digits)
 
 445 the character whose hexadecimal value is
 
 447 (a single character no matter how many hexadecimal digits are used).
 
 450 the character whose value is
 
 456 is exactly two octal digits,
 
 458 \fIback reference\fR (see below))
 
 459 the character whose octal value is
 
 465 is exactly three octal digits,
 
 467 back reference (see below))
 
 468 the character whose octal value is
 
 472 Hexadecimal digits are `\fB0\fR'-`\fB9\fR', `\fBa\fR'-`\fBf\fR',
 
 473 and `\fBA\fR'-`\fBF\fR'.
 
 474 Octal digits are `\fB0\fR'-`\fB7\fR'.
 
 476 The character-entry escapes are always taken as ordinary characters.
 
 484 does not terminate a bracket expression.
 
 485 Beware, however, that some applications (e.g., C compilers) interpret 
 
 486 such sequences themselves before the regular-expression package
 
 487 gets to see them, which may require doubling (quadrupling, etc.) the `\fB\e\fR'.
 
 489 Class-shorthand escapes (AREs only) provide shorthands for certain commonly-used
 
 514 Within bracket expressions, `\fB\ed\fR', `\fB\es\fR',
 
 516 lose their outer brackets,
 
 517 and `\fB\eD\fR', `\fB\eS\fR',
 
 521 (So, for example, \fB[a-c\ed]\fR is equivalent to \fB[a-c[:digit:]]\fR.
 
 522 Also, \fB[a-c\eD]\fR, which is equivalent to \fB[a-c^[:digit:]]\fR, is illegal.)
 
 525 A constraint escape (AREs only) is a constraint,
 
 526 matching the empty string if specific conditions are met,
 
 527 written as an escape:
 
 531 matches only at the beginning of the string
 
 532 (see MATCHING, below, for how this differs from `\fB^\fR')
 
 535 matches only at the beginning of a word
 
 538 matches only at the end of a word
 
 541 matches only at the beginning or end of a word
 
 544 matches only at a point that is not the beginning or end of a word
 
 547 matches only at the end of the string
 
 548 (see MATCHING, below, for how this differs from `\fB$\fR')
 
 553 is a nonzero digit) a \fIback reference\fR, see below
 
 558 is a nonzero digit, and
 
 561 and the decimal value
 
 563 is not greater than the number of closing capturing parentheses seen so far)
 
 564 a \fIback reference\fR, see below
 
 567 A word is defined as in the specification of
 
 572 Constraint escapes are illegal within bracket expressions.
 
 574 A back reference (AREs only) matches the same string matched by the parenthesized
 
 575 subexpression specified by the number,
 
 583 The subexpression must entirely precede the back reference in the RE.
 
 584 Subexpressions are numbered in the order of their leading parentheses.
 
 585 Non-capturing parentheses do not define subexpressions.
 
 587 There is an inherent historical ambiguity between octal character-entry 
 
 588 escapes and back references, which is resolved by heuristics,
 
 590 A leading zero always indicates an octal escape.
 
 591 A single non-zero digit, not followed by another digit,
 
 592 is always taken as a back reference.
 
 593 A multi-digit sequence not starting with a zero is taken as a back 
 
 594 reference if it comes after a suitable subexpression
 
 595 (i.e. the number is in the legal range for a back reference),
 
 596 and otherwise is taken as octal.
 
 598 In addition to the main syntax described above, there are some special
 
 599 forms and miscellaneous syntactic facilities available.
 
 601 Normally the flavor of RE being used is specified by
 
 602 application-dependent means.
 
 603 However, this can be overridden by a \fIdirector\fR.
 
 604 If an RE of any flavor begins with `\fB***:\fR',
 
 605 the rest of the RE is an ARE.
 
 606 If an RE of any flavor begins with `\fB***=\fR',
 
 607 the rest of the RE is taken to be a literal string,
 
 608 with all characters considered ordinary characters.
 
 610 An ARE may begin with \fIembedded options\fR:
 
 615 is one or more alphabetic characters)
 
 616 specifies options affecting the rest of the RE.
 
 617 These supplement, and can override,
 
 618 any options specified by the application.
 
 619 The available option letters are:
 
 626 case-sensitive matching (usual default)
 
 632 case-insensitive matching (see MATCHING, below)
 
 635 historical synonym for
 
 639 newline-sensitive matching (see MATCHING, below)
 
 642 partial newline-sensitive matching (see MATCHING, below)
 
 645 rest of RE is a literal (``quoted'') string, all ordinary characters
 
 648 non-newline-sensitive matching (usual default)
 
 651 tight syntax (usual default; see below)
 
 654 inverse partial newline-sensitive (``weird'') matching (see MATCHING, below)
 
 657 expanded syntax (see below)
 
 660 Embedded options take effect at the
 
 662 terminating the sequence.
 
 663 They are available only at the start of an ARE,
 
 664 and may not be used later within it.
 
 666 In addition to the usual (\fItight\fR) RE syntax, in which all characters are
 
 667 significant, there is an \fIexpanded\fR syntax,
 
 668 available in all flavors of RE
 
 669 with the \fB-expanded\fR switch, or in AREs with the embedded x option.
 
 670 In the expanded syntax,
 
 671 white-space characters are ignored
 
 672 and all characters between a
 
 674 and the following newline (or the end of the RE) are ignored,
 
 675 permitting paragraphing and commenting a complex RE.
 
 676 There are three exceptions to that basic rule:
 
 679 a white-space character or `\fB#\fR' preceded by `\fB\e\fR' is retained
 
 681 white space or `\fB#\fR' within a bracket expression is retained
 
 683 white space and comments are illegal within multi-character symbols
 
 684 like the ARE `\fB(?:\fR' or the BRE `\fB\e(\fR'
 
 687 Expanded-syntax white-space characters are blank, tab, newline, and
 
 689 any character that belongs to the \fIspace\fR character class.
 
 693 outside bracket expressions, the sequence `\fB(?#\fIttt\fB)\fR'
 
 696 is any text not containing a `\fB)\fR')
 
 699 Again, this is not allowed between the characters of
 
 700 multi-character symbols like `\fB(?:\fR'.
 
 701 Such comments are more a historical artifact than a useful facility,
 
 702 and their use is deprecated;
 
 703 use the expanded syntax instead.
 
 705 \fINone\fR of these metasyntax extensions is available if the application
 
 709 has specified that the user's input be treated as a literal string
 
 710 rather than as an RE.
 
 712 In the event that an RE could match more than one substring of a given
 
 714 the RE matches the one starting earliest in the string.
 
 715 If the RE could match more than one substring starting at that point,
 
 716 its choice is determined by its \fIpreference\fR:
 
 717 either the longest substring, or the shortest.
 
 719 Most atoms, and all constraints, have no preference.
 
 720 A parenthesized RE has the same preference (possibly none) as the RE.
 
 721 A quantified atom with quantifier
 
 725 has the same preference (possibly none) as the atom itself.
 
 726 A quantified atom with other normal quantifiers (including
 
 727 \fB{\fIm\fB,\fIn\fB}\fR
 
 732 prefers longest match.
 
 733 A quantified atom with other non-greedy quantifiers (including
 
 734 \fB{\fIm\fB,\fIn\fB}?\fR
 
 739 prefers shortest match.
 
 740 A branch has the same preference as the first quantified atom in it
 
 741 which has a preference.
 
 742 An RE consisting of two or more branches connected by the
 
 744 operator prefers longest match.
 
 746 Subject to the constraints imposed by the rules for matching the whole RE,
 
 747 subexpressions also match the longest or shortest possible substrings,
 
 748 based on their preferences,
 
 749 with subexpressions starting earlier in the RE taking priority over
 
 751 Note that outer subexpressions thus take priority over
 
 752 their component subexpressions.
 
 754 Note that the quantifiers
 
 758 can be used to force longest and shortest preference, respectively,
 
 759 on a subexpression or a whole RE.
 
 761 Match lengths are measured in characters, not collating elements.
 
 762 An empty string is considered longer than no match at all.
 
 765 matches the three middle characters of `\fBabbbc\fR',
 
 766 \fB(week|wee)(night|knights)\fR
 
 767 matches all ten characters of `\fBweeknights\fR',
 
 772 the parenthesized subexpression
 
 773 matches all three characters, and
 
 778 both the whole RE and the parenthesized
 
 779 subexpression match an empty string.
 
 781 If case-independent matching is specified,
 
 782 the effect is much as if all case distinctions had vanished from the
 
 784 When an alphabetic that exists in multiple cases appears as an
 
 785 ordinary character outside a bracket expression, it is effectively
 
 786 transformed into a bracket expression containing both cases,
 
 789 becomes `\fB[xX]\fR'.
 
 790 When it appears inside a bracket expression, all case counterparts
 
 791 of it are added to the bracket expression, so that
 
 797 becomes `\fB[^xX]\fR'.
 
 799 If newline-sensitive matching is specified, \fB.\fR
 
 800 and bracket expressions using
 
 802 will never match the newline character
 
 803 (so that matches will never cross newlines unless the RE
 
 804 explicitly arranges it)
 
 809 will match the empty string after and before a newline
 
 810 respectively, in addition to matching at beginning and end of string
 
 816 continue to match beginning or end of string \fIonly\fR.
 
 818 If partial newline-sensitive matching is specified,
 
 820 and bracket expressions
 
 821 as with newline-sensitive matching, but not
 
 825 If inverse partial newline-sensitive matching is specified,
 
 831 newline-sensitive matching,
 
 833 and bracket expressions.
 
 834 This isn't very useful but is provided for symmetry.
 
 835 .SH "LIMITS AND COMPATIBILITY"
 
 836 No particular limit is imposed on the length of REs.
 
 837 Programs intended to be highly portable should not employ REs longer
 
 839 as a POSIX-compliant implementation can refuse to accept such REs.
 
 841 The only feature of AREs that is actually incompatible with
 
 844 does not lose its special
 
 845 significance inside bracket expressions.
 
 846 All other ARE features use syntax which is illegal or has
 
 847 undefined or unspecified effects in POSIX EREs;
 
 850 syntax of directors likewise is outside the POSIX
 
 851 syntax for both BREs and EREs.
 
 853 Many of the ARE extensions are borrowed from Perl, but some have
 
 854 been changed to clean them up, and a few Perl extensions are not present.
 
 855 Incompatibilities of note include `\fB\eb\fR', `\fB\eB\fR',
 
 856 the lack of special treatment for a trailing newline,
 
 857 the addition of complemented bracket expressions to the things
 
 858 affected by newline-sensitive matching,
 
 859 the restrictions on parentheses and back references in lookahead constraints,
 
 860 and the longest/shortest-match (rather than first-match) matching semantics.
 
 862 The matching rules for REs containing both normal and non-greedy quantifiers
 
 863 have changed since early beta-test versions of this package.
 
 864 (The new rules are much simpler and cleaner,
 
 865 but don't work as hard at guessing the user's real intentions.)
 
 867 Henry Spencer's original 1986 \fIregexp\fR package,
 
 868 still in widespread use (e.g., in pre-8.1 releases of Tcl),
 
 869 implemented an early version of today's EREs.
 
 870 There are four incompatibilities between \fIregexp\fR's near-EREs
 
 871 (`RREs' for short) and AREs.
 
 872 In roughly increasing order of significance:
 
 877 followed by an alphanumeric character is either an
 
 879 while in RREs, it was just another way of writing the 
 
 881 This should not be a problem because there was no reason to write
 
 882 such a sequence in RREs.
 
 885 followed by a digit in an ARE is the beginning of a bound,
 
 888 was always an ordinary character.
 
 889 Such sequences should be rare,
 
 890 and will often result in an error because following characters
 
 891 will not look like a valid bound.
 
 895 remains a special character within `\fB[\|]\fR',
 
 900 must be written `\fB\e\e\fR'.
 
 907 but only truly paranoid programmers routinely doubled the backslash.
 
 909 AREs report the longest/shortest match for the RE,
 
 910 rather than the first found in a specified search order.
 
 911 This may affect some RREs which were written in the expectation that
 
 912 the first match would be reported.
 
 913 (The careful crafting of RREs to optimize the search order for fast
 
 914 matching is obsolete (AREs examine all possible matches
 
 915 in parallel, and their performance is largely insensitive to their
 
 916 complexity) but cases where the search order was exploited to deliberately 
 
 917 find a match which was \fInot\fR the longest/shortest will need rewriting.)
 
 920 .SH "BASIC REGULAR EXPRESSIONS"
 
 921 BREs differ from EREs in several respects.  `\fB|\fR', `\fB+\fR',
 
 924 are ordinary characters and there is no equivalent
 
 925 for their functionality.
 
 926 The delimiters for bounds are
 
 933 by themselves ordinary characters.
 
 934 The parentheses for nested subexpressions are
 
 941 by themselves ordinary characters.
 
 943 is an ordinary character except at the beginning of the
 
 944 RE or the beginning of a parenthesized subexpression,
 
 946 is an ordinary character except at the end of the
 
 947 RE or the end of a parenthesized subexpression,
 
 950 is an ordinary character if it appears at the beginning of the
 
 951 RE or the beginning of a parenthesized subexpression
 
 952 (after a possible leading `\fB^\fR').
 
 954 single-digit back references are available,
 
 964 no other escapes are available.
 
 967 RegExp(3), regexp(n), regsub(n), lsearch(n), switch(n), text(n)
 
 970 match, regular expression, string