]>
Commit | Line | Data |
---|---|---|
15b6757b | 1 | ///////////////////////////////////////////////////////////////////////////// |
72844950 | 2 | // Name: resyntax.h |
15b6757b FM |
3 | // Purpose: topic overview |
4 | // Author: wxWidgets team | |
5 | // RCS-ID: $Id$ | |
6 | // Licence: wxWindows license | |
7 | ///////////////////////////////////////////////////////////////////////////// | |
8 | ||
9 | /*! | |
36c9828f | 10 | |
72844950 | 11 | @page overview_resyntax Syntax of the Built-in Regular Expression Library |
36c9828f | 12 | |
72844950 BP |
13 | A <em>regular expression</em> describes strings of characters. It's a pattern |
14 | that matches certain strings and doesn't match others. | |
36c9828f | 15 | |
72844950 | 16 | @seealso #wxRegEx |
36c9828f | 17 | |
72844950 BP |
18 | @li @ref overview_resyntax_differentflavors |
19 | @li @ref overview_resyntax_syntax | |
20 | @li @ref overview_resyntax_bracket | |
21 | @li @ref overview_resyntax_escapes | |
22 | @li @ref overview_resyntax_metasyntax | |
23 | @li @ref overview_resyntax_matching | |
24 | @li @ref overview_resyntax_limits | |
25 | @li @ref overview_resyntax_bre | |
26 | @li @ref overview_resyntax_characters | |
36c9828f | 27 | |
36c9828f | 28 | |
72844950 | 29 | <hr> |
36c9828f | 30 | |
36c9828f | 31 | |
72844950 | 32 | @section overview_resyntax_differentflavors Different Flavors of REs |
36c9828f | 33 | |
72844950 BP |
34 | Regular expressions ("RE''s), as defined by POSIX, come in two |
35 | flavors: @e extended REs ("EREs'') and @e basic REs ("BREs''). EREs are roughly those | |
36 | of the traditional @e egrep, while BREs are roughly those of the traditional | |
37 | @e ed. This implementation adds a third flavor, @e advanced REs ("AREs''), basically | |
38 | EREs with some significant extensions. | |
39 | This manual page primarily describes | |
40 | AREs. BREs mostly exist for backward compatibility in some old programs; | |
41 | they will be discussed at the #end. POSIX EREs are almost an exact subset | |
42 | of AREs. Features of AREs that are not present in EREs will be indicated. | |
36c9828f FM |
43 | |
44 | ||
72844950 | 45 | @section overview_resyntax_syntax Regular Expression Syntax |
36c9828f | 46 | |
72844950 BP |
47 | These regular expressions are implemented using |
48 | the package written by Henry Spencer, based on the 1003.2 spec and some | |
49 | (not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description | |
50 | of regular expressions below is copied verbatim from his manual entry. | |
51 | An ARE is one or more @e branches, separated by '@b |', matching anything that matches | |
52 | any of the branches. | |
53 | A branch is zero or more @e constraints or @e quantified | |
54 | atoms, concatenated. It matches a match for the first, followed by a match | |
55 | for the second, etc; an empty branch matches the empty string. | |
56 | A quantified atom is an @e atom possibly followed by a single @e quantifier. Without a quantifier, | |
57 | it matches a match for the atom. The quantifiers, and what a so-quantified | |
58 | atom matches, are: | |
36c9828f | 59 | |
36c9828f FM |
60 | |
61 | ||
72844950 | 62 | @b * |
36c9828f | 63 | |
72844950 | 64 | a sequence of 0 or more matches of the atom |
36c9828f | 65 | |
72844950 | 66 | @b + |
36c9828f | 67 | |
72844950 | 68 | a sequence of 1 or more matches of the atom |
36c9828f | 69 | |
72844950 | 70 | @b ? |
36c9828f | 71 | |
72844950 | 72 | a sequence of 0 or 1 matches of the atom |
36c9828f | 73 | |
72844950 | 74 | @b {m} |
36c9828f | 75 | |
72844950 | 76 | a sequence of exactly @e m matches of the atom |
36c9828f | 77 | |
72844950 | 78 | @b {m,} |
36c9828f | 79 | |
72844950 | 80 | a sequence of @e m or more matches of the atom |
36c9828f | 81 | |
72844950 | 82 | @b {m,n} |
36c9828f | 83 | |
72844950 BP |
84 | a sequence of @e m through @e n (inclusive) |
85 | matches of the atom; @e m may not exceed @e n | |
36c9828f | 86 | |
72844950 | 87 | @b *? +? ?? {m}? {m,}? {m,n}? |
36c9828f | 88 | |
72844950 BP |
89 | @e non-greedy quantifiers, |
90 | which match the same possibilities, but prefer the | |
91 | smallest number rather than the largest number of matches (see #Matching) | |
36c9828f | 92 | |
72844950 BP |
93 | The forms using @b { and @b } are known as @e bounds. The numbers @e m and @e n are unsigned |
94 | decimal integers with permissible values from 0 to 255 inclusive. | |
95 | An atom is one of: | |
36c9828f | 96 | |
72844950 | 97 | @b (re) |
36c9828f | 98 | |
72844950 BP |
99 | (where @e re is any regular expression) matches a match for |
100 | @e re, with the match noted for possible reporting | |
36c9828f | 101 | |
72844950 | 102 | @b (?:re) |
36c9828f | 103 | |
72844950 BP |
104 | as previous, but |
105 | does no reporting (a "non-capturing'' set of parentheses) | |
36c9828f | 106 | |
72844950 | 107 | @b () |
36c9828f | 108 | |
72844950 BP |
109 | matches an empty |
110 | string, noted for possible reporting | |
36c9828f | 111 | |
72844950 | 112 | @b (?:) |
36c9828f | 113 | |
72844950 | 114 | matches an empty string, without reporting |
36c9828f | 115 | |
72844950 | 116 | @b [chars] |
36c9828f | 117 | |
72844950 BP |
118 | a @e bracket expression, matching any one of the @e chars |
119 | (see @ref resynbracket_overview for more detail) | |
36c9828f | 120 | |
72844950 | 121 | @b . |
36c9828f | 122 | |
72844950 | 123 | matches any single character |
36c9828f | 124 | |
72844950 | 125 | @b \k |
36c9828f | 126 | |
72844950 BP |
127 | (where @e k is a non-alphanumeric character) |
128 | matches that character taken as an ordinary character, e.g. \\ matches a backslash | |
129 | character | |
36c9828f | 130 | |
72844950 | 131 | @b \c |
36c9828f | 132 | |
72844950 BP |
133 | where @e c is alphanumeric (possibly followed by other characters), |
134 | an @e escape (AREs only), see #Escapes below | |
36c9828f | 135 | |
72844950 | 136 | @b { |
36c9828f | 137 | |
72844950 BP |
138 | when followed by a character |
139 | other than a digit, matches the left-brace character '@b {'; when followed by | |
140 | a digit, it is the beginning of a @e bound (see above) | |
36c9828f | 141 | |
72844950 | 142 | @b x |
36c9828f | 143 | |
72844950 BP |
144 | where @e x is a single |
145 | character with no other significance, matches that character. | |
36c9828f | 146 | |
72844950 BP |
147 | A @e constraint matches an empty string when specific conditions are met. A constraint may |
148 | not be followed by a quantifier. The simple constraints are as follows; | |
149 | some more constraints are described later, under #Escapes. | |
36c9828f | 150 | |
72844950 | 151 | @b ^ |
36c9828f | 152 | |
72844950 | 153 | matches at the beginning of a line |
36c9828f | 154 | |
72844950 | 155 | @b $ |
36c9828f | 156 | |
72844950 | 157 | matches at the end of a line |
36c9828f | 158 | |
72844950 | 159 | @b (?=re) |
36c9828f | 160 | |
72844950 BP |
161 | @e positive lookahead |
162 | (AREs only), matches at any point where a substring matching @e re begins | |
36c9828f | 163 | |
72844950 | 164 | @b (?!re) |
36c9828f | 165 | |
72844950 BP |
166 | @e negative lookahead (AREs only), |
167 | matches at any point where no substring matching @e re begins | |
36c9828f FM |
168 | |
169 | ||
170 | ||
72844950 BP |
171 | The lookahead constraints may not contain back references |
172 | (see later), and all parentheses within them are considered non-capturing. | |
173 | An RE may not end with '@b \'. | |
36c9828f | 174 | |
36c9828f | 175 | |
72844950 | 176 | @section overview_resyntax_bracket Bracket Expressions |
36c9828f | 177 | |
72844950 BP |
178 | A @e bracket expression is a list |
179 | of characters enclosed in '@b []'. It normally matches any single character from | |
180 | the list (but see below). If the list begins with '@b ^', it matches any single | |
181 | character (but see below) @e not from the rest of the list. | |
182 | If two characters | |
183 | in the list are separated by '@b -', this is shorthand for the full @e range of | |
184 | characters between those two (inclusive) in the collating sequence, e.g. | |
185 | @b [0-9] in ASCII matches any decimal digit. Two ranges may not share an endpoint, | |
186 | so e.g. @b a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable | |
187 | programs should avoid relying on them. | |
188 | To include a literal @b ] or @b - in the | |
189 | list, the simplest method is to enclose it in @b [. and @b .] to make it a collating | |
190 | element (see below). Alternatively, make it the first character (following | |
191 | a possible '@b ^'), or (AREs only) precede it with '@b \'. | |
192 | Alternatively, for '@b -', make | |
193 | it the last character, or the second endpoint of a range. To use a literal | |
194 | @b - as the first endpoint of a range, make it a collating element or (AREs | |
195 | only) precede it with '@b \'. With the exception of these, some combinations using | |
196 | @b [ (see next paragraphs), and escapes, all other special characters lose | |
197 | their special significance within a bracket expression. | |
198 | Within a bracket | |
199 | expression, a collating element (a character, a multi-character sequence | |
200 | that collates as if it were a single character, or a collating-sequence | |
201 | name for either) enclosed in @b [. and @b .] stands for the | |
202 | sequence of characters of that collating element. | |
203 | @e wxWidgets: Currently no multi-character collating elements are defined. | |
204 | So in @b [.X.], @e X can either be a single character literal or | |
205 | the name of a character. For example, the following are both identical | |
206 | @b [[.0.]-[.9.]] and @b [[.zero.]-[.nine.]] and mean the same as | |
207 | @b [0-9]. | |
208 | See @ref resynchars_overview. | |
209 | Within a bracket expression, a collating element enclosed in @b [= and @b =] | |
210 | is an equivalence class, standing for the sequences of characters of all | |
211 | collating elements equivalent to that one, including itself. | |
212 | An equivalence class may not be an endpoint of a range. | |
213 | @e wxWidgets: Currently no equivalence classes are defined, so | |
214 | @b [=X=] stands for just the single character @e X. | |
215 | @e X can either be a single character literal or the name of a character, | |
216 | see @ref resynchars_overview. | |
217 | Within a bracket expression, | |
218 | the name of a @e character class enclosed in @b [: and @b :] stands for the list | |
219 | of all characters (not all collating elements!) belonging to that class. | |
220 | Standard character classes are: | |
36c9828f FM |
221 | |
222 | ||
223 | ||
72844950 | 224 | @b alpha |
36c9828f | 225 | |
72844950 | 226 | A letter. |
36c9828f | 227 | |
72844950 | 228 | @b upper |
36c9828f | 229 | |
72844950 | 230 | An upper-case letter. |
36c9828f | 231 | |
72844950 | 232 | @b lower |
36c9828f | 233 | |
72844950 | 234 | A lower-case letter. |
36c9828f | 235 | |
72844950 | 236 | @b digit |
36c9828f | 237 | |
72844950 | 238 | A decimal digit. |
36c9828f | 239 | |
72844950 | 240 | @b xdigit |
36c9828f | 241 | |
72844950 | 242 | A hexadecimal digit. |
36c9828f | 243 | |
72844950 | 244 | @b alnum |
36c9828f | 245 | |
72844950 | 246 | An alphanumeric (letter or digit). |
36c9828f | 247 | |
72844950 | 248 | @b print |
36c9828f | 249 | |
72844950 | 250 | An alphanumeric (same as alnum). |
36c9828f | 251 | |
72844950 | 252 | @b blank |
36c9828f | 253 | |
72844950 | 254 | A space or tab character. |
36c9828f | 255 | |
72844950 | 256 | @b space |
36c9828f | 257 | |
72844950 | 258 | A character producing white space in displayed text. |
36c9828f | 259 | |
72844950 | 260 | @b punct |
36c9828f | 261 | |
72844950 | 262 | A punctuation character. |
36c9828f | 263 | |
72844950 | 264 | @b graph |
36c9828f | 265 | |
72844950 | 266 | A character with a visible representation. |
36c9828f | 267 | |
72844950 | 268 | @b cntrl |
36c9828f | 269 | |
72844950 | 270 | A control character. |
36c9828f | 271 | |
36c9828f FM |
272 | |
273 | ||
72844950 BP |
274 | A character class may not be used as an endpoint of a range. |
275 | @e wxWidgets: In a non-Unicode build, these character classifications depend on the | |
276 | current locale, and correspond to the values return by the ANSI C 'is' | |
277 | functions: isalpha, isupper, etc. In Unicode mode they are based on | |
278 | Unicode classifications, and are not affected by the current locale. | |
279 | There are two special cases of bracket expressions: | |
280 | the bracket expressions @b [[::]] and @b [[::]] are constraints, matching empty | |
281 | strings at the beginning and end of a word respectively. A word is defined | |
282 | as a sequence of word characters that is neither preceded nor followed | |
283 | by word characters. A word character is an @e alnum character or an underscore | |
284 | (@b _). These special bracket expressions are deprecated; users of AREs should | |
285 | use constraint escapes instead (see #Escapes below). | |
36c9828f FM |
286 | |
287 | ||
72844950 | 288 | @section overview_resyntax_escapes Escapes |
36c9828f | 289 | |
72844950 BP |
290 | Escapes (AREs only), |
291 | which begin with a @b \ followed by an alphanumeric character, come in several | |
292 | varieties: character entry, class shorthands, constraint escapes, and back | |
293 | references. A @b \ followed by an alphanumeric character but not constituting | |
294 | a valid escape is illegal in AREs. In EREs, there are no escapes: outside | |
295 | a bracket expression, a @b \ followed by an alphanumeric character merely stands | |
296 | for that character as an ordinary character, and inside a bracket expression, | |
297 | @b \ is an ordinary character. (The latter is the one actual incompatibility | |
298 | between EREs and AREs.) | |
299 | Character-entry escapes (AREs only) exist to make | |
300 | it easier to specify non-printing and otherwise inconvenient characters | |
301 | in REs: | |
36c9828f FM |
302 | |
303 | ||
304 | ||
72844950 | 305 | @b \a |
36c9828f | 306 | |
72844950 | 307 | alert (bell) character, as in C |
36c9828f | 308 | |
72844950 | 309 | @b \b |
36c9828f | 310 | |
72844950 | 311 | backspace, as in C |
36c9828f | 312 | |
72844950 | 313 | @b \B |
36c9828f | 314 | |
72844950 BP |
315 | synonym |
316 | for @b \ to help reduce backslash doubling in some applications where there | |
317 | are multiple levels of backslash processing | |
36c9828f | 318 | |
72844950 | 319 | @b \c@e X |
36c9828f | 320 | |
72844950 BP |
321 | (where X is any character) |
322 | the character whose low-order 5 bits are the same as those of @e X, and whose | |
323 | other bits are all zero | |
36c9828f | 324 | |
72844950 | 325 | @b \e |
36c9828f | 326 | |
72844950 BP |
327 | the character whose collating-sequence name is |
328 | '@b ESC', or failing that, the character with octal value 033 | |
36c9828f | 329 | |
72844950 | 330 | @b \f |
36c9828f | 331 | |
72844950 | 332 | formfeed, as in C |
36c9828f | 333 | |
72844950 | 334 | @b \n |
36c9828f | 335 | |
72844950 | 336 | newline, as in C |
36c9828f | 337 | |
72844950 | 338 | @b \r |
36c9828f | 339 | |
72844950 | 340 | carriage return, as in C |
36c9828f | 341 | |
72844950 | 342 | @b \t |
36c9828f | 343 | |
72844950 | 344 | horizontal tab, as in C |
36c9828f | 345 | |
72844950 | 346 | @b \u@e wxyz |
36c9828f | 347 | |
72844950 BP |
348 | (where @e wxyz is exactly four hexadecimal digits) |
349 | the Unicode | |
350 | character @b U+@e wxyz in the local byte ordering | |
36c9828f | 351 | |
72844950 | 352 | @b \U@e stuvwxyz |
36c9828f | 353 | |
72844950 BP |
354 | (where @e stuvwxyz is |
355 | exactly eight hexadecimal digits) reserved for a somewhat-hypothetical Unicode | |
356 | extension to 32 bits | |
36c9828f | 357 | |
72844950 | 358 | @b \v |
36c9828f | 359 | |
72844950 | 360 | vertical tab, as in C are all available. |
36c9828f | 361 | |
72844950 | 362 | @b \x@e hhh |
36c9828f | 363 | |
72844950 BP |
364 | (where |
365 | @e hhh is any sequence of hexadecimal digits) the character whose hexadecimal | |
366 | value is @b 0x@e hhh (a single character no matter how many hexadecimal digits | |
367 | are used). | |
36c9828f | 368 | |
72844950 | 369 | @b \0 |
36c9828f | 370 | |
72844950 | 371 | the character whose value is @b 0 |
36c9828f | 372 | |
72844950 | 373 | @b \@e xy |
36c9828f | 374 | |
72844950 BP |
375 | (where @e xy is exactly two |
376 | octal digits, and is not a @e back reference (see below)) the character whose | |
377 | octal value is @b 0@e xy | |
36c9828f | 378 | |
72844950 | 379 | @b \@e xyz |
36c9828f | 380 | |
72844950 BP |
381 | (where @e xyz is exactly three octal digits, and is |
382 | not a back reference (see below)) | |
383 | the character whose octal value is @b 0@e xyz | |
36c9828f | 384 | |
36c9828f FM |
385 | |
386 | ||
72844950 BP |
387 | Hexadecimal digits are '@b 0'-'@b 9', '@b a'-'@b f', and '@b A'-'@b F'. Octal |
388 | digits are '@b 0'-'@b 7'. | |
389 | The character-entry | |
390 | escapes are always taken as ordinary characters. For example, @b \135 is @b ] in | |
391 | ASCII, but @b \135 does not terminate a bracket expression. Beware, however, | |
392 | that some applications (e.g., C compilers) interpret such sequences themselves | |
393 | before the regular-expression package gets to see them, which may require | |
394 | doubling (quadrupling, etc.) the '@b \'. | |
395 | Class-shorthand escapes (AREs only) provide | |
396 | shorthands for certain commonly-used character classes: | |
36c9828f FM |
397 | |
398 | ||
399 | ||
72844950 | 400 | @b \d |
36c9828f | 401 | |
72844950 | 402 | @b [[:digit:]] |
36c9828f | 403 | |
72844950 | 404 | @b \s |
36c9828f | 405 | |
72844950 | 406 | @b [[:space:]] |
36c9828f | 407 | |
72844950 | 408 | @b \w |
36c9828f | 409 | |
72844950 | 410 | @b [[:alnum:]_] (note underscore) |
36c9828f | 411 | |
72844950 | 412 | @b \D |
36c9828f | 413 | |
72844950 | 414 | @b [^[:digit:]] |
36c9828f | 415 | |
72844950 | 416 | @b \S |
36c9828f | 417 | |
72844950 | 418 | @b [^[:space:]] |
36c9828f | 419 | |
72844950 | 420 | @b \W |
36c9828f | 421 | |
72844950 | 422 | @b [^[:alnum:]_] (note underscore) |
36c9828f FM |
423 | |
424 | ||
36c9828f | 425 | |
72844950 BP |
426 | Within bracket expressions, '@b \d', '@b \s', and |
427 | '@b \w' lose their outer brackets, and '@b \D', | |
428 | '@b \S', and '@b \W' are illegal. (So, for example, | |
429 | @b [a-c\d] is equivalent to @b [a-c[:digit:]]. | |
430 | Also, @b [a-c\D], which is equivalent to | |
431 | @b [a-c^[:digit:]], is illegal.) | |
432 | A constraint escape (AREs only) is a constraint, | |
433 | matching the empty string if specific conditions are met, written as an | |
434 | escape: | |
36c9828f FM |
435 | |
436 | ||
437 | ||
72844950 | 438 | @b \A |
36c9828f | 439 | |
72844950 BP |
440 | matches only at the beginning of the string |
441 | (see #Matching, below, | |
442 | for how this differs from '@b ^') | |
36c9828f | 443 | |
72844950 | 444 | @b \m |
36c9828f | 445 | |
72844950 | 446 | matches only at the beginning of a word |
36c9828f | 447 | |
72844950 | 448 | @b \M |
36c9828f | 449 | |
72844950 | 450 | matches only at the end of a word |
36c9828f | 451 | |
72844950 | 452 | @b \y |
36c9828f | 453 | |
72844950 | 454 | matches only at the beginning or end of a word |
36c9828f | 455 | |
72844950 | 456 | @b \Y |
36c9828f | 457 | |
72844950 BP |
458 | matches only at a point that is not the beginning or end of |
459 | a word | |
36c9828f | 460 | |
72844950 | 461 | @b \Z |
36c9828f | 462 | |
72844950 BP |
463 | matches only at the end of the string |
464 | (see #Matching, below, for | |
465 | how this differs from '@b $') | |
36c9828f | 466 | |
72844950 | 467 | @b \@e m |
36c9828f | 468 | |
72844950 BP |
469 | (where @e m is a nonzero digit) a @e back reference, |
470 | see below | |
36c9828f | 471 | |
72844950 | 472 | @b \@e mnn |
36c9828f | 473 | |
72844950 BP |
474 | (where @e m is a nonzero digit, and @e nn is some more digits, |
475 | and the decimal value @e mnn is not greater than the number of closing capturing | |
476 | parentheses seen so far) a @e back reference, see below | |
36c9828f FM |
477 | |
478 | ||
479 | ||
72844950 BP |
480 | A word is defined |
481 | as in the specification of @b [[::]] and @b [[::]] above. Constraint escapes are | |
482 | illegal within bracket expressions. | |
483 | A back reference (AREs only) matches | |
484 | the same string matched by the parenthesized subexpression specified by | |
485 | the number, so that (e.g.) @b ([bc])\1 matches @b bb or @b cc but not '@b bc'. | |
486 | The subexpression | |
487 | must entirely precede the back reference in the RE. Subexpressions are numbered | |
488 | in the order of their leading parentheses. Non-capturing parentheses do not | |
489 | define subexpressions. | |
490 | There is an inherent historical ambiguity between | |
491 | octal character-entry escapes and back references, which is resolved by | |
492 | heuristics, as hinted at above. A leading zero always indicates an octal | |
493 | escape. A single non-zero digit, not followed by another digit, is always | |
494 | taken as a back reference. A multi-digit sequence not starting with a zero | |
495 | is taken as a back reference if it comes after a suitable subexpression | |
496 | (i.e. the number is in the legal range for a back reference), and otherwise | |
497 | is taken as octal. | |
36c9828f | 498 | |
36c9828f | 499 | |
72844950 | 500 | @section overview_resyntax_metasyntax Metasyntax |
36c9828f | 501 | |
72844950 BP |
502 | In addition to the main syntax described above, |
503 | there are some special forms and miscellaneous syntactic facilities available. | |
504 | Normally the flavor of RE being used is specified by application-dependent | |
505 | means. However, this can be overridden by a @e director. If an RE of any flavor | |
506 | begins with '@b ***:', the rest of the RE is an ARE. If an RE of any flavor begins | |
507 | with '@b ***=', the rest of the RE is taken to be a literal string, with all | |
508 | characters considered ordinary characters. | |
509 | An ARE may begin with @e embedded options: a sequence @b (?xyz) | |
510 | (where @e xyz is one or more alphabetic characters) | |
511 | specifies options affecting the rest of the RE. These supplement, and can | |
512 | override, any options specified by the application. The available option | |
513 | letters are: | |
36c9828f FM |
514 | |
515 | ||
516 | ||
72844950 | 517 | @b b |
36c9828f | 518 | |
72844950 | 519 | rest of RE is a BRE |
36c9828f | 520 | |
72844950 | 521 | @b c |
36c9828f | 522 | |
72844950 | 523 | case-sensitive matching (usual default) |
36c9828f | 524 | |
72844950 | 525 | @b e |
36c9828f | 526 | |
72844950 | 527 | rest of RE is an ERE |
36c9828f | 528 | |
72844950 | 529 | @b i |
36c9828f | 530 | |
72844950 | 531 | case-insensitive matching (see #Matching, below) |
36c9828f | 532 | |
72844950 | 533 | @b m |
36c9828f | 534 | |
72844950 | 535 | historical synonym for @b n |
36c9828f | 536 | |
72844950 | 537 | @b n |
36c9828f | 538 | |
72844950 | 539 | newline-sensitive matching (see #Matching, below) |
36c9828f | 540 | |
72844950 | 541 | @b p |
36c9828f | 542 | |
72844950 | 543 | partial newline-sensitive matching (see #Matching, below) |
36c9828f | 544 | |
72844950 | 545 | @b q |
36c9828f | 546 | |
72844950 BP |
547 | rest of RE |
548 | is a literal ("quoted'') string, all ordinary characters | |
36c9828f | 549 | |
72844950 | 550 | @b s |
36c9828f | 551 | |
72844950 | 552 | non-newline-sensitive matching (usual default) |
36c9828f | 553 | |
72844950 | 554 | @b t |
36c9828f | 555 | |
72844950 | 556 | tight syntax (usual default; see below) |
36c9828f | 557 | |
72844950 | 558 | @b w |
36c9828f | 559 | |
72844950 BP |
560 | inverse |
561 | partial newline-sensitive ("weird'') matching (see #Matching, below) | |
36c9828f | 562 | |
72844950 BP |
563 | @b x |
564 | ||
565 | expanded syntax (see below) | |
566 | ||
567 | ||
568 | ||
569 | Embedded options take effect at the @b ) terminating the | |
570 | sequence. They are available only at the start of an ARE, and may not be | |
571 | used later within it. | |
572 | In addition to the usual (@e tight) RE syntax, in which | |
573 | all characters are significant, there is an @e expanded syntax, available | |
574 | in AREs with the embedded | |
575 | x option. In the expanded syntax, white-space characters are ignored and | |
576 | all characters between a @b # and the following newline (or the end of the | |
577 | RE) are ignored, permitting paragraphing and commenting a complex RE. There | |
578 | are three exceptions to that basic rule: | |
579 | ||
580 | ||
581 | a white-space character or '@b #' preceded | |
582 | by '@b \' is retained | |
583 | white space or '@b #' within a bracket expression is retained | |
584 | white space and comments are illegal within multi-character symbols like | |
585 | the ARE '@b (?:' or the BRE '@b \(' | |
586 | ||
587 | ||
588 | Expanded-syntax white-space characters are blank, | |
589 | tab, newline, and any character that belongs to the @e space character class. | |
590 | Finally, in an ARE, outside bracket expressions, the sequence '@b (?#ttt)' (where | |
591 | @e ttt is any text not containing a '@b )') is a comment, completely ignored. Again, | |
592 | this is not allowed between the characters of multi-character symbols like | |
593 | '@b (?:'. Such comments are more a historical artifact than a useful facility, | |
594 | and their use is deprecated; use the expanded syntax instead. | |
595 | @e None of these | |
596 | metasyntax extensions is available if the application (or an initial @b ***= | |
597 | director) has specified that the user's input be treated as a literal string | |
598 | rather than as an RE. | |
599 | ||
600 | ||
601 | @section overview_resyntax_matching Matching | |
602 | ||
603 | In the event that an RE could match more than | |
604 | one substring of a given string, the RE matches the one starting earliest | |
605 | in the string. If the RE could match more than one substring starting at | |
606 | that point, its choice is determined by its @e preference: either the longest | |
607 | substring, or the shortest. | |
608 | Most atoms, and all constraints, have no preference. | |
609 | A parenthesized RE has the same preference (possibly none) as the RE. A | |
610 | quantified atom with quantifier @b {m} or @b {m}? has the same preference (possibly | |
611 | none) as the atom itself. A quantified atom with other normal quantifiers | |
612 | (including @b {m,n} with @e m equal to @e n) prefers longest match. A quantified | |
613 | atom with other non-greedy quantifiers (including @b {m,n}? with @e m equal to | |
614 | @e n) prefers shortest match. A branch has the same preference as the first | |
615 | quantified atom in it which has a preference. An RE consisting of two or | |
616 | more branches connected by the @b | operator prefers longest match. | |
617 | Subject to the constraints imposed by the rules for matching the whole RE, subexpressions | |
618 | also match the longest or shortest possible substrings, based on their | |
619 | preferences, with subexpressions starting earlier in the RE taking priority | |
620 | over ones starting later. Note that outer subexpressions thus take priority | |
621 | over their component subexpressions. | |
622 | Note that the quantifiers @b {1,1} and | |
623 | @b {1,1}? can be used to force longest and shortest preference, respectively, | |
624 | on a subexpression or a whole RE. | |
625 | Match lengths are measured in characters, | |
626 | not collating elements. An empty string is considered longer than no match | |
627 | at all. For example, @b bb* matches the three middle characters | |
628 | of '@b abbbc', @b (week|wee)(night|knights) | |
629 | matches all ten characters of '@b weeknights', when @b (.*).* is matched against | |
630 | @b abc the parenthesized subexpression matches all three characters, and when | |
631 | @b (a*)* is matched against @b bc both the whole RE and the parenthesized subexpression | |
632 | match an empty string. | |
633 | If case-independent matching is specified, the effect | |
634 | is much as if all case distinctions had vanished from the alphabet. When | |
635 | an alphabetic that exists in multiple cases appears as an ordinary character | |
636 | outside a bracket expression, it is effectively transformed into a bracket | |
637 | expression containing both cases, so that @b x becomes '@b [xX]'. When it appears | |
638 | inside a bracket expression, all case counterparts of it are added to the | |
639 | bracket expression, so that @b [x] becomes @b [xX] and @b [^x] becomes '@b [^xX]'. | |
640 | If newline-sensitive | |
641 | matching is specified, @b . and bracket expressions using @b ^ will never match | |
642 | the newline character (so that matches will never cross newlines unless | |
643 | the RE explicitly arranges it) and @b ^ and @b $ will match the empty string after | |
644 | and before a newline respectively, in addition to matching at beginning | |
645 | and end of string respectively. ARE @b \A and @b \Z continue to match beginning | |
646 | or end of string @e only. | |
647 | If partial newline-sensitive matching is specified, | |
648 | this affects @b . and bracket expressions as with newline-sensitive matching, | |
649 | but not @b ^ and '@b $'. | |
650 | If inverse partial newline-sensitive matching is specified, | |
651 | this affects @b ^ and @b $ as with newline-sensitive matching, but not @b . and bracket | |
652 | expressions. This isn't very useful but is provided for symmetry. | |
653 | ||
654 | ||
655 | @section overview_resyntax_limits Limits and Compatibility | |
656 | ||
657 | No particular limit is imposed on the length of REs. Programs | |
658 | intended to be highly portable should not employ REs longer than 256 bytes, | |
659 | as a POSIX-compliant implementation can refuse to accept such REs. | |
660 | The only | |
661 | feature of AREs that is actually incompatible with POSIX EREs is that @b \ | |
662 | does not lose its special significance inside bracket expressions. All other | |
663 | ARE features use syntax which is illegal or has undefined or unspecified | |
664 | effects in POSIX EREs; the @b *** syntax of directors likewise is outside | |
665 | the POSIX syntax for both BREs and EREs. | |
666 | Many of the ARE extensions are | |
667 | borrowed from Perl, but some have been changed to clean them up, and a | |
668 | few Perl extensions are not present. Incompatibilities of note include '@b \b', | |
669 | '@b \B', the lack of special treatment for a trailing newline, the addition of | |
670 | complemented bracket expressions to the things affected by newline-sensitive | |
671 | matching, the restrictions on parentheses and back references in lookahead | |
672 | constraints, and the longest/shortest-match (rather than first-match) matching | |
673 | semantics. | |
674 | The matching rules for REs containing both normal and non-greedy | |
675 | quantifiers have changed since early beta-test versions of this package. | |
676 | (The new rules are much simpler and cleaner, but don't work as hard at guessing | |
677 | the user's real intentions.) | |
678 | Henry Spencer's original 1986 @e regexp package, still in widespread use, | |
679 | implemented an early version of today's EREs. There are four incompatibilities between @e regexp's | |
680 | near-EREs ('RREs' for short) and AREs. In roughly increasing order of significance: | |
36c9828f | 681 | |
72844950 BP |
682 | In AREs, @b \ followed by an alphanumeric character is either an escape or |
683 | an error, while in RREs, it was just another way of writing the alphanumeric. | |
684 | This should not be a problem because there was no reason to write such | |
685 | a sequence in RREs. | |
686 | @b { followed by a digit in an ARE is the beginning of | |
687 | a bound, while in RREs, @b { was always an ordinary character. Such sequences | |
688 | should be rare, and will often result in an error because following characters | |
689 | will not look like a valid bound. | |
690 | In AREs, @b \ remains a special character | |
691 | within '@b []', so a literal @b \ within @b [] must be | |
692 | written '@b \\'. @b \\ also gives a literal | |
693 | @b \ within @b [] in RREs, but only truly paranoid programmers routinely doubled | |
694 | the backslash. | |
695 | AREs report the longest/shortest match for the RE, rather | |
696 | than the first found in a specified search order. This may affect some RREs | |
697 | which were written in the expectation that the first match would be reported. | |
698 | (The careful crafting of RREs to optimize the search order for fast matching | |
699 | is obsolete (AREs examine all possible matches in parallel, and their performance | |
700 | is largely insensitive to their complexity) but cases where the search | |
701 | order was exploited to deliberately find a match which was @e not the longest/shortest | |
702 | will need rewriting.) | |
36c9828f FM |
703 | |
704 | ||
72844950 | 705 | @section overview_resyntax_bre Basic Regular Expressions |
36c9828f | 706 | |
72844950 BP |
707 | BREs differ from EREs in |
708 | several respects. '@b |', '@b +', and @b ? are ordinary characters and there is no equivalent | |
709 | for their functionality. The delimiters for bounds | |
710 | are @b \{ and '@b \}', with @b { and | |
711 | @b } by themselves ordinary characters. The parentheses for nested subexpressions | |
712 | are @b \( and '@b \)', with @b ( and @b ) by themselves | |
713 | ordinary characters. @b ^ is an ordinary | |
714 | character except at the beginning of the RE or the beginning of a parenthesized | |
715 | subexpression, @b $ is an ordinary character except at the end of the RE or | |
716 | the end of a parenthesized subexpression, and @b * is an ordinary character | |
717 | if it appears at the beginning of the RE or the beginning of a parenthesized | |
718 | subexpression (after a possible leading '@b ^'). Finally, single-digit back references | |
719 | are available, and @b \ and @b \ are synonyms | |
720 | for @b [[::]] and @b [[::]] respectively; | |
721 | no other escapes are available. | |
36c9828f FM |
722 | |
723 | ||
72844950 | 724 | @section overview_resyntax_characters Regular Expression Character Names |
36c9828f | 725 | |
72844950 | 726 | Note that the character names are case sensitive. |
36c9828f FM |
727 | |
728 | ||
729 | ||
36c9828f FM |
730 | |
731 | ||
732 | ||
72844950 | 733 | NUL |
36c9828f FM |
734 | |
735 | ||
36c9828f | 736 | |
36c9828f | 737 | |
72844950 | 738 | '\0' |
36c9828f FM |
739 | |
740 | ||
741 | ||
742 | ||
743 | ||
72844950 | 744 | SOH |
36c9828f | 745 | |
36c9828f FM |
746 | |
747 | ||
748 | ||
72844950 | 749 | '\001' |
36c9828f | 750 | |
36c9828f FM |
751 | |
752 | ||
753 | ||
754 | ||
72844950 | 755 | STX |
36c9828f | 756 | |
36c9828f FM |
757 | |
758 | ||
759 | ||
72844950 | 760 | '\002' |
36c9828f | 761 | |
36c9828f FM |
762 | |
763 | ||
764 | ||
765 | ||
72844950 | 766 | ETX |
36c9828f | 767 | |
36c9828f FM |
768 | |
769 | ||
770 | ||
72844950 | 771 | '\003' |
36c9828f | 772 | |
36c9828f FM |
773 | |
774 | ||
775 | ||
776 | ||
72844950 | 777 | EOT |
36c9828f | 778 | |
36c9828f FM |
779 | |
780 | ||
781 | ||
72844950 | 782 | '\004' |
36c9828f | 783 | |
36c9828f FM |
784 | |
785 | ||
786 | ||
787 | ||
72844950 | 788 | ENQ |
36c9828f | 789 | |
36c9828f FM |
790 | |
791 | ||
792 | ||
72844950 | 793 | '\005' |
36c9828f | 794 | |
36c9828f FM |
795 | |
796 | ||
797 | ||
798 | ||
72844950 | 799 | ACK |
36c9828f | 800 | |
36c9828f FM |
801 | |
802 | ||
803 | ||
72844950 | 804 | '\006' |
36c9828f | 805 | |
36c9828f FM |
806 | |
807 | ||
808 | ||
809 | ||
72844950 | 810 | BEL |
36c9828f | 811 | |
36c9828f FM |
812 | |
813 | ||
814 | ||
72844950 | 815 | '\007' |
36c9828f | 816 | |
36c9828f FM |
817 | |
818 | ||
819 | ||
820 | ||
72844950 | 821 | alert |
36c9828f | 822 | |
36c9828f FM |
823 | |
824 | ||
825 | ||
72844950 | 826 | '\007' |
36c9828f | 827 | |
36c9828f FM |
828 | |
829 | ||
830 | ||
831 | ||
72844950 | 832 | BS |
36c9828f | 833 | |
36c9828f FM |
834 | |
835 | ||
836 | ||
72844950 | 837 | '\010' |
36c9828f | 838 | |
36c9828f FM |
839 | |
840 | ||
841 | ||
842 | ||
72844950 | 843 | backspace |
36c9828f | 844 | |
36c9828f FM |
845 | |
846 | ||
847 | ||
72844950 | 848 | '\b' |
36c9828f | 849 | |
36c9828f FM |
850 | |
851 | ||
852 | ||
853 | ||
72844950 | 854 | HT |
36c9828f | 855 | |
36c9828f FM |
856 | |
857 | ||
858 | ||
72844950 | 859 | '\011' |
36c9828f | 860 | |
36c9828f FM |
861 | |
862 | ||
863 | ||
864 | ||
72844950 | 865 | tab |
36c9828f | 866 | |
36c9828f FM |
867 | |
868 | ||
869 | ||
72844950 | 870 | '\t' |
36c9828f | 871 | |
36c9828f FM |
872 | |
873 | ||
874 | ||
875 | ||
72844950 | 876 | LF |
36c9828f | 877 | |
36c9828f | 878 | |
36c9828f | 879 | |
36c9828f | 880 | |
72844950 | 881 | '\012' |
36c9828f FM |
882 | |
883 | ||
884 | ||
885 | ||
886 | ||
72844950 | 887 | newline |
36c9828f FM |
888 | |
889 | ||
890 | ||
891 | ||
72844950 | 892 | '\n' |
36c9828f FM |
893 | |
894 | ||
895 | ||
896 | ||
897 | ||
72844950 | 898 | VT |
36c9828f FM |
899 | |
900 | ||
901 | ||
902 | ||
72844950 | 903 | '\013' |
36c9828f FM |
904 | |
905 | ||
906 | ||
907 | ||
908 | ||
72844950 | 909 | vertical-tab |
36c9828f FM |
910 | |
911 | ||
912 | ||
913 | ||
72844950 | 914 | '\v' |
36c9828f FM |
915 | |
916 | ||
917 | ||
918 | ||
919 | ||
72844950 | 920 | FF |
36c9828f FM |
921 | |
922 | ||
923 | ||
924 | ||
72844950 | 925 | '\014' |
36c9828f FM |
926 | |
927 | ||
928 | ||
929 | ||
930 | ||
72844950 | 931 | form-feed |
36c9828f FM |
932 | |
933 | ||
934 | ||
935 | ||
72844950 | 936 | '\f' |
36c9828f FM |
937 | |
938 | ||
939 | ||
940 | ||
941 | ||
72844950 | 942 | CR |
36c9828f FM |
943 | |
944 | ||
945 | ||
946 | ||
72844950 | 947 | '\015' |
36c9828f FM |
948 | |
949 | ||
950 | ||
951 | ||
952 | ||
72844950 | 953 | carriage-return |
36c9828f FM |
954 | |
955 | ||
956 | ||
957 | ||
72844950 | 958 | '\r' |
36c9828f FM |
959 | |
960 | ||
961 | ||
962 | ||
963 | ||
72844950 | 964 | SO |
36c9828f FM |
965 | |
966 | ||
967 | ||
968 | ||
72844950 | 969 | '\016' |
36c9828f FM |
970 | |
971 | ||
972 | ||
973 | ||
974 | ||
72844950 | 975 | SI |
36c9828f FM |
976 | |
977 | ||
978 | ||
979 | ||
72844950 | 980 | '\017' |
36c9828f FM |
981 | |
982 | ||
983 | ||
984 | ||
985 | ||
72844950 | 986 | DLE |
36c9828f FM |
987 | |
988 | ||
989 | ||
990 | ||
72844950 | 991 | '\020' |
36c9828f FM |
992 | |
993 | ||
994 | ||
995 | ||
996 | ||
72844950 | 997 | DC1 |
36c9828f FM |
998 | |
999 | ||
1000 | ||
1001 | ||
72844950 | 1002 | '\021' |
36c9828f FM |
1003 | |
1004 | ||
1005 | ||
1006 | ||
1007 | ||
72844950 | 1008 | DC2 |
36c9828f FM |
1009 | |
1010 | ||
1011 | ||
1012 | ||
72844950 | 1013 | '\022' |
36c9828f FM |
1014 | |
1015 | ||
1016 | ||
1017 | ||
1018 | ||
72844950 | 1019 | DC3 |
36c9828f FM |
1020 | |
1021 | ||
1022 | ||
1023 | ||
72844950 | 1024 | '\023' |
36c9828f FM |
1025 | |
1026 | ||
1027 | ||
1028 | ||
1029 | ||
72844950 | 1030 | DC4 |
36c9828f FM |
1031 | |
1032 | ||
1033 | ||
1034 | ||
72844950 | 1035 | '\024' |
36c9828f FM |
1036 | |
1037 | ||
1038 | ||
1039 | ||
1040 | ||
72844950 | 1041 | NAK |
36c9828f FM |
1042 | |
1043 | ||
1044 | ||
1045 | ||
72844950 | 1046 | '\025' |
36c9828f FM |
1047 | |
1048 | ||
1049 | ||
1050 | ||
1051 | ||
72844950 | 1052 | SYN |
36c9828f FM |
1053 | |
1054 | ||
1055 | ||
1056 | ||
72844950 | 1057 | '\026' |
36c9828f FM |
1058 | |
1059 | ||
1060 | ||
1061 | ||
1062 | ||
72844950 | 1063 | ETB |
36c9828f FM |
1064 | |
1065 | ||
1066 | ||
1067 | ||
72844950 | 1068 | '\027' |
36c9828f FM |
1069 | |
1070 | ||
36c9828f FM |
1071 | |
1072 | ||
1073 | ||
72844950 | 1074 | CAN |
36c9828f | 1075 | |
36c9828f FM |
1076 | |
1077 | ||
1078 | ||
72844950 | 1079 | '\030' |
36c9828f FM |
1080 | |
1081 | ||
36c9828f FM |
1082 | |
1083 | ||
1084 | ||
72844950 | 1085 | EM |
36c9828f | 1086 | |
36c9828f FM |
1087 | |
1088 | ||
1089 | ||
72844950 | 1090 | '\031' |
36c9828f FM |
1091 | |
1092 | ||
36c9828f FM |
1093 | |
1094 | ||
1095 | ||
72844950 | 1096 | SUB |
36c9828f | 1097 | |
36c9828f FM |
1098 | |
1099 | ||
1100 | ||
72844950 | 1101 | '\032' |
36c9828f FM |
1102 | |
1103 | ||
36c9828f FM |
1104 | |
1105 | ||
1106 | ||
72844950 | 1107 | ESC |
36c9828f | 1108 | |
36c9828f FM |
1109 | |
1110 | ||
1111 | ||
72844950 | 1112 | '\033' |
36c9828f FM |
1113 | |
1114 | ||
36c9828f FM |
1115 | |
1116 | ||
1117 | ||
72844950 | 1118 | IS4 |
36c9828f | 1119 | |
36c9828f FM |
1120 | |
1121 | ||
1122 | ||
72844950 | 1123 | '\034' |
36c9828f FM |
1124 | |
1125 | ||
36c9828f FM |
1126 | |
1127 | ||
1128 | ||
72844950 | 1129 | FS |
36c9828f | 1130 | |
36c9828f FM |
1131 | |
1132 | ||
1133 | ||
72844950 | 1134 | '\034' |
36c9828f FM |
1135 | |
1136 | ||
36c9828f FM |
1137 | |
1138 | ||
1139 | ||
72844950 | 1140 | IS3 |
36c9828f FM |
1141 | |
1142 | ||
1143 | ||
36c9828f | 1144 | |
72844950 | 1145 | '\035' |
36c9828f FM |
1146 | |
1147 | ||
1148 | ||
36c9828f FM |
1149 | |
1150 | ||
72844950 | 1151 | GS |
36c9828f FM |
1152 | |
1153 | ||
1154 | ||
36c9828f | 1155 | |
72844950 | 1156 | '\035' |
36c9828f FM |
1157 | |
1158 | ||
1159 | ||
36c9828f FM |
1160 | |
1161 | ||
72844950 | 1162 | IS2 |
36c9828f FM |
1163 | |
1164 | ||
1165 | ||
36c9828f | 1166 | |
72844950 | 1167 | '\036' |
36c9828f FM |
1168 | |
1169 | ||
1170 | ||
36c9828f FM |
1171 | |
1172 | ||
72844950 | 1173 | RS |
36c9828f FM |
1174 | |
1175 | ||
1176 | ||
36c9828f | 1177 | |
72844950 | 1178 | '\036' |
36c9828f FM |
1179 | |
1180 | ||
1181 | ||
36c9828f FM |
1182 | |
1183 | ||
72844950 | 1184 | IS1 |
36c9828f FM |
1185 | |
1186 | ||
1187 | ||
36c9828f | 1188 | |
72844950 | 1189 | '\037' |
36c9828f FM |
1190 | |
1191 | ||
1192 | ||
36c9828f FM |
1193 | |
1194 | ||
72844950 | 1195 | US |
36c9828f FM |
1196 | |
1197 | ||
1198 | ||
36c9828f | 1199 | |
72844950 | 1200 | '\037' |
36c9828f FM |
1201 | |
1202 | ||
1203 | ||
36c9828f FM |
1204 | |
1205 | ||
72844950 | 1206 | space |
36c9828f FM |
1207 | |
1208 | ||
1209 | ||
36c9828f | 1210 | |
72844950 | 1211 | ' ' |
36c9828f FM |
1212 | |
1213 | ||
1214 | ||
36c9828f FM |
1215 | |
1216 | ||
72844950 | 1217 | exclamation-mark |
36c9828f FM |
1218 | |
1219 | ||
1220 | ||
36c9828f | 1221 | |
72844950 | 1222 | '!' |
36c9828f FM |
1223 | |
1224 | ||
1225 | ||
36c9828f FM |
1226 | |
1227 | ||
72844950 | 1228 | quotation-mark |
36c9828f FM |
1229 | |
1230 | ||
1231 | ||
36c9828f | 1232 | |
72844950 | 1233 | '"' |
36c9828f | 1234 | |
36c9828f FM |
1235 | |
1236 | ||
1237 | ||
1238 | ||
72844950 | 1239 | number-sign |
36c9828f FM |
1240 | |
1241 | ||
36c9828f FM |
1242 | |
1243 | ||
72844950 | 1244 | '#' |
36c9828f FM |
1245 | |
1246 | ||
36c9828f FM |
1247 | |
1248 | ||
1249 | ||
72844950 | 1250 | dollar-sign |
36c9828f FM |
1251 | |
1252 | ||
36c9828f FM |
1253 | |
1254 | ||
72844950 | 1255 | '$' |
36c9828f FM |
1256 | |
1257 | ||
36c9828f FM |
1258 | |
1259 | ||
1260 | ||
72844950 | 1261 | percent-sign |
36c9828f FM |
1262 | |
1263 | ||
36c9828f FM |
1264 | |
1265 | ||
72844950 | 1266 | '%' |
36c9828f FM |
1267 | |
1268 | ||
36c9828f FM |
1269 | |
1270 | ||
1271 | ||
72844950 | 1272 | ampersand |
36c9828f FM |
1273 | |
1274 | ||
36c9828f FM |
1275 | |
1276 | ||
72844950 | 1277 | '' |
36c9828f FM |
1278 | |
1279 | ||
36c9828f FM |
1280 | |
1281 | ||
1282 | ||
72844950 | 1283 | apostrophe |
36c9828f FM |
1284 | |
1285 | ||
36c9828f FM |
1286 | |
1287 | ||
72844950 | 1288 | '\'' |
36c9828f FM |
1289 | |
1290 | ||
36c9828f FM |
1291 | |
1292 | ||
1293 | ||
72844950 | 1294 | left-parenthesis |
36c9828f FM |
1295 | |
1296 | ||
36c9828f FM |
1297 | |
1298 | ||
72844950 | 1299 | '(' |
36c9828f FM |
1300 | |
1301 | ||
36c9828f FM |
1302 | |
1303 | ||
1304 | ||
72844950 | 1305 | right-parenthesis |
36c9828f FM |
1306 | |
1307 | ||
36c9828f FM |
1308 | |
1309 | ||
72844950 | 1310 | ')' |
36c9828f FM |
1311 | |
1312 | ||
36c9828f FM |
1313 | |
1314 | ||
1315 | ||
72844950 | 1316 | asterisk |
36c9828f FM |
1317 | |
1318 | ||
36c9828f FM |
1319 | |
1320 | ||
72844950 | 1321 | '*' |
36c9828f FM |
1322 | |
1323 | ||
36c9828f FM |
1324 | |
1325 | ||
1326 | ||
72844950 | 1327 | plus-sign |
36c9828f FM |
1328 | |
1329 | ||
36c9828f FM |
1330 | |
1331 | ||
72844950 | 1332 | '+' |
36c9828f FM |
1333 | |
1334 | ||
36c9828f FM |
1335 | |
1336 | ||
1337 | ||
72844950 | 1338 | comma |
36c9828f FM |
1339 | |
1340 | ||
36c9828f FM |
1341 | |
1342 | ||
72844950 | 1343 | ',' |
36c9828f FM |
1344 | |
1345 | ||
36c9828f FM |
1346 | |
1347 | ||
1348 | ||
72844950 | 1349 | hyphen |
36c9828f FM |
1350 | |
1351 | ||
36c9828f FM |
1352 | |
1353 | ||
72844950 | 1354 | '-' |
36c9828f FM |
1355 | |
1356 | ||
36c9828f FM |
1357 | |
1358 | ||
1359 | ||
72844950 | 1360 | hyphen-minus |
36c9828f FM |
1361 | |
1362 | ||
36c9828f FM |
1363 | |
1364 | ||
72844950 | 1365 | '-' |
36c9828f FM |
1366 | |
1367 | ||
36c9828f FM |
1368 | |
1369 | ||
1370 | ||
72844950 | 1371 | period |
36c9828f FM |
1372 | |
1373 | ||
36c9828f FM |
1374 | |
1375 | ||
72844950 | 1376 | '.' |
36c9828f FM |
1377 | |
1378 | ||
36c9828f | 1379 | |
36c9828f | 1380 | |
36c9828f | 1381 | |
72844950 | 1382 | full-stop |
36c9828f | 1383 | |
36c9828f FM |
1384 | |
1385 | ||
36c9828f | 1386 | |
72844950 | 1387 | '.' |
36c9828f FM |
1388 | |
1389 | ||
36c9828f | 1390 | |
36c9828f | 1391 | |
36c9828f | 1392 | |
72844950 | 1393 | slash |
36c9828f FM |
1394 | |
1395 | ||
1396 | ||
1397 | ||
72844950 | 1398 | '/' |
36c9828f FM |
1399 | |
1400 | ||
36c9828f FM |
1401 | |
1402 | ||
1403 | ||
72844950 | 1404 | solidus |
36c9828f | 1405 | |
36c9828f FM |
1406 | |
1407 | ||
1408 | ||
72844950 | 1409 | '/' |
36c9828f FM |
1410 | |
1411 | ||
36c9828f FM |
1412 | |
1413 | ||
1414 | ||
72844950 | 1415 | zero |
36c9828f | 1416 | |
36c9828f FM |
1417 | |
1418 | ||
1419 | ||
72844950 | 1420 | '0' |
36c9828f FM |
1421 | |
1422 | ||
36c9828f FM |
1423 | |
1424 | ||
1425 | ||
72844950 | 1426 | one |
36c9828f | 1427 | |
36c9828f FM |
1428 | |
1429 | ||
1430 | ||
72844950 | 1431 | '1' |
36c9828f FM |
1432 | |
1433 | ||
36c9828f FM |
1434 | |
1435 | ||
1436 | ||
72844950 | 1437 | two |
36c9828f | 1438 | |
36c9828f FM |
1439 | |
1440 | ||
1441 | ||
72844950 | 1442 | '2' |
36c9828f FM |
1443 | |
1444 | ||
36c9828f FM |
1445 | |
1446 | ||
1447 | ||
72844950 | 1448 | three |
36c9828f | 1449 | |
36c9828f FM |
1450 | |
1451 | ||
1452 | ||
72844950 | 1453 | '3' |
36c9828f FM |
1454 | |
1455 | ||
36c9828f FM |
1456 | |
1457 | ||
1458 | ||
72844950 | 1459 | four |
36c9828f | 1460 | |
36c9828f FM |
1461 | |
1462 | ||
1463 | ||
72844950 | 1464 | '4' |
36c9828f FM |
1465 | |
1466 | ||
36c9828f FM |
1467 | |
1468 | ||
1469 | ||
72844950 | 1470 | five |
36c9828f | 1471 | |
36c9828f FM |
1472 | |
1473 | ||
1474 | ||
72844950 | 1475 | '5' |
36c9828f FM |
1476 | |
1477 | ||
36c9828f FM |
1478 | |
1479 | ||
1480 | ||
72844950 | 1481 | six |
36c9828f | 1482 | |
36c9828f FM |
1483 | |
1484 | ||
1485 | ||
72844950 | 1486 | '6' |
36c9828f FM |
1487 | |
1488 | ||
36c9828f FM |
1489 | |
1490 | ||
1491 | ||
72844950 | 1492 | seven |
36c9828f | 1493 | |
36c9828f FM |
1494 | |
1495 | ||
1496 | ||
72844950 | 1497 | '7' |
36c9828f FM |
1498 | |
1499 | ||
36c9828f FM |
1500 | |
1501 | ||
1502 | ||
72844950 | 1503 | eight |
36c9828f | 1504 | |
36c9828f FM |
1505 | |
1506 | ||
1507 | ||
72844950 | 1508 | '8' |
36c9828f FM |
1509 | |
1510 | ||
36c9828f FM |
1511 | |
1512 | ||
1513 | ||
72844950 | 1514 | nine |
36c9828f | 1515 | |
36c9828f FM |
1516 | |
1517 | ||
1518 | ||
72844950 | 1519 | '9' |
36c9828f FM |
1520 | |
1521 | ||
36c9828f FM |
1522 | |
1523 | ||
1524 | ||
72844950 | 1525 | colon |
36c9828f | 1526 | |
36c9828f FM |
1527 | |
1528 | ||
1529 | ||
72844950 | 1530 | ':' |
36c9828f FM |
1531 | |
1532 | ||
36c9828f FM |
1533 | |
1534 | ||
1535 | ||
72844950 | 1536 | semicolon |
36c9828f | 1537 | |
36c9828f FM |
1538 | |
1539 | ||
1540 | ||
72844950 | 1541 | ';' |
36c9828f FM |
1542 | |
1543 | ||
36c9828f FM |
1544 | |
1545 | ||
1546 | ||
72844950 | 1547 | less-than-sign |
36c9828f | 1548 | |
36c9828f FM |
1549 | |
1550 | ||
1551 | ||
72844950 | 1552 | '' |
36c9828f FM |
1553 | |
1554 | ||
36c9828f FM |
1555 | |
1556 | ||
1557 | ||
72844950 | 1558 | equals-sign |
36c9828f | 1559 | |
36c9828f FM |
1560 | |
1561 | ||
1562 | ||
72844950 | 1563 | '=' |
36c9828f FM |
1564 | |
1565 | ||
36c9828f FM |
1566 | |
1567 | ||
1568 | ||
72844950 | 1569 | greater-than-sign |
36c9828f | 1570 | |
36c9828f FM |
1571 | |
1572 | ||
1573 | ||
72844950 | 1574 | '' |
36c9828f FM |
1575 | |
1576 | ||
36c9828f FM |
1577 | |
1578 | ||
1579 | ||
72844950 | 1580 | question-mark |
36c9828f | 1581 | |
36c9828f FM |
1582 | |
1583 | ||
1584 | ||
72844950 | 1585 | '?' |
36c9828f FM |
1586 | |
1587 | ||
36c9828f FM |
1588 | |
1589 | ||
1590 | ||
72844950 | 1591 | commercial-at |
36c9828f | 1592 | |
36c9828f FM |
1593 | |
1594 | ||
1595 | ||
72844950 | 1596 | '@' |
36c9828f FM |
1597 | |
1598 | ||
36c9828f FM |
1599 | |
1600 | ||
1601 | ||
72844950 | 1602 | left-square-bracket |
36c9828f | 1603 | |
36c9828f FM |
1604 | |
1605 | ||
1606 | ||
72844950 | 1607 | '[' |
36c9828f FM |
1608 | |
1609 | ||
36c9828f FM |
1610 | |
1611 | ||
1612 | ||
72844950 | 1613 | backslash |
36c9828f | 1614 | |
36c9828f FM |
1615 | |
1616 | ||
1617 | ||
72844950 | 1618 | '\' |
36c9828f FM |
1619 | |
1620 | ||
36c9828f FM |
1621 | |
1622 | ||
1623 | ||
72844950 | 1624 | reverse-solidus |
36c9828f | 1625 | |
36c9828f FM |
1626 | |
1627 | ||
1628 | ||
72844950 | 1629 | '\' |
36c9828f FM |
1630 | |
1631 | ||
36c9828f FM |
1632 | |
1633 | ||
1634 | ||
72844950 | 1635 | right-square-bracket |
36c9828f | 1636 | |
36c9828f FM |
1637 | |
1638 | ||
1639 | ||
72844950 | 1640 | ']' |
36c9828f FM |
1641 | |
1642 | ||
36c9828f FM |
1643 | |
1644 | ||
1645 | ||
72844950 | 1646 | circumflex |
36c9828f | 1647 | |
36c9828f FM |
1648 | |
1649 | ||
1650 | ||
72844950 | 1651 | '^' |
36c9828f FM |
1652 | |
1653 | ||
36c9828f FM |
1654 | |
1655 | ||
1656 | ||
72844950 | 1657 | circumflex-accent |
36c9828f | 1658 | |
36c9828f FM |
1659 | |
1660 | ||
1661 | ||
72844950 | 1662 | '^' |
36c9828f FM |
1663 | |
1664 | ||
36c9828f FM |
1665 | |
1666 | ||
1667 | ||
72844950 | 1668 | underscore |
36c9828f | 1669 | |
36c9828f FM |
1670 | |
1671 | ||
1672 | ||
72844950 | 1673 | '_' |
36c9828f FM |
1674 | |
1675 | ||
36c9828f FM |
1676 | |
1677 | ||
1678 | ||
72844950 | 1679 | low-line |
36c9828f | 1680 | |
36c9828f FM |
1681 | |
1682 | ||
1683 | ||
72844950 | 1684 | '_' |
36c9828f FM |
1685 | |
1686 | ||
36c9828f FM |
1687 | |
1688 | ||
1689 | ||
72844950 | 1690 | grave-accent |
36c9828f | 1691 | |
36c9828f FM |
1692 | |
1693 | ||
1694 | ||
72844950 | 1695 | ''' |
36c9828f FM |
1696 | |
1697 | ||
36c9828f FM |
1698 | |
1699 | ||
1700 | ||
72844950 | 1701 | left-brace |
36c9828f | 1702 | |
36c9828f FM |
1703 | |
1704 | ||
1705 | ||
72844950 | 1706 | '{' |
36c9828f FM |
1707 | |
1708 | ||
36c9828f FM |
1709 | |
1710 | ||
1711 | ||
72844950 | 1712 | left-curly-bracket |
36c9828f | 1713 | |
36c9828f FM |
1714 | |
1715 | ||
1716 | ||
72844950 | 1717 | '{' |
36c9828f FM |
1718 | |
1719 | ||
36c9828f FM |
1720 | |
1721 | ||
1722 | ||
72844950 | 1723 | vertical-line |
36c9828f | 1724 | |
36c9828f FM |
1725 | |
1726 | ||
1727 | ||
72844950 | 1728 | '|' |
36c9828f FM |
1729 | |
1730 | ||
36c9828f FM |
1731 | |
1732 | ||
1733 | ||
72844950 | 1734 | right-brace |
36c9828f | 1735 | |
36c9828f FM |
1736 | |
1737 | ||
1738 | ||
72844950 | 1739 | '}' |
36c9828f FM |
1740 | |
1741 | ||
36c9828f FM |
1742 | |
1743 | ||
1744 | ||
72844950 | 1745 | right-curly-bracket |
36c9828f | 1746 | |
36c9828f FM |
1747 | |
1748 | ||
1749 | ||
72844950 | 1750 | '}' |
36c9828f FM |
1751 | |
1752 | ||
36c9828f FM |
1753 | |
1754 | ||
1755 | ||
72844950 | 1756 | tilde |
36c9828f | 1757 | |
36c9828f FM |
1758 | |
1759 | ||
1760 | ||
72844950 | 1761 | '~' |
36c9828f FM |
1762 | |
1763 | ||
36c9828f FM |
1764 | |
1765 | ||
1766 | ||
72844950 | 1767 | DEL |
36c9828f | 1768 | |
36c9828f FM |
1769 | |
1770 | ||
1771 | ||
72844950 | 1772 | '\177' |
36c9828f | 1773 | |
72844950 | 1774 | */ |
36c9828f | 1775 |