]> git.saurik.com Git - wxWidgets.git/blob - docs/doxygen/overviews/resyntax.h
compilation fix for pch-less build
[wxWidgets.git] / docs / doxygen / overviews / resyntax.h
1 /////////////////////////////////////////////////////////////////////////////
2 // Name: resyntax.h
3 // Purpose: topic overview
4 // Author: wxWidgets team
5 // RCS-ID: $Id$
6 // Licence: wxWindows license
7 /////////////////////////////////////////////////////////////////////////////
8
9 /*!
10
11 @page overview_resyntax Syntax of the Built-in Regular Expression Library
12
13 A <em>regular expression</em> describes strings of characters. It's a pattern
14 that matches certain strings and doesn't match others.
15
16 @seealso #wxRegEx
17
18 @li @ref overview_resyntax_differentflavors
19 @li @ref overview_resyntax_syntax
20 @li @ref overview_resyntax_bracket
21 @li @ref overview_resyntax_escapes
22 @li @ref overview_resyntax_metasyntax
23 @li @ref overview_resyntax_matching
24 @li @ref overview_resyntax_limits
25 @li @ref overview_resyntax_bre
26 @li @ref overview_resyntax_characters
27
28
29 <hr>
30
31
32 @section overview_resyntax_differentflavors Different Flavors of REs
33
34 Regular expressions ("RE''s), as defined by POSIX, come in two
35 flavors: @e extended REs ("EREs'') and @e basic REs ("BREs''). EREs are roughly those
36 of the traditional @e egrep, while BREs are roughly those of the traditional
37 @e ed. This implementation adds a third flavor, @e advanced REs ("AREs''), basically
38 EREs with some significant extensions.
39 This manual page primarily describes
40 AREs. BREs mostly exist for backward compatibility in some old programs;
41 they will be discussed at the #end. POSIX EREs are almost an exact subset
42 of AREs. Features of AREs that are not present in EREs will be indicated.
43
44
45 @section overview_resyntax_syntax Regular Expression Syntax
46
47 These regular expressions are implemented using
48 the package written by Henry Spencer, based on the 1003.2 spec and some
49 (not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description
50 of regular expressions below is copied verbatim from his manual entry.
51 An ARE is one or more @e branches, separated by '@b |', matching anything that matches
52 any of the branches.
53 A branch is zero or more @e constraints or @e quantified
54 atoms, concatenated. It matches a match for the first, followed by a match
55 for the second, etc; an empty branch matches the empty string.
56 A quantified atom is an @e atom possibly followed by a single @e quantifier. Without a quantifier,
57 it matches a match for the atom. The quantifiers, and what a so-quantified
58 atom matches, are:
59
60
61
62 @b *
63
64 a sequence of 0 or more matches of the atom
65
66 @b +
67
68 a sequence of 1 or more matches of the atom
69
70 @b ?
71
72 a sequence of 0 or 1 matches of the atom
73
74 @b {m}
75
76 a sequence of exactly @e m matches of the atom
77
78 @b {m,}
79
80 a sequence of @e m or more matches of the atom
81
82 @b {m,n}
83
84 a sequence of @e m through @e n (inclusive)
85 matches of the atom; @e m may not exceed @e n
86
87 @b *? +? ?? {m}? {m,}? {m,n}?
88
89 @e non-greedy quantifiers,
90 which match the same possibilities, but prefer the
91 smallest number rather than the largest number of matches (see #Matching)
92
93 The forms using @b { and @b } are known as @e bounds. The numbers @e m and @e n are unsigned
94 decimal integers with permissible values from 0 to 255 inclusive.
95 An atom is one of:
96
97 @b (re)
98
99 (where @e re is any regular expression) matches a match for
100 @e re, with the match noted for possible reporting
101
102 @b (?:re)
103
104 as previous, but
105 does no reporting (a "non-capturing'' set of parentheses)
106
107 @b ()
108
109 matches an empty
110 string, noted for possible reporting
111
112 @b (?:)
113
114 matches an empty string, without reporting
115
116 @b [chars]
117
118 a @e bracket expression, matching any one of the @e chars
119 (see @ref resynbracket_overview for more detail)
120
121 @b .
122
123 matches any single character
124
125 @b \k
126
127 (where @e k is a non-alphanumeric character)
128 matches that character taken as an ordinary character, e.g. \\ matches a backslash
129 character
130
131 @b \c
132
133 where @e c is alphanumeric (possibly followed by other characters),
134 an @e escape (AREs only), see #Escapes below
135
136 @b {
137
138 when followed by a character
139 other than a digit, matches the left-brace character '@b {'; when followed by
140 a digit, it is the beginning of a @e bound (see above)
141
142 @b x
143
144 where @e x is a single
145 character with no other significance, matches that character.
146
147 A @e constraint matches an empty string when specific conditions are met. A constraint may
148 not be followed by a quantifier. The simple constraints are as follows;
149 some more constraints are described later, under #Escapes.
150
151 @b ^
152
153 matches at the beginning of a line
154
155 @b $
156
157 matches at the end of a line
158
159 @b (?=re)
160
161 @e positive lookahead
162 (AREs only), matches at any point where a substring matching @e re begins
163
164 @b (?!re)
165
166 @e negative lookahead (AREs only),
167 matches at any point where no substring matching @e re begins
168
169
170
171 The lookahead constraints may not contain back references
172 (see later), and all parentheses within them are considered non-capturing.
173 An RE may not end with '@b \'.
174
175
176 @section overview_resyntax_bracket Bracket Expressions
177
178 A @e bracket expression is a list
179 of characters enclosed in '@b []'. It normally matches any single character from
180 the list (but see below). If the list begins with '@b ^', it matches any single
181 character (but see below) @e not from the rest of the list.
182 If two characters
183 in the list are separated by '@b -', this is shorthand for the full @e range of
184 characters between those two (inclusive) in the collating sequence, e.g.
185 @b [0-9] in ASCII matches any decimal digit. Two ranges may not share an endpoint,
186 so e.g. @b a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable
187 programs should avoid relying on them.
188 To include a literal @b ] or @b - in the
189 list, the simplest method is to enclose it in @b [. and @b .] to make it a collating
190 element (see below). Alternatively, make it the first character (following
191 a possible '@b ^'), or (AREs only) precede it with '@b \'.
192 Alternatively, for '@b -', make
193 it the last character, or the second endpoint of a range. To use a literal
194 @b - as the first endpoint of a range, make it a collating element or (AREs
195 only) precede it with '@b \'. With the exception of these, some combinations using
196 @b [ (see next paragraphs), and escapes, all other special characters lose
197 their special significance within a bracket expression.
198 Within a bracket
199 expression, a collating element (a character, a multi-character sequence
200 that collates as if it were a single character, or a collating-sequence
201 name for either) enclosed in @b [. and @b .] stands for the
202 sequence of characters of that collating element.
203 @e wxWidgets: Currently no multi-character collating elements are defined.
204 So in @b [.X.], @e X can either be a single character literal or
205 the name of a character. For example, the following are both identical
206 @b [[.0.]-[.9.]] and @b [[.zero.]-[.nine.]] and mean the same as
207 @b [0-9].
208 See @ref resynchars_overview.
209 Within a bracket expression, a collating element enclosed in @b [= and @b =]
210 is an equivalence class, standing for the sequences of characters of all
211 collating elements equivalent to that one, including itself.
212 An equivalence class may not be an endpoint of a range.
213 @e wxWidgets: Currently no equivalence classes are defined, so
214 @b [=X=] stands for just the single character @e X.
215 @e X can either be a single character literal or the name of a character,
216 see @ref resynchars_overview.
217 Within a bracket expression,
218 the name of a @e character class enclosed in @b [: and @b :] stands for the list
219 of all characters (not all collating elements!) belonging to that class.
220 Standard character classes are:
221
222
223
224 @b alpha
225
226 A letter.
227
228 @b upper
229
230 An upper-case letter.
231
232 @b lower
233
234 A lower-case letter.
235
236 @b digit
237
238 A decimal digit.
239
240 @b xdigit
241
242 A hexadecimal digit.
243
244 @b alnum
245
246 An alphanumeric (letter or digit).
247
248 @b print
249
250 An alphanumeric (same as alnum).
251
252 @b blank
253
254 A space or tab character.
255
256 @b space
257
258 A character producing white space in displayed text.
259
260 @b punct
261
262 A punctuation character.
263
264 @b graph
265
266 A character with a visible representation.
267
268 @b cntrl
269
270 A control character.
271
272
273
274 A character class may not be used as an endpoint of a range.
275 @e wxWidgets: In a non-Unicode build, these character classifications depend on the
276 current locale, and correspond to the values return by the ANSI C 'is'
277 functions: isalpha, isupper, etc. In Unicode mode they are based on
278 Unicode classifications, and are not affected by the current locale.
279 There are two special cases of bracket expressions:
280 the bracket expressions @b [[::]] and @b [[::]] are constraints, matching empty
281 strings at the beginning and end of a word respectively. A word is defined
282 as a sequence of word characters that is neither preceded nor followed
283 by word characters. A word character is an @e alnum character or an underscore
284 (@b _). These special bracket expressions are deprecated; users of AREs should
285 use constraint escapes instead (see #Escapes below).
286
287
288 @section overview_resyntax_escapes Escapes
289
290 Escapes (AREs only),
291 which begin with a @b \ followed by an alphanumeric character, come in several
292 varieties: character entry, class shorthands, constraint escapes, and back
293 references. A @b \ followed by an alphanumeric character but not constituting
294 a valid escape is illegal in AREs. In EREs, there are no escapes: outside
295 a bracket expression, a @b \ followed by an alphanumeric character merely stands
296 for that character as an ordinary character, and inside a bracket expression,
297 @b \ is an ordinary character. (The latter is the one actual incompatibility
298 between EREs and AREs.)
299 Character-entry escapes (AREs only) exist to make
300 it easier to specify non-printing and otherwise inconvenient characters
301 in REs:
302
303
304
305 @b \a
306
307 alert (bell) character, as in C
308
309 @b \b
310
311 backspace, as in C
312
313 @b \B
314
315 synonym
316 for @b \ to help reduce backslash doubling in some applications where there
317 are multiple levels of backslash processing
318
319 @b \c@e X
320
321 (where X is any character)
322 the character whose low-order 5 bits are the same as those of @e X, and whose
323 other bits are all zero
324
325 @b \e
326
327 the character whose collating-sequence name is
328 '@b ESC', or failing that, the character with octal value 033
329
330 @b \f
331
332 formfeed, as in C
333
334 @b \n
335
336 newline, as in C
337
338 @b \r
339
340 carriage return, as in C
341
342 @b \t
343
344 horizontal tab, as in C
345
346 @b \u@e wxyz
347
348 (where @e wxyz is exactly four hexadecimal digits)
349 the Unicode
350 character @b U+@e wxyz in the local byte ordering
351
352 @b \U@e stuvwxyz
353
354 (where @e stuvwxyz is
355 exactly eight hexadecimal digits) reserved for a somewhat-hypothetical Unicode
356 extension to 32 bits
357
358 @b \v
359
360 vertical tab, as in C are all available.
361
362 @b \x@e hhh
363
364 (where
365 @e hhh is any sequence of hexadecimal digits) the character whose hexadecimal
366 value is @b 0x@e hhh (a single character no matter how many hexadecimal digits
367 are used).
368
369 @b \0
370
371 the character whose value is @b 0
372
373 @b \@e xy
374
375 (where @e xy is exactly two
376 octal digits, and is not a @e back reference (see below)) the character whose
377 octal value is @b 0@e xy
378
379 @b \@e xyz
380
381 (where @e xyz is exactly three octal digits, and is
382 not a back reference (see below))
383 the character whose octal value is @b 0@e xyz
384
385
386
387 Hexadecimal digits are '@b 0'-'@b 9', '@b a'-'@b f', and '@b A'-'@b F'. Octal
388 digits are '@b 0'-'@b 7'.
389 The character-entry
390 escapes are always taken as ordinary characters. For example, @b \135 is @b ] in
391 ASCII, but @b \135 does not terminate a bracket expression. Beware, however,
392 that some applications (e.g., C compilers) interpret such sequences themselves
393 before the regular-expression package gets to see them, which may require
394 doubling (quadrupling, etc.) the '@b \'.
395 Class-shorthand escapes (AREs only) provide
396 shorthands for certain commonly-used character classes:
397
398
399
400 @b \d
401
402 @b [[:digit:]]
403
404 @b \s
405
406 @b [[:space:]]
407
408 @b \w
409
410 @b [[:alnum:]_] (note underscore)
411
412 @b \D
413
414 @b [^[:digit:]]
415
416 @b \S
417
418 @b [^[:space:]]
419
420 @b \W
421
422 @b [^[:alnum:]_] (note underscore)
423
424
425
426 Within bracket expressions, '@b \d', '@b \s', and
427 '@b \w' lose their outer brackets, and '@b \D',
428 '@b \S', and '@b \W' are illegal. (So, for example,
429 @b [a-c\d] is equivalent to @b [a-c[:digit:]].
430 Also, @b [a-c\D], which is equivalent to
431 @b [a-c^[:digit:]], is illegal.)
432 A constraint escape (AREs only) is a constraint,
433 matching the empty string if specific conditions are met, written as an
434 escape:
435
436
437
438 @b \A
439
440 matches only at the beginning of the string
441 (see #Matching, below,
442 for how this differs from '@b ^')
443
444 @b \m
445
446 matches only at the beginning of a word
447
448 @b \M
449
450 matches only at the end of a word
451
452 @b \y
453
454 matches only at the beginning or end of a word
455
456 @b \Y
457
458 matches only at a point that is not the beginning or end of
459 a word
460
461 @b \Z
462
463 matches only at the end of the string
464 (see #Matching, below, for
465 how this differs from '@b $')
466
467 @b \@e m
468
469 (where @e m is a nonzero digit) a @e back reference,
470 see below
471
472 @b \@e mnn
473
474 (where @e m is a nonzero digit, and @e nn is some more digits,
475 and the decimal value @e mnn is not greater than the number of closing capturing
476 parentheses seen so far) a @e back reference, see below
477
478
479
480 A word is defined
481 as in the specification of @b [[::]] and @b [[::]] above. Constraint escapes are
482 illegal within bracket expressions.
483 A back reference (AREs only) matches
484 the same string matched by the parenthesized subexpression specified by
485 the number, so that (e.g.) @b ([bc])\1 matches @b bb or @b cc but not '@b bc'.
486 The subexpression
487 must entirely precede the back reference in the RE. Subexpressions are numbered
488 in the order of their leading parentheses. Non-capturing parentheses do not
489 define subexpressions.
490 There is an inherent historical ambiguity between
491 octal character-entry escapes and back references, which is resolved by
492 heuristics, as hinted at above. A leading zero always indicates an octal
493 escape. A single non-zero digit, not followed by another digit, is always
494 taken as a back reference. A multi-digit sequence not starting with a zero
495 is taken as a back reference if it comes after a suitable subexpression
496 (i.e. the number is in the legal range for a back reference), and otherwise
497 is taken as octal.
498
499
500 @section overview_resyntax_metasyntax Metasyntax
501
502 In addition to the main syntax described above,
503 there are some special forms and miscellaneous syntactic facilities available.
504 Normally the flavor of RE being used is specified by application-dependent
505 means. However, this can be overridden by a @e director. If an RE of any flavor
506 begins with '@b ***:', the rest of the RE is an ARE. If an RE of any flavor begins
507 with '@b ***=', the rest of the RE is taken to be a literal string, with all
508 characters considered ordinary characters.
509 An ARE may begin with @e embedded options: a sequence @b (?xyz)
510 (where @e xyz is one or more alphabetic characters)
511 specifies options affecting the rest of the RE. These supplement, and can
512 override, any options specified by the application. The available option
513 letters are:
514
515
516
517 @b b
518
519 rest of RE is a BRE
520
521 @b c
522
523 case-sensitive matching (usual default)
524
525 @b e
526
527 rest of RE is an ERE
528
529 @b i
530
531 case-insensitive matching (see #Matching, below)
532
533 @b m
534
535 historical synonym for @b n
536
537 @b n
538
539 newline-sensitive matching (see #Matching, below)
540
541 @b p
542
543 partial newline-sensitive matching (see #Matching, below)
544
545 @b q
546
547 rest of RE
548 is a literal ("quoted'') string, all ordinary characters
549
550 @b s
551
552 non-newline-sensitive matching (usual default)
553
554 @b t
555
556 tight syntax (usual default; see below)
557
558 @b w
559
560 inverse
561 partial newline-sensitive ("weird'') matching (see #Matching, below)
562
563 @b x
564
565 expanded syntax (see below)
566
567
568
569 Embedded options take effect at the @b ) terminating the
570 sequence. They are available only at the start of an ARE, and may not be
571 used later within it.
572 In addition to the usual (@e tight) RE syntax, in which
573 all characters are significant, there is an @e expanded syntax, available
574 in AREs with the embedded
575 x option. In the expanded syntax, white-space characters are ignored and
576 all characters between a @b # and the following newline (or the end of the
577 RE) are ignored, permitting paragraphing and commenting a complex RE. There
578 are three exceptions to that basic rule:
579
580
581 a white-space character or '@b #' preceded
582 by '@b \' is retained
583 white space or '@b #' within a bracket expression is retained
584 white space and comments are illegal within multi-character symbols like
585 the ARE '@b (?:' or the BRE '@b \('
586
587
588 Expanded-syntax white-space characters are blank,
589 tab, newline, and any character that belongs to the @e space character class.
590 Finally, in an ARE, outside bracket expressions, the sequence '@b (?#ttt)' (where
591 @e ttt is any text not containing a '@b )') is a comment, completely ignored. Again,
592 this is not allowed between the characters of multi-character symbols like
593 '@b (?:'. Such comments are more a historical artifact than a useful facility,
594 and their use is deprecated; use the expanded syntax instead.
595 @e None of these
596 metasyntax extensions is available if the application (or an initial @b ***=
597 director) has specified that the user's input be treated as a literal string
598 rather than as an RE.
599
600
601 @section overview_resyntax_matching Matching
602
603 In the event that an RE could match more than
604 one substring of a given string, the RE matches the one starting earliest
605 in the string. If the RE could match more than one substring starting at
606 that point, its choice is determined by its @e preference: either the longest
607 substring, or the shortest.
608 Most atoms, and all constraints, have no preference.
609 A parenthesized RE has the same preference (possibly none) as the RE. A
610 quantified atom with quantifier @b {m} or @b {m}? has the same preference (possibly
611 none) as the atom itself. A quantified atom with other normal quantifiers
612 (including @b {m,n} with @e m equal to @e n) prefers longest match. A quantified
613 atom with other non-greedy quantifiers (including @b {m,n}? with @e m equal to
614 @e n) prefers shortest match. A branch has the same preference as the first
615 quantified atom in it which has a preference. An RE consisting of two or
616 more branches connected by the @b | operator prefers longest match.
617 Subject to the constraints imposed by the rules for matching the whole RE, subexpressions
618 also match the longest or shortest possible substrings, based on their
619 preferences, with subexpressions starting earlier in the RE taking priority
620 over ones starting later. Note that outer subexpressions thus take priority
621 over their component subexpressions.
622 Note that the quantifiers @b {1,1} and
623 @b {1,1}? can be used to force longest and shortest preference, respectively,
624 on a subexpression or a whole RE.
625 Match lengths are measured in characters,
626 not collating elements. An empty string is considered longer than no match
627 at all. For example, @b bb* matches the three middle characters
628 of '@b abbbc', @b (week|wee)(night|knights)
629 matches all ten characters of '@b weeknights', when @b (.*).* is matched against
630 @b abc the parenthesized subexpression matches all three characters, and when
631 @b (a*)* is matched against @b bc both the whole RE and the parenthesized subexpression
632 match an empty string.
633 If case-independent matching is specified, the effect
634 is much as if all case distinctions had vanished from the alphabet. When
635 an alphabetic that exists in multiple cases appears as an ordinary character
636 outside a bracket expression, it is effectively transformed into a bracket
637 expression containing both cases, so that @b x becomes '@b [xX]'. When it appears
638 inside a bracket expression, all case counterparts of it are added to the
639 bracket expression, so that @b [x] becomes @b [xX] and @b [^x] becomes '@b [^xX]'.
640 If newline-sensitive
641 matching is specified, @b . and bracket expressions using @b ^ will never match
642 the newline character (so that matches will never cross newlines unless
643 the RE explicitly arranges it) and @b ^ and @b $ will match the empty string after
644 and before a newline respectively, in addition to matching at beginning
645 and end of string respectively. ARE @b \A and @b \Z continue to match beginning
646 or end of string @e only.
647 If partial newline-sensitive matching is specified,
648 this affects @b . and bracket expressions as with newline-sensitive matching,
649 but not @b ^ and '@b $'.
650 If inverse partial newline-sensitive matching is specified,
651 this affects @b ^ and @b $ as with newline-sensitive matching, but not @b . and bracket
652 expressions. This isn't very useful but is provided for symmetry.
653
654
655 @section overview_resyntax_limits Limits and Compatibility
656
657 No particular limit is imposed on the length of REs. Programs
658 intended to be highly portable should not employ REs longer than 256 bytes,
659 as a POSIX-compliant implementation can refuse to accept such REs.
660 The only
661 feature of AREs that is actually incompatible with POSIX EREs is that @b \
662 does not lose its special significance inside bracket expressions. All other
663 ARE features use syntax which is illegal or has undefined or unspecified
664 effects in POSIX EREs; the @b *** syntax of directors likewise is outside
665 the POSIX syntax for both BREs and EREs.
666 Many of the ARE extensions are
667 borrowed from Perl, but some have been changed to clean them up, and a
668 few Perl extensions are not present. Incompatibilities of note include '@b \b',
669 '@b \B', the lack of special treatment for a trailing newline, the addition of
670 complemented bracket expressions to the things affected by newline-sensitive
671 matching, the restrictions on parentheses and back references in lookahead
672 constraints, and the longest/shortest-match (rather than first-match) matching
673 semantics.
674 The matching rules for REs containing both normal and non-greedy
675 quantifiers have changed since early beta-test versions of this package.
676 (The new rules are much simpler and cleaner, but don't work as hard at guessing
677 the user's real intentions.)
678 Henry Spencer's original 1986 @e regexp package, still in widespread use,
679 implemented an early version of today's EREs. There are four incompatibilities between @e regexp's
680 near-EREs ('RREs' for short) and AREs. In roughly increasing order of significance:
681
682 In AREs, @b \ followed by an alphanumeric character is either an escape or
683 an error, while in RREs, it was just another way of writing the alphanumeric.
684 This should not be a problem because there was no reason to write such
685 a sequence in RREs.
686 @b { followed by a digit in an ARE is the beginning of
687 a bound, while in RREs, @b { was always an ordinary character. Such sequences
688 should be rare, and will often result in an error because following characters
689 will not look like a valid bound.
690 In AREs, @b \ remains a special character
691 within '@b []', so a literal @b \ within @b [] must be
692 written '@b \\'. @b \\ also gives a literal
693 @b \ within @b [] in RREs, but only truly paranoid programmers routinely doubled
694 the backslash.
695 AREs report the longest/shortest match for the RE, rather
696 than the first found in a specified search order. This may affect some RREs
697 which were written in the expectation that the first match would be reported.
698 (The careful crafting of RREs to optimize the search order for fast matching
699 is obsolete (AREs examine all possible matches in parallel, and their performance
700 is largely insensitive to their complexity) but cases where the search
701 order was exploited to deliberately find a match which was @e not the longest/shortest
702 will need rewriting.)
703
704
705 @section overview_resyntax_bre Basic Regular Expressions
706
707 BREs differ from EREs in
708 several respects. '@b |', '@b +', and @b ? are ordinary characters and there is no equivalent
709 for their functionality. The delimiters for bounds
710 are @b \{ and '@b \}', with @b { and
711 @b } by themselves ordinary characters. The parentheses for nested subexpressions
712 are @b \( and '@b \)', with @b ( and @b ) by themselves
713 ordinary characters. @b ^ is an ordinary
714 character except at the beginning of the RE or the beginning of a parenthesized
715 subexpression, @b $ is an ordinary character except at the end of the RE or
716 the end of a parenthesized subexpression, and @b * is an ordinary character
717 if it appears at the beginning of the RE or the beginning of a parenthesized
718 subexpression (after a possible leading '@b ^'). Finally, single-digit back references
719 are available, and @b \ and @b \ are synonyms
720 for @b [[::]] and @b [[::]] respectively;
721 no other escapes are available.
722
723
724 @section overview_resyntax_characters Regular Expression Character Names
725
726 Note that the character names are case sensitive.
727
728
729
730
731
732
733 NUL
734
735
736
737
738 '\0'
739
740
741
742
743
744 SOH
745
746
747
748
749 '\001'
750
751
752
753
754
755 STX
756
757
758
759
760 '\002'
761
762
763
764
765
766 ETX
767
768
769
770
771 '\003'
772
773
774
775
776
777 EOT
778
779
780
781
782 '\004'
783
784
785
786
787
788 ENQ
789
790
791
792
793 '\005'
794
795
796
797
798
799 ACK
800
801
802
803
804 '\006'
805
806
807
808
809
810 BEL
811
812
813
814
815 '\007'
816
817
818
819
820
821 alert
822
823
824
825
826 '\007'
827
828
829
830
831
832 BS
833
834
835
836
837 '\010'
838
839
840
841
842
843 backspace
844
845
846
847
848 '\b'
849
850
851
852
853
854 HT
855
856
857
858
859 '\011'
860
861
862
863
864
865 tab
866
867
868
869
870 '\t'
871
872
873
874
875
876 LF
877
878
879
880
881 '\012'
882
883
884
885
886
887 newline
888
889
890
891
892 '\n'
893
894
895
896
897
898 VT
899
900
901
902
903 '\013'
904
905
906
907
908
909 vertical-tab
910
911
912
913
914 '\v'
915
916
917
918
919
920 FF
921
922
923
924
925 '\014'
926
927
928
929
930
931 form-feed
932
933
934
935
936 '\f'
937
938
939
940
941
942 CR
943
944
945
946
947 '\015'
948
949
950
951
952
953 carriage-return
954
955
956
957
958 '\r'
959
960
961
962
963
964 SO
965
966
967
968
969 '\016'
970
971
972
973
974
975 SI
976
977
978
979
980 '\017'
981
982
983
984
985
986 DLE
987
988
989
990
991 '\020'
992
993
994
995
996
997 DC1
998
999
1000
1001
1002 '\021'
1003
1004
1005
1006
1007
1008 DC2
1009
1010
1011
1012
1013 '\022'
1014
1015
1016
1017
1018
1019 DC3
1020
1021
1022
1023
1024 '\023'
1025
1026
1027
1028
1029
1030 DC4
1031
1032
1033
1034
1035 '\024'
1036
1037
1038
1039
1040
1041 NAK
1042
1043
1044
1045
1046 '\025'
1047
1048
1049
1050
1051
1052 SYN
1053
1054
1055
1056
1057 '\026'
1058
1059
1060
1061
1062
1063 ETB
1064
1065
1066
1067
1068 '\027'
1069
1070
1071
1072
1073
1074 CAN
1075
1076
1077
1078
1079 '\030'
1080
1081
1082
1083
1084
1085 EM
1086
1087
1088
1089
1090 '\031'
1091
1092
1093
1094
1095
1096 SUB
1097
1098
1099
1100
1101 '\032'
1102
1103
1104
1105
1106
1107 ESC
1108
1109
1110
1111
1112 '\033'
1113
1114
1115
1116
1117
1118 IS4
1119
1120
1121
1122
1123 '\034'
1124
1125
1126
1127
1128
1129 FS
1130
1131
1132
1133
1134 '\034'
1135
1136
1137
1138
1139
1140 IS3
1141
1142
1143
1144
1145 '\035'
1146
1147
1148
1149
1150
1151 GS
1152
1153
1154
1155
1156 '\035'
1157
1158
1159
1160
1161
1162 IS2
1163
1164
1165
1166
1167 '\036'
1168
1169
1170
1171
1172
1173 RS
1174
1175
1176
1177
1178 '\036'
1179
1180
1181
1182
1183
1184 IS1
1185
1186
1187
1188
1189 '\037'
1190
1191
1192
1193
1194
1195 US
1196
1197
1198
1199
1200 '\037'
1201
1202
1203
1204
1205
1206 space
1207
1208
1209
1210
1211 ' '
1212
1213
1214
1215
1216
1217 exclamation-mark
1218
1219
1220
1221
1222 '!'
1223
1224
1225
1226
1227
1228 quotation-mark
1229
1230
1231
1232
1233 '"'
1234
1235
1236
1237
1238
1239 number-sign
1240
1241
1242
1243
1244 '#'
1245
1246
1247
1248
1249
1250 dollar-sign
1251
1252
1253
1254
1255 '$'
1256
1257
1258
1259
1260
1261 percent-sign
1262
1263
1264
1265
1266 '%'
1267
1268
1269
1270
1271
1272 ampersand
1273
1274
1275
1276
1277 ''
1278
1279
1280
1281
1282
1283 apostrophe
1284
1285
1286
1287
1288 '\''
1289
1290
1291
1292
1293
1294 left-parenthesis
1295
1296
1297
1298
1299 '('
1300
1301
1302
1303
1304
1305 right-parenthesis
1306
1307
1308
1309
1310 ')'
1311
1312
1313
1314
1315
1316 asterisk
1317
1318
1319
1320
1321 '*'
1322
1323
1324
1325
1326
1327 plus-sign
1328
1329
1330
1331
1332 '+'
1333
1334
1335
1336
1337
1338 comma
1339
1340
1341
1342
1343 ','
1344
1345
1346
1347
1348
1349 hyphen
1350
1351
1352
1353
1354 '-'
1355
1356
1357
1358
1359
1360 hyphen-minus
1361
1362
1363
1364
1365 '-'
1366
1367
1368
1369
1370
1371 period
1372
1373
1374
1375
1376 '.'
1377
1378
1379
1380
1381
1382 full-stop
1383
1384
1385
1386
1387 '.'
1388
1389
1390
1391
1392
1393 slash
1394
1395
1396
1397
1398 '/'
1399
1400
1401
1402
1403
1404 solidus
1405
1406
1407
1408
1409 '/'
1410
1411
1412
1413
1414
1415 zero
1416
1417
1418
1419
1420 '0'
1421
1422
1423
1424
1425
1426 one
1427
1428
1429
1430
1431 '1'
1432
1433
1434
1435
1436
1437 two
1438
1439
1440
1441
1442 '2'
1443
1444
1445
1446
1447
1448 three
1449
1450
1451
1452
1453 '3'
1454
1455
1456
1457
1458
1459 four
1460
1461
1462
1463
1464 '4'
1465
1466
1467
1468
1469
1470 five
1471
1472
1473
1474
1475 '5'
1476
1477
1478
1479
1480
1481 six
1482
1483
1484
1485
1486 '6'
1487
1488
1489
1490
1491
1492 seven
1493
1494
1495
1496
1497 '7'
1498
1499
1500
1501
1502
1503 eight
1504
1505
1506
1507
1508 '8'
1509
1510
1511
1512
1513
1514 nine
1515
1516
1517
1518
1519 '9'
1520
1521
1522
1523
1524
1525 colon
1526
1527
1528
1529
1530 ':'
1531
1532
1533
1534
1535
1536 semicolon
1537
1538
1539
1540
1541 ';'
1542
1543
1544
1545
1546
1547 less-than-sign
1548
1549
1550
1551
1552 ''
1553
1554
1555
1556
1557
1558 equals-sign
1559
1560
1561
1562
1563 '='
1564
1565
1566
1567
1568
1569 greater-than-sign
1570
1571
1572
1573
1574 ''
1575
1576
1577
1578
1579
1580 question-mark
1581
1582
1583
1584
1585 '?'
1586
1587
1588
1589
1590
1591 commercial-at
1592
1593
1594
1595
1596 '@'
1597
1598
1599
1600
1601
1602 left-square-bracket
1603
1604
1605
1606
1607 '['
1608
1609
1610
1611
1612
1613 backslash
1614
1615
1616
1617
1618 '\'
1619
1620
1621
1622
1623
1624 reverse-solidus
1625
1626
1627
1628
1629 '\'
1630
1631
1632
1633
1634
1635 right-square-bracket
1636
1637
1638
1639
1640 ']'
1641
1642
1643
1644
1645
1646 circumflex
1647
1648
1649
1650
1651 '^'
1652
1653
1654
1655
1656
1657 circumflex-accent
1658
1659
1660
1661
1662 '^'
1663
1664
1665
1666
1667
1668 underscore
1669
1670
1671
1672
1673 '_'
1674
1675
1676
1677
1678
1679 low-line
1680
1681
1682
1683
1684 '_'
1685
1686
1687
1688
1689
1690 grave-accent
1691
1692
1693
1694
1695 '''
1696
1697
1698
1699
1700
1701 left-brace
1702
1703
1704
1705
1706 '{'
1707
1708
1709
1710
1711
1712 left-curly-bracket
1713
1714
1715
1716
1717 '{'
1718
1719
1720
1721
1722
1723 vertical-line
1724
1725
1726
1727
1728 '|'
1729
1730
1731
1732
1733
1734 right-brace
1735
1736
1737
1738
1739 '}'
1740
1741
1742
1743
1744
1745 right-curly-bracket
1746
1747
1748
1749
1750 '}'
1751
1752
1753
1754
1755
1756 tilde
1757
1758
1759
1760
1761 '~'
1762
1763
1764
1765
1766
1767 DEL
1768
1769
1770
1771
1772 '\177'
1773
1774 */
1775