From 8f8e45c4d941c9b17a77ec71595e3d7a39058234 Mon Sep 17 00:00:00 2001 From: =?utf8?q?V=C3=A1clav=20Slav=C3=ADk?= Date: Mon, 8 Mar 2004 09:41:41 +0000 Subject: [PATCH] removed more unneeded files, see patch 890642 git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@26134 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775 --- src/regex/Makefile | 28 --- src/regex/WHATSNEW | 108 ----------- src/regex/engine.ih | 35 ---- src/regex/mkh | 76 -------- src/regex/regcomp.ih | 53 ------ src/regex/regerror.ih | 12 -- src/regex/regex.3 | 509 -------------------------------------------------- 7 files changed, 821 deletions(-) delete mode 100644 src/regex/Makefile delete mode 100644 src/regex/WHATSNEW delete mode 100644 src/regex/engine.ih delete mode 100644 src/regex/mkh delete mode 100644 src/regex/regcomp.ih delete mode 100644 src/regex/regerror.ih delete mode 100644 src/regex/regex.3 diff --git a/src/regex/Makefile b/src/regex/Makefile deleted file mode 100644 index cb76d37..0000000 --- a/src/regex/Makefile +++ /dev/null @@ -1,28 +0,0 @@ -#------------------------------------------------------------------------- -# -# Makefile-- -# Makefile for backend/regex -# -# IDENTIFICATION -# $Header: /projects/cvsroot/pgsql-server/src/backend/regex/Makefile,v 1.20 2003/02/05 17:41:32 tgl Exp $ -# -#------------------------------------------------------------------------- - -subdir = src/backend/regex -top_builddir = ../../.. -include $(top_builddir)/src/Makefile.global - -OBJS = regcomp.o regerror.o regexec.o regfree.o - -all: SUBSYS.o - -SUBSYS.o: $(OBJS) - $(LD) $(LDREL) $(LDOUT) SUBSYS.o $(OBJS) - -# mark inclusion dependencies between .c files explicitly -regcomp.o: regcomp.c regc_lex.c regc_color.c regc_nfa.c regc_cvec.c regc_locale.c - -regexec.o: regexec.c rege_dfa.c - -clean: - rm -f SUBSYS.o $(OBJS) diff --git a/src/regex/WHATSNEW b/src/regex/WHATSNEW deleted file mode 100644 index 1295343..0000000 --- a/src/regex/WHATSNEW +++ /dev/null @@ -1,108 +0,0 @@ -New in alpha3.8: Bug fix for signed/unsigned mixup, found and fixed -by the FreeBSD folks. - -New in alpha3.7: A bit of cleanup aimed at maximizing portability, -possibly at slight cost in efficiency. "ul" suffixes and "unsigned long" -no longer appear, in particular. - -New in alpha3.6: A couple more portability glitches fixed. - -New in alpha3.5: Active development of this code has been stopped -- -I'm working on a complete reimplementation -- but folks have found some -minor portability glitches and the like, hence this release to fix them. -One penalty: slightly reduced compatibility with old compilers, because -the ANSI C `unsigned long' type and `ul' constant suffix are used in a -few places (I could avoid this but it would be considerably more work). - -New in alpha3.4: The complex bug alluded to below has been fixed (in a -slightly kludgey temporary way that may hurt efficiency a bit; this is -another "get it out the door for 4.4" release). The tests at the end of -the tests file have accordingly been uncommented. The primary sign of -the bug was that something like a?b matching ab matched b rather than ab. -(The bug was essentially specific to this exact situation, else it would -have shown up earlier.) - -New in alpha3.3: The definition of word boundaries has been altered -slightly, to more closely match the usual programming notion that "_" -is an alphabetic. Stuff used for pre-ANSI systems is now in a subdir, -and the makefile no longer alludes to it in mysterious ways. The -makefile has generally been cleaned up some. Fixes have been made -(again!) so that the regression test will run without -DREDEBUG, at -the cost of weaker checking. A workaround for a bug in some folks' - has been added. And some more things have been added to -tests, including a couple right at the end which are commented out -because the code currently flunks them (complex bug; fix coming). -Plus the usual minor cleanup. - -New in alpha3.2: Assorted bits of cleanup and portability improvement -(the development base is now a BSDI system using GCC instead of an ancient -Sun system, and the newer compiler exposed some glitches). Fix for a -serious bug that affected REs using many [] (including REG_ICASE REs -because of the way they are implemented), *sometimes*, depending on -memory-allocation patterns. The header-file prototypes no longer name -the parameters, avoiding possible name conflicts. The possibility that -some clot has defined CHAR_MIN as (say) `-128' instead of `(-128)' is -now handled gracefully. "uchar" is no longer used as an internal type -name (too many people have the same idea). Still the same old lousy -performance, alas. - -New in alpha3.1: Basically nothing, this release is just a bookkeeping -convenience. Stay tuned. - -New in alpha3.0: Performance is no better, alas, but some fixes have been -made and some functionality has been added. (This is basically the "get -it out the door in time for 4.4" release.) One bug fix: regfree() didn't -free the main internal structure (how embarrassing). It is now possible -to put NULs in either the RE or the target string, using (resp.) a new -REG_PEND flag and the old REG_STARTEND flag. The REG_NOSPEC flag to -regcomp() makes all characters ordinary, so you can match a literal -string easily (this will become more useful when performance improves!). -There are now primitives to match beginnings and ends of words, although -the syntax is disgusting and so is the implementation. The REG_ATOI -debugging interface has changed a bit. And there has been considerable -internal cleanup of various kinds. - -New in alpha2.3: Split change list out of README, and moved flags notes -into Makefile. Macro-ized the name of regex(7) in regex(3), since it has -to change for 4.4BSD. Cleanup work in engine.c, and some new regression -tests to catch tricky cases thereof. - -New in alpha2.2: Out-of-date manpages updated. Regerror() acquires two -small extensions -- REG_ITOA and REG_ATOI -- which avoid debugging kludges -in my own test program and might be useful to others for similar purposes. -The regression test will now compile (and run) without REDEBUG. The -BRE \$ bug is fixed. Most uses of "uchar" are gone; it's all chars now. -Char/uchar parameters are now written int/unsigned, to avoid possible -portability problems with unpromoted parameters. Some unsigned casts have -been introduced to minimize portability problems with shifting into sign -bits. - -New in alpha2.1: Lots of little stuff, cleanup and fixes. The one big -thing is that regex.h is now generated, using mkh, rather than being -supplied in the distribution; due to circularities in dependencies, -you have to build regex.h explicitly by "make h". The two known bugs -have been fixed (and the regression test now checks for them), as has a -problem with assertions not being suppressed in the absence of REDEBUG. -No performance work yet. - -New in alpha2: Backslash-anything is an ordinary character, not an -error (except, of course, for the handful of backslashed metacharacters -in BREs), which should reduce script breakage. The regression test -checks *where* null strings are supposed to match, and has generally -been tightened up somewhat. Small bug fixes in parameter passing (not -harmful, but technically errors) and some other areas. Debugging -invoked by defining REDEBUG rather than not defining NDEBUG. - -New in alpha+3: full prototyping for internal routines, using a little -helper program, mkh, which extracts prototypes given in stylized comments. -More minor cleanup. Buglet fix: it's CHAR_BIT, not CHAR_BITS. Simple -pre-screening of input when a literal string is known to be part of the -RE; this does wonders for performance. - -New in alpha+2: minor bits of cleanup. Notably, the number "32" for the -word width isn't hardwired into regexec.c any more, the public header -file prototypes the functions if __STDC__ is defined, and some small typos -in the manpages have been fixed. - -New in alpha+1: improvements to the manual pages, and an important -extension, the REG_STARTEND option to regexec(). diff --git a/src/regex/engine.ih b/src/regex/engine.ih deleted file mode 100644 index cc98334..0000000 --- a/src/regex/engine.ih +++ /dev/null @@ -1,35 +0,0 @@ -/* ========= begin header generated by ./mkh ========= */ -#ifdef __cplusplus -extern "C" { -#endif - -/* === engine.c === */ -static int matcher(register struct re_guts *g, char *string, size_t nmatch, regmatch_t pmatch[], int eflags); -static char *dissect(register struct match *m, char *start, char *stop, sopno startst, sopno stopst); -static char *backref(register struct match *m, char *start, char *stop, sopno startst, sopno stopst, sopno lev); -static char *fast(register struct match *m, char *start, char *stop, sopno startst, sopno stopst); -static char *slow(register struct match *m, char *start, char *stop, sopno startst, sopno stopst); -static states step(register struct re_guts *g, sopno start, sopno stop, register states bef, int ch, register states aft); -#define BOL (OUT+1) -#define EOL (BOL+1) -#define BOLEOL (BOL+2) -#define NOTHING (BOL+3) -#define BOW (BOL+4) -#define EOW (BOL+5) -#define CODEMAX (BOL+5) /* highest code used */ -#define NONCHAR(c) ((c) > CHAR_MAX) -#define NNONCHAR (CODEMAX-CHAR_MAX) -#ifdef REDEBUG -static void print(struct match *m, char *caption, states st, int ch, FILE *d); -#endif -#ifdef REDEBUG -static void at(struct match *m, char *title, char *start, char *stop, sopno startst, sopno stopst); -#endif -#ifdef REDEBUG -static char *pchar(int ch); -#endif - -#ifdef __cplusplus -} -#endif -/* ========= end header generated by ./mkh ========= */ diff --git a/src/regex/mkh b/src/regex/mkh deleted file mode 100644 index 252b246..0000000 --- a/src/regex/mkh +++ /dev/null @@ -1,76 +0,0 @@ -#! /bin/sh -# mkh - pull headers out of C source -PATH=/bin:/usr/bin ; export PATH - -# egrep pattern to pick out marked lines -egrep='^ =([ ]|$)' - -# Sed program to process marked lines into lines for the header file. -# The markers have already been removed. Two things are done here: removal -# of backslashed newlines, and some fudging of comments. The first is done -# because -o needs to have prototypes on one line to strip them down. -# Getting comments into the output is tricky; we turn C++-style // comments -# into /* */ comments, after altering any existing */'s to avoid trouble. -peel=' /\\$/N - /\\\n[ ]*/s///g - /\/\//s;\*/;* /;g - /\/\//s;//\(.*\);/*\1 */;' - -for a -do - case "$a" in - -o) # old (pre-function-prototype) compiler - # add code to comment out argument lists - peel="$peel - "'/^\([^#\/][^\/]*[a-zA-Z0-9_)]\)(\(.*\))/s;;\1(/*\2*/);' - shift - ;; - -b) # funny Berkeley __P macro - peel="$peel - "'/^\([^#\/][^\/]*[a-zA-Z0-9_)]\)(\(.*\))/s;;\1 __P((\2));' - shift - ;; - -s) # compiler doesn't like `static foo();' - # add code to get rid of the `static' - peel="$peel - "'/^static[ ][^\/]*[a-zA-Z0-9_)](.*)/s;static.;;' - shift - ;; - -p) # private declarations - egrep='^ ==([ ]|$)' - shift - ;; - -i) # wrap in #ifndef, argument is name - ifndef="$2" - shift ; shift - ;; - *) break - ;; - esac -done - -if test " $ifndef" != " " -then - echo "#ifndef $ifndef" - echo "#define $ifndef /* never again */" -fi -echo "/* ========= begin header generated by $0 ========= */" -echo '#ifdef __cplusplus' -echo 'extern "C" {' -echo '#endif' -for f -do - echo - echo "/* === $f === */" - egrep "$egrep" $f | sed 's/^ ==*[ ]//;s/^ ==*$//' | sed "$peel" - echo -done -echo '#ifdef __cplusplus' -echo '}' -echo '#endif' -echo "/* ========= end header generated by $0 ========= */" -if test " $ifndef" != " " -then - echo "#endif" -fi -exit 0 diff --git a/src/regex/regcomp.ih b/src/regex/regcomp.ih deleted file mode 100644 index 6efafeb..0000000 --- a/src/regex/regcomp.ih +++ /dev/null @@ -1,53 +0,0 @@ -/* ========= begin header generated by ./mkh ========= */ -#ifdef __cplusplus -extern "C" { -#endif - -/* === regcomp.c === */ -static void p_ere(register struct parse *p, int stop); -static void p_ere_exp(register struct parse *p); -static void p_str(register struct parse *p); -static void p_bre(register struct parse *p, register int end1, register int end2); -static int p_simp_re(register struct parse *p, int starordinary); -static int p_count(register struct parse *p); -static void p_bracket(register struct parse *p); -static void p_b_term(register struct parse *p, register cset *cs); -static void p_b_cclass(register struct parse *p, register cset *cs); -static void p_b_eclass(register struct parse *p, register cset *cs); -static char p_b_symbol(register struct parse *p); -static char p_b_coll_elem(register struct parse *p, int endc); -static char othercase(int ch); -static void bothcases(register struct parse *p, int ch); -static void ordinary(register struct parse *p, register int ch); -static void nonnewline(register struct parse *p); -static void repeat(register struct parse *p, sopno start, int from, int to); -static int seterr(register struct parse *p, int e); -static cset *allocset(register struct parse *p); -static void freeset(register struct parse *p, register cset *cs); -static int freezeset(register struct parse *p, register cset *cs); -static int firstch(register struct parse *p, register cset *cs); -static int nch(register struct parse *p, register cset *cs); -static void mcadd(register struct parse *p, register cset *cs, register char *cp); -#if 0 -static void mcsub(register cset *cs, register char *cp); -static int mcin(register cset *cs, register char *cp); -static char *mcfind(register cset *cs, register char *cp); -#endif -static void mcinvert(register struct parse *p, register cset *cs); -static void mccase(register struct parse *p, register cset *cs); -static int isinsets(register struct re_guts *g, int c); -static int samesets(register struct re_guts *g, int c1, int c2); -static void categorize(struct parse *p, register struct re_guts *g); -static sopno dupl(register struct parse *p, sopno start, sopno finish); -static void doemit(register struct parse *p, sop op, size_t opnd); -static void doinsert(register struct parse *p, sop op, size_t opnd, sopno pos); -static void dofwd(register struct parse *p, sopno pos, sop value); -static void enlarge(register struct parse *p, sopno size); -static void stripsnug(register struct parse *p, register struct re_guts *g); -static void findmust(register struct parse *p, register struct re_guts *g); -static sopno pluscount(register struct parse *p, register struct re_guts *g); - -#ifdef __cplusplus -} -#endif -/* ========= end header generated by ./mkh ========= */ diff --git a/src/regex/regerror.ih b/src/regex/regerror.ih deleted file mode 100644 index 2cb668c..0000000 --- a/src/regex/regerror.ih +++ /dev/null @@ -1,12 +0,0 @@ -/* ========= begin header generated by ./mkh ========= */ -#ifdef __cplusplus -extern "C" { -#endif - -/* === regerror.c === */ -static char *regatoi(const regex_t *preg, char *localbuf); - -#ifdef __cplusplus -} -#endif -/* ========= end header generated by ./mkh ========= */ diff --git a/src/regex/regex.3 b/src/regex/regex.3 deleted file mode 100644 index bc74709..0000000 --- a/src/regex/regex.3 +++ /dev/null @@ -1,509 +0,0 @@ -.TH REGEX 3 "25 Sept 1997" -.BY "Henry Spencer" -.de ZR -.\" one other place knows this name: the SEE ALSO section -.IR regex (7) \\$1 -.. -.SH NAME -regcomp, regexec, regerror, regfree \- regular-expression library -.SH SYNOPSIS -.ft B -.\".na -#include -.br -#include -.HP 10 -int regcomp(regex_t\ *preg, const\ char\ *pattern, int\ cflags); -.HP -int\ regexec(const\ regex_t\ *preg, const\ char\ *string, -size_t\ nmatch, regmatch_t\ pmatch[], int\ eflags); -.HP -size_t\ regerror(int\ errcode, const\ regex_t\ *preg, -char\ *errbuf, size_t\ errbuf_size); -.HP -void\ regfree(regex_t\ *preg); -.\".ad -.ft -.SH DESCRIPTION -These routines implement POSIX 1003.2 regular expressions (``RE''s); -see -.ZR . -.I Regcomp -compiles an RE written as a string into an internal form, -.I regexec -matches that internal form against a string and reports results, -.I regerror -transforms error codes from either into human-readable messages, -and -.I regfree -frees any dynamically-allocated storage used by the internal form -of an RE. -.PP -The header -.I -declares two structure types, -.I regex_t -and -.IR regmatch_t , -the former for compiled internal forms and the latter for match reporting. -It also declares the four functions, -a type -.IR regoff_t , -and a number of constants with names starting with ``REG_''. -.PP -.I Regcomp -compiles the regular expression contained in the -.I pattern -string, -subject to the flags in -.IR cflags , -and places the results in the -.I regex_t -structure pointed to by -.IR preg . -.I Cflags -is the bitwise OR of zero or more of the following flags: -.IP REG_EXTENDED \w'REG_EXTENDED'u+2n -Compile modern (``extended'') REs, -rather than the obsolete (``basic'') REs that -are the default. -.IP REG_BASIC -This is a synonym for 0, -provided as a counterpart to REG_EXTENDED to improve readability. -This is an extension, -compatible with but not specified by POSIX 1003.2, -and should be used with -caution in software intended to be portable to other systems. -.IP REG_NOSPEC -Compile with recognition of all special characters turned off. -All characters are thus considered ordinary, -so the ``RE'' is a literal string. -This is an extension, -compatible with but not specified by POSIX 1003.2, -and should be used with -caution in software intended to be portable to other systems. -REG_EXTENDED and REG_NOSPEC may not be used -in the same call to -.IR regcomp . -.IP REG_ICASE -Compile for matching that ignores upper/lower case distinctions. -See -.ZR . -.IP REG_NOSUB -Compile for matching that need only report success or failure, -not what was matched. -.IP REG_NEWLINE -Compile for newline-sensitive matching. -By default, newline is a completely ordinary character with no special -meaning in either REs or strings. -With this flag, -`[^' bracket expressions and `.' never match newline, -a `^' anchor matches the null string after any newline in the string -in addition to its normal function, -and the `$' anchor matches the null string before any newline in the -string in addition to its normal function. -.IP REG_PEND -The regular expression ends, -not at the first NUL, -but just before the character pointed to by the -.I re_endp -member of the structure pointed to by -.IR preg . -The -.I re_endp -member is of type -.IR const\ char\ * . -This flag permits inclusion of NULs in the RE; -they are considered ordinary characters. -This is an extension, -compatible with but not specified by POSIX 1003.2, -and should be used with -caution in software intended to be portable to other systems. -.PP -When successful, -.I regcomp -returns 0 and fills in the structure pointed to by -.IR preg . -One member of that structure -(other than -.IR re_endp ) -is publicized: -.IR re_nsub , -of type -.IR size_t , -contains the number of parenthesized subexpressions within the RE -(except that the value of this member is undefined if the -REG_NOSUB flag was used). -If -.I regcomp -fails, it returns a non-zero error code; -see DIAGNOSTICS. -.PP -.I Regexec -matches the compiled RE pointed to by -.I preg -against the -.IR string , -subject to the flags in -.IR eflags , -and reports results using -.IR nmatch , -.IR pmatch , -and the returned value. -The RE must have been compiled by a previous invocation of -.IR regcomp . -The compiled form is not altered during execution of -.IR regexec , -so a single compiled RE can be used simultaneously by multiple threads. -.PP -By default, -the NUL-terminated string pointed to by -.I string -is considered to be the text of an entire line, -with the NUL indicating the end of the line. -(That is, -any other end-of-line marker is considered to have been removed -and replaced by the NUL.) -The -.I eflags -argument is the bitwise OR of zero or more of the following flags: -.IP REG_NOTBOL \w'REG_STARTEND'u+2n -The first character of -the string -is not the beginning of a line, so the `^' anchor should not match before it. -This does not affect the behavior of newlines under REG_NEWLINE. -.IP REG_NOTEOL -The NUL terminating -the string -does not end a line, so the `$' anchor should not match before it. -This does not affect the behavior of newlines under REG_NEWLINE. -.IP REG_STARTEND -The string is considered to start at -\fIstring\fR\ + \fIpmatch\fR[0].\fIrm_so\fR -and to have a terminating NUL located at -\fIstring\fR\ + \fIpmatch\fR[0].\fIrm_eo\fR -(there need not actually be a NUL at that location), -regardless of the value of -.IR nmatch . -See below for the definition of -.IR pmatch -and -.IR nmatch . -This is an extension, -compatible with but not specified by POSIX 1003.2, -and should be used with -caution in software intended to be portable to other systems. -Note that a non-zero \fIrm_so\fR does not imply REG_NOTBOL; -REG_STARTEND affects only the location of the string, -not how it is matched. -.PP -See -.ZR -for a discussion of what is matched in situations where an RE or a -portion thereof could match any of several substrings of -.IR string . -.PP -Normally, -.I regexec -returns 0 for success and the non-zero code REG_NOMATCH for failure. -Other non-zero error codes may be returned in exceptional situations; -see DIAGNOSTICS. -.PP -If REG_NOSUB was specified in the compilation of the RE, -or if -.I nmatch -is 0, -.I regexec -ignores the -.I pmatch -argument (but see below for the case where REG_STARTEND is specified). -Otherwise, -.I pmatch -points to an array of -.I nmatch -structures of type -.IR regmatch_t . -Such a structure has at least the members -.I rm_so -and -.IR rm_eo , -both of type -.I regoff_t -(a signed arithmetic type at least as large as an -.I off_t -and a -.IR ssize_t ), -containing respectively the offset of the first character of a substring -and the offset of the first character after the end of the substring. -Offsets are measured from the beginning of the -.I string -argument given to -.IR regexec . -An empty substring is denoted by equal offsets, -both indicating the character following the empty substring. -.PP -The 0th member of the -.I pmatch -array is filled in to indicate what substring of -.I string -was matched by the entire RE. -Remaining members report what substring was matched by parenthesized -subexpressions within the RE; -member -.I i -reports subexpression -.IR i , -with subexpressions counted (starting at 1) by the order of their opening -parentheses in the RE, left to right. -Unused entries in the array\(emcorresponding either to subexpressions that -did not participate in the match at all, or to subexpressions that do not -exist in the RE (that is, \fIi\fR\ > \fIpreg\fR\->\fIre_nsub\fR)\(emhave both -.I rm_so -and -.I rm_eo -set to \-1. -If a subexpression participated in the match several times, -the reported substring is the last one it matched. -(Note, as an example in particular, that when the RE `(b*)+' matches `bbb', -the parenthesized subexpression matches the three `b's and then -an infinite number of empty strings following the last `b', -so the reported substring is one of the empties.) -.PP -If REG_STARTEND is specified, -.I pmatch -must point to at least one -.I regmatch_t -(even if -.I nmatch -is 0 or REG_NOSUB was specified), -to hold the input offsets for REG_STARTEND. -Use for output is still entirely controlled by -.IR nmatch ; -if -.I nmatch -is 0 or REG_NOSUB was specified, -the value of -.IR pmatch [0] -will not be changed by a successful -.IR regexec . -.PP -.I Regerror -maps a non-zero -.I errcode -from either -.I regcomp -or -.I regexec -to a human-readable, printable message. -If -.I preg -is non-NULL, -the error code should have arisen from use of -the -.I regex_t -pointed to by -.IR preg , -and if the error code came from -.IR regcomp , -it should have been the result from the most recent -.I regcomp -using that -.IR regex_t . -.RI ( Regerror -may be able to supply a more detailed message using information -from the -.IR regex_t .) -.I Regerror -places the NUL-terminated message into the buffer pointed to by -.IR errbuf , -limiting the length (including the NUL) to at most -.I errbuf_size -bytes. -If the whole message won't fit, -as much of it as will fit before the terminating NUL is supplied. -In any case, -the returned value is the size of buffer needed to hold the whole -message (including terminating NUL). -If -.I errbuf_size -is 0, -.I errbuf -is ignored but the return value is still correct. -.PP -If the -.I errcode -given to -.I regerror -is first ORed with REG_ITOA, -the ``message'' that results is the printable name of the error code, -e.g. ``REG_NOMATCH'', -rather than an explanation thereof. -If -.I errcode -is REG_ATOI, -then -.I preg -shall be non-NULL and the -.I re_endp -member of the structure it points to -must point to the printable name of an error code; -in this case, the result in -.I errbuf -is the decimal digits of -the numeric value of the error code -(0 if the name is not recognized). -REG_ITOA and REG_ATOI are intended primarily as debugging facilities; -they are extensions, -compatible with but not specified by POSIX 1003.2, -and should be used with -caution in software intended to be portable to other systems. -Be warned also that they are considered experimental and changes are possible. -.PP -.I Regfree -frees any dynamically-allocated storage associated with the compiled RE -pointed to by -.IR preg . -The remaining -.I regex_t -is no longer a valid compiled RE -and the effect of supplying it to -.I regexec -or -.I regerror -is undefined. -.PP -None of these functions references global variables except for tables -of constants; -all are safe for use from multiple threads if the arguments are safe. -.SH IMPLEMENTATION CHOICES -There are a number of decisions that 1003.2 leaves up to the implementor, -either by explicitly saying ``undefined'' or by virtue of them being -forbidden by the RE grammar. -This implementation treats them as follows. -.PP -See -.ZR -for a discussion of the definition of case-independent matching. -.PP -There is no particular limit on the length of REs, -except insofar as memory is limited. -Memory usage is approximately linear in RE size, and largely insensitive -to RE complexity, except for bounded repetitions. -See BUGS for one short RE using them -that will run almost any system out of memory. -.PP -A backslashed character other than one specifically given a magic meaning -by 1003.2 (such magic meanings occur only in obsolete [``basic''] REs) -is taken as an ordinary character. -.PP -Any unmatched [ is a REG_EBRACK error. -.PP -Equivalence classes cannot begin or end bracket-expression ranges. -The endpoint of one range cannot begin another. -.PP -RE_DUP_MAX, the limit on repetition counts in bounded repetitions, is 255. -.PP -A repetition operator (?, *, +, or bounds) cannot follow another -repetition operator. -A repetition operator cannot begin an expression or subexpression -or follow `^' or `|'. -.PP -`|' cannot appear first or last in a (sub)expression or after another `|', -i.e. an operand of `|' cannot be an empty subexpression. -An empty parenthesized subexpression, `()', is legal and matches an -empty (sub)string. -An empty string is not a legal RE. -.PP -A `{' followed by a digit is considered the beginning of bounds for a -bounded repetition, which must then follow the syntax for bounds. -A `{' \fInot\fR followed by a digit is considered an ordinary character. -.PP -`^' and `$' beginning and ending subexpressions in obsolete (``basic'') -REs are anchors, not ordinary characters. -.SH SEE ALSO -grep(1), regex(7) -.PP -POSIX 1003.2, sections 2.8 (Regular Expression Notation) -and -B.5 (C Binding for Regular Expression Matching). -.SH DIAGNOSTICS -Non-zero error codes from -.I regcomp -and -.I regexec -include the following: -.PP -.nf -.ta \w'REG_ECOLLATE'u+3n -REG_NOMATCH regexec() failed to match -REG_BADPAT invalid regular expression -REG_ECOLLATE invalid collating element -REG_ECTYPE invalid character class -REG_EESCAPE \e applied to unescapable character -REG_ESUBREG invalid backreference number -REG_EBRACK brackets [ ] not balanced -REG_EPAREN parentheses ( ) not balanced -REG_EBRACE braces { } not balanced -REG_BADBR invalid repetition count(s) in { } -REG_ERANGE invalid character range in [ ] -REG_ESPACE ran out of memory -REG_BADRPT ?, *, or + operand invalid -REG_EMPTY empty (sub)expression -REG_ASSERT ``can't happen''\(emyou found a bug -REG_INVARG invalid argument, e.g. negative-length string -.fi -.SH HISTORY -Written by Henry Spencer, -henry@zoo.toronto.edu. -.SH BUGS -This is an alpha release with known defects. -Please report problems. -.PP -There is one known functionality bug. -The implementation of internationalization is incomplete: -the locale is always assumed to be the default one of 1003.2, -and only the collating elements etc. of that locale are available. -.PP -The back-reference code is subtle and doubts linger about its correctness -in complex cases. -.PP -.I Regexec -performance is poor. -This will improve with later releases. -.I Nmatch -exceeding 0 is expensive; -.I nmatch -exceeding 1 is worse. -.I Regexec -is largely insensitive to RE complexity \fIexcept\fR that back -references are massively expensive. -RE length does matter; in particular, there is a strong speed bonus -for keeping RE length under about 30 characters, -with most special characters counting roughly double. -.PP -.I Regcomp -implements bounded repetitions by macro expansion, -which is costly in time and space if counts are large -or bounded repetitions are nested. -An RE like, say, -`((((a{1,100}){1,100}){1,100}){1,100}){1,100}' -will (eventually) run almost any existing machine out of swap space. -.PP -There are suspected problems with response to obscure error conditions. -Notably, -certain kinds of internal overflow, -produced only by truly enormous REs or by multiply nested bounded repetitions, -are probably not handled well. -.PP -Due to a mistake in 1003.2, things like `a)b' are legal REs because `)' is -a special character only in the presence of a previous unmatched `('. -This can't be fixed until the spec is fixed. -.PP -The standard's definition of back references is vague. -For example, does -`a\e(\e(b\e)*\e2\e)*d' match `abbbd'? -Until the standard is clarified, -behavior in such cases should not be relied on. -.PP -The implementation of word-boundary matching is a bit of a kludge, -and bugs may lurk in combinations of word-boundary matching and anchoring. -- 2.7.4