X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/18b519c0d6020e2ab2431c3ada64b8ebe522b504..41f83caf1cbad0e79f7fcd3e35386a523e463784:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 967b01f8..652da664 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -119,7 +119,8 @@ Reference sections: * Copying This Manual:: License for copying this manual. * Index:: Cross-references to the text. -@detailmenu --- The Detailed Node Listing --- +@detailmenu + --- The Detailed Node Listing --- The Concepts of Bison @@ -130,6 +131,8 @@ The Concepts of Bison a semantic value (the value of an integer, the name of an identifier, etc.). * Semantic Actions:: Each rule can have an action containing C code. +* GLR Parsers:: Writing parsers for general context-free languages +* Locations Overview:: Tracking Locations. * Bison Parser:: What are Bison's input and output, how is the output used? * Stages:: Stages in writing and running Bison grammars. @@ -143,8 +146,8 @@ Examples Operator precedence is introduced. * Simple Error Recovery:: Continuing after syntax errors. * Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. -* Multi-function Calc:: Calculator with memory and trig functions. - It uses multiple data-types for semantic values. +* Multi-function Calc:: Calculator with memory and trig functions. + It uses multiple data-types for semantic values. * Exercises:: Ideas for improving the multi-function calculator. Reverse Polish Notation Calculator @@ -182,15 +185,16 @@ Bison Grammar Files * Rules:: How to write grammar rules. * Recursion:: Writing recursive rules. * Semantics:: Semantic values and actions. +* Locations:: Locations and actions. * Declarations:: All kinds of Bison declarations are described here. * Multiple Parsers:: Putting more than one Bison parser in one program. Outline of a Bison Grammar -* Prologue:: Syntax and usage of the prologue (declarations section). +* Prologue:: Syntax and usage of the prologue. * Bison Declarations:: Syntax and usage of the Bison declarations section. * Grammar Rules:: Syntax and usage of the grammar rules section. -* Epilogue:: Syntax and usage of the epilogue (additional code section). +* Epilogue:: Syntax and usage of the epilogue. Defining Language Semantics @@ -202,6 +206,12 @@ Defining Language Semantics This says when, why and how to use the exceptional action in the middle of a rule. +Tracking Locations + +* Location Type:: Specifying a data type for locations. +* Actions and Locations:: Using locations in actions. +* Location Default Action:: Defining a general way to compute locations. + Bison Declarations * Token Decl:: Declaring terminal symbols. @@ -229,7 +239,7 @@ The Lexical Analyzer Function @code{yylex} of the token it has read. * Token Positions:: How @code{yylex} must return the text position (line number, etc.) of the token, if the - actions want that. + actions want that. * Pure Calling:: How the calling convention differs in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). @@ -259,7 +269,7 @@ Handling Context Dependencies * Tie-in Recovery:: Lexical tie-ins have implications for how error recovery rules must be written. -Understanding or Debugging Your Parser +Debugging Your Parser * Understanding:: Understanding the structure of your parser. * Tracing:: Tracing the execution of your parser. @@ -269,6 +279,7 @@ Invoking Bison * Bison Options:: All the options described in detail, in alphabetical order by short options. * Option Cross Key:: Alphabetical list of long options. +* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}. Frequently Asked Questions @@ -412,42 +423,41 @@ more information on this. @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars @cindex non-deterministic parsing -Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, -meaning roughly that -the next grammar rule to apply at any point in the input is uniquely -determined by the preceding input and a fixed, finite portion (called -a @dfn{look-ahead}) of the remaining input. -A context-free grammar can be @dfn{ambiguous}, meaning that -there are multiple ways to apply the grammar rules to get the some inputs. -Even unambiguous grammars can be @dfn{non-deterministic}, meaning that no -fixed look-ahead always suffices to determine the next grammar rule to apply. -With the proper declarations, Bison is also able to parse these more general -context-free grammars, using a technique known as @acronym{GLR} parsing (for -Generalized @acronym{LR}). Bison's @acronym{GLR} parsers are able to -handle any context-free -grammar for which the number of possible parses of any given string -is finite. + +Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning +roughly that the next grammar rule to apply at any point in the input is +uniquely determined by the preceding input and a fixed, finite portion +(called a @dfn{look-ahead}) of the remaining input. A context-free +grammar can be @dfn{ambiguous}, meaning that there are multiple ways to +apply the grammar rules to get the some inputs. Even unambiguous +grammars can be @dfn{non-deterministic}, meaning that no fixed +look-ahead always suffices to determine the next grammar rule to apply. +With the proper declarations, Bison is also able to parse these more +general context-free grammars, using a technique known as @acronym{GLR} +parsing (for Generalized @acronym{LR}). Bison's @acronym{GLR} parsers +are able to handle any context-free grammar for which the number of +possible parses of any given string is finite. @cindex symbols (abstract) @cindex token @cindex syntactic grouping @cindex grouping, syntactic -In the formal grammatical rules for a language, each kind of syntactic unit -or grouping is named by a @dfn{symbol}. Those which are built by grouping -smaller constructs according to grammatical rules are called +In the formal grammatical rules for a language, each kind of syntactic +unit or grouping is named by a @dfn{symbol}. Those which are built by +grouping smaller constructs according to grammatical rules are called @dfn{nonterminal symbols}; those which can't be subdivided are called @dfn{terminal symbols} or @dfn{token types}. We call a piece of input corresponding to a single terminal symbol a @dfn{token}, and a piece corresponding to a single nonterminal symbol a @dfn{grouping}. We can use the C language as an example of what symbols, terminal and -nonterminal, mean. The tokens of C are identifiers, constants (numeric and -string), and the various keywords, arithmetic operators and punctuation -marks. So the terminal symbols of a grammar for C include `identifier', -`number', `string', plus one symbol for each keyword, operator or -punctuation mark: `if', `return', `const', `static', `int', `char', -`plus-sign', `open-brace', `close-brace', `comma' and many more. (These -tokens can be subdivided into characters, but that is a matter of +nonterminal, mean. The tokens of C are identifiers, constants (numeric +and string), and the various keywords, arithmetic operators and +punctuation marks. So the terminal symbols of a grammar for C include +`identifier', `number', `string', plus one symbol for each keyword, +operator or punctuation mark: `if', `return', `const', `static', `int', +`char', `plus-sign', `open-brace', `close-brace', `comma' and many more. +(These tokens can be subdivided into characters, but that is a matter of lexicography, not grammar.) Here is a simple C function subdivided into tokens: @@ -642,28 +652,28 @@ from the values of the two subexpressions. @cindex conflicts @cindex shift/reduce conflicts -In some grammars, there will be cases where Bison's standard @acronym{LALR}(1) -parsing algorithm cannot decide whether to apply a certain grammar rule -at a given point. That is, it may not be able to decide (on the basis -of the input read so far) which of two possible reductions (applications -of a grammar rule) applies, or whether to apply a reduction or read more -of the input and apply a reduction later in the input. These are known -respectively as @dfn{reduce/reduce} conflicts (@pxref{Reduce/Reduce}), -and @dfn{shift/reduce} conflicts (@pxref{Shift/Reduce}). - -To use a grammar that is not easily modified to be @acronym{LALR}(1), a more -general parsing algorithm is sometimes necessary. If you include +In some grammars, there will be cases where Bison's standard +@acronym{LALR}(1) parsing algorithm cannot decide whether to apply a +certain grammar rule at a given point. That is, it may not be able to +decide (on the basis of the input read so far) which of two possible +reductions (applications of a grammar rule) applies, or whether to apply +a reduction or read more of the input and apply a reduction later in the +input. These are known respectively as @dfn{reduce/reduce} conflicts +(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts +(@pxref{Shift/Reduce}). + +To use a grammar that is not easily modified to be @acronym{LALR}(1), a +more general parsing algorithm is sometimes necessary. If you include @code{%glr-parser} among the Bison declarations in your file -(@pxref{Grammar Outline}), the result will be a Generalized -@acronym{LR} (@acronym{GLR}) -parser. These parsers handle Bison grammars that contain no unresolved -conflicts (i.e., after applying precedence declarations) identically to -@acronym{LALR}(1) parsers. However, when faced with unresolved -shift/reduce and reduce/reduce conflicts, @acronym{GLR} parsers use -the simple expedient of doing -both, effectively cloning the parser to follow both possibilities. Each -of the resulting parsers can again split, so that at any given time, -there can be any number of possible parses being explored. The parsers +(@pxref{Grammar Outline}), the result will be a Generalized @acronym{LR} +(@acronym{GLR}) parser. These parsers handle Bison grammars that +contain no unresolved conflicts (i.e., after applying precedence +declarations) identically to @acronym{LALR}(1) parsers. However, when +faced with unresolved shift/reduce and reduce/reduce conflicts, +@acronym{GLR} parsers use the simple expedient of doing both, +effectively cloning the parser to follow both possibilities. Each of +the resulting parsers can again split, so that at any given time, there +can be any number of possible parses being explored. The parsers proceed in lockstep; that is, all of them consume (shift) a given input symbol before any of them proceed to the next. Each of the cloned parsers eventually meets one of two possible fates: either it runs into @@ -686,7 +696,10 @@ Let's consider an example, vastly simplified from a C++ grammar. @example %@{ - #define YYSTYPE const char* + #include + #define YYSTYPE char const * + int yylex (void); + void yyerror (char const *); %@} %token TYPENAME ID @@ -784,7 +797,8 @@ stmt : expr ';' %merge and define the @code{stmtMerge} function as: @example -static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1) +static YYSTYPE +stmtMerge (YYSTYPE x0, YYSTYPE x1) @{ printf (" "); return ""; @@ -797,7 +811,7 @@ in the C declarations at the beginning of the file: @example %@{ - #define YYSTYPE const char* + #define YYSTYPE char const * static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1); %@} @end example @@ -810,6 +824,33 @@ as both an @code{expr} and a @code{decl}, and print "x" y z + T x T y z + = @end example +@sp 1 + +@cindex @code{incline} +@cindex @acronym{GLR} parsers and @code{inline} +The @acronym{GLR} parsers require a compiler for @acronym{ISO} C89 or +later. In addition, they use the @code{inline} keyword, which is not +C89, but is C99 and is a common extension in pre-C99 compilers. It is +up to the user of these parsers to handle +portability issues. For instance, if using Autoconf and the Autoconf +macro @code{AC_C_INLINE}, a mere + +@example +%@{ + #include +%@} +@end example + +@noindent +will suffice. Otherwise, we suggest + +@example +%@{ + #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline + #define inline + #endif +%@} +@end example @node Locations Overview @section Locations @@ -967,6 +1008,9 @@ in every Bison grammar file to separate the sections. The prologue may define types and variables used in the actions. You can also use preprocessor commands to define macros used there, and use @code{#include} to include header files that do any of these things. +You need to declare the lexical analyzer @code{yylex} and the error +printer @code{yyerror} here, along with any other global identifiers +used by the actions in the grammar rules. The Bison declarations declare the names of the terminal and nonterminal symbols, and may also describe operator precedence and the data types of @@ -975,10 +1019,9 @@ semantic values of various symbols. The grammar rules define how to construct each nonterminal symbol from its parts. -The epilogue can contain any code you want to use. Often the definition of -the lexical analyzer @code{yylex} goes here, plus subroutines called by the -actions in the grammar rules. In a simple program, all the rest of the -program can go here. +The epilogue can contain any code you want to use. Often the +definitions of functions declared in the prologue go here. In a +simple program, all the rest of the program can go here. @node Examples @chapter Examples @@ -1045,8 +1088,10 @@ calculator. As in C, comments are placed between @samp{/*@dots{}*/}. /* Reverse polish notation calculator. */ %@{ -#define YYSTYPE double -#include + #define YYSTYPE double + #include + int yylex (void); + void yyerror (char const *); %@} %token NUM @@ -1055,7 +1100,7 @@ calculator. As in C, comments are placed between @samp{/*@dots{}*/}. @end example The declarations section (@pxref{Prologue, , The prologue}) contains two -preprocessor directives. +preprocessor directives and two forward declarations. The @code{#define} directive defines the macro @code{YYSTYPE}, thus specifying the C data type for semantic values of both tokens and @@ -1068,6 +1113,12 @@ which is a floating point number. The @code{#include} directive is used to declare the exponentiation function @code{pow}. +The forward declarations for @code{yylex} and @code{yyerror} are +needed because the C language requires that functions be declared +before they are used. These functions will be defined in the +epilogue, but the parser calls them so they must be declared in the +prologue. + The second section, Bison declarations, provides information to Bison about the token types (@pxref{Bison Declarations, ,The Bison Declarations Section}). Each terminal symbol that is not a @@ -1348,7 +1399,7 @@ main (void) When @code{yyparse} detects a syntax error, it calls the error reporting function @code{yyerror} to print an error message (usually but not -always @code{"parse error"}). It is up to the programmer to supply +always @code{"syntax error"}). It is up to the programmer to supply @code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so here is the definition we will use: @@ -1356,8 +1407,9 @@ here is the definition we will use: @group #include +/* Called by yyparse on error. */ void -yyerror (const char *s) /* Called by yyparse on error. */ +yyerror (char const *s) @{ printf ("%s\n", s); @} @@ -1457,23 +1509,25 @@ parentheses nested to arbitrary depth. Here is the Bison code for @file{calc.y}, an infix desk-top calculator. @example -/* Infix notation calculator--calc */ +/* Infix notation calculator. */ %@{ -#define YYSTYPE double -#include + #define YYSTYPE double + #include + #include + int yylex (void); + void yyerror (char const *); %@} -/* Bison Declarations */ +/* Bison declarations. */ %token NUM %left '-' '+' %left '*' '/' %left NEG /* negation--unary minus */ -%right '^' /* exponentiation */ +%right '^' /* exponentiation */ -/* Grammar follows */ -%% -input: /* empty string */ +%% /* The grammar follows. */ +input: /* empty */ | input line ; @@ -1558,7 +1612,7 @@ line: '\n' @end example This addition to the grammar allows for simple error recovery in the -event of a parse error. If an expression that cannot be evaluated is +event of a syntax error. If an expression that cannot be evaluated is read, the error will be recognized by the third rule for @code{line}, and parsing will continue. (The @code{yyerror} function is still called upon to print its message as well.) The action executes the statement @@ -1603,8 +1657,10 @@ the same as the declarations for the infix notation calculator. /* Location tracking calculator. */ %@{ -#define YYSTYPE int -#include + #define YYSTYPE int + #include + int yylex (void); + void yyerror (char const *); %@} /* Bison declarations. */ @@ -1615,7 +1671,7 @@ the same as the declarations for the infix notation calculator. %left NEG %right '^' -%% /* Grammar follows */ +%% /* The grammar follows. */ @end example @noindent @@ -1838,29 +1894,30 @@ Here are the C and Bison declarations for the multi-function calculator. @smallexample @group %@{ -#include /* For math functions, cos(), sin(), etc. */ -#include "calc.h" /* Contains definition of `symrec' */ + #include /* For math functions, cos(), sin(), etc. */ + #include "calc.h" /* Contains definition of `symrec'. */ + int yylex (void); + void yyerror (char const *); %@} @end group @group %union @{ - double val; /* For returning numbers. */ - symrec *tptr; /* For returning symbol-table pointers. */ + double val; /* For returning numbers. */ + symrec *tptr; /* For returning symbol-table pointers. */ @} @end group -%token NUM /* Simple double precision number. */ -%token VAR FNCT /* Variable and Function. */ +%token NUM /* Simple double precision number. */ +%token VAR FNCT /* Variable and Function. */ %type exp @group %right '=' %left '-' '+' %left '*' '/' -%left NEG /* Negation--unary minus */ -%right '^' /* Exponentiation */ +%left NEG /* negation--unary minus */ +%right '^' /* exponentiation */ @end group -/* Grammar follows */ -%% +%% /* The grammar follows. */ @end smallexample The above grammar introduces only two new features of the Bison language. @@ -1921,7 +1978,7 @@ exp: NUM @{ $$ = $1; @} | '(' exp ')' @{ $$ = $2; @} ; @end group -/* End of grammar */ +/* End of grammar. */ %% @end smallexample @@ -1940,33 +1997,33 @@ provides for either functions or variables to be placed in the table. @smallexample @group -/* Function type. */ +/* Function type. */ typedef double (*func_t) (double); @end group @group -/* Data type for links in the chain of symbols. */ +/* Data type for links in the chain of symbols. */ struct symrec @{ - char *name; /* name of symbol */ + char *name; /* name of symbol */ int type; /* type of symbol: either VAR or FNCT */ union @{ - double var; /* value of a VAR */ - func_t fnctptr; /* value of a FNCT */ + double var; /* value of a VAR */ + func_t fnctptr; /* value of a FNCT */ @} value; - struct symrec *next; /* link field */ + struct symrec *next; /* link field */ @}; @end group @group typedef struct symrec symrec; -/* The symbol table: a chain of `struct symrec'. */ +/* The symbol table: a chain of `struct symrec'. */ extern symrec *sym_table; -symrec *putsym (const char *, func_t); -symrec *getsym (const char *); +symrec *putsym (char const *, func_t); +symrec *getsym (char const *); @end group @end smallexample @@ -1978,17 +2035,9 @@ function that initializes the symbol table. Here it is, and #include @group -int -main (void) -@{ - init_table (); - return yyparse (); -@} -@end group - -@group +/* Called by yyparse on error. */ void -yyerror (const char *s) /* Called by yyparse on error. */ +yyerror (char const *s) @{ printf ("%s\n", s); @} @@ -1997,13 +2046,13 @@ yyerror (const char *s) /* Called by yyparse on error. */ @group struct init @{ - char *fname; - double (*fnct)(double); + char const *fname; + double (*fnct) (double); @}; @end group @group -struct init arith_fncts[] = +struct init const arith_fncts[] = @{ "sin", sin, "cos", cos, @@ -2017,7 +2066,7 @@ struct init arith_fncts[] = @group /* The symbol table: a chain of `struct symrec'. */ -symrec *sym_table = (symrec *) 0; +symrec *sym_table; @end group @group @@ -2034,6 +2083,15 @@ init_table (void) @} @} @end group + +@group +int +main (void) +@{ + init_table (); + return yyparse (); +@} +@end group @end smallexample By simply editing the initialization list and adding the necessary include @@ -2048,7 +2106,7 @@ found, a pointer to that symbol is returned; otherwise zero is returned. @smallexample symrec * -putsym (char *sym_name, int sym_type) +putsym (char const *sym_name, int sym_type) @{ symrec *ptr; ptr = (symrec *) malloc (sizeof (symrec)); @@ -2062,7 +2120,7 @@ putsym (char *sym_name, int sym_type) @} symrec * -getsym (const char *sym_name) +getsym (char const *sym_name) @{ symrec *ptr; for (ptr = sym_table; ptr != (symrec *) 0; @@ -2220,7 +2278,7 @@ appropriate delimiters: @example %@{ -@var{Prologue} + @var{Prologue} %@} @var{Bison declarations} @@ -2243,7 +2301,7 @@ continues until end of line. * Epilogue:: Syntax and usage of the epilogue. @end menu -@node Prologue, Bison Declarations, , Grammar Outline +@node Prologue @subsection The prologue @cindex declarations section @cindex Prologue @@ -2267,8 +2325,8 @@ can be done with two @var{Prologue} blocks, one before and one after the @smallexample %@{ -#include -#include "ptypes.h" + #include + #include "ptypes.h" %@} %union @{ @@ -2277,8 +2335,8 @@ can be done with two @var{Prologue} blocks, one before and one after the @} %@{ -static void yyprint(FILE *, int, YYSTYPE); -#define YYPRINT(F, N, L) yyprint(F, N, L) + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(F, N, L) print_token_value (F, N, L) %@} @dots{} @@ -2306,7 +2364,7 @@ There must always be at least one grammar rule, and the first @samp{%%} (which precedes the grammar rules) may never be omitted even if it is the first thing in the file. -@node Epilogue, , Grammar Rules, Grammar Outline +@node Epilogue @subsection The epilogue @cindex additional C code section @cindex epilogue @@ -2316,14 +2374,17 @@ The @var{Epilogue} is copied verbatim to the end of the parser file, just as the @var{Prologue} is copied to the beginning. This is the most convenient place to put anything that you want to have in the parser file but which need not come before the definition of @code{yyparse}. For example, the -definitions of @code{yylex} and @code{yyerror} often go here. +definitions of @code{yylex} and @code{yyerror} often go here. Because +C requires functions to be declared before being used, you often need +to declare functions like @code{yylex} and @code{yyerror} in the Prologue, +even if you define them int he Epilogue. @xref{Interface, ,Parser C-Language Interface}. If the last section is empty, you may omit the @samp{%%} that separates it from the grammar rules. -The Bison parser itself contains many static variables whose names start -with @samp{yy} and many macros whose names start with @samp{YY}. It is a +The Bison parser itself contains many macros and identifiers whose +names start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using any such names (except those documented in this manual) in the epilogue of the grammar file. @@ -3646,7 +3707,7 @@ Generate the code processing the locations (@pxref{Action Features, ,Special Features for Use in Actions}). This mode is enabled as soon as the grammar uses the special @samp{@@@var{n}} tokens, but if your grammar does not use it, using @samp{%locations} allows for more -accurate parse error messages. +accurate syntax error messages. @end deffn @deffn {Directive} %name-prefix="@var{prefix}" @@ -3822,19 +3883,20 @@ If you use a reentrant parser, you can optionally pass additional parameter information to it in a reentrant way. To do so, use the declaration @code{%parse-param}: -@deffn {Directive} %parse-param @var{argument-declaration} @var{argument-name} +@deffn {Directive} %parse-param @{@var{argument-declaration}@} @findex %parse-param -Declare that @code{argument-name} is an additional @code{yyparse} -argument. This argument is also passed to @code{yyerror}. The -@var{argument-declaration} is used when declaring functions or -prototypes. +Declare that an argument declared by @code{argument-declaration} is an +additional @code{yyparse} argument. This argument is also passed to +@code{yyerror}. The @var{argument-declaration} is used when declaring +functions or prototypes. The last identifier in +@var{argument-declaration} must be the argument name. @end deffn Here's an example. Write this in the parser: @example -%parse-param "int *nastiness" "nastiness" -%parse-param "int *randomness" "randomness" +%parse-param @{int *nastiness@} +%parse-param @{int *randomness@} @end example @noindent @@ -4065,18 +4127,18 @@ If you wish to pass the additional parameter data to @code{yylex}, use @code{%lex-param} just like @code{%parse-param} (@pxref{Parser Function}). -@deffn {Directive} lex-param @var{argument-declaration} @var{argument-name} +@deffn {Directive} lex-param @{@var{argument-declaration}@} @findex %lex-param -Declare that @code{argument-name} is an additional @code{yylex} -argument. +Declare that @code{argument-declaration} is an additional @code{yylex} +argument declaration. @end deffn For instance: @example -%parse-param "int *nastiness" "nastiness" -%lex-param "int *nastiness" "nastiness" -%parse-param "int *randomness" "randomness" +%parse-param @{int *nastiness@} +%lex-param @{int *nastiness@} +%parse-param @{int *randomness@} @end example @noindent @@ -4109,7 +4171,7 @@ int yyparse (int *nastiness, int *randomness); @cindex parse error @cindex syntax error -The Bison parser detects a @dfn{parse error} or @dfn{syntax error} +The Bison parser detects a @dfn{syntax error} or @dfn{parse error} whenever it reads a token which cannot satisfy any syntax rule. An action in the grammar can also explicitly proclaim an error, using the macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use @@ -4118,14 +4180,14 @@ in Actions}). The Bison parser expects to report the error by calling an error reporting function named @code{yyerror}, which you must supply. It is called by @code{yyparse} whenever a syntax error is found, and it -receives one argument. For a parse error, the string is normally -@w{@code{"parse error"}}. +receives one argument. For a syntax error, the string is normally +@w{@code{"syntax error"}}. @findex %error-verbose If you invoke the directive @code{%error-verbose} in the Bison declarations section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose and specific error message -string instead of just plain @w{@code{"parse error"}}. +string instead of just plain @w{@code{"syntax error"}}. The parser can detect one other kind of error: stack overflow. This happens when the input contains constructions that are very deeply @@ -4140,7 +4202,7 @@ The following definition suffices in simple programs: @example @group void -yyerror (const char *s) +yyerror (char const *s) @{ @end group @group @@ -4161,15 +4223,15 @@ parsers, but not for the Yacc parser, for historical reasons. I.e., if @code{yyerror} are: @example -void yyerror (const char *msg); /* Yacc parsers. */ -void yyerror (YYLTYPE *locp, const char *msg); /* GLR parsers. */ +void yyerror (char const *msg); /* Yacc parsers. */ +void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ @end example -If @samp{%parse-param "int *nastiness" "nastiness"} is used, then: +If @samp{%parse-param @{int *nastiness@}} is used, then: @example -void yyerror (int *randomness, const char *msg); /* Yacc parsers. */ -void yyerror (int *randomness, const char *msg); /* GLR parsers. */ +void yyerror (int *randomness, char const *msg); /* Yacc parsers. */ +void yyerror (int *randomness, char const *msg); /* GLR parsers. */ @end example Finally, GLR and Yacc parsers share the same @code{yyerror} calling @@ -4182,10 +4244,10 @@ convention of @code{yylex} @emph{and} the calling convention of %locations /* Pure yylex. */ %pure-parser -%lex-param "int *nastiness" "nastiness" +%lex-param @{int *nastiness@} /* Pure yyparse. */ -%parse-param "int *nastiness" "nastiness" -%parse-param "int *randomness" "randomness" +%parse-param @{int *nastiness@} +%parse-param @{int *randomness@} @end example @noindent @@ -4196,14 +4258,20 @@ int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); int yyparse (int *nastiness, int *randomness); void yyerror (YYLTYPE *locp, int *nastiness, int *randomness, - const char *msg); + char const *msg); @end example @noindent -Please, note that the prototypes are only indications of how the code -produced by Bison will use @code{yyerror}; you still have freedom on the -exit value, and even on making @code{yyerror} a variadic function. It -is precisely to enable this that the message is always passed last. +The prototypes are only indications of how the code produced by Bison +uses @code{yyerror}. Bison-generated code always ignores the returned +value, so @code{yyerror} can return any type, including @code{void}. +Also, @code{yyerror} can be a variadic function; that is why the +message is always passed last. + +Traditionally @code{yyerror} returns an @code{int} that is always +ignored, but this is purely for historical reasons, and @code{void} is +preferable since it more accurately describes the return type for +@code{yyerror}. @vindex yynerrs The variable @code{yynerrs} contains the number of syntax errors @@ -5131,7 +5199,7 @@ provided which addresses this issue. @cindex error recovery @cindex recovery from errors -It is not usually acceptable to have a program terminate on a parse +It is not usually acceptable to have a program terminate on a syntax error. For example, a compiler should recover sufficiently to parse the rest of the input file and check it for errors; a calculator should accept another expression. @@ -5234,7 +5302,7 @@ this is unacceptable, then the macro @code{yyclearin} may be used to clear this token. Write the statement @samp{yyclearin;} in the error rule's action. -For example, suppose that on a parse error, an error handling routine is +For example, suppose that on a syntax error, an error handling routine is called that advances the input stream to some point where parsing should once again commence. The next symbol returned by the lexical scanner is probably correct. The previous look-ahead token ought to be discarded @@ -5357,7 +5425,9 @@ as an identifier if it appears in that context. Here is how you can do it: @example @group %@{ -int hexflag; + int hexflag; + int yylex (void); + void yyerror (char const *); %@} %% @dots{} @@ -5621,7 +5691,7 @@ after having reduced a rule that produced an @code{exp}, the control flow jumps to state 2. If there is no such transition on a nonterminal symbol, and the lookahead is a @code{NUM}, then this token is shifted on the parse stack, and the control flow jumps to state 1. Any other -lookahead triggers a parse error.'' +lookahead triggers a syntax error.'' @cindex core, item set @cindex item set core @@ -5689,7 +5759,7 @@ because of the item @samp{exp -> exp . '+' exp}, if the lookahead if @samp{+}, it will be shifted on the parse stack, and the automaton control will jump to state 4, corresponding to the item @samp{exp -> exp '+' . exp}. Since there is no default action, any other token than -those listed above will trigger a parse error. +those listed above will trigger a syntax error. The state 3 is named the @dfn{final state}, or the @dfn{accepting state}: @@ -5955,15 +6025,20 @@ Here is an example of @code{YYPRINT} suitable for the multi-function calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}): @smallexample -#define YYPRINT(file, type, value) yyprint (file, type, value) +%@{ + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(file, type, value) print_token_value (file, type, value) +%@} + +@dots{} %% @dots{} %% @dots{} static void -yyprint (FILE *file, int type, YYSTYPE value) +print_token_value (FILE *file, int type, YYSTYPE value) @{ if (type == VAR) - fprintf (file, " %s", value.tptr->name); + fprintf (file, "%s", value.tptr->name); else if (type == NUM) - fprintf (file, " %d", value.val); + fprintf (file, "%d", value.val); @} @end smallexample @@ -6011,6 +6086,7 @@ will produce @file{output.c++} and @file{outfile.h++}. * Bison Options:: All the options described in detail, in alphabetical order by short options. * Option Cross Key:: Alphabetical list of long options. +* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}. @end menu @node Bison Options @@ -6202,6 +6278,32 @@ the corresponding short option. @end example @end ifinfo +@node Yacc Library +@section Yacc Library + +The Yacc library contains default implementations of the +@code{yyerror} and @code{main} functions. These default +implementations are normally not useful, but @acronym{POSIX} requires +them. To use the Yacc library, link your program with the +@option{-ly} option. Note that Bison's implementation of the Yacc +library is distributed under the terms of the @acronym{GNU} General +Public License (@pxref{Copying}). + +If you use the Yacc library's @code{yyerror} function, you should +declare @code{yyerror} as follows: + +@example +int yyerror (char const *); +@end example + +Bison ignores the @code{int} value returned by this @code{yyerror}. +If you use the Yacc library's @code{main} function, your +@code{yyparse} function should have the following type signature: + +@example +int yyparse (void); +@end example + @c ================================================= Invoking Bison @node FAQ @@ -6275,7 +6377,7 @@ The predefined token onto which all undefined values returned by A token name reserved for error recovery. This token may be used in grammar rules so as to allow the Bison parser to recognize an error in the grammar without halting the process. In effect, a sentence -containing an error may be recognized as valid. On a parse error, the +containing an error may be recognized as valid. On a syntax error, the token @code{error} becomes the current look-ahead token. Actions corresponding to @code{error} are then executed, and the look-ahead token is reset to the token that originally caused the violation. @@ -6313,8 +6415,8 @@ Macro to pretend that a syntax error has just been detected: call @end deffn @deffn {Macro} YYERROR_VERBOSE -An obsolete macro that you define with @code{#define} in the Bison -declarations section to request verbose, specific error message strings +An obsolete macro that you define with @code{#define} in the prologue +to request verbose, specific error message strings when @code{yyerror} is called. It doesn't matter what definition you use for @code{YYERROR_VERBOSE}, just whether you define it. Using @code{%error-verbose} is preferred. @@ -6390,13 +6492,12 @@ symbols and parser action. @xref{Tracing, ,Tracing Your Parser}. @deffn {Macro} yyerrok Macro to cause parser to recover immediately to its normal mode -after a parse error. @xref{Error Recovery}. +after a syntax error. @xref{Error Recovery}. @end deffn @deffn {Function} yyerror -User-supplied function to be called by @code{yyparse} on error. The -function receives one argument, a pointer to a character string -containing an error message. @xref{Error Reporting, ,The Error +User-supplied function to be called by @code{yyparse} on error. +@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @@ -6423,7 +6524,7 @@ variable within @code{yyparse}, and its address is passed to @end deffn @deffn {Variable} yynerrs -Global variable which Bison increments each time there is a parse error. +Global variable which Bison increments each time there is a syntax error. (In a pure parser, it is a local variable within @code{yyparse}.) @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @@ -6473,7 +6574,7 @@ Bison declaration to assign left associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn -@deffn {Directive} %lex-param "@var{argument-declaration}" "@var{argument-name}" +@deffn {Directive} %lex-param @{@var{argument-declaration}@} Bison declaration to specifying an additional parameter that @code{yylex} should accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}. @@ -6505,7 +6606,7 @@ Bison declaration to set the name of the parser file. @xref{Decl Summary}. @end deffn -@deffn {Directive} %parse-param "@var{argument-declaration}" "@var{argument-name}" +@deffn {Directive} %parse-param @{@var{argument-declaration}@} Bison declaration to specifying an additional parameter that @code{yyparse} should accept. @xref{Parser Function,, The Parser Function @code{yyparse}}. @@ -6690,10 +6791,6 @@ A grammar symbol standing for a grammatical construct that can be expressed through rules in terms of smaller constructs; in other words, a construct that is not a token. @xref{Symbols}. -@item Parse error -An error encountered during parsing of an input stream due to invalid -syntax. @xref{Error Recovery}. - @item Parser A function that recognizes valid sentences of a language by analyzing the syntax structure of a set of tokens passed to it from a lexical @@ -6746,6 +6843,10 @@ A data structure where symbol names and associated data are stored during parsing to allow for recognition and use of existing information in repeated uses of a symbol. @xref{Multi-function Calc}. +@item Syntax error +An error encountered during parsing of an input stream due to invalid +syntax. @xref{Error Recovery}. + @item Token A basic, grammatically indivisible unit of a language. The symbol that describes a token in the grammar is a terminal symbol.