X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/23ae3b48e481b7ddc0e4d6147a864091be110906..847bf1f53803bb6b605d19bcde720de43ae57d48:/doc/bison.texinfo?ds=inline diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 61f7b12f..f3ec3932 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -1,20 +1,19 @@ \input texinfo @c -*-texinfo-*- @comment %**start of header @setfilename bison.info -@settitle Bison 1.25 +@include version.texi +@settitle Bison @value{VERSION} @setchapternewpage odd @iftex @finalout @end iftex -@c SMALL BOOK version +@c SMALL BOOK version @c This edition has been formatted so that you can format and print it in -@c the smallbook format. +@c the smallbook format. @c @smallbook -@c next time, consider using @set for edition number, etc... - @c Set following if you have the new `shorttitlepage' command @c @clear shorttitlepage-enabled @c @set shorttitlepage-enabled @@ -36,10 +35,19 @@ @end ifinfo @comment %**end of header +@ifinfo +@format +START-INFO-DIR-ENTRY +* bison: (bison). GNU Project parser generator (yacc replacement). +END-INFO-DIR-ENTRY +@end format +@end ifinfo + @ifinfo This file documents the Bison parser generator. -Copyright (C) 1988, 89, 90, 91, 92, 93, 1995 Free Software Foundation, Inc. +Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, 2000 +Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice @@ -73,21 +81,22 @@ instead of in the original English. @titlepage @title Bison @subtitle The YACC-compatible Parser Generator -@subtitle November 1995, Bison Version 1.25 +@subtitle @value{UPDATED}, Bison Version @value{VERSION} @author by Charles Donnelly and Richard Stallman @page @vskip 0pt plus 1filll -Copyright @copyright{} 1988, 89, 90, 91, 92, 93, 1995 Free Software -Foundation +Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, +1999, 2000 +Free Software Foundation, Inc. @sp 2 Published by the Free Software Foundation @* 59 Temple Place, Suite 330 @* Boston, MA 02111-1307 USA @* -Printed copies are available for $15 each.@* -ISBN 1-882114-45-0 +Printed copies are available from the Free Software Foundation.@* +ISBN 1-882114-44-2 Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice @@ -116,17 +125,18 @@ instead of in the original English. @sp 2 Cover art by Etienne Suvasa. @end titlepage -@page + +@contents @node Top, Introduction, (dir), (dir) @ifinfo -This manual documents version 1.25 of Bison. +This manual documents version @value{VERSION} of Bison. @end ifinfo @menu -* Introduction:: -* Conditions:: +* Introduction:: +* Conditions:: * Copying:: The GNU General Public License says how you can copy and share Bison @@ -186,9 +196,9 @@ Reverse Polish Notation Calculator Grammar Rules for @code{rpcalc} -* Rpcalc Input:: -* Rpcalc Line:: -* Rpcalc Expr:: +* Rpcalc Input:: +* Rpcalc Line:: +* Rpcalc Expr:: Multi-Function Calculator: @code{mfcalc} @@ -237,7 +247,7 @@ Bison Declarations Parser C-Language Interface * Parser Function:: How to call @code{yyparse} and what it returns. -* Lexical:: You must supply a function @code{yylex} +* Lexical:: You must supply a function @code{yylex} which reads tokens. * Error Reporting:: You must supply a function @code{yyerror}. * Action Features:: Special features for use in actions. @@ -253,7 +263,7 @@ The Lexical Analyzer Function @code{yylex} * Pure Calling:: How the calling convention differs in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). -The Bison Parser Algorithm +The Bison Parser Algorithm * Look-Ahead:: Parser looks one token ahead when deciding what to do. * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. @@ -280,7 +290,7 @@ Handling Context Dependencies Invoking Bison -* Bison Options:: All the options described in detail, +* Bison Options:: All the options described in detail, in alphabetical order by short options. * Option Cross Key:: Alphabetical list of long options. * VMS Invocation:: Bison command syntax on VMS. @@ -308,20 +318,20 @@ chapters follow which describe specific aspects of Bison in detail. Bison was written primarily by Robert Corbett; Richard Stallman made it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added -multicharacter string literals and other features. +multi-character string literals and other features. -This edition corresponds to version 1.25 of Bison. +This edition corresponds to version @value{VERSION} of Bison. @node Conditions, Copying, Introduction, Top @unnumbered Conditions for Using Bison As of Bison version 1.24, we have changed the distribution terms for -@code{yyparse} to permit using Bison's output in non-free programs. +@code{yyparse} to permit using Bison's output in nonfree programs. Formerly, Bison parsers could be used only in programs that were free software. The other GNU programming tools, such as the GNU C compiler, have never -had such a requirement. They could always be used for non-free +had such a requirement. They could always be used for nonfree software. The reason Bison was different was not due to a special policy decision; it resulted from applying the usual General Public License to all of the Bison source code. @@ -346,7 +356,7 @@ using the other GNU tools. @display Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc. -675 Mass Ave, Cambridge, MA 02139, USA +59 Temple Place - Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. @@ -692,7 +702,8 @@ GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software -Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. +Foundation, Inc., 59 Temple Place - Suite 330, +Boston, MA 02111-1307, USA. @end smallexample Also add information on how to contact you by electronic and paper mail. @@ -702,7 +713,7 @@ when it starts in an interactive mode: @smallexample Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author} -Gnomovision comes with ABSOLUTELY NO WARRANTY; for details +Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. @@ -747,6 +758,7 @@ use Bison or Yacc, we suggest you start by reading this chapter carefully. a semantic value (the value of an integer, the name of an identifier, etc.). * Semantic Actions:: Each rule can have an action containing C code. +* Locations Overview:: Tracking Locations. * Bison Parser:: What are Bison's input and output, how is the output used? * Stages:: Stages in writing and running Bison grammars. @@ -949,7 +961,7 @@ semantic value that is a number. In a compiler for a programming language, an expression typically has a semantic value that is a tree structure describing the meaning of the expression. -@node Semantic Actions, Bison Parser, Semantic Values, Concepts +@node Semantic Actions, Locations Overview, Semantic Values, Concepts @section Semantic Actions @cindex semantic actions @cindex actions, semantic @@ -959,7 +971,7 @@ also produce some output based on the input. In a Bison grammar, a grammar rule can have an @dfn{action} made up of C statements. Each time the parser recognizes a match for that rule, the action is executed. @xref{Actions}. - + Most of the time, the purpose of an action is to compute the semantic value of the whole construct from the semantic values of its parts. For example, suppose we have a rule which says an expression can be the sum of two @@ -980,7 +992,36 @@ expr: expr '+' expr @{ $$ = $1 + $3; @} The action says how to produce the semantic value of the sum expression from the values of the two subexpressions. -@node Bison Parser, Stages, Semantic Actions, Concepts +@node Locations Overview, Bison Parser, Semantic Actions, Concepts +@section Locations +@cindex location +@cindex textual position +@cindex position, textual + +Many applications, like interpreters or compilers, have to produce verbose +and useful error messages. To achieve this, one must be able to keep track of +the @dfn{textual position}, or @dfn{location}, of each syntactic construct. +Bison provides a mechanism for handling these locations. + +Each token has a semantic value. In a similar fashion, each token has an +associated location, but the type of locations is the same for all tokens and +groupings. Moreover, the output parser is equipped with a default data +structure for storing locations (@pxref{Locations}, for more details). + +Like semantic values, locations can be reached in actions using a dedicated +set of constructs. In the example above, the location of the whole grouping +is @code{@@$}, while the locations of the subexpressions are @code{@@1} and +@code{@@3}. + +When a rule is matched, a default action is used to compute the semantic value +of its left hand side (@pxref{Actions}). In the same way, another default +action is used for locations. However, the action for locations is general +enough for most cases, meaning there is usually no need to describe for each +rule how @code{@@$} should be formed. When building a new location for a given +grouping, the default behavior of the output parser is to take the beginning +of the first symbol, and the end of the last symbol. + +@node Bison Parser, Stages, Locations Overview, Concepts @section Bison Output: the Parser File @cindex Bison parser @cindex Bison utility @@ -1250,9 +1291,9 @@ main job of most actions. The semantic values of the components of the rule are referred to as @code{$1}, @code{$2}, and so on. @menu -* Rpcalc Input:: -* Rpcalc Line:: -* Rpcalc Expr:: +* Rpcalc Input:: +* Rpcalc Line:: +* Rpcalc Expr:: @end menu @node Rpcalc Input, Rpcalc Line, , Rpcalc Rules @@ -1411,7 +1452,7 @@ Here is the code for the lexical analyzer: @example @group -/* Lexical analyzer returns a double floating point +/* Lexical analyzer returns a double floating point number on the stack and the token NUM, or the ASCII character read if not a number. Skips all blanks and tabs, returns 0 for EOF. */ @@ -1420,17 +1461,18 @@ Here is the code for the lexical analyzer: @end group @group -yylex () +int +yylex (void) @{ int c; /* skip white space */ - while ((c = getchar ()) == ' ' || c == '\t') + while ((c = getchar ()) == ' ' || c == '\t') ; @end group @group /* process numbers */ - if (c == '.' || isdigit (c)) + if (c == '.' || isdigit (c)) @{ ungetc (c, stdin); scanf ("%lf", &yylval); @@ -1439,10 +1481,10 @@ yylex () @end group @group /* return end-of-file */ - if (c == EOF) + if (c == EOF) return 0; /* return single chars */ - return c; + return c; @} @end group @end example @@ -1458,9 +1500,10 @@ kept to the bare minimum. The only requirement is that it call @example @group -main () +int +main (void) @{ - yyparse (); + return yyparse (); @} @end group @end example @@ -1470,16 +1513,17 @@ main () @cindex error reporting routine When @code{yyparse} detects a syntax error, it calls the error reporting -function @code{yyerror} to print an error message (usually but not always -@code{"parse error"}). It is up to the programmer to supply @code{yyerror} -(@pxref{Interface, ,Parser C-Language Interface}), so here is the definition we will use: +function @code{yyerror} to print an error message (usually but not +always @code{"parse error"}). It is up to the programmer to supply +@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so +here is the definition we will use: @example @group #include -yyerror (s) /* Called by yyparse on error */ - char *s; +void +yyerror (const char *s) /* Called by yyparse on error */ @{ printf ("%s\n", s); @} @@ -1491,17 +1535,18 @@ and continue parsing if the grammar contains a suitable error rule (@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We have not written any error rules in this example, so any invalid input will cause the calculator program to exit. This is not clean behavior for a -real calculator, but it is adequate in the first example. +real calculator, but it is adequate for the first example. @node Rpcalc Gen, Rpcalc Compile, Rpcalc Error, RPN Calc @subsection Running Bison to Make the Parser @cindex running Bison (introduction) -Before running Bison to produce a parser, we need to decide how to arrange -all the source code in one or more source files. For such a simple example, -the easiest thing is to put everything in one file. The definitions of -@code{yylex}, @code{yyerror} and @code{main} go at the end, in the -``additional C code'' section of the file (@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}). +Before running Bison to produce a parser, we need to decide how to +arrange all the source code in one or more source files. For such a +simple example, the easiest thing is to put everything in one file. The +definitions of @code{yylex}, @code{yyerror} and @code{main} go at the +end, in the ``additional C code'' section of the file (@pxref{Grammar +Layout, ,The Overall Layout of a Bison Grammar}). For a large project, you would probably have several source files, and use @code{make} to arrange to recompile them. @@ -1615,8 +1660,8 @@ exp: NUM @{ $$ = $1; @} @end example @noindent -The functions @code{yylex}, @code{yyerror} and @code{main} can be the same -as before. +The functions @code{yylex}, @code{yyerror} and @code{main} can be the +same as before. There are two important new features shown in this code. @@ -1658,10 +1703,10 @@ Here is a sample run of @file{calc.y}: Up to this point, this manual has not addressed the issue of @dfn{error recovery}---how to continue parsing after the parser detects a syntax -error. All we have handled is error reporting with @code{yyerror}. Recall -that by default @code{yyparse} returns after calling @code{yyerror}. This -means that an erroneous input line causes the calculator program to exit. -Now we show how to rectify this deficiency. +error. All we have handled is error reporting with @code{yyerror}. +Recall that by default @code{yyparse} returns after calling +@code{yyerror}. This means that an erroneous input line causes the +calculator program to exit. Now we show how to rectify this deficiency. The Bison language itself includes the reserved word @code{error}, which may be included in the grammar rules. In the example below it has @@ -1676,14 +1721,15 @@ line: '\n' @end group @end example -This addition to the grammar allows for simple error recovery in the event -of a parse error. If an expression that cannot be evaluated is read, the -error will be recognized by the third rule for @code{line}, and parsing -will continue. (The @code{yyerror} function is still called upon to print -its message as well.) The action executes the statement @code{yyerrok}, a -macro defined automatically by Bison; its meaning is that error recovery is -complete (@pxref{Error Recovery}). Note the difference between -@code{yyerrok} and @code{yyerror}; neither one is a misprint.@refill +This addition to the grammar allows for simple error recovery in the +event of a parse error. If an expression that cannot be evaluated is +read, the error will be recognized by the third rule for @code{line}, +and parsing will continue. (The @code{yyerror} function is still called +upon to print its message as well.) The action executes the statement +@code{yyerrok}, a macro defined automatically by Bison; its meaning is +that error recovery is complete (@pxref{Error Recovery}). Note the +difference between @code{yyerrok} and @code{yyerror}; neither one is a +misprint.@refill This form of error recovery deals with syntax errors. There are other kinds of errors; for example, division by zero, which raises an exception @@ -1707,7 +1753,7 @@ as @code{sin}, @code{cos}, etc. It is easy to add new operators to the infix calculator as long as they are only single-character literals. The lexical analyzer @code{yylex} passes -back all non-number characters as tokens, so new grammar rules suffice for +back all nonnumber characters as tokens, so new grammar rules suffice for adding a new operator. But we want something more flexible: built-in functions whose syntax has this form: @@ -1845,14 +1891,18 @@ provides for either functions or variables to be placed in the table. @smallexample @group +/* Fonctions type. */ +typedef double (*func_t) (double); + /* Data type for links in the chain of symbols. */ struct symrec @{ char *name; /* name of symbol */ int type; /* type of symbol: either VAR or FNCT */ - union @{ - double var; /* value of a VAR */ - double (*fnctptr)(); /* value of a FNCT */ + union + @{ + double var; /* value of a VAR */ + func_t fnctptr; /* value of a FNCT */ @} value; struct symrec *next; /* link field */ @}; @@ -1864,8 +1914,8 @@ typedef struct symrec symrec; /* The symbol table: a chain of `struct symrec'. */ extern symrec *sym_table; -symrec *putsym (); -symrec *getsym (); +symrec *putsym (const char *, func_t); +symrec *getsym (const char *); @end group @end smallexample @@ -1877,16 +1927,17 @@ function that initializes the symbol table. Here it is, and @group #include -main () +int +main (void) @{ init_table (); - yyparse (); + return yyparse (); @} @end group @group -yyerror (s) /* Called by yyparse on error */ - char *s; +void +yyerror (const char *s) /* Called by yyparse on error */ @{ printf ("%s\n", s); @} @@ -1894,28 +1945,30 @@ yyerror (s) /* Called by yyparse on error */ struct init @{ char *fname; - double (*fnct)(); + double (*fnct)(double); @}; @end group @group -struct init arith_fncts[] - = @{ - "sin", sin, - "cos", cos, - "atan", atan, - "ln", log, - "exp", exp, - "sqrt", sqrt, - 0, 0 - @}; +struct init arith_fncts[] = +@{ + "sin", sin, + "cos", cos, + "atan", atan, + "ln", log, + "exp", exp, + "sqrt", sqrt, + 0, 0 +@}; /* The symbol table: a chain of `struct symrec'. */ -symrec *sym_table = (symrec *)0; +symrec *sym_table = (symrec *) 0; @end group @group -init_table () /* puts arithmetic functions in table. */ +/* Put arithmetic functions in table. */ +void +init_table (void) @{ int i; symrec *ptr; @@ -1940,9 +1993,7 @@ found, a pointer to that symbol is returned; otherwise zero is returned. @smallexample symrec * -putsym (sym_name,sym_type) - char *sym_name; - int sym_type; +putsym (char *sym_name, int sym_type) @{ symrec *ptr; ptr = (symrec *) malloc (sizeof (symrec)); @@ -1956,8 +2007,7 @@ putsym (sym_name,sym_type) @} symrec * -getsym (sym_name) - char *sym_name; +getsym (const char *sym_name) @{ symrec *ptr; for (ptr = sym_table; ptr != (symrec *) 0; @@ -1970,7 +2020,7 @@ getsym (sym_name) The function @code{yylex} must now recognize variables, numeric values, and the single-character arithmetic operators. Strings of alphanumeric -characters with a leading nondigit are recognized as either variables or +characters with a leading non-digit are recognized as either variables or functions depending on what the symbol table says about them. The string is passed to @code{getsym} for look up in the symbol table. If @@ -1986,7 +2036,9 @@ operators in @code{yylex}. @smallexample @group #include -yylex () + +int +yylex (void) @{ int c; @@ -2097,6 +2149,7 @@ The Bison grammar input file conventionally has a name ending in @samp{.y}. * Rules:: How to write grammar rules. * Recursion:: Writing recursive rules. * Semantics:: Semantic values and actions. +* Locations:: Locations and actions. * Declarations:: All kinds of Bison declarations are described here. * Multiple Parsers:: Putting more than one Bison parser in one program. @end menu @@ -2170,12 +2223,13 @@ if it is the first thing in the file. @cindex additional C code section @cindex C code, section for additional -The @var{additional C code} section is copied verbatim to the end of -the parser file, just as the @var{C declarations} section is copied to -the beginning. This is the most convenient place to put anything -that you want to have in the parser file but which need not come before -the definition of @code{yyparse}. For example, the definitions of -@code{yylex} and @code{yyerror} often go here. @xref{Interface, ,Parser C-Language Interface}. +The @var{additional C code} section is copied verbatim to the end of the +parser file, just as the @var{C declarations} section is copied to the +beginning. This is the most convenient place to put anything that you +want to have in the parser file but which need not come before the +definition of @code{yyparse}. For example, the definitions of +@code{yylex} and @code{yyerror} often go here. @xref{Interface, ,Parser +C-Language Interface}. If the last section is empty, you may omit the @samp{%%} that separates it from the grammar rules. @@ -2246,11 +2300,11 @@ for @code{yylex}}). @item @cindex string token @cindex literal string token -@cindex multi-character literal +@cindex multicharacter literal A @dfn{literal string token} is written like a C string constant; for example, @code{"<="} is a literal string token. A literal string token doesn't need to be declared unless you need to specify its semantic -value data type (@pxref{Value Type}), associativity, precedence +value data type (@pxref{Value Type}), associativity, or precedence (@pxref{Precedence}). You can associate the literal string token with a symbolic name as an @@ -2264,7 +2318,7 @@ retrieve the token number for the literal string token from the By convention, a literal string token is used only to represent a token that consists of that particular string. Thus, you should use the token type @code{"<="} to represent the string @samp{<=} as a token. Bison -does not enforces this convention, but if you depart from it, people who +does not enforce this convention, but if you depart from it, people who read your program will be confused. All the escape sequences used in string literals in C can be used in @@ -2284,7 +2338,7 @@ The numeric code for a character token type is simply the ASCII code for the character, so @code{yylex} can use the identical character constant to generate the requisite code. Each named token type becomes a C macro in the parser file, so @code{yylex} can use the name to stand for the code. -(This is why periods don't make sense in terminal symbols.) +(This is why periods don't make sense in terminal symbols.) @xref{Calling Convention, ,Calling Convention for @code{yylex}}. If @code{yylex} is defined in a separate file, you need to arrange for the @@ -2313,9 +2367,9 @@ A Bison grammar rule has the following general form: @end example @noindent -where @var{result} is the nonterminal symbol that this rule describes +where @var{result} is the nonterminal symbol that this rule describes, and @var{components} are various terminal and nonterminal symbols that -are put together by this rule (@pxref{Symbols}). +are put together by this rule (@pxref{Symbols}). For example, @@ -2399,8 +2453,8 @@ with no components. A rule is called @dfn{recursive} when its @var{result} nonterminal appears also on its right hand side. Nearly all Bison grammars need to use recursion, because that is the only way to define a sequence of any number -of somethings. Consider this recursive definition of a comma-separated -sequence of one or more expressions: +of a particular thing. Consider this recursive definition of a +comma-separated sequence of one or more expressions: @example @group @@ -2439,7 +2493,7 @@ further explanation of this. @dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the rule does not appear directly on its right hand side, but does appear in rules for other nonterminals which do appear on its right hand -side. +side. For example: @@ -2461,10 +2515,10 @@ primary: constant defines two mutually-recursive nonterminals, since each refers to the other. -@node Semantics, Declarations, Recursion, Grammar File +@node Semantics, Locations, Recursion, Grammar File @section Defining Language Semantics @cindex defining language semantics -@cindex language semantics, defining +@cindex language semantics, defining The grammar rules for a language determine only the syntax. The semantics are determined by the semantic values associated with various tokens and @@ -2524,10 +2578,11 @@ Specify the entire collection of possible data types, with the @code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}). @item -Choose one of those types for each symbol (terminal or nonterminal) -for which semantic values are used. This is done for tokens with the -@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) and for groupings -with the @code{%type} Bison declaration (@pxref{Type Decl, ,Nonterminal Symbols}). +Choose one of those types for each symbol (terminal or nonterminal) for +which semantic values are used. This is done for tokens with the +@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) +and for groupings with the @code{%type} Bison declaration (@pxref{Type +Decl, ,Nonterminal Symbols}). @end itemize @node Actions, Action Types, Multiple Types, Semantics @@ -2813,7 +2868,119 @@ the action is now at the end of its rule. Any mid-rule action can be converted to an end-of-rule action in this way, and this is what Bison actually does to implement mid-rule actions. -@node Declarations, Multiple Parsers, Semantics, Grammar File +@node Locations, Declarations, Semantics, Grammar File +@section Tracking Locations +@cindex location +@cindex textual position +@cindex position, textual + +Though grammar rules and semantic actions are enough to write a fully +functional parser, it can be useful to process some additionnal informations, +especially locations of tokens and groupings. + +The way locations are handled is defined by providing a data type, and actions +to take when rules are matched. + +@menu +* Location Type:: Specifying a data type for locations. +* Actions and Locations:: Using locations in actions. +* Location Default Action:: Defining a general way to compute locations. +@end menu + +@node Location Type, Actions and Locations, , Locations +@subsection Data Type of Locations +@cindex data type of locations +@cindex default location type + +Defining a data type for locations is much simpler than for semantic values, +since all tokens and groupings always use the same type. + +The type of locations is specified by defining a macro called @code{YYLTYPE}. +When @code{YYLTYPE} is not defined, Bison uses a default structure type with +four members: + +@example +struct +@{ + int first_line; + int first_column; + int last_line; + int last_column; +@} +@end example + +@node Actions and Locations, Location Default Action, Location Type, Locations +@subsection Actions and Locations +@cindex location actions +@cindex actions, location +@vindex @@$ +@vindex @@@var{n} + +Actions are not only useful for defining language semantics, but also for +describing the behavior of the output parser with locations. + +The most obvious way for building locations of syntactic groupings is very +similar to the way semantic values are computed. In a given rule, several +constructs can be used to access the locations of the elements being matched. +The location of the @var{n}th component of the right hand side is +@code{@@@var{n}}, while the location of the left hand side grouping is +@code{@@$}. + +Here is a simple example using the default data type for locations: + +@example +@group +exp: @dots{} + | exp '+' exp + @{ + @@$.last_column = @@3.last_column; + @@$.last_line = @@3.last_line; + $$ = $1 + $3; + @} +@end group +@end example + +@noindent +In the example above, there is no need to set the beginning of @code{@@$}. The +output parser always sets @code{@@$} to @code{@@1} before executing the C +code of a given action, whether you provide a processing for locations or not. + +@node Location Default Action, , Actions and Locations, Locations +@subsection Default Action for Locations +@vindex YYLLOC_DEFAULT + +Actually, actions are not the best place to compute locations. Since locations +are much more general than semantic values, there is room in the output parser +to define a default action to take for each rule. The @code{YYLLOC_DEFAULT} +macro is called each time a rule is matched, before the associated action is +run. + +@c Documentation for the old (?) YYLLOC_DEFAULT + +This macro takes two parameters, the first one being the location of the +grouping (the result of the computation), and the second one being the +location of the last element matched. Of course, before @code{YYLLOC_DEFAULT} +is run, the result is set to the location of the first component matched. + +By default, this macro computes a location that ranges from the beginning of +the first element to the end of the last element. It is defined this way: + +@example +@group +#define YYLLOC_DEFAULT(Current, Last) \ + Current.last_line = Last.last_line; \ + Current.last_column = Last.last_column; +@end group +@end example + +@c not Documentation for the old (?) YYLLOC_DEFAULT + +@noindent + +Most of the time, the default action for locations is general enough to +suppress location dedicated code from most actions. + +@node Declarations, Multiple Parsers, Locations, Grammar File @section Bison Declarations @cindex declarations, Bison @cindex Bison declarations @@ -2859,9 +3026,10 @@ Bison will convert this into a @code{#define} directive in the parser, so that the function @code{yylex} (if it is in this file) can use the name @var{name} to stand for this token type's code. -Alternatively, you can use @code{%left}, @code{%right}, or @code{%nonassoc} -instead of @code{%token}, if you wish to specify precedence. -@xref{Precedence Decl, ,Operator Precedence}. +Alternatively, you can use @code{%left}, @code{%right}, or +@code{%nonassoc} instead of @code{%token}, if you wish to specify +associativity and precedence. @xref{Precedence Decl, ,Operator +Precedence}. You can explicitly specify the numeric code for a token type by appending an integer value in the field immediately following the token name: @@ -2877,7 +3045,7 @@ with each other or with ASCII characters. In the event that the stack type is a union, you must augment the @code{%token} or other token declaration to include the data type -alternative delimited by angle-brackets (@pxref{Multiple Types, ,More Than One Value Type}). +alternative delimited by angle-brackets (@pxref{Multiple Types, ,More Than One Value Type}). For example: @@ -2973,7 +3141,7 @@ the one declared later has the higher precedence and is grouped first. The @code{%union} declaration specifies the entire collection of possible data types for semantic values. The keyword @code{%union} is followed by a pair of braces containing the same thing that goes inside a @code{union} in -C. +C. For example: @@ -3094,28 +3262,36 @@ may override this restriction with the @code{%start} declaration as follows: A @dfn{reentrant} program is one which does not alter in the course of execution; in other words, it consists entirely of @dfn{pure} (read-only) code. Reentrancy is important whenever asynchronous execution is possible; -for example, a nonreentrant program may not be safe to call from a signal -handler. In systems with multiple threads of control, a nonreentrant +for example, a non-reentrant program may not be safe to call from a signal +handler. In systems with multiple threads of control, a non-reentrant program must be called only within interlocks. -The Bison parser is not normally a reentrant program, because it uses -statically allocated variables for communication with @code{yylex}. These -variables include @code{yylval} and @code{yylloc}. +Normally, Bison generates a parser which is not reentrant. This is +suitable for most uses, and it permits compatibility with YACC. (The +standard YACC interfaces are inherently nonreentrant, because they use +statically allocated variables for communication with @code{yylex}, +including @code{yylval} and @code{yylloc}.) -The Bison declaration @code{%pure_parser} says that you want the parser -to be reentrant. It looks like this: +Alternatively, you can generate a pure, reentrant parser. The Bison +declaration @code{%pure_parser} says that you want the parser to be +reentrant. It looks like this: @example %pure_parser @end example -The effect is that the two communication variables become local -variables in @code{yyparse}, and a different calling convention is used -for the lexical analyzer function @code{yylex}. @xref{Pure Calling, -,Calling Conventions for Pure Parsers}, for the details of this. The -variable @code{yynerrs} also becomes local in @code{yyparse} -(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). -The convention for calling @code{yyparse} itself is unchanged. +The result is that the communication variables @code{yylval} and +@code{yylloc} become local variables in @code{yyparse}, and a different +calling convention is used for the lexical analyzer function +@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure +Parsers}, for the details of this. The variable @code{yynerrs} also +becomes local in @code{yyparse} (@pxref{Error Reporting, ,The Error +Reporting Function @code{yyerror}}). The convention for calling +@code{yyparse} itself is unchanged. + +Whether the parser is pure has nothing to do with the grammar rules. +You can generate either a pure parser or a nonreentrant parser from any +valid grammar. @node Decl Summary, , Pure Decl, Declarations @subsection Bison Declaration Summary @@ -3152,14 +3328,37 @@ Declare the type of semantic values for a nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}). @item %start -Specify the grammar's start symbol (@pxref{Start Decl, ,The Start-Symbol}). +Specify the grammar's start symbol (@pxref{Start Decl, ,The +Start-Symbol}). @item %expect Declare the expected number of shift-reduce conflicts (@pxref{Expect Decl, ,Suppressing Conflict Warnings}). +@item %yacc +@itemx %fixed_output_files +Pretend the option @option{--yacc} was given, i.e., imitate Yacc, +including its naming conventions. @xref{Bison Options}, for more. + +@item %locations +Generate the code processing the locations (@pxref{Action Features, +,Special Features for Use in Actions}). This mode is enabled as soon as +the grammar uses the special @samp{@@@var{n}} tokens, but if your +grammar does not use it, using @samp{%locations} allows for more +accurate parse error messages. + @item %pure_parser -Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure +(Reentrant) Parser}). + +@item %no_parser +Do not include any C code in the parser file; generate tables only. The +parser file contains just @code{#define} directives and static variable +declarations. + +This option also tells Bison to write the C code for the grammar actions +into a file named @file{@var{filename}.act}, in the form of a +brace-surrounded body fit for a @code{switch} statement. @item %no_lines Don't generate any @code{#line} preprocessor commands in the parser @@ -3169,12 +3368,38 @@ your source file (the grammar file). This directive causes them to associate errors with the parser file, treating it an independent source file in its own right. -@item %raw -The output file @file{@var{name}.h} normally defines the tokens with -Yacc-compatible token numbers. If this option is specified, the -internal Bison numbers are used instead. (Yacc-compatible numbers start -at 257 except for single character tokens; Bison assigns token numbers -sequentially for all tokens starting at 3.) +@item %debug +Output a definition of the macro @code{YYDEBUG} into the parser file, so +that the debugging facilities are compiled. @xref{Debugging, ,Debugging +Your Parser}. + +@item %defines +Write an extra output file containing macro definitions for the token +type names defined in the grammar and the semantic value type +@code{YYSTYPE}, as well as a few @code{extern} variable declarations. + +If the parser output file is named @file{@var{name}.c} then this file +is named @file{@var{name}.h}.@refill + +This output file is essential if you wish to put the definition of +@code{yylex} in a separate source file, because @code{yylex} needs to +be able to refer to token type codes and the variable +@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill + +@item %verbose +Write an extra output file containing verbose descriptions of the +parser states and what is done for each type of look-ahead token in +that state. + +This file also describes all the conflicts, both those resolved by +operator precedence and the unresolved ones. + +The file's name is made by removing @samp{.tab.c} or @samp{.c} from +the parser output file name, and adding @samp{.output} instead.@refill + +Therefore, if the input file is @file{foo.y}, then the parser file is +called @file{foo.tab.c} by default. As a consequence, the verbose +output file is called @file{foo.output}.@refill @item %token_table Generate an array of token names in the parser file. The name of the @@ -3202,7 +3427,7 @@ definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and @item YYNTOKENS The highest token number, plus one. @item YYNNTS -The number of non-terminal symbols. +The number of nonterminal symbols. @item YYNRULES The number of grammar rules, @item YYNSTATES @@ -3256,7 +3481,7 @@ C code in the grammar file, you are likely to run into trouble. @menu * Parser Function:: How to call @code{yyparse} and what it returns. -* Lexical:: You must supply a function @code{yylex} +* Lexical:: You must supply a function @code{yylex} which reads tokens. * Error Reporting:: You must supply a function @code{yyerror}. * Action Features:: Special features for use in actions. @@ -3269,8 +3494,8 @@ C code in the grammar file, you are likely to run into trouble. You call the function @code{yyparse} to cause parsing to occur. This function reads tokens, executes actions, and ultimately returns when it encounters end-of-input or an unrecoverable syntax error. You can also -write an action which directs @code{yyparse} to return immediately without -reading further. +write an action which directs @code{yyparse} to return immediately +without reading further. The value returned by @code{yyparse} is 0 if parsing was successful (return is due to end-of-input). @@ -3339,7 +3564,8 @@ signifies end-of-input. Here is an example showing these things: @example -yylex () +int +yylex (void) @{ @dots{} if (c == EOF) /* Detect end of file. */ @@ -3368,9 +3594,9 @@ all others. In this case, the use of the literal string tokens in the grammar file has no effect on @code{yylex}. @item -@code{yylex} can find the multi-character token in the @code{yytname} +@code{yylex} can find the multicharacter token in the @code{yytname} table. The index of the token in the table is the token type's code. -The name of a multi-character token is recorded in @code{yytname} with a +The name of a multicharacter token is recorded in @code{yytname} with a double-quote, the token's characters, and another double-quote. The token's characters are not escaped in any way; they appear verbatim in the contents of the string in the table. @@ -3383,7 +3609,8 @@ for (i = 0; i < YYNTOKENS; i++) @{ if (yytname[i] != 0 && yytname[i][0] == '"' - && strncmp (yytname[i] + 1, token_buffer, strlen (token_buffer)) + && strncmp (yytname[i] + 1, token_buffer, + strlen (token_buffer)) && yytname[i][strlen (token_buffer) + 1] == '"' && yytname[i][strlen (token_buffer) + 2] == 0) break; @@ -3398,7 +3625,7 @@ The @code{yytname} table is generated only if you use the @subsection Semantic Values of Tokens @vindex yylval -In an ordinary (nonreentrant) parser, the semantic value of the token must +In an ordinary (non-reentrant) parser, the semantic value of the token must be stored into the global variable @code{yylval}. When you are using just one data type for semantic values, @code{yylval} has that type. Thus, if the type is @code{int} (the default), you might write this in @@ -3444,16 +3671,19 @@ then the code in @code{yylex} might look like this: @subsection Textual Positions of Tokens @vindex yylloc -If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, ,Special Features for Use in Actions}) in -actions to keep track of the textual locations of tokens and groupings, -then you must provide this information in @code{yylex}. The function -@code{yyparse} expects to find the textual location of a token just parsed -in the global variable @code{yylloc}. So @code{yylex} must store the -proper data in that variable. The value of @code{yylloc} is a structure -and you need only initialize the members that are going to be used by the -actions. The four members are called @code{first_line}, -@code{first_column}, @code{last_line} and @code{last_column}. Note that -the use of this feature makes the parser noticeably slower. +If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , +Tracking Locations}) in actions to keep track of the +textual locations of tokens and groupings, then you must provide this +information in @code{yylex}. The function @code{yyparse} expects to +find the textual location of a token just parsed in the global variable +@code{yylloc}. So @code{yylex} must store the proper data in that +variable. + +By default, the value of @code{yylloc} is a structure and you need only +initialize the members that are going to be used by the actions. The +four members are called @code{first_line}, @code{first_column}, +@code{last_line} and @code{last_column}. Note that the use of this +feature makes the parser noticeably slower. @tindex YYLTYPE The data type of @code{yylloc} has the name @code{YYLTYPE}. @@ -3470,9 +3700,8 @@ shown here, and pass the information back by storing it through those pointers. @example -yylex (lvalp, llocp) - YYSTYPE *lvalp; - YYLTYPE *llocp; +int +yylex (YYSTYPE *lvalp, YYLTYPE *llocp) @{ @dots{} *lvalp = value; /* Put value onto Bison stack. */ @@ -3574,9 +3803,10 @@ with no arguments, as usual. @cindex syntax error The Bison parser detects a @dfn{parse error} or @dfn{syntax error} -whenever it reads a token which cannot satisfy any syntax rule. A +whenever it reads a token which cannot satisfy any syntax rule. An action in the grammar can also explicitly proclaim an error, using the -macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use in Actions}). +macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use +in Actions}). The Bison parser expects to report the error by calling an error reporting function named @code{yyerror}, which you must supply. It is @@ -3586,10 +3816,11 @@ receives one argument. For a parse error, the string is normally @findex YYERROR_VERBOSE If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations -section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose -and specific error message string instead of just plain @w{@code{"parse -error"}}. It doesn't matter what definition you use for -@code{YYERROR_VERBOSE}, just whether you define it. +section (@pxref{Bison Declarations, ,The Bison Declarations Section}), +then Bison provides a more verbose and specific error message string +instead of just plain @w{@code{"parse error"}}. It doesn't matter what +definition you use for @code{YYERROR_VERBOSE}, just whether you define +it. The parser can detect one other kind of error: stack overflow. This happens when the input contains constructions that are very deeply @@ -3603,8 +3834,8 @@ The following definition suffices in simple programs: @example @group -yyerror (s) - char *s; +void +yyerror (char *s) @{ @end group @group @@ -3647,7 +3878,7 @@ specified by the @code{%union} declaration. @xref{Action Types, ,Data Types of @item $<@var{typealt}>@var{n} Like @code{$@var{n}} but specifies alternative @var{typealt} in the -union specified by the @code{%union} declaration. +union specified by the @code{%union} declaration. @xref{Action Types, ,Data Types of Values in Actions}.@refill @item YYABORT; @@ -3702,36 +3933,45 @@ error rules. @xref{Error Recovery}. @item yyerrok; Resume generating error messages immediately for subsequent syntax -errors. This is useful primarily in error rules. +errors. This is useful primarily in error rules. @xref{Error Recovery}. -@item @@@var{n} -@findex @@@var{n} -Acts like a structure variable containing information on the line -numbers and column numbers of the @var{n}th component of the current -rule. The structure has four members, like this: +@item @@$ +@findex @@$ +Acts like a structure variable containing information on the textual position +of the grouping made by the current rule. @xref{Locations, , +Tracking Locations}. -@example -struct @{ - int first_line, last_line; - int first_column, last_column; -@}; -@end example +@c Check if those paragraphs are still useful or not. -Thus, to get the starting line number of the third component, use -@samp{@@3.first_line}. +@c @example +@c struct @{ +@c int first_line, last_line; +@c int first_column, last_column; +@c @}; +@c @end example -In order for the members of this structure to contain valid information, -you must make @code{yylex} supply this information about each token. -If you need only certain members, then @code{yylex} need only fill in -those members. +@c Thus, to get the starting line number of the third component, you would +@c use @samp{@@3.first_line}. + +@c In order for the members of this structure to contain valid information, +@c you must make @code{yylex} supply this information about each token. +@c If you need only certain members, then @code{yylex} need only fill in +@c those members. + +@c The use of this feature makes the parser noticeably slower. + +@item @@@var{n} +@findex @@@var{n} +Acts like a structure variable containing information on the textual position +of the @var{n}th component of the current rule. @xref{Locations, , +Tracking Locations}. -The use of this feature makes the parser noticeably slower. @end table @node Algorithm, Error Recovery, Interface, Top -@chapter The Bison Parser Algorithm -@cindex Bison parser algorithm +@chapter The Bison Parser Algorithm +@cindex Bison parser algorithm @cindex algorithm of parser @cindex shifting @cindex reduction @@ -3983,33 +4223,33 @@ expr: expr '-' expr @noindent Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2}; -should it reduce them via the rule for the addition operator? It depends -on the next token. Of course, if the next token is @samp{)}, we must -reduce; shifting is invalid because no single rule can reduce the token -sequence @w{@samp{- 2 )}} or anything starting with that. But if the next -token is @samp{*} or @samp{<}, we have a choice: either shifting or -reduction would allow the parse to complete, but with different -results. - -To decide which one Bison should do, we must consider the -results. If the next operator token @var{op} is shifted, then it -must be reduced first in order to permit another opportunity to -reduce the sum. The result is (in effect) @w{@samp{1 - (2 -@var{op} 3)}}. On the other hand, if the subtraction is reduced -before shifting @var{op}, the result is @w{@samp{(1 - 2) @var{op} -3}}. Clearly, then, the choice of shift or reduce should depend -on the relative precedence of the operators @samp{-} and -@var{op}: @samp{*} should be shifted first, but not @samp{<}. +should it reduce them via the rule for the subtraction operator? It +depends on the next token. Of course, if the next token is @samp{)}, we +must reduce; shifting is invalid because no single rule can reduce the +token sequence @w{@samp{- 2 )}} or anything starting with that. But if +the next token is @samp{*} or @samp{<}, we have a choice: either +shifting or reduction would allow the parse to complete, but with +different results. + +To decide which one Bison should do, we must consider the results. If +the next operator token @var{op} is shifted, then it must be reduced +first in order to permit another opportunity to reduce the difference. +The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other +hand, if the subtraction is reduced before shifting @var{op}, the result +is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or +reduce should depend on the relative precedence of the operators +@samp{-} and @var{op}: @samp{*} should be shifted first, but not +@samp{<}. @cindex associativity What about input such as @w{@samp{1 - 2 - 5}}; should this be -@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For -most operators we prefer the former, which is called @dfn{left -association}. The latter alternative, @dfn{right association}, is -desirable for assignment operators. The choice of left or right -association is a matter of whether the parser chooses to shift or -reduce when the stack contains @w{@samp{1 - 2}} and the look-ahead -token is @samp{-}: shifting makes right-associativity. +@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most +operators we prefer the former, which is called @dfn{left association}. +The latter alternative, @dfn{right association}, is desirable for +assignment operators. The choice of left or right association is a +matter of whether the parser chooses to shift or reduce when the stack +contains @w{@samp{1 - 2}} and the look-ahead token is @samp{-}: shifting +makes right-associativity. @node Using Precedence, Precedence Examples, Why Precedence, Precedence @subsection Specifying Operator Precedence @@ -4317,7 +4557,7 @@ name_list: @end example It would seem that this grammar can be parsed with only a single token -of look-ahead: when a @code{param_spec} is being read, an @code{ID} is +of look-ahead: when a @code{param_spec} is being read, an @code{ID} is a @code{name} if a comma or colon follows, or a @code{type} if another @code{ID} follows. In other words, this grammar is LR(1). @@ -4444,7 +4684,7 @@ recognize the special token @code{error}. This is a terminal symbol that is always defined (you need not declare it) and reserved for error handling. The Bison parser generates an @code{error} token whenever a syntax error happens; if you have provided a rule to recognize this token -in the current context, the parse can continue. +in the current context, the parse can continue. For example: @@ -4600,11 +4840,11 @@ static int foo (lose); /* @r{redeclare @code{foo} as function} */ Unfortunately, the name being declared is separated from the declaration construct itself by a complicated syntactic structure---the ``declarator''. -As a result, the part of Bison parser for C needs to be duplicated, with -all the nonterminal names changed: once for parsing a declaration in which -a typedef name can be redefined, and once for parsing a declaration in -which that can't be done. Here is a part of the duplication, with actions -omitted for brevity: +As a result, part of the Bison parser for C needs to be duplicated, with +all the nonterminal names changed: once for parsing a declaration in +which a typedef name can be redefined, and once for parsing a +declaration in which that can't be done. Here is a part of the +duplication, with actions omitted for brevity: @example initdcl: @@ -4682,7 +4922,7 @@ it is nonzero, all integers are parsed in hexadecimal, and tokens starting with letters are parsed as integers if possible. The declaration of @code{hexflag} shown in the C declarations section of -the parser file is needed to make it accessible to the actions +the parser file is needed to make it accessible to the actions (@pxref{C Declarations, ,The C Declarations Section}). You must also write the code in @code{yylex} to obey the flag. @@ -4754,7 +4994,7 @@ runs, the @code{yydebug} parser-trace feature can help you figure out why. To enable compilation of trace facilities, you must define the macro @code{YYDEBUG} when you compile the parser. You could use @samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define -YYDEBUG 1} in the C declarations section of the grammar file +YYDEBUG 1} in the C declarations section of the grammar file (@pxref{C Declarations, ,The C Declarations Section}). Alternatively, use the @samp{-t} option when you run Bison (@pxref{Invocation, ,Invoking Bison}). We always define @code{YYDEBUG} so that debugging is always possible. @@ -4814,10 +5054,7 @@ calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}): #define YYPRINT(file, type, value) yyprint (file, type, value) static void -yyprint (file, type, value) - FILE *file; - int type; - YYSTYPE value; +yyprint (FILE *file, int type, YYSTYPE value) @{ if (type == VAR) fprintf (file, " %s", value.tptr->name); @@ -4845,13 +5082,14 @@ with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields @file{hack/foo.tab.c}.@refill @menu -* Bison Options:: All the options described in detail, +* Bison Options:: All the options described in detail, in alphabetical order by short options. +* Environment Variables:: Variables which affect Bison execution. * Option Cross Key:: Alphabetical list of long options. * VMS Invocation:: Bison command syntax on VMS. @end menu -@node Bison Options, Option Cross Key, , Invocation +@node Bison Options, Environment Variables, , Invocation @section Bison Options Bison supports both traditional single-letter options and mnemonic long @@ -4865,50 +5103,50 @@ Here is a list of options that can be used with Bison, alphabetized by short option. It is followed by a cross key alphabetized by long option. -@table @samp -@item -b @var{file-prefix} -@itemx --file-prefix=@var{prefix} -Specify a prefix to use for all Bison output file names. The names are -chosen as if the input file were named @file{@var{prefix}.c}. - -@item -d -@itemx --defines -Write an extra output file containing macro definitions for the token -type names defined in the grammar and the semantic value type -@code{YYSTYPE}, as well as a few @code{extern} variable declarations. +@c Please, keep this ordered as in `bison --help'. +@noindent +Operations modes: +@table @option +@item -h +@itemx --help +Print a summary of the command-line options to Bison and exit. -If the parser output file is named @file{@var{name}.c} then this file -is named @file{@var{name}.h}.@refill +@item -V +@itemx --version +Print the version number of Bison and exit. -This output file is essential if you wish to put the definition of -@code{yylex} in a separate source file, because @code{yylex} needs to -be able to refer to token type codes and the variable -@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill +@need 1750 +@item -y +@itemx --yacc +@itemx --fixed-output-files +Equivalent to @samp{-o y.tab.c}; the parser output file is called +@file{y.tab.c}, and the other outputs are called @file{y.output} and +@file{y.tab.h}. The purpose of this option is to imitate Yacc's output +file name conventions. Thus, the following shell script can substitute +for Yacc:@refill -@item -l -@itemx --no-lines -Don't put any @code{#line} preprocessor commands in the parser file. -Ordinarily Bison puts them in the parser file so that the C compiler -and debuggers will associate errors with your source file, the -grammar file. This option causes them to associate errors with the -parser file, treating it as an independent source file in its own right. +@example +bison -y $* +@end example +@end table -@item -n -@itemx --no-parser -Do not include any C code in the parser file; generate tables only. The -parser file contains just @code{#define} directives and static variable -declarations. +@noindent +Tuning the parser: -This option also tells Bison to write the C code for the grammar actions -into a file named @file{@var{filename}.act}, in the form of a -brace-surrounded body fit for a @code{switch} statement. +@table @option +@item -S @var{file} +@itemx --skeleton=@var{file} +Specify the skeleton to use. You probably don't need this option unless +you are developing Bison. -@item -o @var{outfile} -@itemx --output-file=@var{outfile} -Specify the name @var{outfile} for the parser file. +@item -t +@itemx --debug +Output a definition of the macro @code{YYDEBUG} into the parser file, so +that the debugging facilities are compiled. @xref{Debugging, ,Debugging +Your Parser}. -The other output files' names are constructed from @var{outfile} -as described under the @samp{-v} and @samp{-d} options. +@item --locations +Pretend that @code{%locactions} was specified. @xref{Decl Summary}. @item -p @var{prefix} @itemx --name-prefix=@var{prefix} @@ -4922,55 +5160,79 @@ For example, if you use @samp{-p c}, the names become @code{cparse}, @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. -@item -r -@itemx --raw -Pretend that @code{%raw} was specified. @xref{Decl Summary}. +@item -l +@itemx --no-lines +Don't put any @code{#line} preprocessor commands in the parser file. +Ordinarily Bison puts them in the parser file so that the C compiler +and debuggers will associate errors with your source file, the +grammar file. This option causes them to associate errors with the +parser file, treating it as an independent source file in its own right. -@item -t -@itemx --debug -Output a definition of the macro @code{YYDEBUG} into the parser file, -so that the debugging facilities are compiled. @xref{Debugging, ,Debugging Your Parser}. +@item -n +@itemx --no-parser +Pretend that @code{%no_parser} was specified. @xref{Decl Summary}. + +@item -k +@itemx --token-table +Pretend that @code{%token_table} was specified. @xref{Decl Summary}. +@end table + +@noindent +Adjust the output: + +@table @option +@item -d +@itemx --defines +Pretend that @code{%verbose} was specified, i.e., write an extra output +file containing macro definitions for the token type names defined in +the grammar and the semantic value type @code{YYSTYPE}, as well as a few +@code{extern} variable declarations. @xref{Decl Summary}. + +@item -b @var{file-prefix} +@itemx --file-prefix=@var{prefix} +Specify a prefix to use for all Bison output file names. The names are +chosen as if the input file were named @file{@var{prefix}.c}. @item -v @itemx --verbose -Write an extra output file containing verbose descriptions of the -parser states and what is done for each type of look-ahead token in -that state. - -This file also describes all the conflicts, both those resolved by -operator precedence and the unresolved ones. +Pretend that @code{%verbose} was specified, i.e, write an extra output +file containing verbose descriptions of the grammar and +parser. @xref{Decl Summary}, for more. -The file's name is made by removing @samp{.tab.c} or @samp{.c} from -the parser output file name, and adding @samp{.output} instead.@refill +@item -o @var{outfile} +@itemx --output-file=@var{outfile} +Specify the name @var{outfile} for the parser file. -Therefore, if the input file is @file{foo.y}, then the parser file is -called @file{foo.tab.c} by default. As a consequence, the verbose -output file is called @file{foo.output}.@refill +The other output files' names are constructed from @var{outfile} +as described under the @samp{-v} and @samp{-d} options. +@end table -@item -V -@itemx --version -Print the version number of Bison and exit. +@node Environment Variables, Option Cross Key, Bison Options, Invocation +@section Environment Variables +@cindex environment variables +@cindex BISON_HAIRY +@cindex BISON_SIMPLE -@item -h -@itemx --help -Print a summary of the command-line options to Bison and exit. +Here is a list of environment variables which affect the way Bison +runs. -@need 1750 -@item -y -@itemx --yacc -@itemx --fixed-output-files -Equivalent to @samp{-o y.tab.c}; the parser output file is called -@file{y.tab.c}, and the other outputs are called @file{y.output} and -@file{y.tab.h}. The purpose of this option is to imitate Yacc's output -file name conventions. Thus, the following shell script can substitute -for Yacc:@refill +@table @samp +@item BISON_SIMPLE +@itemx BISON_HAIRY +Much of the parser generated by Bison is copied verbatim from a file +called @file{bison.simple}. If Bison cannot find that file, or if you +would like to direct Bison to use a different copy, setting the +environment variable @code{BISON_SIMPLE} to the path of the file will +cause Bison to use that copy instead. + +When the @samp{%semantic_parser} declaration is used, Bison copies from +a file called @file{bison.hairy} instead. The location of this file can +also be specified or overridden in a similar fashion, with the +@code{BISON_HAIRY} environment variable. -@example -bison -y $* -@end example @end table -@node Option Cross Key, VMS Invocation, Bison Options, Invocation +@node Option Cross Key, VMS Invocation, Environment Variables, Invocation @section Option Cross Key Here is a list of options, alphabetized by long option, to help you find @@ -4989,7 +5251,6 @@ the corresponding short option. \line{ --no-lines \leaderfill -l} \line{ --no-parser \leaderfill -n} \line{ --output-file \leaderfill -o} -\line{ --raw \leaderfill -r} \line{ --token-table \leaderfill -k} \line{ --verbose \leaderfill -v} \line{ --version \leaderfill -V} @@ -5008,7 +5269,6 @@ the corresponding short option. --no-lines -l --no-parser -n --output-file=@var{outfile} -o @var{outfile} ---raw -r --token-table -k --verbose -v --version -V @@ -5062,11 +5322,12 @@ token is reset to the token that originally caused the violation. @item YYABORT Macro to pretend that an unrecoverable syntax error has occurred, by making @code{yyparse} return 1 immediately. The error reporting -function @code{yyerror} is not called. @xref{Parser Function, ,The Parser Function @code{yyparse}}. +function @code{yyerror} is not called. @xref{Parser Function, ,The +Parser Function @code{yyparse}}. @item YYACCEPT Macro to pretend that a complete utterance of the language has been -read, by making @code{yyparse} return 0 immediately. +read, by making @code{yyparse} return 0 immediately. @xref{Parser Function, ,The Parser Function @code{yyparse}}. @item YYBACKUP @@ -5095,7 +5356,7 @@ Conventions for Pure Parsers}. @item YYLTYPE Macro for the data type of @code{yylloc}; a structure with four -members. @xref{Token Positions, ,Textual Positions of Tokens}. +members. @xref{Location Type, , Data Types of Locations}. @item yyltype Default value for YYLTYPE. @@ -5117,10 +5378,10 @@ Macro for the data type of semantic values; @code{int} by default. @xref{Value Type, ,Data Types of Semantic Values}. @item yychar -External integer variable that contains the integer value of the -current look-ahead token. (In a pure parser, it is a local variable -within @code{yyparse}.) Error-recovery rule actions may examine this -variable. @xref{Action Features, ,Special Features for Use in Actions}. +External integer variable that contains the integer value of the current +look-ahead token. (In a pure parser, it is a local variable within +@code{yyparse}.) Error-recovery rule actions may examine this variable. +@xref{Action Features, ,Special Features for Use in Actions}. @item yyclearin Macro used in error-recovery rule actions. It clears the previous @@ -5138,7 +5399,8 @@ after a parse error. @xref{Error Recovery}. @item yyerror User-supplied function to be called by @code{yyparse} on error. The function receives one argument, a pointer to a character string -containing an error message. @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. +containing an error message. @xref{Error Reporting, ,The Error +Reporting Function @code{yyerror}}. @item yylex User-supplied lexical analyzer function, called with no arguments @@ -5151,21 +5413,29 @@ variable within @code{yyparse}, and its address is passed to @code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}. @item yylloc -External variable in which @code{yylex} should place the line and -column numbers associated with a token. (In a pure parser, it is a -local variable within @code{yyparse}, and its address is passed to +External variable in which @code{yylex} should place the line and column +numbers associated with a token. (In a pure parser, it is a local +variable within @code{yyparse}, and its address is passed to @code{yylex}.) You can ignore this variable if you don't use the -@samp{@@} feature in the grammar actions. @xref{Token Positions, ,Textual Positions of Tokens}. +@samp{@@} feature in the grammar actions. @xref{Token Positions, +,Textual Positions of Tokens}. @item yynerrs -Global variable which Bison increments each time there is a parse -error. (In a pure parser, it is a local variable within -@code{yyparse}.) @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. +Global variable which Bison increments each time there is a parse error. +(In a pure parser, it is a local variable within @code{yyparse}.) +@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @item yyparse The parser function produced by Bison; call this function to start parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}. +@item %debug +Equip the parser for debugging. @xref{Decl Summary}. + +@item %defines +Bison declaration to create a header file meant for the scanner. +@xref{Decl Summary}. + @item %left Bison declaration to assign left associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @@ -5175,7 +5445,7 @@ Bison declaration to avoid generating @code{#line} directives in the parser file. @xref{Decl Summary}. @item %nonassoc -Bison declaration to assign nonassociativity to token(s). +Bison declaration to assign non-associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @item %prec @@ -5186,11 +5456,6 @@ Bison declaration to assign a precedence to a specific rule. Bison declaration to request a pure (reentrant) parser. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. -@item %raw -Bison declaration to use Bison internal token code numbers in token -tables instead of the usual Yacc-compatible token code numbers. -@xref{Decl Summary}. - @item %right Bison declaration to assign right associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @@ -5223,15 +5488,17 @@ Bison declarations section or the additional C code section. @xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}. @item %@{ %@} -All code listed between @samp{%@{} and @samp{%@}} is copied directly -to the output file uninterpreted. Such code forms the ``C -declarations'' section of the input file. @xref{Grammar Outline, ,Outline of a Bison Grammar}. +All code listed between @samp{%@{} and @samp{%@}} is copied directly to +the output file uninterpreted. Such code forms the ``C declarations'' +section of the input file. @xref{Grammar Outline, ,Outline of a Bison +Grammar}. @item /*@dots{}*/ Comment delimiters, as in C. @item : -Separates a rule's result from its components. @xref{Rules, ,Syntax of Grammar Rules}. +Separates a rule's result from its components. @xref{Rules, ,Syntax of +Grammar Rules}. @item ; Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}. @@ -5248,13 +5515,15 @@ Separates alternate rules for the same result nonterminal. @table @asis @item Backus-Naur Form (BNF) Formal method of specifying context-free grammars. BNF was first used -in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, +,Languages and Context-Free Grammars}. @item Context-free grammars Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an expression, integers are allowed @emph{anywhere} an expression is -permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +permitted. @xref{Language and Grammar, ,Languages and Context-Free +Grammars}. @item Dynamic allocation Allocation of memory that occurs during execution, rather than at @@ -5274,7 +5543,7 @@ rules. @xref{Algorithm, ,The Bison Parser Algorithm }. @item Grouping A language construct that is (in general) grammatically divisible; -for example, `expression' or `declaration' in C. +for example, `expression' or `declaration' in C. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. @item Infix operator @@ -5295,8 +5564,9 @@ Operators having left associativity are analyzed from left to right: @samp{c}. @xref{Precedence, ,Operator Precedence}. @item Left recursion -A rule whose result symbol is also its first component symbol; -for example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive Rules}. +A rule whose result symbol is also its first component symbol; for +example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive +Rules}. @item Left-to-right parsing Parsing a sentence of a language by analyzing it token by token from @@ -5311,11 +5581,11 @@ A flag, set by actions in the grammar rules, which alters the way tokens are parsed. @xref{Lexical Tie-ins}. @item Literal string token -A token which constists of two or more fixed characters. -@xref{Symbols}. +A token which consists of two or more fixed characters. @xref{Symbols}. @item Look-ahead token -A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}. +A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead +Tokens}. @item LALR(1) The class of context-free grammars that Bison (like most other parser @@ -5346,7 +5616,8 @@ performs some operation. @item Reduction Replacing a string of nonterminals and/or terminals with a single -nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison Parser Algorithm }. +nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison +Parser Algorithm }. @item Reentrant A reentrant subprogram is a subprogram which can be in invoked any @@ -5357,8 +5628,9 @@ invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. A language in which all operators are postfix operators. @item Right recursion -A rule whose result symbol is also its last component symbol; -for example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive Rules}. +A rule whose result symbol is also its last component symbol; for +example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive +Rules}. @item Semantics In computer languages, the semantics are specified by the actions @@ -5377,7 +5649,7 @@ A single character that is recognized and interpreted as is. @item Start symbol The nonterminal symbol that stands for a complete valid utterance in the language being parsed. The start symbol is usually listed as the -first nonterminal symbol in a language specification. +first nonterminal symbol in a language specification. @xref{Start Decl, ,The Start-Symbol}. @item Symbol table @@ -5392,9 +5664,9 @@ The input of the Bison parser is a stream of tokens which comes from the lexical analyzer. @xref{Symbols}. @item Terminal symbol -A grammar symbol that has no rules in the grammar and therefore -is grammatically indivisible. The piece of text it represents -is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +A grammar symbol that has no rules in the grammar and therefore is +grammatically indivisible. The piece of text it represents is a token. +@xref{Language and Grammar, ,Languages and Context-Free Grammars}. @end table @node Index, , Glossary, Top @@ -5402,34 +5674,4 @@ is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. @printindex cp -@contents - @bye - - - - -@c old menu - -* Introduction:: -* Conditions:: -* Copying:: The GNU General Public License says - how you can copy and share Bison - -Tutorial sections: -* Concepts:: Basic concepts for understanding Bison. -* Examples:: Three simple explained examples of using Bison. - -Reference sections: -* Grammar File:: Writing Bison declarations and rules. -* Interface:: C-language interface to the parser function @code{yyparse}. -* Algorithm:: How the Bison parser works at run-time. -* Error Recovery:: Writing rules for error recovery. -* Context Dependency::What to do if your language syntax is too - messy for Bison to handle straightforwardly. -* Debugging:: Debugging Bison parsers that parse wrong. -* Invocation:: How to run Bison (to produce the parser source file). -* Table of Symbols:: All the keywords of the Bison language are explained. -* Glossary:: Basic concepts are explained. -* Index:: Cross-references to the text. -