This manual is for @acronym{GNU} Bison (version @value{VERSION},
@value{UPDATED}), the @acronym{GNU} parser generator.
-Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
-1999, 2000, 2001, 2002 Free Software Foundation, Inc.
+Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 2003,
+1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
* Calling Convention:: How @code{yyparse} calls @code{yylex}.
* Token Values:: How @code{yylex} must return the semantic value
of the token it has read.
-* Token Positions:: How @code{yylex} must return the text position
+* Token Locations:: How @code{yylex} must return the text location
(line number, etc.) of the token, if the
actions want that.
* Pure Calling:: How the calling convention differs
Frequently Asked Questions
* Parser Stack Overflow:: Breaking the Stack Limits
+* How Can I Reset @code{yyparse}:: @code{yyparse} Keeps some State
+* Strings are Destroyed:: @code{yylval} Loses Track of Strings
Copying This Manual
@node Locations Overview
@section Locations
@cindex location
-@cindex textual position
-@cindex position, textual
+@cindex textual location
+@cindex location, textual
Many applications, like interpreters or compilers, have to produce verbose
and useful error messages. To achieve this, one must be able to keep track of
-the @dfn{textual position}, or @dfn{location}, of each syntactic construct.
+the @dfn{textual location}, or @dfn{location}, of each syntactic construct.
Bison provides a mechanism for handling these locations.
Each token has a semantic value. In a similar fashion, each token has an
@node Locations
@section Tracking Locations
@cindex location
-@cindex textual position
-@cindex position, textual
+@cindex textual location
+@cindex location, textual
Though grammar rules and semantic actions are enough to write a fully
functional parser, it can be useful to process some additional information,
four members:
@example
-struct
+typedef struct YYLTYPE
@{
int first_line;
int first_column;
int last_line;
int last_column;
-@}
+@} YYLTYPE;
@end example
@node Actions and Locations
locations are much more general than semantic values, there is room in
the output parser to redefine the default action to take for each
rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
-matched, before the associated action is run.
+matched, before the associated action is run. It is also invoked
+while processing a syntax error, to compute the error's location.
Most of the time, this macro is general enough to suppress location
dedicated code from semantic actions.
The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
-the location of the grouping (the result of the computation). The second one
-is an array holding locations of all right hand side elements of the rule
-being matched. The last one is the size of the right hand side rule.
+the location of the grouping (the result of the computation). When a
+rule is matched, the second parameter is an array holding locations of
+all right hand side elements of the rule being matched, and the third
+parameter is the size of the rule's right hand side. When processing
+a syntax error, the second parameter is an array holding locations of
+the symbols that were discarded during error processing, and the third
+parameter is the number of discarded symbols.
-By default, it is defined this way for simple @acronym{LALR}(1) parsers:
+By default, @code{YYLLOC_DEFAULT} is defined this way for simple
+@acronym{LALR}(1) parsers:
@example
@group
in the @code{%token} and @code{%type} declarations to pick one of the types
for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
-Note that, unlike making a @code{union} declaration in C, you do not write
+As an extension to @acronym{POSIX}, a tag is allowed after the
+@code{union}. For example:
+
+@example
+@group
+%union value @{
+ double val;
+ symrec *tptr;
+@}
+@end group
+@end example
+
+specifies the union tag @code{value}, so the corresponding C type is
+@code{union value}. If you do not specify a tag, it defaults to
+@code{YYSTYPE}.
+
+Note that, unlike making a @code{union} declaration in C, you need not write
a semicolon after the closing brace.
@node Type Decl
(@pxref{Parser Function, , The Parser Function @code{yyparse}}).
@strong{Warning:} as of Bison 1.875, this feature is still considered as
-experimental, as there was not enough users feedback. In particular,
+experimental, as there was not enough user feedback. In particular,
the syntax might still change.
@end deffn
Here @var{n} is a decimal integer. The declaration says there should be
no warning if there are @var{n} shift/reduce conflicts and no
-reduce/reduce conflicts. An error, instead of the usual warning, is
+reduce/reduce conflicts. The usual warning is
given if there are either more or fewer conflicts, or if there are any
reduce/reduce conflicts.
number which Bison printed.
@end itemize
-Now Bison will stop annoying you about the conflicts you have checked, but
-it will warn you again if changes in the grammar result in additional
-conflicts.
+Now Bison will stop annoying you if you do not change the number of
+conflicts, but it will warn you again if changes in the grammar result
+in more or fewer conflicts.
@node Start Decl
@subsection The Start-Symbol
Generate an array of token names in the parser file. The name of the
array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
token whose internal Bison token code number is @var{i}. The first
-three elements of @code{yytname} are always @code{"$end"},
+three elements of @code{yytname} correspond to the predefined tokens
+@code{"$end"},
@code{"error"}, and @code{"$undefined"}; after these come the symbols
defined in the grammar file.
Return immediately with value 1 (to report failure).
@end defmac
-@c For now, do not document %lex-param and %parse-param, since it's
-@c not clear that the current behavior is stable enough. For example,
-@c we may need to add %error-param.
-@clear documentparam
-
-@ifset documentparam
If you use a reentrant parser, you can optionally pass additional
parameter information to it in a reentrant way. To do so, use the
declaration @code{%parse-param}:
@example
exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
@end example
-@end ifset
@node Lexical
* Calling Convention:: How @code{yyparse} calls @code{yylex}.
* Token Values:: How @code{yylex} must return the semantic value
of the token it has read.
-* Token Positions:: How @code{yylex} must return the text position
+* Token Locations:: How @code{yylex} must return the text location
(line number, etc.) of the token, if the
actions want that.
* Pure Calling:: How the calling convention differs
@end group
@end example
-@node Token Positions
-@subsection Textual Positions of Tokens
+@node Token Locations
+@subsection Textual Locations of Tokens
@vindex yylloc
If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
@end example
If the grammar file does not use the @samp{@@} constructs to refer to
-textual positions, then the type @code{YYLTYPE} will not be defined. In
+textual locations, then the type @code{YYLTYPE} will not be defined. In
this case, omit the second argument; @code{yylex} will be called with
only one argument.
-@ifset documentparam
If you wish to pass the additional parameter data to @code{yylex}, use
@code{%lex-param} just like @code{%parse-param} (@pxref{Parser
Function}).
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
int yyparse (int *nastiness, int *randomness);
@end example
-@end ifset
@node Error Reporting
@section The Error Reporting Function @code{yyerror}
void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */
@end example
-@ifset documentparam
If @samp{%parse-param @{int *nastiness@}} is used, then:
@example
int *nastiness, int *randomness,
char const *msg);
@end example
-@end ifset
@noindent
The prototypes are only indications of how the code produced by Bison
@deffn {Value} @@$
@findex @@$
-Acts like a structure variable containing information on the textual position
+Acts like a structure variable containing information on the textual location
of the grouping made by the current rule. @xref{Locations, ,
Tracking Locations}.
@deffn {Value} @@@var{n}
@findex @@@var{n}
-Acts like a structure variable containing information on the textual position
+Acts like a structure variable containing information on the textual location
of the @var{n}th component of the current rule. @xref{Locations, ,
Tracking Locations}.
@end deffn
grammar, in particular, it is only slightly slower than with the default
Bison parser.
+For a more detailed exposition of GLR parsers, please see: Elizabeth
+Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style
+Generalised @acronym{LR} Parsers, Royal Holloway, University of
+London, Department of Computer Science, TR-00-12,
+@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps},
+(2000-12-24).
+
@node Stack Overflow
@section Stack Overflow, and How to Avoid It
@cindex stack overflow
@example
calc.y: warning: 1 useless nonterminal and 1 useless rule
calc.y:11.1-7: warning: useless nonterminal: useless
-calc.y:11.8-12: warning: useless rule: useless: STR
-calc.y contains 7 shift/reduce conflicts.
+calc.y:11.10-12: warning: useless rule: useless: STR
+calc.y: conflicts: 7 shift/reduce
@end example
When given @option{--report=state}, in addition to @file{calc.tab.c}, it
The next section lists states that still have conflicts.
@example
-State 8 contains 1 shift/reduce conflict.
-State 9 contains 1 shift/reduce conflict.
-State 10 contains 1 shift/reduce conflict.
-State 11 contains 4 shift/reduce conflicts.
+State 8 conflicts: 1 shift/reduce
+State 9 conflicts: 1 shift/reduce
+State 10 conflicts: 1 shift/reduce
+State 11 conflicts: 4 shift/reduce
@end example
@noindent
exp go to state 11
@end example
-As was announced in beginning of the report, @samp{State 8 contains 1
-shift/reduce conflict}:
+As was announced in beginning of the report, @samp{State 8 conflicts:
+1 shift/reduce}:
@example
state 8
@menu
* Parser Stack Overflow:: Breaking the Stack Limits
+* How Can I Reset @code{yyparse}:: @code{yyparse} Keeps some State
+* Strings are Destroyed:: @code{yylval} Loses Track of Strings
@end menu
@node Parser Stack Overflow
This question is already addressed elsewhere, @xref{Recursion,
,Recursive Rules}.
+@node How Can I Reset @code{yyparse}
+@section How Can I Reset @code{yyparse}
+
+The following phenomenon gives raise to several incarnations,
+resulting in the following typical questions:
+
+@display
+I invoke @code{yyparse} several times, and on correct input it works
+properly; but when a parse error is found, all the other calls fail
+too. How can I reset @code{yyparse}'s error flag?
+@end display
+
+@noindent
+or
+
+@display
+My parser includes support for a @samp{#include} like feature, in
+which case I run @code{yyparse} from @code{yyparse}. This fails
+although I did specify I needed a @code{%pure-parser}.
+@end display
+
+These problems are not related to Bison itself, but with the Lex
+generated scanners. Because these scanners use large buffers for
+speed, they might not notice a change of input file. As a
+demonstration, consider the following source file,
+@file{first-line.l}:
+
+@verbatim
+%{
+#include <stdio.h>
+#include <stdlib.h>
+%}
+%%
+.*\n ECHO; return 1;
+%%
+int
+yyparse (const char *file)
+{
+ yyin = fopen (file, "r");
+ if (!yyin)
+ exit (2);
+ /* One token only. */
+ yylex ();
+ if (!fclose (yyin))
+ exit (3);
+ return 0;
+}
+
+int
+main ()
+{
+ yyparse ("input");
+ yyparse ("input");
+ return 0;
+}
+@end verbatim
+
+@noindent
+If the file @file{input} contains
+
+@verbatim
+input:1: Hello,
+input:2: World!
+@end verbatim
+
+@noindent
+then instead of getting twice the first line, you get:
+
+@example
+$ @kbd{flex -ofirst-line.c first-line.l}
+$ @kbd{gcc -ofirst-line first-line.c -ll}
+$ @kbd{./first-line}
+input:1: Hello,
+input:2: World!
+@end example
+
+Therefore, whenever you change @code{yyin}, you must tell the Lex
+generated scanner to discard its current buffer, and to switch to the
+new one. This depends upon your implementation of Lex, see its
+documentation for more. For instance, in the case of Flex, a simple
+call @samp{yyrestart (yyin)} suffices after each change to
+@code{yyin}.
+
+@node Strings are Destroyed
+@section Strings are Destroyed
+
+@display
+My parser seems to destroy old strings, or maybe it losses track of
+them. Instead of reporting @samp{"foo", "bar"}, it reports
+@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
+@end display
+
+This error is probably the single most frequent ``bug report'' sent to
+Bison lists, but is only concerned with a misunderstanding of the role
+of scanner. Consider the following Lex code:
+
+@verbatim
+%{
+#include <stdio.h>
+char *yylval = NULL;
+%}
+%%
+.* yylval = yytext; return 1;
+\n /* IGNORE */
+%%
+int
+main ()
+{
+ /* Similar to using $1, $2 in a Bison action. */
+ char *fst = (yylex (), yylval);
+ char *snd = (yylex (), yylval);
+ printf ("\"%s\", \"%s\"\n", fst, snd);
+ return 0;
+}
+@end verbatim
+
+If you compile and run this code, you get:
+
+@example
+$ @kbd{flex -osplit-lines.c split-lines.l}
+$ @kbd{gcc -osplit-lines split-lines.c -ll}
+$ @kbd{printf 'one\ntwo\n' | ./split-lines}
+"one
+two", "two"
+@end example
+
+@noindent
+this is because @code{yytext} is a buffer provided for @emph{reading}
+in the action, but if you want to keep it, you have to duplicate it
+(e.g., using @code{strdup}). Note that the output may depend on how
+your implementation of Lex handles @code{yytext}. For instance, when
+given the Lex compatibility option @option{-l} (which triggers the
+option @samp{%array}) Flex generates a different behavior:
+
+@example
+$ @kbd{flex -l -osplit-lines.c split-lines.l}
+$ @kbd{gcc -osplit-lines split-lines.c -ll}
+$ @kbd{printf 'one\ntwo\n' | ./split-lines}
+"two", "two"
+@end example
+
+
@c ================================================= Table of Symbols
@node Table of Symbols
@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
@end deffn
-@deffn {Macro} YYLTYPE
-Macro for the data type of @code{yylloc}; a structure with four
+@deffn {Type} YYLTYPE
+Data type of @code{yylloc}; by default, a structure with four
members. @xref{Location Type, , Data Types of Locations}.
@end deffn
-@deffn {Type} yyltype
-Default value for YYLTYPE.
-@end deffn
-
@deffn {Macro} YYMAXDEPTH
Macro for specifying the maximum size of the parser stack. @xref{Stack
Overflow}.
to anything else.
@end deffn
-@deffn {Macro} YYSTYPE
-Macro for the data type of semantic values; @code{int} by default.
+@deffn {Type} YYSTYPE
+Data type of semantic values; @code{int} by default.
@xref{Value Type, ,Data Types of Semantic Values}.
@end deffn
numbers associated with a token. (In a pure parser, it is a local
variable within @code{yyparse}, and its address is passed to
@code{yylex}.) You can ignore this variable if you don't use the
-@samp{@@} feature in the grammar actions. @xref{Token Positions,
-,Textual Positions of Tokens}.
+@samp{@@} feature in the grammar actions. @xref{Token Locations,
+,Textual Locations of Tokens}.
@end deffn
@deffn {Variable} yynerrs
@xref{Precedence Decl, ,Operator Precedence}.
@end deffn
-@ifset documentparam
@deffn {Directive} %lex-param @{@var{argument-declaration}@}
Bison declaration to specifying an additional parameter that
@code{yylex} should accept. @xref{Pure Calling,, Calling Conventions
for Pure Parsers}.
@end deffn
-@end ifset
@deffn {Directive} %merge
Bison declaration to assign a merging function to a rule. If there is a
Summary}.
@end deffn
-@ifset documentparam
@deffn {Directive} %parse-param @{@var{argument-declaration}@}
Bison declaration to specifying an additional parameter that
@code{yyparse} should accept. @xref{Parser Function,, The Parser
Function @code{yyparse}}.
@end deffn
-@end ifset
@deffn {Directive} %prec
Bison declaration to assign a precedence to a specific rule.