This manual is for @acronym{GNU} Bison (version @value{VERSION},
@value{UPDATED}), the @acronym{GNU} parser generator.
-Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
-1999, 2000, 2001, 2002 Free Software Foundation, Inc.
+Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 2003,
+1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
* Calling Convention:: How @code{yyparse} calls @code{yylex}.
* Token Values:: How @code{yylex} must return the semantic value
of the token it has read.
-* Token Positions:: How @code{yylex} must return the text position
+* Token Locations:: How @code{yylex} must return the text location
(line number, etc.) of the token, if the
actions want that.
* Pure Calling:: How the calling convention differs
Frequently Asked Questions
* Parser Stack Overflow:: Breaking the Stack Limits
+* Strings are Destroyed:: @code{yylval} Loses Track of Strings
Copying This Manual
@node Locations Overview
@section Locations
@cindex location
-@cindex textual position
-@cindex position, textual
+@cindex textual location
+@cindex location, textual
Many applications, like interpreters or compilers, have to produce verbose
and useful error messages. To achieve this, one must be able to keep track of
-the @dfn{textual position}, or @dfn{location}, of each syntactic construct.
+the @dfn{textual location}, or @dfn{location}, of each syntactic construct.
Bison provides a mechanism for handling these locations.
Each token has a semantic value. In a similar fashion, each token has an
@node Locations
@section Tracking Locations
@cindex location
-@cindex textual position
-@cindex position, textual
+@cindex textual location
+@cindex location, textual
Though grammar rules and semantic actions are enough to write a fully
functional parser, it can be useful to process some additional information,
four members:
@example
-struct
+typedef struct YYLTYPE
@{
int first_line;
int first_column;
int last_line;
int last_column;
-@}
+@} YYLTYPE;
@end example
@node Actions and Locations
locations are much more general than semantic values, there is room in
the output parser to redefine the default action to take for each
rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
-matched, before the associated action is run.
+matched, before the associated action is run. It is also invoked
+while processing a syntax error, to compute the error's location.
Most of the time, this macro is general enough to suppress location
dedicated code from semantic actions.
The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
-the location of the grouping (the result of the computation). The second one
-is an array holding locations of all right hand side elements of the rule
-being matched. The last one is the size of the right hand side rule.
+the location of the grouping (the result of the computation). When a
+rule is matched, the second parameter is an array holding locations of
+all right hand side elements of the rule being matched, and the third
+parameter is the size of the rule's right hand side. When processing
+a syntax error, the second parameter is an array holding locations of
+the symbols that were discarded during error processing, and the third
+parameter is the number of discarded symbols.
-By default, it is defined this way for simple @acronym{LALR}(1) parsers:
+By default, @code{YYLLOC_DEFAULT} is defined this way for simple
+@acronym{LALR}(1) parsers:
@example
@group
in the @code{%token} and @code{%type} declarations to pick one of the types
for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
-Note that, unlike making a @code{union} declaration in C, you do not write
+As an extension to @acronym{POSIX}, a tag is allowed after the
+@code{union}. For example:
+
+@example
+@group
+%union value @{
+ double val;
+ symrec *tptr;
+@}
+@end group
+@end example
+
+specifies the union tag @code{value}, so the corresponding C type is
+@code{union value}. If you do not specify a tag, it defaults to
+@code{YYSTYPE}.
+
+Note that, unlike making a @code{union} declaration in C, you need not write
a semicolon after the closing brace.
@node Type Decl
(@pxref{Parser Function, , The Parser Function @code{yyparse}}).
@strong{Warning:} as of Bison 1.875, this feature is still considered as
-experimental, as there was not enough users feedback. In particular,
+experimental, as there was not enough user feedback. In particular,
the syntax might still change.
@end deffn
Here @var{n} is a decimal integer. The declaration says there should be
no warning if there are @var{n} shift/reduce conflicts and no
-reduce/reduce conflicts. An error, instead of the usual warning, is
+reduce/reduce conflicts. The usual warning is
given if there are either more or fewer conflicts, or if there are any
reduce/reduce conflicts.
number which Bison printed.
@end itemize
-Now Bison will stop annoying you about the conflicts you have checked, but
-it will warn you again if changes in the grammar result in additional
-conflicts.
+Now Bison will stop annoying you if you do not change the number of
+conflicts, but it will warn you again if changes in the grammar result
+in more or fewer conflicts.
@node Start Decl
@subsection The Start-Symbol
Generate an array of token names in the parser file. The name of the
array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
token whose internal Bison token code number is @var{i}. The first
-three elements of @code{yytname} are always @code{"$end"},
+three elements of @code{yytname} correspond to the predefined tokens
+@code{"$end"},
@code{"error"}, and @code{"$undefined"}; after these come the symbols
defined in the grammar file.
@deffn {Directive} %parse-param @{@var{argument-declaration}@}
@findex %parse-param
Declare that an argument declared by @code{argument-declaration} is an
-additional @code{yyparse} argument. This argument is also passed to
-@code{yyerror}. The @var{argument-declaration} is used when declaring
+additional @code{yyparse} argument.
+The @var{argument-declaration} is used when declaring
functions or prototypes. The last identifier in
@var{argument-declaration} must be the argument name.
@end deffn
* Calling Convention:: How @code{yyparse} calls @code{yylex}.
* Token Values:: How @code{yylex} must return the semantic value
of the token it has read.
-* Token Positions:: How @code{yylex} must return the text position
+* Token Locations:: How @code{yylex} must return the text location
(line number, etc.) of the token, if the
actions want that.
* Pure Calling:: How the calling convention differs
@end group
@end example
-@node Token Positions
-@subsection Textual Positions of Tokens
+@node Token Locations
+@subsection Textual Locations of Tokens
@vindex yylloc
If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
@end example
If the grammar file does not use the @samp{@@} constructs to refer to
-textual positions, then the type @code{YYLTYPE} will not be defined. In
+textual locations, then the type @code{YYLTYPE} will not be defined. In
this case, omit the second argument; @code{yylex} will be called with
only one argument.
If @samp{%parse-param @{int *nastiness@}} is used, then:
@example
-void yyerror (int *randomness, char const *msg); /* Yacc parsers. */
-void yyerror (int *randomness, char const *msg); /* GLR parsers. */
+void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */
+void yyerror (int *nastiness, char const *msg); /* GLR parsers. */
@end example
Finally, GLR and Yacc parsers share the same @code{yyerror} calling
@deffn {Value} @@$
@findex @@$
-Acts like a structure variable containing information on the textual position
+Acts like a structure variable containing information on the textual location
of the grouping made by the current rule. @xref{Locations, ,
Tracking Locations}.
@deffn {Value} @@@var{n}
@findex @@@var{n}
-Acts like a structure variable containing information on the textual position
+Acts like a structure variable containing information on the textual location
of the @var{n}th component of the current rule. @xref{Locations, ,
Tracking Locations}.
@end deffn
grammar, in particular, it is only slightly slower than with the default
Bison parser.
+For a more detailed exposition of GLR parsers, please see: Elizabeth
+Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style
+Generalised @acronym{LR} Parsers, Royal Holloway, University of
+London, Department of Computer Science, TR-00-12,
+@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps},
+(2000-12-24).
+
@node Stack Overflow
@section Stack Overflow, and How to Avoid It
@cindex stack overflow
@example
calc.y: warning: 1 useless nonterminal and 1 useless rule
calc.y:11.1-7: warning: useless nonterminal: useless
-calc.y:11.8-12: warning: useless rule: useless: STR
-calc.y contains 7 shift/reduce conflicts.
+calc.y:11.10-12: warning: useless rule: useless: STR
+calc.y: conflicts: 7 shift/reduce
@end example
When given @option{--report=state}, in addition to @file{calc.tab.c}, it
The next section lists states that still have conflicts.
@example
-State 8 contains 1 shift/reduce conflict.
-State 9 contains 1 shift/reduce conflict.
-State 10 contains 1 shift/reduce conflict.
-State 11 contains 4 shift/reduce conflicts.
+State 8 conflicts: 1 shift/reduce
+State 9 conflicts: 1 shift/reduce
+State 10 conflicts: 1 shift/reduce
+State 11 conflicts: 4 shift/reduce
@end example
@noindent
exp go to state 11
@end example
-As was announced in beginning of the report, @samp{State 8 contains 1
-shift/reduce conflict}:
+As was announced in beginning of the report, @samp{State 8 conflicts:
+1 shift/reduce}:
@example
state 8
@noindent
will produce @file{output.c++} and @file{outfile.h++}.
+For compatibility with @acronym{POSIX}, the standard Bison
+distribution also contains a shell script called @command{yacc} that
+invokes Bison with the @option{-y} option.
+
@menu
* Bison Options:: All the options described in detail,
in alphabetical order by short options.
@file{y.tab.c}, and the other outputs are called @file{y.output} and
@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
file name conventions. Thus, the following shell script can substitute
-for Yacc:
+for Yacc, and the Bison distribution contains such a script for
+compatibility with @acronym{POSIX}:
@example
-bison -y $*
+#! /bin/sh
+bison -y "$@"
@end example
@end table
@menu
* Parser Stack Overflow:: Breaking the Stack Limits
+* Strings are Destroyed:: @code{yylval} Loses Track of Strings
@end menu
@node Parser Stack Overflow
This question is already addressed elsewhere, @xref{Recursion,
,Recursive Rules}.
+@node Strings are Destroyed
+@section Strings are Destroyed
+
+@display
+My parser seems to destroy old strings, or maybe it losses track of
+them. Instead of reporting @samp{"foo", "bar"}, it reports
+@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
+@end display
+
+This error is probably the single most frequent ``bug report'' sent to
+Bison lists, but is only concerned with a misunderstanding of the role
+of scanner. Consider the following Lex code:
+
+@verbatim
+%{
+#include <stdio.h>
+char *yylval = NULL;
+%}
+%%
+.* yylval = yytext; return 1;
+\n /* IGNORE */
+%%
+int
+main ()
+{
+ /* Similar to using $1, $2 in a Bison action. */
+ char *fst = (yylex (), yylval);
+ char *snd = (yylex (), yylval);
+ printf ("\"%s\", \"%s\"\n", fst, snd);
+ return 0;
+}
+@end verbatim
+
+If you compile and run this code, you get:
+
+@example
+$ @kbd{flex -osplit-lines.c split-lines.l}
+$ @kbd{gcc -osplit-lines split-lines.c -ll}
+$ @kbd{printf 'one\ntwo\n' | ./split-lines}
+"one
+two", "two"
+@end example
+
+@noindent
+this is because @code{yytext} is a buffer provided for @emph{reading}
+in the action, but if you want to keep it, you have to duplicate it
+(e.g., using @code{strdup}). Note that the output may depend on how
+your implementation of Lex handles @code{yytext}. For instance, when
+given the Lex compatibility option @option{-l} (which triggers the
+option @samp{%array}) Flex generates a different behavior:
+
+@example
+$ @kbd{flex -l -osplit-lines.c split-lines.l}
+$ @kbd{gcc -osplit-lines split-lines.c -ll}
+$ @kbd{printf 'one\ntwo\n' | ./split-lines}
+"two", "two"
+@end example
+
+
@c ================================================= Table of Symbols
@node Table of Symbols
@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
@end deffn
-@deffn {Macro} YYLTYPE
-Macro for the data type of @code{yylloc}; a structure with four
+@deffn {Type} YYLTYPE
+Data type of @code{yylloc}; by default, a structure with four
members. @xref{Location Type, , Data Types of Locations}.
@end deffn
-@deffn {Type} yyltype
-Default value for YYLTYPE.
-@end deffn
-
@deffn {Macro} YYMAXDEPTH
Macro for specifying the maximum size of the parser stack. @xref{Stack
Overflow}.
to anything else.
@end deffn
-@deffn {Macro} YYSTYPE
-Macro for the data type of semantic values; @code{int} by default.
+@deffn {Type} YYSTYPE
+Data type of semantic values; @code{int} by default.
@xref{Value Type, ,Data Types of Semantic Values}.
@end deffn
numbers associated with a token. (In a pure parser, it is a local
variable within @code{yyparse}, and its address is passed to
@code{yylex}.) You can ignore this variable if you don't use the
-@samp{@@} feature in the grammar actions. @xref{Token Positions,
-,Textual Positions of Tokens}.
+@samp{@@} feature in the grammar actions. @xref{Token Locations,
+,Textual Locations of Tokens}.
@end deffn
@deffn {Variable} yynerrs