@c the smallbook format.
@c @smallbook
-@c Set following if you have the new `shorttitlepage' command
-@c @clear shorttitlepage-enabled
-@c @set shorttitlepage-enabled
-
@c Set following if you want to document %default-prec and %no-default-prec.
@c This feature is experimental and may change in future Bison versions.
@c @set defaultprec
-@c ISPELL CHECK: done, 14 Jan 1993 --bob
-
-@c Check COPYRIGHT dates. should be updated in the titlepage, ifinfo
-@c titlepage; should NOT be changed in the GPL. --mew
-
-@c FIXME: I don't understand this `iftex'. Obsolete? --akim.
-@iftex
+@ifnotinfo
@syncodeindex fn cp
@syncodeindex vr cp
@syncodeindex tp cp
-@end iftex
+@end ifnotinfo
@ifinfo
@synindex fn cp
@synindex vr cp
@value{UPDATED}), the @acronym{GNU} parser generator.
Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
-1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software
+Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
(a) below. A copy of the license is included in the section entitled
``@acronym{GNU} Free Documentation License.''
-(a) The @acronym{FSF}'s Back-Cover Text is: ``You have freedom to copy
-and modify this @acronym{GNU} Manual, like @acronym{GNU} software.
-Copies published by the Free Software Foundation raise funds for
-@acronym{GNU} development.''
+(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
+modify this @acronym{GNU} manual. Buying copies from the @acronym{FSF}
+supports it in developing @acronym{GNU} and promoting software
+freedom.''
@end quotation
@end copying
* bison: (bison). @acronym{GNU} parser generator (Yacc replacement).
@end direntry
-@ifset shorttitlepage-enabled
-@shorttitlepage Bison
-@end ifset
@titlepage
@title Bison
@subtitle The Yacc-compatible Parser Generator
messy for Bison to handle straightforwardly.
* Debugging:: Understanding or debugging Bison parsers.
* Invocation:: How to run Bison (to produce the parser source file).
-* C++ Language Interface:: Creating C++ parser objects.
+* Other Languages:: Creating C++ and Java parsers.
* FAQ:: Frequently Asked Questions
* Table of Symbols:: All the keywords of the Bison language are explained.
* Glossary:: Basic concepts are explained.
Writing @acronym{GLR} Parsers
-* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
-* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
-* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
+* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
+* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
+* GLR Semantic Actions:: Deferred semantic actions have special concerns.
+* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
Examples
Outline of a Bison Grammar
* Prologue:: Syntax and usage of the prologue.
+* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
* Bison Declarations:: Syntax and usage of the Bison declarations section.
* Grammar Rules:: Syntax and usage of the grammar rules section.
* Epilogue:: Syntax and usage of the epilogue.
* Expect Decl:: Suppressing warnings about parsing conflicts.
* Start Decl:: Specifying the start symbol.
* Pure Decl:: Requesting a reentrant parser.
+* Push Decl:: Requesting a push parser.
* Decl Summary:: Table of all Bison declarations.
Parser C-Language Interface
The Bison Parser Algorithm
-* Look-Ahead:: Parser looks one token ahead when deciding what to do.
+* Lookahead:: Parser looks one token ahead when deciding what to do.
* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
* Precedence:: Operator precedence works by resolving conflicts.
* Contextual Precedence:: When an operator's precedence depends on context.
* Option Cross Key:: Alphabetical list of long options.
* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
-C++ Language Interface
+Parsers Written In Other Languages
* C++ Parsers:: The interface to generate C++ parser classes
-* A Complete C++ Example:: Demonstrating their use
+* Java Parsers:: The interface to generate Java parser classes
C++ Parsers
* C++ Location Values:: The position and location classes
* C++ Parser Interface:: Instantiating and running the parser
* C++ Scanner Interface:: Exchanges between yylex and parse
+* A Complete C++ Example:: Demonstrating their use
A Complete C++ Example
* Calc++ Scanner:: A pure C++ Flex scanner
* Calc++ Top Level:: Conducting the band
+Java Parsers
+
+* Java Bison Interface:: Asking for Java parser generation
+* Java Semantic Values:: %type and %token vs. Java
+* Java Location Values:: The position and location classes
+* Java Parser Interface:: Instantiating and running the parser
+* Java Scanner Interface:: Java scanners, and pure parsers
+* Java Differences:: Differences between C/C++ and Java Grammars
+
Frequently Asked Questions
* Memory Exhausted:: Breaking the Stack Limits
* How Can I Reset the Parser:: @code{yyparse} Keeps some State
* Strings are Destroyed:: @code{yylval} Loses Track of Strings
* Implementing Gotos/Loops:: Control Flow in the Calculator
+* Multiple start-symbols:: Factoring closely related grammars
+* Secure? Conform?:: Is Bison @acronym{POSIX} safe?
+* I can't build Bison:: Troubleshooting
+* Where can I find help?:: Troubleshouting
+* Bug Reports:: Troublereporting
+* Other Languages:: Parsers in Java and others
+* Beta Testing:: Experimenting development versions
+* Mailing Lists:: Meeting other Bison users
Copying This Manual
-* GNU Free Documentation License:: License for copying this manual.
+* Copying This Manual:: License for copying this manual.
@end detailmenu
@end menu
@unnumbered Introduction
@cindex introduction
-@dfn{Bison} is a general-purpose parser generator that converts a
-grammar description for an @acronym{LALR}(1) context-free grammar into a C
-program to parse that grammar. Once you are proficient with Bison,
-you may use it to develop a wide range of language parsers, from those
+@dfn{Bison} is a general-purpose parser generator that converts an
+annotated context-free grammar into an @acronym{LALR}(1) or
+@acronym{GLR} parser for that grammar. Once you are proficient with
+Bison, you can use it to develop a wide range of language parsers, from those
used in simple desk calculators to complex programming languages.
Bison is upward compatible with Yacc: all properly-written Yacc grammars
ought to work with Bison with no change. Anyone familiar with Yacc
should be able to use Bison with little trouble. You need to be fluent in
-C programming in order to use Bison or to understand this manual.
+C or C++ programming in order to use Bison or to understand this manual.
We begin with tutorial chapters that explain the basic concepts of using
Bison and show three explained examples, each building on the last. If you
@node Conditions
@unnumbered Conditions for Using Bison
-As of Bison version 1.24, we have changed the distribution terms for
-@code{yyparse} to permit using Bison's output in nonfree programs when
-Bison is generating C code for @acronym{LALR}(1) parsers. Formerly, these
+The distribution terms for Bison-generated parsers permit using the
+parsers in nonfree programs. Before Bison version 2.2, these extra
+permissions applied only when Bison was generating @acronym{LALR}(1)
+parsers in C@. And before Bison version 1.24, Bison-generated
parsers could be used only in programs that were free software.
The other @acronym{GNU} programming tools, such as the @acronym{GNU} C
The output of the Bison utility---the Bison parser file---contains a
verbatim copy of a sizable piece of Bison, which is the code for the
-@code{yyparse} function. (The actions from your grammar are inserted
-into this function at one point, but the rest of the function is not
-changed.) When we applied the @acronym{GPL} terms to the code for
-@code{yyparse},
+parser's implementation. (The actions from your grammar are inserted
+into this implementation at one point, but most of the rest of the
+implementation is not changed.) When we applied the @acronym{GPL}
+terms to the skeleton code for the parser's implementation,
the effect was to restrict the use of Bison output to free software.
We didn't change the terms because of sympathy for people who want to
practical conditions for using Bison match the practical conditions for
using the other @acronym{GNU} tools.
-This exception applies only when Bison is generating C code for an
-@acronym{LALR}(1) parser; otherwise, the @acronym{GPL} terms operate
-as usual. You can
-tell whether the exception applies to your @samp{.c} output file by
-inspecting it to see whether it says ``As a special exception, when
-this file is copied by Bison into a Bison output file, you may use
-that output file without restriction.''
+This exception applies when Bison is generating code for a parser.
+You can tell whether the exception applies to a Bison output file by
+inspecting the file for text beginning with ``As a special
+exception@dots{}''. The text spells out the exact terms of the
+exception.
-@include gpl.texi
+@node Copying
+@unnumbered GNU GENERAL PUBLIC LICENSE
+@include gpl-3.0.texi
@node Concepts
@chapter The Concepts of Bison
are called @acronym{LALR}(1) grammars.
In brief, in these grammars, it must be possible to
tell how to parse any portion of an input string with just a single
-token of look-ahead. Strictly speaking, that is a description of an
+token of lookahead. Strictly speaking, that is a description of an
@acronym{LR}(1) grammar, and @acronym{LALR}(1) involves additional
restrictions that are
hard to explain simply; but it is rare in actual practice to find an
Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning
roughly that the next grammar rule to apply at any point in the input is
uniquely determined by the preceding input and a fixed, finite portion
-(called a @dfn{look-ahead}) of the remaining input. A context-free
+(called a @dfn{lookahead}) of the remaining input. A context-free
grammar can be @dfn{ambiguous}, meaning that there are multiple ways to
apply the grammar rules to get the same inputs. Even unambiguous
grammars can be @dfn{nondeterministic}, meaning that no fixed
-look-ahead always suffices to determine the next grammar rule to apply.
+lookahead always suffices to determine the next grammar rule to apply.
With the proper declarations, Bison is also able to parse these more
general context-free grammars, using a technique known as @acronym{GLR}
parsing (for Generalized @acronym{LR}). Bison's @acronym{GLR} parsers
square (int x) /* @r{identifier, open-paren, keyword `int',}
@r{identifier, close-paren} */
@{ /* @r{open-brace} */
- return x * x; /* @r{keyword `return', identifier, asterisk,
- identifier, semicolon} */
+ return x * x; /* @r{keyword `return', identifier, asterisk,}
+ @r{identifier, semicolon} */
@} /* @r{close-brace} */
@end example
@end ifinfo
merged result.
@menu
-* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
-* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
-* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
+* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
+* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
+* GLR Semantic Actions:: Deferred semantic actions have special concerns.
+* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
@end menu
@node Simple GLR Parsers
In the simplest cases, you can use the @acronym{GLR} algorithm
to parse grammars that are unambiguous, but fail to be @acronym{LALR}(1).
-Such grammars typically require more than one symbol of look-ahead,
+Such grammars typically require more than one symbol of lookahead,
or (in rare cases) fall into the category of grammars in which the
@acronym{LALR}(1) algorithm throws away too much information (they are in
@acronym{LR}(1), but not @acronym{LALR}(1), @ref{Mystery Conflicts}).
valid, and more-complicated cases can come up in practical programs.)
These two declarations look identical until the @samp{..} token.
-With normal @acronym{LALR}(1) one-token look-ahead it is not
+With normal @acronym{LALR}(1) one-token lookahead it is not
possible to decide between the two forms when the identifier
@samp{a} is parsed. It is, however, desirable
for a parser to decide this, since in the latter case
The effect of all this is that the parser seems to ``guess'' the
correct branch to take, or in other words, it seems to use more
-look-ahead than the underlying @acronym{LALR}(1) algorithm actually allows
+lookahead than the underlying @acronym{LALR}(1) algorithm actually allows
for. In this example, @acronym{LALR}(2) would suffice, but also some cases
that are not @acronym{LALR}(@math{k}) for any @math{k} can be handled this way.
limited syntax above, transparently. In fact, the user does not even
notice when the parser splits.
-So here we have a case where we can use the benefits of @acronym{GLR}, almost
-without disadvantages. Even in simple cases like this, however, there
-are at least two potential problems to beware.
-First, always analyze the conflicts reported by
-Bison to make sure that @acronym{GLR} splitting is only done where it is
-intended. A @acronym{GLR} parser splitting inadvertently may cause
-problems less obvious than an @acronym{LALR} parser statically choosing the
-wrong alternative in a conflict.
-Second, consider interactions with the lexer (@pxref{Semantic Tokens})
-with great care. Since a split parser consumes tokens
-without performing any actions during the split, the lexer cannot
-obtain information via parser actions. Some cases of
-lexer interactions can be eliminated by using @acronym{GLR} to
-shift the complications from the lexer to the parser. You must check
-the remaining cases for correctness.
-
-In our example, it would be safe for the lexer to return tokens
-based on their current meanings in some symbol table, because no new
-symbols are defined in the middle of a type declaration. Though it
-is possible for a parser to define the enumeration
-constants as they are parsed, before the type declaration is
-completed, it actually makes no difference since they cannot be used
-within the same enumerated type declaration.
+So here we have a case where we can use the benefits of @acronym{GLR},
+almost without disadvantages. Even in simple cases like this, however,
+there are at least two potential problems to beware. First, always
+analyze the conflicts reported by Bison to make sure that @acronym{GLR}
+splitting is only done where it is intended. A @acronym{GLR} parser
+splitting inadvertently may cause problems less obvious than an
+@acronym{LALR} parser statically choosing the wrong alternative in a
+conflict. Second, consider interactions with the lexer (@pxref{Semantic
+Tokens}) with great care. Since a split parser consumes tokens without
+performing any actions during the split, the lexer cannot obtain
+information via parser actions. Some cases of lexer interactions can be
+eliminated by using @acronym{GLR} to shift the complications from the
+lexer to the parser. You must check the remaining cases for
+correctness.
+
+In our example, it would be safe for the lexer to return tokens based on
+their current meanings in some symbol table, because no new symbols are
+defined in the middle of a type declaration. Though it is possible for
+a parser to define the enumeration constants as they are parsed, before
+the type declaration is completed, it actually makes no difference since
+they cannot be used within the same enumerated type declaration.
@node Merging GLR Parses
@subsection Using @acronym{GLR} to Resolve Ambiguities
and the parser will report an error during any parse that results in
the offending merge.
+@node GLR Semantic Actions
+@subsection GLR Semantic Actions
+
+@cindex deferred semantic actions
+By definition, a deferred semantic action is not performed at the same time as
+the associated reduction.
+This raises caveats for several Bison features you might use in a semantic
+action in a @acronym{GLR} parser.
+
+@vindex yychar
+@cindex @acronym{GLR} parsers and @code{yychar}
+@vindex yylval
+@cindex @acronym{GLR} parsers and @code{yylval}
+@vindex yylloc
+@cindex @acronym{GLR} parsers and @code{yylloc}
+In any semantic action, you can examine @code{yychar} to determine the type of
+the lookahead token present at the time of the associated reduction.
+After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF},
+you can then examine @code{yylval} and @code{yylloc} to determine the
+lookahead token's semantic value and location, if any.
+In a nondeferred semantic action, you can also modify any of these variables to
+influence syntax analysis.
+@xref{Lookahead, ,Lookahead Tokens}.
+
+@findex yyclearin
+@cindex @acronym{GLR} parsers and @code{yyclearin}
+In a deferred semantic action, it's too late to influence syntax analysis.
+In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to
+shallow copies of the values they had at the time of the associated reduction.
+For this reason alone, modifying them is dangerous.
+Moreover, the result of modifying them is undefined and subject to change with
+future versions of Bison.
+For example, if a semantic action might be deferred, you should never write it
+to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free
+memory referenced by @code{yylval}.
+
+@findex YYERROR
+@cindex @acronym{GLR} parsers and @code{YYERROR}
+Another Bison feature requiring special consideration is @code{YYERROR}
+(@pxref{Action Features}), which you can invoke in a semantic action to
+initiate error recovery.
+During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is
+the same as its effect in an @acronym{LALR}(1) parser.
+In a deferred semantic action, its effect is undefined.
+@c The effect is probably a syntax error at the split point.
+
+Also, see @ref{Location Default Action, ,Default Action for Locations}, which
+describes a special usage of @code{YYLLOC_DEFAULT} in @acronym{GLR} parsers.
+
@node Compiler Requirements
@subsection Considerations when Compiling @acronym{GLR} Parsers
@cindex @code{inline}
desk-top calculator.
These examples are simple, but Bison grammars for real programming
-languages are written the same way.
-@ifinfo
-You can copy these examples out of the Info file and into a source file
-to try them.
-@end ifinfo
+languages are written the same way. You can copy these examples into a
+source file to try them.
@menu
* RPN Calc:: Reverse polish notation calculator;
The groupings of the rpcalc ``language'' defined here are the expression
(given the name @code{exp}), the line of input (@code{line}), and the
complete input transcript (@code{input}). Each of these nonterminal
-symbols has several alternate rules, joined by the @samp{|} punctuator
+symbols has several alternate rules, joined by the vertical bar @samp{|}
which is read as ``or''. The following sections explain what these rules
mean.
by default (@pxref{Location Type, ,Data Types of Locations}), which is a
four member structure with the following integer fields:
@code{first_line}, @code{first_column}, @code{last_line} and
-@code{last_column}.
+@code{last_column}. By conventions, and in accordance with the GNU
+Coding Standards and common practice, the line and column count both
+start at 1.
@node Ltcalc Rules
@subsection Grammar Rules for @code{ltcalc}
@}
@end group
@group
- | '-' exp %preg NEG @{ $$ = -$2; @}
+ | '-' exp %prec NEG @{ $$ = -$2; @}
| exp '^' exp @{ $$ = pow ($1, $3); @}
| '(' exp ')' @{ $$ = $2; @}
@end group
@menu
* Prologue:: Syntax and usage of the prologue.
+* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
* Bison Declarations:: Syntax and usage of the Bison declarations section.
* Grammar Rules:: Syntax and usage of the grammar rules section.
* Epilogue:: Syntax and usage of the epilogue.
@cindex Prologue
@cindex declarations
-The @var{Prologue} section contains macro definitions and
-declarations of functions and variables that are used in the actions in the
-grammar rules. These are copied to the beginning of the parser file so
-that they precede the definition of @code{yyparse}. You can use
-@samp{#include} to get the declarations from a header file. If you don't
-need any C declarations, you may omit the @samp{%@{} and @samp{%@}}
-delimiters that bracket this section.
+The @var{Prologue} section contains macro definitions and declarations
+of functions and variables that are used in the actions in the grammar
+rules. These are copied to the beginning of the parser file so that
+they precede the definition of @code{yyparse}. You can use
+@samp{#include} to get the declarations from a header file. If you
+don't need any C declarations, you may omit the @samp{%@{} and
+@samp{%@}} delimiters that bracket this section.
+
+The @var{Prologue} section is terminated by the first occurrence
+of @samp{%@}} that is outside a comment, a string literal, or a
+character constant.
You may have more than one @var{Prologue} section, intermixed with the
@var{Bison declarations}. This allows you to have C and Bison
@smallexample
%@{
+ #define _GNU_SOURCE
+ #include <stdio.h>
+ #include "ptypes.h"
+%@}
+
+%union @{
+ long int n;
+ tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
+@}
+
+%@{
+ static void print_token_value (FILE *, int, YYSTYPE);
+ #define YYPRINT(F, N, L) print_token_value (F, N, L)
+%@}
+
+@dots{}
+@end smallexample
+
+When in doubt, it is usually safer to put prologue code before all
+Bison declarations, rather than after. For example, any definitions
+of feature test macros like @code{_GNU_SOURCE} or
+@code{_POSIX_C_SOURCE} should appear before all Bison declarations, as
+feature test macros can affect the behavior of Bison-generated
+@code{#include} directives.
+
+@node Prologue Alternatives
+@subsection Prologue Alternatives
+@cindex Prologue Alternatives
+
+@findex %code
+@findex %code requires
+@findex %code provides
+@findex %code top
+(The prologue alternatives described here are experimental.
+More user feedback will help to determine whether they should become permanent
+features.)
+
+The functionality of @var{Prologue} sections can often be subtle and
+inflexible.
+As an alternative, Bison provides a %code directive with an explicit qualifier
+field, which identifies the purpose of the code and thus the location(s) where
+Bison should generate it.
+For C/C++, the qualifier can be omitted for the default location, or it can be
+one of @code{requires}, @code{provides}, @code{top}.
+@xref{Decl Summary,,%code}.
+
+Look again at the example of the previous section:
+
+@smallexample
+%@{
+ #define _GNU_SOURCE
#include <stdio.h>
#include "ptypes.h"
%@}
@dots{}
@end smallexample
+@noindent
+Notice that there are two @var{Prologue} sections here, but there's a subtle
+distinction between their functionality.
+For example, if you decide to override Bison's default definition for
+@code{YYLTYPE}, in which @var{Prologue} section should you write your new
+definition?
+You should write it in the first since Bison will insert that code into the
+parser source code file @emph{before} the default @code{YYLTYPE} definition.
+In which @var{Prologue} section should you prototype an internal function,
+@code{trace_token}, that accepts @code{YYLTYPE} and @code{yytokentype} as
+arguments?
+You should prototype it in the second since Bison will insert that code
+@emph{after} the @code{YYLTYPE} and @code{yytokentype} definitions.
+
+This distinction in functionality between the two @var{Prologue} sections is
+established by the appearance of the @code{%union} between them.
+This behavior raises a few questions.
+First, why should the position of a @code{%union} affect definitions related to
+@code{YYLTYPE} and @code{yytokentype}?
+Second, what if there is no @code{%union}?
+In that case, the second kind of @var{Prologue} section is not available.
+This behavior is not intuitive.
+
+To avoid this subtle @code{%union} dependency, rewrite the example using a
+@code{%code top} and an unqualified @code{%code}.
+Let's go ahead and add the new @code{YYLTYPE} definition and the
+@code{trace_token} prototype at the same time:
+
+@smallexample
+%code top @{
+ #define _GNU_SOURCE
+ #include <stdio.h>
+
+ /* WARNING: The following code really belongs
+ * in a `%code requires'; see below. */
+
+ #include "ptypes.h"
+ #define YYLTYPE YYLTYPE
+ typedef struct YYLTYPE
+ @{
+ int first_line;
+ int first_column;
+ int last_line;
+ int last_column;
+ char *filename;
+ @} YYLTYPE;
+@}
+
+%union @{
+ long int n;
+ tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
+@}
+
+%code @{
+ static void print_token_value (FILE *, int, YYSTYPE);
+ #define YYPRINT(F, N, L) print_token_value (F, N, L)
+ static void trace_token (enum yytokentype token, YYLTYPE loc);
+@}
+
+@dots{}
+@end smallexample
+
+@noindent
+In this way, @code{%code top} and the unqualified @code{%code} achieve the same
+functionality as the two kinds of @var{Prologue} sections, but it's always
+explicit which kind you intend.
+Moreover, both kinds are always available even in the absence of @code{%union}.
+
+The @code{%code top} block above logically contains two parts.
+The first two lines before the warning need to appear near the top of the
+parser source code file.
+The first line after the warning is required by @code{YYSTYPE} and thus also
+needs to appear in the parser source code file.
+However, if you've instructed Bison to generate a parser header file
+(@pxref{Decl Summary, ,%defines}), you probably want that line to appear before
+the @code{YYSTYPE} definition in that header file as well.
+The @code{YYLTYPE} definition should also appear in the parser header file to
+override the default @code{YYLTYPE} definition there.
+
+In other words, in the @code{%code top} block above, all but the first two
+lines are dependency code required by the @code{YYSTYPE} and @code{YYLTYPE}
+definitions.
+Thus, they belong in one or more @code{%code requires}:
+
+@smallexample
+%code top @{
+ #define _GNU_SOURCE
+ #include <stdio.h>
+@}
+
+%code requires @{
+ #include "ptypes.h"
+@}
+%union @{
+ long int n;
+ tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
+@}
+
+%code requires @{
+ #define YYLTYPE YYLTYPE
+ typedef struct YYLTYPE
+ @{
+ int first_line;
+ int first_column;
+ int last_line;
+ int last_column;
+ char *filename;
+ @} YYLTYPE;
+@}
+
+%code @{
+ static void print_token_value (FILE *, int, YYSTYPE);
+ #define YYPRINT(F, N, L) print_token_value (F, N, L)
+ static void trace_token (enum yytokentype token, YYLTYPE loc);
+@}
+
+@dots{}
+@end smallexample
+
+@noindent
+Now Bison will insert @code{#include "ptypes.h"} and the new @code{YYLTYPE}
+definition before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE}
+definitions in both the parser source code file and the parser header file.
+(By the same reasoning, @code{%code requires} would also be the appropriate
+place to write your own definition for @code{YYSTYPE}.)
+
+When you are writing dependency code for @code{YYSTYPE} and @code{YYLTYPE}, you
+should prefer @code{%code requires} over @code{%code top} regardless of whether
+you instruct Bison to generate a parser header file.
+When you are writing code that you need Bison to insert only into the parser
+source code file and that has no special need to appear at the top of that
+file, you should prefer the unqualified @code{%code} over @code{%code top}.
+These practices will make the purpose of each block of your code explicit to
+Bison and to other developers reading your grammar file.
+Following these practices, we expect the unqualified @code{%code} and
+@code{%code requires} to be the most important of the four @var{Prologue}
+alternatives.
+
+At some point while developing your parser, you might decide to provide
+@code{trace_token} to modules that are external to your parser.
+Thus, you might wish for Bison to insert the prototype into both the parser
+header file and the parser source code file.
+Since this function is not a dependency required by @code{YYSTYPE} or
+@code{YYLTYPE}, it doesn't make sense to move its prototype to a
+@code{%code requires}.
+More importantly, since it depends upon @code{YYLTYPE} and @code{yytokentype},
+@code{%code requires} is not sufficient.
+Instead, move its prototype from the unqualified @code{%code} to a
+@code{%code provides}:
+
+@smallexample
+%code top @{
+ #define _GNU_SOURCE
+ #include <stdio.h>
+@}
+
+%code requires @{
+ #include "ptypes.h"
+@}
+%union @{
+ long int n;
+ tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
+@}
+
+%code requires @{
+ #define YYLTYPE YYLTYPE
+ typedef struct YYLTYPE
+ @{
+ int first_line;
+ int first_column;
+ int last_line;
+ int last_column;
+ char *filename;
+ @} YYLTYPE;
+@}
+
+%code provides @{
+ void trace_token (enum yytokentype token, YYLTYPE loc);
+@}
+
+%code @{
+ static void print_token_value (FILE *, int, YYSTYPE);
+ #define YYPRINT(F, N, L) print_token_value (F, N, L)
+@}
+
+@dots{}
+@end smallexample
+
+@noindent
+Bison will insert the @code{trace_token} prototype into both the parser header
+file and the parser source code file after the definitions for
+@code{yytokentype}, @code{YYLTYPE}, and @code{YYSTYPE}.
+
+The above examples are careful to write directives in an order that reflects
+the layout of the generated parser source code and header files:
+@code{%code top}, @code{%code requires}, @code{%code provides}, and then
+@code{%code}.
+While your grammar files may generally be easier to read if you also follow
+this order, Bison does not require it.
+Instead, Bison lets you choose an organization that makes sense to you.
+
+You may declare any of these directives multiple times in the grammar file.
+In that case, Bison concatenates the contained code in declaration order.
+This is the only way in which the position of one of these directives within
+the grammar file affects its functionality.
+
+The result of the previous two properties is greater flexibility in how you may
+organize your grammar file.
+For example, you may organize semantic-type-related directives by semantic
+type:
+
+@smallexample
+%code requires @{ #include "type1.h" @}
+%union @{ type1 field1; @}
+%destructor @{ type1_free ($$); @} <field1>
+%printer @{ type1_print ($$); @} <field1>
+
+%code requires @{ #include "type2.h" @}
+%union @{ type2 field2; @}
+%destructor @{ type2_free ($$); @} <field2>
+%printer @{ type2_print ($$); @} <field2>
+@end smallexample
+
+@noindent
+You could even place each of the above directive groups in the rules section of
+the grammar file next to the set of rules that uses the associated semantic
+type.
+(In the rules section, you must terminate each of those directives with a
+semicolon.)
+And you don't have to worry that some directive (like a @code{%union}) in the
+definitions section is going to adversely affect their functionality in some
+counter-intuitive manner just because it comes first.
+Such an organization is not possible using @var{Prologue} sections.
+
+This section has been concerned with explaining the advantages of the four
+@var{Prologue} alternatives over the original Yacc @var{Prologue}.
+However, in most cases when using these directives, you shouldn't need to
+think about all the low-level ordering issues discussed here.
+Instead, you should simply use these directives to label each block of your
+code according to its purpose and let Bison handle the ordering.
+@code{%code} is the most generic label.
+Move code to @code{%code requires}, @code{%code provides}, or @code{%code top}
+as needed.
+
@node Bison Declarations
@subsection The Bison Declarations Section
@cindex Bison declarations (introduction)
If the last section is empty, you may omit the @samp{%%} that separates it
from the grammar rules.
-The Bison parser itself contains many macros and identifiers whose
-names start with @samp{yy} or @samp{YY}, so it is a
-good idea to avoid using any such names (except those documented in this
-manual) in the epilogue of the grammar file.
+The Bison parser itself contains many macros and identifiers whose names
+start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using
+any such names (except those documented in this manual) in the epilogue
+of the grammar file.
@node Symbols
@section Symbols, Terminal and Nonterminal
class of syntactically equivalent tokens. You use the symbol in grammar
rules to mean that a token in that class is allowed. The symbol is
represented in the Bison parser by a numeric code, and the @code{yylex}
-function returns a token type code to indicate what kind of token has been
-read. You don't need to know what the code value is; you can use the
-symbol to stand for it.
+function returns a token type code to indicate what kind of token has
+been read. You don't need to know what the code value is; you can use
+the symbol to stand for it.
-A @dfn{nonterminal symbol} stands for a class of syntactically equivalent
-groupings. The symbol name is used in writing grammar rules. By convention,
-it should be all lower case.
+A @dfn{nonterminal symbol} stands for a class of syntactically
+equivalent groupings. The symbol name is used in writing grammar rules.
+By convention, it should be all lower case.
Symbol names can contain letters, digits (not at the beginning),
underscores and periods. Periods make sense only in nonterminals.
"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~"
@end example
-The @code{yylex} function and Bison must use a consistent character
-set and encoding for character tokens. For example, if you run Bison in an
-@acronym{ASCII} environment, but then compile and run the resulting program
-in an environment that uses an incompatible character set like
-@acronym{EBCDIC}, the resulting program may not work because the
-tables generated by Bison will assume @acronym{ASCII} numeric values for
-character tokens. It is standard
-practice for software distributions to contain C source files that
-were generated by Bison in an @acronym{ASCII} environment, so installers on
-platforms that are incompatible with @acronym{ASCII} must rebuild those
-files before compiling them.
+The @code{yylex} function and Bison must use a consistent character set
+and encoding for character tokens. For example, if you run Bison in an
+@acronym{ASCII} environment, but then compile and run the resulting
+program in an environment that uses an incompatible character set like
+@acronym{EBCDIC}, the resulting program may not work because the tables
+generated by Bison will assume @acronym{ASCII} numeric values for
+character tokens. It is standard practice for software distributions to
+contain C source files that were generated by Bison in an
+@acronym{ASCII} environment, so installers on platforms that are
+incompatible with @acronym{ASCII} must rebuild those files before
+compiling them.
The symbol @code{error} is a terminal symbol reserved for error recovery
(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
@end example
@noindent
+@cindex braced code
+This is an example of @dfn{braced code}, that is, C code surrounded by
+braces, much like a compound statement in C@. Braced code can contain
+any sequence of C tokens, so long as its braces are balanced. Bison
+does not check the braced code for correctness directly; it merely
+copies the code to the output file, where the C compiler can check it.
+
+Within braced code, the balanced-brace count is not affected by braces
+within comments, string literals, or character constants, but it is
+affected by the C digraphs @samp{<%} and @samp{%>} that represent
+braces. At the top level braced code must be terminated by @samp{@}}
+and not by a digraph. Bison does not look for trigraphs, so if braced
+code uses trigraphs you should ensure that they do not affect the
+nesting of braces or the boundaries of comments, string literals, or
+character constants.
+
Usually there is only one action and it follows the components.
@xref{Actions}.
Multiple rules for the same @var{result} can be written separately or can
be joined with the vertical-bar character @samp{|} as follows:
-@ifinfo
-@example
-@var{result}: @var{rule1-components}@dots{}
- | @var{rule2-components}@dots{}
- @dots{}
- ;
-@end example
-@end ifinfo
-@iftex
@example
@group
@var{result}: @var{rule1-components}@dots{}
;
@end group
@end example
-@end iftex
@noindent
They are still considered distinct rules even when joined in this way.
@section Recursive Rules
@cindex recursive rule
-A rule is called @dfn{recursive} when its @var{result} nonterminal appears
-also on its right hand side. Nearly all Bison grammars need to use
-recursion, because that is the only way to define a sequence of any number
-of a particular thing. Consider this recursive definition of a
+A rule is called @dfn{recursive} when its @var{result} nonterminal
+appears also on its right hand side. Nearly all Bison grammars need to
+use recursion, because that is the only way to define a sequence of any
+number of a particular thing. Consider this recursive definition of a
comma-separated sequence of one or more expressions:
@example
@acronym{RPN} and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish
Notation Calculator}).
-Bison's default is to use type @code{int} for all semantic values. To
+Bison normally uses the type @code{int} for semantic values if your
+program uses the same data type for all language constructs. To
specify some other type, define @code{YYSTYPE} as a macro, like this:
@example
@end example
@noindent
+@code{YYSTYPE}'s replacement list should be a type name
+that does not contain parentheses or square brackets.
This macro definition must go in the prologue of the grammar file
(@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
In most programs, you will need different data types for different kinds
of tokens and groupings. For example, a numeric constant may need type
-@code{int} or @code{long int}, while a string constant needs type @code{char *},
-and an identifier might need a pointer to an entry in the symbol table.
+@code{int} or @code{long int}, while a string constant needs type
+@code{char *}, and an identifier might need a pointer to an entry in the
+symbol table.
To use more than one data type for semantic values in one parser, Bison
requires you to do two things:
@itemize @bullet
@item
-Specify the entire collection of possible data types, with the
+Specify the entire collection of possible data types, either by using the
@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of
-Value Types}).
+Value Types}), or by using a @code{typedef} or a @code{#define} to
+define @code{YYSTYPE} to be a union type whose member names are
+the type tags.
@item
Choose one of those types for each symbol (terminal or nonterminal) for
is to compute a semantic value for the grouping built by the rule from the
semantic values associated with tokens or smaller groupings.
-An action consists of C statements surrounded by braces, much like a
-compound statement in C@. An action can contain any sequence of C
-statements. Bison does not look for trigraphs, though, so if your C
-code uses trigraphs you should ensure that they do not affect the
-nesting of braces or the boundaries of comments, strings, or character
-literals.
-
-An action can be placed at any position in the rule;
+An action consists of braced code containing C statements, and can be
+placed at any position in the rule;
it is executed at that position. Most rules have just one action at the
end of the rule, following all the components. Actions in the middle of
a rule are tricky and used only for special purposes (@pxref{Mid-Rule
always refers to the @code{expr} which precedes @code{bar} in the
definition of @code{foo}.
+@vindex yylval
+It is also possible to access the semantic value of the lookahead token, if
+any, from a semantic action.
+This semantic value is stored in @code{yylval}.
+@xref{Action Features, ,Special Features for Use in Actions}.
+
@node Action Types
@subsection Data Types of Values in Actions
@cindex action data types
removes the temporary @code{let}-variable from the list so that it won't
appear to exist while the rest of the program is parsed.
+@findex %destructor
+@cindex discarded symbols, mid-rule actions
+@cindex error recovery, mid-rule actions
+In the above example, if the parser initiates error recovery (@pxref{Error
+Recovery}) while parsing the tokens in the embedded statement @code{stmt},
+it might discard the previous semantic context @code{$<context>5} without
+restoring it.
+Thus, @code{$<context>5} needs a destructor (@pxref{Destructor Decl, , Freeing
+Discarded Symbols}).
+However, Bison currently provides no means to declare a destructor specific to
+a particular mid-rule action's semantic value.
+
+One solution is to bury the mid-rule action inside a nonterminal symbol and to
+declare a destructor for that symbol:
+
+@example
+@group
+%type <context> let
+%destructor @{ pop_context ($$); @} let
+
+%%
+
+stmt: let stmt
+ @{ $$ = $2;
+ pop_context ($1); @}
+ ;
+
+let: LET '(' var ')'
+ @{ $$ = push_context ();
+ declare_variable ($3); @}
+ ;
+
+@end group
+@end example
+
+@noindent
+Note that the action is now at the end of its rule.
+Any mid-rule action can be converted to an end-of-rule action in this way, and
+this is what Bison actually does to implement mid-rule actions.
+
Taking action before a rule is completely recognized often leads to
conflicts since the parser must commit to a parse in order to execute the
action. For example, the following two rules, without mid-rule actions,
when it has read no farther than the open-brace. In other words, it
must commit to using one rule or the other, without sufficient
information to do it correctly. (The open-brace token is what is called
-the @dfn{look-ahead} token at this time, since the parser is still
-deciding what to do about it. @xref{Look-Ahead, ,Look-Ahead Tokens}.)
+the @dfn{lookahead} token at this time, since the parser is still
+deciding what to do about it. @xref{Lookahead, ,Lookahead Tokens}.)
You might think that you could correct the problem by putting identical
actions into the two rules, like this:
@noindent
Now Bison can execute the action in the rule for @code{subroutine} without
-deciding which rule for @code{compound} it will eventually use. Note that
-the action is now at the end of its rule. Any mid-rule action can be
-converted to an end-of-rule action in this way, and this is what Bison
-actually does to implement mid-rule actions.
+deciding which rule for @code{compound} it will eventually use.
@node Locations
@section Tracking Locations
Defining a data type for locations is much simpler than for semantic values,
since all tokens and groupings always use the same type.
-The type of locations is specified by defining a macro called @code{YYLTYPE}.
+You can specify the type of locations by defining a macro called
+@code{YYLTYPE}, just as you can specify the semantic value type by
+defining a @code{YYSTYPE} macro (@pxref{Value Type}).
When @code{YYLTYPE} is not defined, Bison uses a default structure type with
four members:
@} YYLTYPE;
@end example
+At the beginning of the parsing, Bison initializes all these fields to 1
+for @code{yylloc}.
+
@node Actions and Locations
@subsection Actions and Locations
@cindex location actions
@end group
@end example
+@vindex yylloc
+It is also possible to access the location of the lookahead token, if any,
+from a semantic action.
+This location is stored in @code{yylloc}.
+@xref{Action Features, ,Special Features for Use in Actions}.
+
@node Location Default Action
@subsection Default Action for Locations
@vindex YYLLOC_DEFAULT
+@cindex @acronym{GLR} parsers and @code{YYLLOC_DEFAULT}
Actually, actions are not the best place to compute locations. Since
locations are much more general than semantic values, there is room in
rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
matched, before the associated action is run. It is also invoked
while processing a syntax error, to compute the error's location.
+Before reporting an unresolvable syntactic ambiguity, a @acronym{GLR}
+parser invokes @code{YYLLOC_DEFAULT} recursively to compute the location
+of that ambiguity.
Most of the time, this macro is general enough to suppress location
dedicated code from semantic actions.
the location of the grouping (the result of the computation). When a
rule is matched, the second parameter identifies locations of
all right hand side elements of the rule being matched, and the third
-parameter is the size of the rule's right hand side. When processing
-a syntax error, the second parameter identifies locations of
-the symbols that were discarded during error processing, and the third
+parameter is the size of the rule's right hand side.
+When a @acronym{GLR} parser reports an ambiguity, which of multiple candidate
+right hand sides it passes to @code{YYLLOC_DEFAULT} is undefined.
+When processing a syntax error, the second parameter identifies locations
+of the symbols that were discarded during error processing, and the third
parameter is the number of discarded symbols.
By default, @code{YYLLOC_DEFAULT} is defined this way:
* Expect Decl:: Suppressing warnings about parsing conflicts.
* Start Decl:: Specifying the start symbol.
* Pure Decl:: Requesting a reentrant parser.
+* Push Decl:: Requesting a push parser.
* Decl Summary:: Table of all Bison declarations.
@end menu
@xref{Precedence, ,Operator Precedence}, for general information on
operator precedence.
-The syntax of a precedence declaration is the same as that of
+The syntax of a precedence declaration is nearly the same as that of
@code{%token}: either
@example
the one declared later has the higher precedence and is grouped first.
@end itemize
+For backward compatibility, there is a confusing difference between the
+argument lists of @code{%token} and precedence declarations.
+Only a @code{%token} can associate a literal string with a token type name.
+A precedence declaration always interprets a literal string as a reference to a
+separate token.
+For example:
+
+@example
+%left OR "<=" // Does not declare an alias.
+%left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=".
+@end example
+
@node Union Decl
@subsection The Collection of Value Types
@cindex declaring value types
@cindex value types, declaring
@findex %union
-The @code{%union} declaration specifies the entire collection of possible
-data types for semantic values. The keyword @code{%union} is followed by a
-pair of braces containing the same thing that goes inside a @code{union} in
-C.
+The @code{%union} declaration specifies the entire collection of
+possible data types for semantic values. The keyword @code{%union} is
+followed by braced code containing the same thing that goes inside a
+@code{union} in C@.
For example:
@end group
@end example
+@noindent
specifies the union tag @code{value}, so the corresponding C type is
@code{union value}. If you do not specify a tag, it defaults to
@code{YYSTYPE}.
+As another extension to @acronym{POSIX}, you may specify multiple
+@code{%union} declarations; their contents are concatenated. However,
+only the first @code{%union} declaration can specify a tag.
+
Note that, unlike making a @code{union} declaration in C, you need not write
a semicolon after the closing brace.
+Instead of @code{%union}, you can define and use your own union type
+@code{YYSTYPE} if your grammar contains at least one
+@samp{<@var{type}>} tag. For example, you can put the following into
+a header file @file{parser.h}:
+
+@example
+@group
+union YYSTYPE @{
+ double val;
+ symrec *tptr;
+@};
+typedef union YYSTYPE YYSTYPE;
+@end group
+@end example
+
+@noindent
+and then your grammar can use the following
+instead of @code{%union}:
+
+@example
+@group
+%@{
+#include "parser.h"
+%@}
+%type <val> expr
+%token <tptr> ID
+@end group
+@end example
+
@node Type Decl
@subsection Nonterminal Symbols
@cindex declaring value types, nonterminals
@deffn {Directive} %initial-action @{ @var{code} @}
@findex %initial-action
-Declare that the @var{code} must be invoked before parsing each time
+Declare that the braced @var{code} must be invoked before parsing each time
@code{yyparse} is called. The @var{code} may use @code{$$} and
-@code{@@$} --- initial value and location of the look-ahead --- and the
+@code{@@$} --- initial value and location of the lookahead --- and the
@code{%parse-param}.
@end deffn
@subsection Freeing Discarded Symbols
@cindex freeing discarded symbols
@findex %destructor
-
+@findex <*>
+@findex <>
During error recovery (@pxref{Error Recovery}), symbols already pushed
on the stack and tokens coming from the rest of the file are discarded
until the parser falls on its feet. If the parser runs out of memory,
@deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
@findex %destructor
-Invoke @var{code} whenever the parser discards one of the @var{symbols}.
+Invoke the braced @var{code} whenever the parser discards one of the
+@var{symbols}.
Within @var{code}, @code{$$} designates the semantic value associated
-with the discarded symbol. The additional parser parameters are also
-available (@pxref{Parser Function, , The Parser Function
-@code{yyparse}}).
+with the discarded symbol, and @code{@@$} designates its location.
+The additional parser parameters are also available (@pxref{Parser Function, ,
+The Parser Function @code{yyparse}}).
+
+When a symbol is listed among @var{symbols}, its @code{%destructor} is called a
+per-symbol @code{%destructor}.
+You may also define a per-type @code{%destructor} by listing a semantic type
+tag among @var{symbols}.
+In that case, the parser will invoke this @var{code} whenever it discards any
+grammar symbol that has that semantic type tag unless that symbol has its own
+per-symbol @code{%destructor}.
+
+Finally, you can define two different kinds of default @code{%destructor}s.
+(These default forms are experimental.
+More user feedback will help to determine whether they should become permanent
+features.)
+You can place each of @code{<*>} and @code{<>} in the @var{symbols} list of
+exactly one @code{%destructor} declaration in your grammar file.
+The parser will invoke the @var{code} associated with one of these whenever it
+discards any user-defined grammar symbol that has no per-symbol and no per-type
+@code{%destructor}.
+The parser uses the @var{code} for @code{<*>} in the case of such a grammar
+symbol for which you have formally declared a semantic type tag (@code{%type}
+counts as such a declaration, but @code{$<tag>$} does not).
+The parser uses the @var{code} for @code{<>} in the case of such a grammar
+symbol that has no declared semantic type tag.
@end deffn
-For instance:
+@noindent
+For example:
@smallexample
-%union
-@{
- char *string;
-@}
-%token <string> STRING
-%type <string> string
-%destructor @{ free ($$); @} STRING string
+%union @{ char *string; @}
+%token <string> STRING1
+%token <string> STRING2
+%type <string> string1
+%type <string> string2
+%union @{ char character; @}
+%token <character> CHR
+%type <character> chr
+%token TAGLESS
+
+%destructor @{ @} <character>
+%destructor @{ free ($$); @} <*>
+%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
+%destructor @{ printf ("Discarding tagless symbol.\n"); @} <>
+@end smallexample
+
+@noindent
+guarantees that, when the parser discards any user-defined symbol that has a
+semantic type tag other than @code{<character>}, it passes its semantic value
+to @code{free} by default.
+However, when the parser discards a @code{STRING1} or a @code{string1}, it also
+prints its line number to @code{stdout}.
+It performs only the second @code{%destructor} in this case, so it invokes
+@code{free} only once.
+Finally, the parser merely prints a message whenever it discards any symbol,
+such as @code{TAGLESS}, that has no semantic type tag.
+
+A Bison-generated parser invokes the default @code{%destructor}s only for
+user-defined as opposed to Bison-defined symbols.
+For example, the parser will not invoke either kind of default
+@code{%destructor} for the special Bison-defined symbols @code{$accept},
+@code{$undefined}, or @code{$end} (@pxref{Table of Symbols, ,Bison Symbols}),
+none of which you can reference in your grammar.
+It also will not invoke either for the @code{error} token (@pxref{Table of
+Symbols, ,error}), which is always defined by Bison regardless of whether you
+reference it in your grammar.
+However, it may invoke one of them for the end token (token 0) if you
+redefine it from @code{$end} to, for example, @code{END}:
+
+@smallexample
+%token END 0
@end smallexample
+@cindex actions in mid-rule
+@cindex mid-rule actions
+Finally, Bison will never invoke a @code{%destructor} for an unreferenced
+mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}).
+That is, Bison does not consider a mid-rule to have a semantic value if you do
+not reference @code{$$} in the mid-rule's action or @code{$@var{n}} (where
+@var{n} is the RHS symbol position of the mid-rule) in any later action in that
+rule.
+However, if you do reference either, the Bison-generated parser will invoke the
+@code{<>} @code{%destructor} whenever it discards the mid-rule symbol.
+
+@ignore
@noindent
-guarantees that when a @code{STRING} or a @code{string} is discarded,
-its associated memory will be freed.
+In the future, it may be possible to redefine the @code{error} token as a
+nonterminal that captures the discarded symbols.
+In that case, the parser will invoke the default destructor for it as well.
+@end ignore
@sp 1
@item
incoming terminals during the second phase of error recovery,
@item
-the current look-ahead and the entire stack (except the current
+the current lookahead and the entire stack (except the current
right-hand side symbols) when the parser returns immediately, and
@item
the start symbol, when the parser succeeds.
@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory
exhaustion.
-Right-hand size symbols of a rule that explicitly triggers a syntax
+Right-hand side symbols of a rule that explicitly triggers a syntax
error via @code{YYERROR} are not discarded automatically. As a rule
of thumb, destructors are invoked only when user actions cannot manage
the memory.
@subsection A Pure (Reentrant) Parser
@cindex reentrant parser
@cindex pure parser
-@findex %pure-parser
+@findex %define api.pure
A @dfn{reentrant} program is one which does not alter in the course of
execution; in other words, it consists entirely of @dfn{pure} (read-only)
including @code{yylval} and @code{yylloc}.)
Alternatively, you can generate a pure, reentrant parser. The Bison
-declaration @code{%pure-parser} says that you want the parser to be
+declaration @code{%define api.pure} says that you want the parser to be
reentrant. It looks like this:
@example
-%pure-parser
+%define api.pure
@end example
The result is that the communication variables @code{yylval} and
@code{yylloc} become local variables in @code{yyparse}, and a different
calling convention is used for the lexical analyzer function
@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
-Parsers}, for the details of this. The variable @code{yynerrs} also
-becomes local in @code{yyparse} (@pxref{Error Reporting, ,The Error
+Parsers}, for the details of this. The variable @code{yynerrs}
+becomes local in @code{yyparse} in pull mode but it becomes a member
+of yypstate in push mode. (@pxref{Error Reporting, ,The Error
Reporting Function @code{yyerror}}). The convention for calling
@code{yyparse} itself is unchanged.
You can generate either a pure parser or a nonreentrant parser from any
valid grammar.
+@node Push Decl
+@subsection A Push Parser
+@cindex push parser
+@cindex push parser
+@findex %define api.push_pull
+
+A pull parser is called once and it takes control until all its input
+is completely parsed. A push parser, on the other hand, is called
+each time a new token is made available.
+
+A push parser is typically useful when the parser is part of a
+main event loop in the client's application. This is typically
+a requirement of a GUI, when the main event loop needs to be triggered
+within a certain time period.
+
+Normally, Bison generates a pull parser.
+The following Bison declaration says that you want the parser to be a push
+parser (@pxref{Decl Summary,,%define api.push_pull}):
+
+@example
+%define api.push_pull "push"
+@end example
+
+In almost all cases, you want to ensure that your push parser is also
+a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). The only
+time you should create an impure push parser is to have backwards
+compatibility with the impure Yacc pull mode interface. Unless you know
+what you are doing, your declarations should look like this:
+
+@example
+%define api.pure
+%define api.push_pull "push"
+@end example
+
+There is a major notable functional difference between the pure push parser
+and the impure push parser. It is acceptable for a pure push parser to have
+many parser instances, of the same type of parser, in memory at the same time.
+An impure push parser should only use one parser at a time.
+
+When a push parser is selected, Bison will generate some new symbols in
+the generated parser. @code{yypstate} is a structure that the generated
+parser uses to store the parser's state. @code{yypstate_new} is the
+function that will create a new parser instance. @code{yypstate_delete}
+will free the resources associated with the corresponding parser instance.
+Finally, @code{yypush_parse} is the function that should be called whenever a
+token is available to provide the parser. A trivial example
+of using a pure push parser would look like this:
+
+@example
+int status;
+yypstate *ps = yypstate_new ();
+do @{
+ status = yypush_parse (ps, yylex (), NULL);
+@} while (status == YYPUSH_MORE);
+yypstate_delete (ps);
+@end example
+
+If the user decided to use an impure push parser, a few things about
+the generated parser will change. The @code{yychar} variable becomes
+a global variable instead of a variable in the @code{yypush_parse} function.
+For this reason, the signature of the @code{yypush_parse} function is
+changed to remove the token as a parameter. A nonreentrant push parser
+example would thus look like this:
+
+@example
+extern int yychar;
+int status;
+yypstate *ps = yypstate_new ();
+do @{
+ yychar = yylex ();
+ status = yypush_parse (ps);
+@} while (status == YYPUSH_MORE);
+yypstate_delete (ps);
+@end example
+
+That's it. Notice the next token is put into the global variable @code{yychar}
+for use by the next invocation of the @code{yypush_parse} function.
+
+Bison also supports both the push parser interface along with the pull parser
+interface in the same generated parser. In order to get this functionality,
+you should replace the @code{%define api.push_pull "push"} declaration with the
+@code{%define api.push_pull "both"} declaration. Doing this will create all of
+the symbols mentioned earlier along with the two extra symbols, @code{yyparse}
+and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally
+would be used. However, the user should note that it is implemented in the
+generated parser by calling @code{yypull_parse}.
+This makes the @code{yyparse} function that is generated with the
+@code{%define api.push_pull "both"} declaration slower than the normal
+@code{yyparse} function. If the user
+calls the @code{yypull_parse} function it will parse the rest of the input
+stream. It is possible to @code{yypush_parse} tokens to select a subgrammar
+and then @code{yypull_parse} the rest of the input stream. If you would like
+to switch back and forth between between parsing styles, you would have to
+write your own @code{yypull_parse} function that knows when to quit looking
+for input. An example of using the @code{yypull_parse} function would look
+like this:
+
+@example
+yypstate *ps = yypstate_new ();
+yypull_parse (ps); /* Will call the lexer */
+yypstate_delete (ps);
+@end example
+
+Adding the @code{%define api.pure} declaration does exactly the same thing to
+the generated parser with @code{%define api.push_pull "both"} as it did for
+@code{%define api.push_pull "push"}.
+
@node Decl Summary
@subsection Bison Declaration Summary
@cindex Bison declaration summary
In order to change the behavior of @command{bison}, use the following
directives:
-@deffn {Directive} %debug
-In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
-already defined, so that the debugging facilities are compiled.
+@deffn {Directive} %code @{@var{code}@}
+@findex %code
+This is the unqualified form of the @code{%code} directive.
+It inserts @var{code} verbatim at a language-dependent default location in the
+output@footnote{The default location is actually skeleton-dependent;
+ writers of non-standard skeletons however should choose the default location
+ consistently with the behavior of the standard Bison skeletons.}.
+
+@cindex Prologue
+For C/C++, the default location is the parser source code
+file after the usual contents of the parser header file.
+Thus, @code{%code} replaces the traditional Yacc prologue,
+@code{%@{@var{code}%@}}, for most purposes.
+For a detailed discussion, see @ref{Prologue Alternatives}.
+
+For Java, the default location is inside the parser class.
+
+(Like all the Yacc prologue alternatives, this directive is experimental.
+More user feedback will help to determine whether it should become a permanent
+feature.)
@end deffn
-@xref{Tracing, ,Tracing Your Parser}.
-@deffn {Directive} %defines
-Write a header file containing macro definitions for the token type
-names defined in the grammar as well as a few other declarations.
-If the parser output file is named @file{@var{name}.c} then this file
-is named @file{@var{name}.h}.
+@deffn {Directive} %code @var{qualifier} @{@var{code}@}
+This is the qualified form of the @code{%code} directive.
+If you need to specify location-sensitive verbatim @var{code} that does not
+belong at the default location selected by the unqualified @code{%code} form,
+use this form instead.
-Unless @code{YYSTYPE} is already defined as a macro, the output header
-declares @code{YYSTYPE}. Therefore, if you are using a @code{%union}
-(@pxref{Multiple Types, ,More Than One Value Type}) with components
-that require other definitions, or if you have defined a
-@code{YYSTYPE} macro (@pxref{Value Type, ,Data Types of Semantic
-Values}), you need to arrange for these definitions to be propagated to
-all modules, e.g., by putting them in a
-prerequisite header that is included both by your parser and by any
-other module that needs @code{YYSTYPE}.
+@var{qualifier} identifies the purpose of @var{code} and thus the location(s)
+where Bison should generate it.
+Not all values of @var{qualifier} are available for all target languages:
-Unless your parser is pure, the output header declares @code{yylval}
-as an external variable. @xref{Pure Decl, ,A Pure (Reentrant)
-Parser}.
+@itemize @bullet
+@item requires
+@findex %code requires
-If you have also used locations, the output header declares
-@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of
-@code{YYSTYPE} and @code{yylval}. @xref{Locations, ,Tracking
-Locations}.
+@itemize @bullet
+@item Language(s): C, C++
-This output file is normally essential if you wish to put the
-definition of @code{yylex} in a separate source file, because
-@code{yylex} typically needs to be able to refer to the
-above-mentioned declarations and to the token type codes.
-@xref{Token Values, ,Semantic Values of Tokens}.
-@end deffn
+@item Purpose: This is the best place to write dependency code required for
+@code{YYSTYPE} and @code{YYLTYPE}.
+In other words, it's the best place to define types referenced in @code{%union}
+directives, and it's the best place to override Bison's default @code{YYSTYPE}
+and @code{YYLTYPE} definitions.
-@deffn {Directive} %destructor
-Specify how the parser should reclaim the memory associated to
-discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
-@end deffn
+@item Location(s): The parser header file and the parser source code file
+before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE} definitions.
+@end itemize
-@deffn {Directive} %file-prefix="@var{prefix}"
-Specify a prefix to use for all Bison output file names. The names are
-chosen as if the input file were named @file{@var{prefix}.y}.
-@end deffn
+@item provides
+@findex %code provides
-@deffn {Directive} %locations
-Generate the code processing the locations (@pxref{Action Features,
-,Special Features for Use in Actions}). This mode is enabled as soon as
-the grammar uses the special @samp{@@@var{n}} tokens, but if your
+@itemize @bullet
+@item Language(s): C, C++
+
+@item Purpose: This is the best place to write additional definitions and
+declarations that should be provided to other modules.
+
+@item Location(s): The parser header file and the parser source code file after
+the Bison-generated @code{YYSTYPE}, @code{YYLTYPE}, and token definitions.
+@end itemize
+
+@item top
+@findex %code top
+
+@itemize @bullet
+@item Language(s): C, C++
+
+@item Purpose: The unqualified @code{%code} or @code{%code requires} should
+usually be more appropriate than @code{%code top}.
+However, occasionally it is necessary to insert code much nearer the top of the
+parser source code file.
+For example:
+
+@smallexample
+%code top @{
+ #define _GNU_SOURCE
+ #include <stdio.h>
+@}
+@end smallexample
+
+@item Location(s): Near the top of the parser source code file.
+@end itemize
+
+@item imports
+@findex %code imports
+
+@itemize @bullet
+@item Language(s): Java
+
+@item Purpose: This is the best place to write Java import directives.
+
+@item Location(s): The parser Java file after any Java package directive and
+before any class definitions.
+@end itemize
+@end itemize
+
+(Like all the Yacc prologue alternatives, this directive is experimental.
+More user feedback will help to determine whether it should become a permanent
+feature.)
+
+@cindex Prologue
+For a detailed discussion of how to use @code{%code} in place of the
+traditional Yacc prologue for C/C++, see @ref{Prologue Alternatives}.
+@end deffn
+
+@deffn {Directive} %debug
+In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
+already defined, so that the debugging facilities are compiled.
+@end deffn
+@xref{Tracing, ,Tracing Your Parser}.
+
+@deffn {Directive} %define @var{variable}
+@deffnx {Directive} %define @var{variable} "@var{value}"
+Define a variable to adjust Bison's behavior.
+The possible choices for @var{variable}, as well as their meanings, depend on
+the selected target language and/or the parser skeleton (@pxref{Decl
+Summary,,%language}).
+
+Bison will warn if a @var{variable} is defined multiple times.
+
+Omitting @code{"@var{value}"} is always equivalent to specifying it as
+@code{""}.
+
+Some @var{variable}s may be used as Booleans.
+In this case, Bison will complain if the variable definition does not meet one
+of the following four conditions:
+
+@enumerate
+@item @code{"@var{value}"} is @code{"true"}
+
+@item @code{"@var{value}"} is omitted (or is @code{""}).
+This is equivalent to @code{"true"}.
+
+@item @code{"@var{value}"} is @code{"false"}.
+
+@item @var{variable} is never defined.
+In this case, Bison selects a default value, which may depend on the selected
+target language and/or parser skeleton.
+@end enumerate
+
+Some of the accepted @var{variable}s are:
+
+@itemize @bullet
+@item api.pure
+@findex %define api.pure
+
+@itemize @bullet
+@item Language(s): C
+
+@item Purpose: Request a pure (reentrant) parser program.
+@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
+
+@item Accepted Values: Boolean
+
+@item Default Value: @code{"false"}
+@end itemize
+
+@item api.push_pull
+@findex %define api.push_pull
+
+@itemize @bullet
+@item Language(s): C (LALR(1) only)
+
+@item Purpose: Requests a pull parser, a push parser, or both.
+@xref{Push Decl, ,A Push Parser}.
+
+@item Accepted Values: @code{"pull"}, @code{"push"}, @code{"both"}
+
+@item Default Value: @code{"pull"}
+@end itemize
+
+@item lr.keep_unreachable_states
+@findex %define lr.keep_unreachable_states
+
+@itemize @bullet
+@item Language(s): all
+
+@item Purpose: Requests that Bison allow unreachable parser states to remain in
+the parser tables.
+Bison considers a state to be unreachable if there exists no sequence of
+transitions from the start state to that state.
+A state can become unreachable during conflict resolution if Bison disables a
+shift action leading to it from a predecessor state.
+Keeping unreachable states is sometimes useful for analysis purposes, but they
+are useless in the generated parser.
+
+@item Accepted Values: Boolean
+
+@item Default Value: @code{"false"}
+
+@item Caveats:
+
+@itemize @bullet
+
+@item Unreachable states may contain conflicts and may use rules not used in
+any other state.
+Thus, keeping unreachable states may induce warnings that are irrelevant to
+your parser's behavior, and it may eliminate warnings that are relevant.
+Of course, the change in warnings may actually be relevant to a parser table
+analysis that wants to keep unreachable states, so this behavior will likely
+remain in future Bison releases.
+
+@item While Bison is able to remove unreachable states, it is not guaranteed to
+remove other kinds of useless states.
+Specifically, when Bison disables reduce actions during conflict resolution,
+some goto actions may become useless, and thus some additional states may
+become useless.
+If Bison were to compute which goto actions were useless and then disable those
+actions, it could identify such states as unreachable and then remove those
+states.
+However, Bison does not compute which goto actions are useless.
+@end itemize
+@end itemize
+
+@item namespace
+@findex %define namespace
+
+@itemize
+@item Languages(s): C++
+
+@item Purpose: Specifies the namespace for the parser class.
+For example, if you specify:
+
+@smallexample
+%define namespace "foo::bar"
+@end smallexample
+
+Bison uses @code{foo::bar} verbatim in references such as:
+
+@smallexample
+foo::bar::parser::semantic_type
+@end smallexample
+
+However, to open a namespace, Bison removes any leading @code{::} and then
+splits on any remaining occurrences:
+
+@smallexample
+namespace foo @{ namespace bar @{
+ class position;
+ class location;
+@} @}
+@end smallexample
+
+@item Accepted Values: Any absolute or relative C++ namespace reference without
+a trailing @code{"::"}.
+For example, @code{"foo"} or @code{"::foo::bar"}.
+
+@item Default Value: The value specified by @code{%name-prefix}, which defaults
+to @code{yy}.
+This usage of @code{%name-prefix} is for backward compatibility and can be
+confusing since @code{%name-prefix} also specifies the textual prefix for the
+lexical analyzer function.
+Thus, if you specify @code{%name-prefix}, it is best to also specify
+@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the
+lexical analyzer function.
+For example, if you specify:
+
+@smallexample
+%define namespace "foo"
+%name-prefix "bar::"
+@end smallexample
+
+The parser namespace is @code{foo} and @code{yylex} is referenced as
+@code{bar::lex}.
+@end itemize
+@end itemize
+
+@end deffn
+
+@deffn {Directive} %defines
+Write a header file containing macro definitions for the token type
+names defined in the grammar as well as a few other declarations.
+If the parser output file is named @file{@var{name}.c} then this file
+is named @file{@var{name}.h}.
+
+For C parsers, the output header declares @code{YYSTYPE} unless
+@code{YYSTYPE} is already defined as a macro or you have used a
+@code{<@var{type}>} tag without using @code{%union}.
+Therefore, if you are using a @code{%union}
+(@pxref{Multiple Types, ,More Than One Value Type}) with components that
+require other definitions, or if you have defined a @code{YYSTYPE} macro
+or type definition
+(@pxref{Value Type, ,Data Types of Semantic Values}), you need to
+arrange for these definitions to be propagated to all modules, e.g., by
+putting them in a prerequisite header that is included both by your
+parser and by any other module that needs @code{YYSTYPE}.
+
+Unless your parser is pure, the output header declares @code{yylval}
+as an external variable. @xref{Pure Decl, ,A Pure (Reentrant)
+Parser}.
+
+If you have also used locations, the output header declares
+@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of
+the @code{YYSTYPE} macro and @code{yylval}. @xref{Locations, ,Tracking
+Locations}.
+
+This output file is normally essential if you wish to put the definition
+of @code{yylex} in a separate source file, because @code{yylex}
+typically needs to be able to refer to the above-mentioned declarations
+and to the token type codes. @xref{Token Values, ,Semantic Values of
+Tokens}.
+
+@findex %code requires
+@findex %code provides
+If you have declared @code{%code requires} or @code{%code provides}, the output
+header also contains their code.
+@xref{Decl Summary, ,%code}.
+@end deffn
+
+@deffn {Directive} %defines @var{defines-file}
+Same as above, but save in the file @var{defines-file}.
+@end deffn
+
+@deffn {Directive} %destructor
+Specify how the parser should reclaim the memory associated to
+discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
+@end deffn
+
+@deffn {Directive} %file-prefix "@var{prefix}"
+Specify a prefix to use for all Bison output file names. The names are
+chosen as if the input file were named @file{@var{prefix}.y}.
+@end deffn
+
+@deffn {Directive} %language "@var{language}"
+Specify the programming language for the generated parser. Currently
+supported languages include C and C++.
+@var{language} is case-insensitive.
+@end deffn
+
+@deffn {Directive} %locations
+Generate the code processing the locations (@pxref{Action Features,
+,Special Features for Use in Actions}). This mode is enabled as soon as
+the grammar uses the special @samp{@@@var{n}} tokens, but if your
grammar does not use it, using @samp{%locations} allows for more
accurate syntax error messages.
@end deffn
-@deffn {Directive} %name-prefix="@var{prefix}"
+@deffn {Directive} %name-prefix "@var{prefix}"
Rename the external symbols used in the parser so that they start with
@var{prefix} instead of @samp{yy}. The precise list of symbols renamed
+in C parsers
is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
-@code{yylval}, @code{yylloc}, @code{yychar}, @code{yydebug}, and
-possible @code{yylloc}. For example, if you use
-@samp{%name-prefix="c_"}, the names become @code{c_parse}, @code{c_lex},
-and so on. @xref{Multiple Parsers, ,Multiple Parsers in the Same
-Program}.
+@code{yylval}, @code{yychar}, @code{yydebug}, and
+(if locations are used) @code{yylloc}. If you use a push parser,
+@code{yypush_parse}, @code{yypull_parse}, @code{yypstate},
+@code{yypstate_new} and @code{yypstate_delete} will
+also be renamed. For example, if you use @samp{%name-prefix "c_"}, the
+names become @code{c_parse}, @code{c_lex}, and so on.
+For C++ parsers, see the @code{%define namespace} documentation in this
+section.
+@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
@end deffn
@ifset defaultprec
@end deffn
@end ifset
-@deffn {Directive} %no-parser
-Do not include any C code in the parser file; generate tables only. The
-parser file contains just @code{#define} directives and static variable
-declarations.
-
-This option also tells Bison to write the C code for the grammar actions
-into a file named @file{@var{file}.act}, in the form of a
-brace-surrounded body fit for a @code{switch} statement.
-@end deffn
-
@deffn {Directive} %no-lines
Don't generate any @code{#line} preprocessor commands in the parser
file. Ordinarily Bison writes these commands in the parser file so that
file in its own right.
@end deffn
-@deffn {Directive} %output="@var{file}"
+@deffn {Directive} %output "@var{file}"
Specify @var{file} for the parser file.
@end deffn
@deffn {Directive} %pure-parser
-Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure
-(Reentrant) Parser}).
+Deprecated version of @code{%define api.pure} (@pxref{Decl Summary, ,%define}),
+for which Bison is more careful to warn about unreasonable usage.
@end deffn
@deffn {Directive} %require "@var{version}"
Require a Version of Bison}.
@end deffn
+@deffn {Directive} %skeleton "@var{file}"
+Specify the skeleton to use.
+
+You probably don't need this option unless you are developing Bison.
+You should use @code{%language} if you want to specify the skeleton for a
+different language, because it is clearer and because it will always choose the
+correct skeleton for non-deterministic or push parsers.
+
+If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
+file in the Bison installation directory.
+If it does, @var{file} is an absolute file name or a file name relative to the
+directory of the grammar file.
+This is similar to how most shells resolve commands.
+@end deffn
+
@deffn {Directive} %token-table
Generate an array of token names in the parser file. The name of the
array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
@deffn {Directive} %verbose
Write an extra output file containing verbose descriptions of the
-parser states and what is done for each type of look-ahead token in
+parser states and what is done for each type of lookahead token in
that state. @xref{Understanding, , Understanding Your Parser}, for more
information.
@end deffn
The precise list of symbols renamed is @code{yyparse}, @code{yylex},
@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yylloc},
-@code{yychar} and @code{yydebug}. For example, if you use @samp{-p c},
-the names become @code{cparse}, @code{clex}, and so on.
+@code{yychar} and @code{yydebug}. If you use a push parser,
+@code{yypush_parse}, @code{yypull_parse}, @code{yypstate},
+@code{yypstate_new} and @code{yypstate_delete} will also be renamed.
+For example, if you use @samp{-p c}, the names become @code{cparse},
+@code{clex}, and so on.
@strong{All the other variables and macros associated with Bison are not
renamed.} These others are not global; there is no conflict if the same
@menu
* Parser Function:: How to call @code{yyparse} and what it returns.
+* Push Parser Function:: How to call @code{yypush_parse} and what it returns.
+* Pull Parser Function:: How to call @code{yypull_parse} and what it returns.
+* Parser Create Function:: How to call @code{yypstate_new} and what it
+ returns.
+* Parser Delete Function:: How to call @code{yypstate_delete} and what it
+ returns.
* Lexical:: You must supply a function @code{yylex}
which reads tokens.
* Error Reporting:: You must supply a function @code{yyerror}.
@deffn {Directive} %parse-param @{@var{argument-declaration}@}
@findex %parse-param
-Declare that an argument declared by @code{argument-declaration} is an
-additional @code{yyparse} argument.
+Declare that an argument declared by the braced-code
+@var{argument-declaration} is an additional @code{yyparse} argument.
The @var{argument-declaration} is used when declaring
functions or prototypes. The last identifier in
@var{argument-declaration} must be the argument name.
exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
@end example
+@node Push Parser Function
+@section The Push Parser Function @code{yypush_parse}
+@findex yypush_parse
+
+You call the function @code{yypush_parse} to parse a single token. This
+function is available if either the @code{%define api.push_pull "push"} or
+@code{%define api.push_pull "both"} declaration is used.
+@xref{Push Decl, ,A Push Parser}.
+
+@deftypefun int yypush_parse (yypstate *yyps)
+The value returned by @code{yypush_parse} is the same as for yyparse with the
+following exception. @code{yypush_parse} will return YYPUSH_MORE if more input
+is required to finish parsing the grammar.
+@end deftypefun
+
+@node Pull Parser Function
+@section The Pull Parser Function @code{yypull_parse}
+@findex yypull_parse
+
+You call the function @code{yypull_parse} to parse the rest of the input
+stream. This function is available if the @code{%define api.push_pull "both"}
+declaration is used.
+@xref{Push Decl, ,A Push Parser}.
+
+@deftypefun int yypull_parse (yypstate *yyps)
+The value returned by @code{yypull_parse} is the same as for @code{yyparse}.
+@end deftypefun
+
+@node Parser Create Function
+@section The Parser Create Function @code{yystate_new}
+@findex yypstate_new
+
+You call the function @code{yypstate_new} to create a new parser instance.
+This function is available if either the @code{%define api.push_pull "push"} or
+@code{%define api.push_pull "both"} declaration is used.
+@xref{Push Decl, ,A Push Parser}.
+
+@deftypefun yypstate *yypstate_new (void)
+The fuction will return a valid parser instance if there was memory available
+or NULL if no memory was available.
+@end deftypefun
+
+@node Parser Delete Function
+@section The Parser Delete Function @code{yystate_delete}
+@findex yypstate_delete
+
+You call the function @code{yypstate_delete} to delete a parser instance.
+function is available if either the @code{%define api.push_pull "push"} or
+@code{%define api.push_pull "both"} declaration is used.
+@xref{Push Decl, ,A Push Parser}.
+
+@deftypefun void yypstate_delete (yypstate *yyps)
+This function will reclaim the memory associated with a parser instance.
+After this call, you should no longer attempt to use the parser instance.
+@end deftypefun
@node Lexical
@section The Lexical Analyzer Function @code{yylex}
@vindex yylloc
If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
-Tracking Locations}) in actions to keep track of the
-textual locations of tokens and groupings, then you must provide this
-information in @code{yylex}. The function @code{yyparse} expects to
-find the textual location of a token just parsed in the global variable
-@code{yylloc}. So @code{yylex} must store the proper data in that
-variable.
+Tracking Locations}) in actions to keep track of the textual locations
+of tokens and groupings, then you must provide this information in
+@code{yylex}. The function @code{yyparse} expects to find the textual
+location of a token just parsed in the global variable @code{yylloc}.
+So @code{yylex} must store the proper data in that variable.
By default, the value of @code{yylloc} is a structure and you need only
initialize the members that are going to be used by the actions. The
@node Pure Calling
@subsection Calling Conventions for Pure Parsers
-When you use the Bison declaration @code{%pure-parser} to request a
+When you use the Bison declaration @code{%define api.pure} to request a
pure, reentrant parser, the global communication variables @code{yylval}
and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
Parser}.) In such parsers the two global variables are replaced by
@deffn {Directive} lex-param @{@var{argument-declaration}@}
@findex %lex-param
-Declare that @code{argument-declaration} is an additional @code{yylex}
-argument declaration.
+Declare that the braced-code @var{argument-declaration} is an
+additional @code{yylex} argument declaration.
@end deffn
For instance:
int yyparse (int *nastiness, int *randomness);
@end example
-If @code{%pure-parser} is added:
+If @code{%define api.pure} is added:
@example
int yylex (YYSTYPE *lvalp, int *nastiness);
@end example
@noindent
-and finally, if both @code{%pure-parser} and @code{%locations} are used:
+and finally, if both @code{%define api.pure} and @code{%locations} are used:
@example
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
an access to the current location.
This is indeed the case for the @acronym{GLR}
parsers, but not for the Yacc parser, for historical reasons. I.e., if
-@samp{%locations %pure-parser} is passed then the prototypes for
+@samp{%locations %define api.pure} is passed then the prototypes for
@code{yyerror} are:
@example
Finally, @acronym{GLR} and Yacc parsers share the same @code{yyerror} calling
convention for absolutely pure parsers, i.e., when the calling
convention of @code{yylex} @emph{and} the calling convention of
-@code{%pure-parser} are pure. I.e.:
+@code{%define api.pure} are pure.
+I.e.:
@example
/* Location tracking. */
%locations
/* Pure yylex. */
-%pure-parser
+%define api.pure
%lex-param @{int *nastiness@}
/* Pure yyparse. */
%parse-param @{int *nastiness@}
@deffn {Macro} YYBACKUP (@var{token}, @var{value});
@findex YYBACKUP
Unshift a token. This macro is allowed only for rules that reduce
-a single value, and only when there is no look-ahead token.
+a single value, and only when there is no lookahead token.
It is also disallowed in @acronym{GLR} parsers.
-It installs a look-ahead token with token type @var{token} and
+It installs a lookahead token with token type @var{token} and
semantic value @var{value}; then it discards the value that was
going to be reduced by this rule.
If the macro is used when it is not valid, such as when there is
-a look-ahead token already, then it reports a syntax error with
+a lookahead token already, then it reports a syntax error with
a message @samp{cannot back up} and performs ordinary error
recovery.
@deffn {Macro} YYEMPTY
@vindex YYEMPTY
-Value stored in @code{yychar} when there is no look-ahead token.
+Value stored in @code{yychar} when there is no lookahead token.
+@end deffn
+
+@deffn {Macro} YYEOF
+@vindex YYEOF
+Value stored in @code{yychar} when the lookahead is the end of the input
+stream.
@end deffn
@deffn {Macro} YYERROR;
@end deffn
@deffn {Macro} YYRECOVERING
-This macro stands for an expression that has the value 1 when the parser
-is recovering from a syntax error, and 0 the rest of the time.
+@findex YYRECOVERING
+The expression @code{YYRECOVERING ()} yields 1 when the parser
+is recovering from a syntax error, and 0 otherwise.
@xref{Error Recovery}.
@end deffn
@deffn {Variable} yychar
-Variable containing the current look-ahead token. (In a pure parser,
-this is actually a local variable within @code{yyparse}.) When there is
-no look-ahead token, the value @code{YYEMPTY} is stored in the variable.
-@xref{Look-Ahead, ,Look-Ahead Tokens}.
+Variable containing either the lookahead token, or @code{YYEOF} when the
+lookahead is the end of the input stream, or @code{YYEMPTY} when no lookahead
+has been performed so the next token is not yet known.
+Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic
+Actions}).
+@xref{Lookahead, ,Lookahead Tokens}.
@end deffn
@deffn {Macro} yyclearin;
-Discard the current look-ahead token. This is useful primarily in
-error rules. @xref{Error Recovery}.
+Discard the current lookahead token. This is useful primarily in
+error rules.
+Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR
+Semantic Actions}).
+@xref{Error Recovery}.
@end deffn
@deffn {Macro} yyerrok;
@xref{Error Recovery}.
@end deffn
+@deffn {Variable} yylloc
+Variable containing the lookahead token location when @code{yychar} is not set
+to @code{YYEMPTY} or @code{YYEOF}.
+Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic
+Actions}).
+@xref{Actions and Locations, ,Actions and Locations}.
+@end deffn
+
+@deffn {Variable} yylval
+Variable containing the lookahead token semantic value when @code{yychar} is
+not set to @code{YYEMPTY} or @code{YYEOF}.
+Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic
+Actions}).
+@xref{Actions, ,Actions}.
+@end deffn
+
@deffn {Value} @@$
@findex @@$
Acts like a structure variable containing information on the textual location
A Bison-generated parser can print diagnostics, including error and
tracing messages. By default, they appear in English. However, Bison
-also supports outputting diagnostics in the user's native language.
-To make this work, the user should set the usual environment
-variables. @xref{Users, , The User's View, gettext, GNU
-@code{gettext} utilities}. For
-example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might set
-the user's locale to French Canadian using the @acronym{UTF}-8
+also supports outputting diagnostics in the user's native language. To
+make this work, the user should set the usual environment variables.
+@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}.
+For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might
+set the user's locale to French Canadian using the @acronym{UTF}-8
encoding. The exact set of available locales depends on the user's
installation.
This kind of parser is known in the literature as a bottom-up parser.
@menu
-* Look-Ahead:: Parser looks one token ahead when deciding what to do.
+* Lookahead:: Parser looks one token ahead when deciding what to do.
* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
* Precedence:: Operator precedence works by resolving conflicts.
* Contextual Precedence:: When an operator's precedence depends on context.
* Memory Management:: What happens when memory is exhausted. How to avoid it.
@end menu
-@node Look-Ahead
-@section Look-Ahead Tokens
-@cindex look-ahead token
+@node Lookahead
+@section Lookahead Tokens
+@cindex lookahead token
The Bison parser does @emph{not} always reduce immediately as soon as the
last @var{n} tokens and groupings match a rule. This is because such a
token in order to decide what to do.
When a token is read, it is not immediately shifted; first it becomes the
-@dfn{look-ahead token}, which is not on the stack. Now the parser can
+@dfn{lookahead token}, which is not on the stack. Now the parser can
perform one or more reductions of tokens and groupings on the stack, while
-the look-ahead token remains off to the side. When no more reductions
-should take place, the look-ahead token is shifted onto the stack. This
+the lookahead token remains off to the side. When no more reductions
+should take place, the lookahead token is shifted onto the stack. This
does not mean that all possible reductions have been done; depending on the
-token type of the look-ahead token, some rules may choose to delay their
+token type of the lookahead token, some rules may choose to delay their
application.
-Here is a simple case where look-ahead is needed. These three rules define
+Here is a simple case where lookahead is needed. These three rules define
expressions which contain binary addition operators and postfix unary
factorial operators (@samp{!}), and allow parentheses for grouping.
'!'}. No rule allows that sequence.
@vindex yychar
-The current look-ahead token is stored in the variable @code{yychar}.
+@vindex yylval
+@vindex yylloc
+The lookahead token is stored in the variable @code{yychar}.
+Its semantic value and location, if any, are stored in the variables
+@code{yylval} and @code{yylloc}.
@xref{Action Features, ,Special Features for Use in Actions}.
@node Shift/Reduce
Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
terminal symbols for specific keyword tokens.
-When the @code{ELSE} token is read and becomes the look-ahead token, the
+When the @code{ELSE} token is read and becomes the lookahead token, the
contents of the stack (assuming the input is valid) are just right for
reduction by the first rule. But it is also legitimate to shift the
@code{ELSE}, because that would lead to eventual reduction by the second
The latter alternative, @dfn{right association}, is desirable for
assignment operators. The choice of left or right association is a
matter of whether the parser chooses to shift or reduce when the stack
-contains @w{@samp{1 - 2}} and the look-ahead token is @samp{-}: shifting
+contains @w{@samp{1 - 2}} and the lookahead token is @samp{-}: shifting
makes right-associativity.
@node Using Precedence
Precedence, ,Context-Dependent Precedence}.)
Finally, the resolution of conflicts works by comparing the precedence
-of the rule being considered with that of the look-ahead token. If the
+of the rule being considered with that of the lookahead token. If the
token's precedence is higher, the choice is to shift. If the rule's
precedence is higher, the choice is to reduce. If they have equal
precedence, the choice is made based on the associativity of that
resolved.
Not all rules and not all tokens have precedence. If either the rule or
-the look-ahead token has no precedence, then the default is to shift.
+the lookahead token has no precedence, then the default is to shift.
@node Contextual Precedence
@section Context-Dependent Precedence
near the top of the stack. The current state collects all the information
about previous input which is relevant to deciding what to do next.
-Each time a look-ahead token is read, the current parser state together
-with the type of look-ahead token are looked up in a table. This table
-entry can say, ``Shift the look-ahead token.'' In this case, it also
+Each time a lookahead token is read, the current parser state together
+with the type of lookahead token are looked up in a table. This table
+entry can say, ``Shift the lookahead token.'' In this case, it also
specifies the new parser state, which is pushed onto the top of the
parser stack. Or it can say, ``Reduce using rule number @var{n}.''
This means that a certain number of tokens or groupings are taken off
that number of states are popped from the stack, and one new state is
pushed.
-There is one other alternative: the table can say that the look-ahead token
+There is one other alternative: the table can say that the lookahead token
is erroneous in the current state. This causes error processing to begin
(@pxref{Error Recovery}).
@end example
It would seem that this grammar can be parsed with only a single token
-of look-ahead: when a @code{param_spec} is being read, an @code{ID} is
+of lookahead: when a @code{param_spec} is being read, an @code{ID} is
a @code{name} if a comma or colon follows, or a @code{type} if another
@code{ID} follows. In other words, this grammar is @acronym{LR}(1).
same. They appear similar because the same set of rules would be
active---the rule for reducing to a @code{name} and that for reducing to
a @code{type}. Bison is unable to determine at that stage of processing
-that the rules would require different look-ahead tokens in the two
+that the rules would require different lookahead tokens in the two
contexts, so it makes a single parser state for them both. Combining
the two contexts causes a conflict later. In parser terminology, this
occurrence means that the grammar is not @acronym{LALR}(1).
Bison produces @emph{deterministic} parsers that choose uniquely
when to reduce and which reduction to apply
-based on a summary of the preceding input and on one extra token of look-ahead.
+based on a summary of the preceding input and on one extra token of lookahead.
As a result, normal Bison handles a proper subset of the family of
context-free languages.
Ambiguous grammars, since they have strings with more than one possible
sequence of reductions cannot have deterministic parsers in this sense.
The same is true of languages that require more than one symbol of
-look-ahead, since the parser lacks the information necessary to make a
+lookahead, since the parser lacks the information necessary to make a
decision at the point it must be made in a shift-reduce parser.
Finally, as previously mentioned (@pxref{Mystery Conflicts}),
there are languages where Bison's particular choice of how to
@code{error} token is acceptable. (This means that the subexpressions
already parsed are discarded, back to the last complete @code{stmnts}.)
At this point the @code{error} token can be shifted. Then, if the old
-look-ahead token is not acceptable to be shifted next, the parser reads
+lookahead token is not acceptable to be shifted next, the parser reads
tokens and discards them until it finds a token which is acceptable. In
this example, Bison reads and discards input until the next newline so
that the fourth rule can apply. Note that discarded symbols are
@samp{yyerrok;} is a valid C statement.
@findex yyclearin
-The previous look-ahead token is reanalyzed immediately after an error. If
+The previous lookahead token is reanalyzed immediately after an error. If
this is unacceptable, then the macro @code{yyclearin} may be used to clear
this token. Write the statement @samp{yyclearin;} in the error rule's
action.
+@xref{Action Features, ,Special Features for Use in Actions}.
For example, suppose that on a syntax error, an error handling routine is
called that advances the input stream to some point where parsing should
once again commence. The next symbol returned by the lexical scanner is
-probably correct. The previous look-ahead token ought to be discarded
+probably correct. The previous lookahead token ought to be discarded
with @samp{yyclearin;}.
@vindex YYRECOVERING
-The macro @code{YYRECOVERING} stands for an expression that has the
-value 1 when the parser is recovering from a syntax error, and 0 the
-rest of the time. A value of 1 indicates that error messages are
-currently suppressed for new syntax errors.
+The expression @code{YYRECOVERING ()} yields 1 when the parser
+is recovering from a syntax error, and 0 otherwise.
+Syntax error diagnostics are suppressed while recovering from a syntax
+error.
@node Context Dependency
@chapter Handling Context Dependencies
Bison parsers are @dfn{shift/reduce automata}. In some cases (much more
frequent than one would hope), looking at this automaton is required to
tune or simply fix a parser. Bison provides two different
-representation of it, either textually or graphically (as a @acronym{VCG}
-file).
+representation of it, either textually or graphically (as a DOT file).
The textual file is generated when the options @option{--report} or
@option{--verbose} are specified, see @xref{Invocation, , Invoking
@command{bison} reports:
@example
-calc.y: warning: 1 useless nonterminal and 1 useless rule
-calc.y:11.1-7: warning: useless nonterminal: useless
-calc.y:11.10-12: warning: useless rule: useless: STR
+calc.y: warning: 1 nonterminal and 1 rule useless in grammar
+calc.y:11.1-7: warning: nonterminal useless in grammar: useless
+calc.y:11.10-12: warning: rule useless in grammar: useless: STR
calc.y: conflicts: 7 shift/reduce
@end example
The next section reports useless tokens, nonterminal and rules. Useless
nonterminals and rules are removed in order to produce a smaller parser,
but useless tokens are preserved, since they might be used by the
-scanner (note the difference between ``useless'' and ``not used''
+scanner (note the difference between ``useless'' and ``unused''
below):
@example
-Useless nonterminals:
+Nonterminals useless in grammar:
useless
-Terminals which are not used:
+Terminals unused in grammar:
STR
-Useless rules:
+Rules useless in grammar:
#6 useless: STR;
@end example
symbol (here, @code{exp}). When the parser returns to this state right
after having reduced a rule that produced an @code{exp}, the control
flow jumps to state 2. If there is no such transition on a nonterminal
-symbol, and the look-ahead is a @code{NUM}, then this token is shifted on
+symbol, and the lookahead is a @code{NUM}, then this token is shifted on
the parse stack, and the control flow jumps to state 1. Any other
-look-ahead triggers a syntax error.''
+lookahead triggers a syntax error.''
@cindex core, item set
@cindex item set core
@cindex kernel, item set
@cindex item set core
Even though the only active rule in state 0 seems to be rule 0, the
-report lists @code{NUM} as a look-ahead token because @code{NUM} can be
+report lists @code{NUM} as a lookahead token because @code{NUM} can be
at the beginning of any rule deriving an @code{exp}. By default Bison
reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
you want to see more detail you can invoke @command{bison} with
@end example
@noindent
-the rule 5, @samp{exp: NUM;}, is completed. Whatever the look-ahead token
+the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token
(@samp{$default}), the parser will reduce it. If it was coming from
state 0, then, after this reduction it will return to state 0, and will
jump to state 2 (@samp{exp: go to state 2}).
@noindent
In state 2, the automaton can only shift a symbol. For instance,
-because of the item @samp{exp -> exp . '+' exp}, if the look-ahead if
+because of the item @samp{exp -> exp . '+' exp}, if the lookahead if
@samp{+}, it will be shifted on the parse stack, and the automaton
control will jump to state 4, corresponding to the item @samp{exp -> exp
'+' . exp}. Since there is no default action, any other token than
$default reduce using rule 1 (exp)
@end example
-Indeed, there are two actions associated to the look-ahead @samp{/}:
+Indeed, there are two actions associated to the lookahead @samp{/}:
either shifting (and going to state 7), or reducing rule 1. The
conflict means that either the grammar is ambiguous, or the parser lacks
information to make the right decision. Indeed the grammar is
shifting the next token and going to the corresponding state, or
reducing a single rule. In the other cases, i.e., when shifting
@emph{and} reducing is possible or when @emph{several} reductions are
-possible, the look-ahead is required to select the action. State 8 is
-one such state: if the look-ahead is @samp{*} or @samp{/} then the action
+possible, the lookahead is required to select the action. State 8 is
+one such state: if the lookahead is @samp{*} or @samp{/} then the action
is shifting, otherwise the action is reducing rule 1. In other words,
the first two items, corresponding to rule 1, are not eligible when the
-look-ahead token is @samp{*}, since we specified that @samp{*} has higher
+lookahead token is @samp{*}, since we specified that @samp{*} has higher
precedence than @samp{+}. More generally, some items are eligible only
-with some set of possible look-ahead tokens. When run with
-@option{--report=look-ahead}, Bison specifies these look-ahead tokens:
+with some set of possible lookahead tokens. When run with
+@option{--report=lookahead}, Bison specifies these lookahead tokens:
@example
state 8
- exp -> exp . '+' exp [$, '+', '-', '/'] (rule 1)
+ exp -> exp . '+' exp (rule 1)
exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1)
exp -> exp . '-' exp (rule 2)
exp -> exp . '*' exp (rule 3)
The trace facility outputs messages with macro calls of the form
@code{YYFPRINTF (stderr, @var{format}, @var{args})} where
-@var{format} and @var{args} are the usual @code{printf} format and
+@var{format} and @var{args} are the usual @code{printf} format and variadic
arguments. If you define @code{YYDEBUG} to a nonzero value but do not
define @code{YYFPRINTF}, @code{<stdio.h>} is automatically included
-and @code{YYPRINTF} is defined to @code{fprintf}.
+and @code{YYFPRINTF} is defined to @code{fprintf}.
Once you have compiled the program with trace facilities, the way to
request a trace is to store a nonzero value in the variable @code{yydebug}.
@item --print-localedir
Print the name of the directory containing locale-dependent data.
-@need 1750
+@item --print-datadir
+Print the name of the directory containing skeletons and XSLT.
+
@item -y
@itemx --yacc
-Equivalent to @samp{-o y.tab.c}; the parser output file is called
+Act more like the traditional Yacc command. This can cause
+different diagnostics to be generated, and may change behavior in
+other minor ways. Most importantly, imitate Yacc's output
+file name conventions, so that the parser output file is called
@file{y.tab.c}, and the other outputs are called @file{y.output} and
-@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
-file name conventions. Thus, the following shell script can substitute
-for Yacc, and the Bison distribution contains such a script for
-compatibility with @acronym{POSIX}:
+@file{y.tab.h}.
+Also, if generating an @acronym{LALR}(1) parser in C, generate @code{#define}
+statements in addition to an @code{enum} to associate token numbers with token
+names.
+Thus, the following shell script can substitute for Yacc, and the Bison
+distribution contains such a script for compatibility with @acronym{POSIX}:
@example
#! /bin/sh
bison -y "$@@"
@end example
+
+The @option{-y}/@option{--yacc} option is intended for use with
+traditional Yacc grammars. If your grammar uses a Bison extension
+like @samp{%glr-parser}, Bison might not be Yacc-compatible even if
+this option is specified.
+
@end table
@noindent
Tuning the parser:
@table @option
-@item -S @var{file}
-@itemx --skeleton=@var{file}
-Specify the skeleton to use. You probably don't need this option unless
-you are developing Bison.
-
@item -t
@itemx --debug
In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
already defined, so that the debugging facilities are compiled.
@xref{Tracing, ,Tracing Your Parser}.
+@item -L @var{language}
+@itemx --language=@var{language}
+Specify the programming language for the generated parser, as if
+@code{%language} was specified (@pxref{Decl Summary, , Bison Declaration
+Summary}). Currently supported languages include C and C++.
+@var{language} is case-insensitive.
+
@item --locations
Pretend that @code{%locations} was specified. @xref{Decl Summary}.
@item -p @var{prefix}
@itemx --name-prefix=@var{prefix}
-Pretend that @code{%name-prefix="@var{prefix}"} was specified.
+Pretend that @code{%name-prefix "@var{prefix}"} was specified.
@xref{Decl Summary}.
@item -l
grammar file. This option causes them to associate errors with the
parser file, treating it as an independent source file in its own right.
-@item -n
-@itemx --no-parser
-Pretend that @code{%no-parser} was specified. @xref{Decl Summary}.
+@item -S @var{file}
+@itemx --skeleton=@var{file}
+Specify the skeleton to use, similar to @code{%skeleton}
+(@pxref{Decl Summary, , Bison Declaration Summary}).
+
+You probably don't need this option unless you are developing Bison.
+You should use @option{--language} if you want to specify the skeleton for a
+different language, because it is clearer and because it will always
+choose the correct skeleton for non-deterministic or push parsers.
+
+If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
+file in the Bison installation directory.
+If it does, @var{file} is an absolute file name or a file name relative to the
+current working directory.
+This is similar to how most shells resolve commands.
@item -k
@itemx --token-table
@item -b @var{file-prefix}
@itemx --file-prefix=@var{prefix}
-Pretend that @code{%verbose} was specified, i.e, specify prefix to use
+Pretend that @code{%file-prefix} was specified, i.e., specify prefix to use
for all Bison output file names. @xref{Decl Summary}.
@item -r @var{things}
Description of the grammar, conflicts (resolved and unresolved), and
@acronym{LALR} automaton.
-@item look-ahead
+@item lookahead
Implies @code{state} and augments the description of the automaton with
-each rule's look-ahead set.
+each rule's lookahead set.
@item itemset
Implies @code{state} and augments the description of the automaton with
the full set of items for each state, instead of its core only.
@end table
-For instance, on the following grammar
+@item --report-file=@var{file}
+Specify the @var{file} for the verbose description.
@item -v
@itemx --verbose
-Pretend that @code{%verbose} was specified, i.e, write an extra output
+Pretend that @code{%verbose} was specified, i.e., write an extra output
file containing verbose descriptions of the grammar and
parser. @xref{Decl Summary}.
described under the @samp{-v} and @samp{-d} options.
@item -g
-Output a @acronym{VCG} definition of the @acronym{LALR}(1) grammar
-automaton computed by Bison. If the grammar file is @file{foo.y}, the
-@acronym{VCG} output file will
-be @file{foo.vcg}.
+Output a graphical representation of the @acronym{LALR}(1) grammar
+automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz}
+@uref{http://www.graphviz.org/doc/info/lang.html, @acronym{DOT}} format.
+If the grammar file is @file{foo.y}, the output file will
+be @file{foo.dot}.
@item --graph=@var{graph-file}
The behavior of @var{--graph} is the same than @samp{-g}. The only
@node Option Cross Key
@section Option Cross Key
+@c FIXME: How about putting the directives too?
Here is a list of options, alphabetized by long option, to help you find
the corresponding short option.
-@tex
-\def\leaderfill{\leaders\hbox to 1em{\hss.\hss}\hfill}
-
-{\tt
-\line{ --debug \leaderfill -t}
-\line{ --defines \leaderfill -d}
-\line{ --file-prefix \leaderfill -b}
-\line{ --graph \leaderfill -g}
-\line{ --help \leaderfill -h}
-\line{ --name-prefix \leaderfill -p}
-\line{ --no-lines \leaderfill -l}
-\line{ --no-parser \leaderfill -n}
-\line{ --output \leaderfill -o}
-\line{ --print-localedir}
-\line{ --token-table \leaderfill -k}
-\line{ --verbose \leaderfill -v}
-\line{ --version \leaderfill -V}
-\line{ --yacc \leaderfill -y}
-}
-@end tex
-
-@ifinfo
-@example
---debug -t
---defines=@var{defines-file} -d
---file-prefix=@var{prefix} -b @var{file-prefix}
---graph=@var{graph-file} -d
---help -h
---name-prefix=@var{prefix} -p @var{name-prefix}
---no-lines -l
---no-parser -n
---output=@var{outfile} -o @var{outfile}
---print-localedir
---token-table -k
---verbose -v
---version -V
---yacc -y
-@end example
-@end ifinfo
+@multitable {@option{--defines=@var{defines-file}}} {@option{-b @var{file-prefix}XXX}}
+@headitem Long Option @tab Short Option
+@include cross-options.texi
+@end multitable
@node Yacc Library
@section Yacc Library
@c ================================================= C++ Bison
-@node C++ Language Interface
-@chapter C++ Language Interface
+@node Other Languages
+@chapter Parsers Written In Other Languages
@menu
* C++ Parsers:: The interface to generate C++ parser classes
-* A Complete C++ Example:: Demonstrating their use
+* Java Parsers:: The interface to generate Java parser classes
@end menu
@node C++ Parsers
* C++ Location Values:: The position and location classes
* C++ Parser Interface:: Instantiating and running the parser
* C++ Scanner Interface:: Exchanges between yylex and parse
+* A Complete C++ Example:: Demonstrating their use
@end menu
@node C++ Bison Interface
@subsection C++ Bison Interface
-@c - %skeleton "lalr1.cc"
+@c - %language "C++"
@c - Always pure
@c - initial action
-The C++ parser @acronym{LALR}(1) skeleton is named @file{lalr1.cc}. To select
-it, you may either pass the option @option{--skeleton=lalr1.cc} to
-Bison, or include the directive @samp{%skeleton "lalr1.cc"} in the
-grammar preamble. When run, @command{bison} will create several
-files:
+The C++ @acronym{LALR}(1) parser is selected using the language directive,
+@samp{%language "C++"}, or the synonymous command-line option
+@option{--language=c++}.
+@xref{Decl Summary}.
+
+When run, @command{bison} will create several entities in the @samp{yy}
+namespace.
+@findex %define namespace
+Use the @samp{%define namespace} directive to change the namespace name, see
+@ref{Decl Summary}.
+The various classes are generated in the following files:
+
@table @file
@item position.hh
@itemx location.hh
@item @var{file}.hh
@itemx @var{file}.cc
-The declaration and implementation of the C++ parser class.
-@var{file} is the name of the output file. It follows the same
-rules as with regular C parsers.
+(Assuming the extension of the input file was @samp{.yy}.) The
+declaration and implementation of the C++ parser class. The basename
+and extension of these two files follow the same rules as with regular C
+parsers (@pxref{Invocation}).
-Note that @file{@var{file}.hh} is @emph{mandatory}, the C++ cannot
-work without the parser class declaration. Therefore, you must either
-pass @option{-d}/@option{--defines} to @command{bison}, or use the
+The header is @emph{mandatory}; you must either pass
+@option{-d}/@option{--defines} to @command{bison}, or use the
@samp{%defines} directive.
@end table
@node C++ Semantic Values
@subsection C++ Semantic Values
@c - No objects in unions
-@c - YSTYPE
+@c - YYSTYPE
@c - Printer and destructor
The @code{%union} directive works as for C, see @ref{Union Decl, ,The
@c - %locations
@c - class Position
@c - class Location
-@c - %define "filename_type" "const symbol::Symbol"
+@c - %define filename_type "const symbol::Symbol"
When the directive @code{%locations} is used, the C++ parser supports
location tracking, see @ref{Locations, , Locations Overview}. Two
The name of the file. It will always be handled as a pointer, the
parser will never duplicate nor deallocate it. As an experimental
feature you may change it to @samp{@var{type}*} using @samp{%define
-"filename_type" "@var{type}"}.
+filename_type "@var{type}"}.
@end deftypemethod
@deftypemethod {position} {unsigned int} line
The output files @file{@var{output}.hh} and @file{@var{output}.cc}
declare and define the parser class in the namespace @code{yy}. The
class name defaults to @code{parser}, but may be changed using
-@samp{%define "parser_class_name" "@var{name}"}. The interface of
+@samp{%define parser_class_name "@var{name}"}. The interface of
this class is detailed below. It can be extended using the
@code{%parse-param} feature: its semantics is slightly changed since
it describes an additional member of the parser class, and an
The parser invokes the scanner by calling @code{yylex}. Contrary to C
parsers, C++ parsers are always pure: there is no point in using the
-@code{%pure-parser} directive. Therefore the interface is as follows.
+@code{%define api.pure} directive. Therefore the interface is as follows.
@deftypemethod {parser} {int} yylex (semantic_value_type& @var{yylval}, location_type& @var{yylloc}, @var{type1} @var{arg1}, ...)
Return the next token. Its type is the return value, its semantic
@node A Complete C++ Example
-@section A Complete C++ Example
+@subsection A Complete C++ Example
This section demonstrates the use of a C++ parser with a simple but
complete example. This example should be available on your system,
@end menu
@node Calc++ --- C++ Calculator
-@subsection Calc++ --- C++ Calculator
+@subsubsection Calc++ --- C++ Calculator
Of course the grammar is dedicated to arithmetics, a single
expression, possibly preceded by variable assignments. An
@end example
@node Calc++ Parsing Driver
-@subsection Calc++ Parsing Driver
+@subsubsection Calc++ Parsing Driver
@c - An env
@c - A place to store error messages
@c - A place for the result
@comment file: calc++-driver.hh
@example
-// Announce to Flex the prototype we want for lexing function, ...
-# define YY_DECL \
- int yylex (yy::calcxx_parser::semantic_type* yylval, \
- yy::calcxx_parser::location_type* yylloc, \
- calcxx_driver& driver)
+// Tell Flex the lexer's prototype ...
+# define YY_DECL \
+ yy::calcxx_parser::token_type \
+ yylex (yy::calcxx_parser::semantic_type* yylval, \
+ yy::calcxx_parser::location_type* yylloc, \
+ calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;
@end example
@noindent
To encapsulate the coordination with the Flex scanner, it is useful to
have two members function to open and close the scanning phase.
-members.
@comment file: calc++-driver.hh
@example
@comment file: calc++-driver.hh
@example
- // Handling the parser.
- void parse (const std::string& f);
+ // Run the parser. Return 0 on success.
+ int parse (const std::string& f);
std::string file;
bool trace_parsing;
@end example
@{
@}
-void
+int
calcxx_driver::parse (const std::string &f)
@{
file = f;
scan_begin ();
yy::calcxx_parser parser (*this);
parser.set_debug_level (trace_parsing);
- parser.parse ();
+ int res = parser.parse ();
scan_end ();
+ return res;
@}
void
@end example
@node Calc++ Parser
-@subsection Calc++ Parser
+@subsubsection Calc++ Parser
The parser definition file @file{calc++-parser.yy} starts by asking for
the C++ LALR(1) skeleton, the creation of the parser header file, and
@comment file: calc++-parser.yy
@example
-%skeleton "lalr1.cc" /* -*- C++ -*- */
-%require "2.1a"
+%language "C++" /* -*- C++ -*- */
+%require "@value{VERSION}"
%defines
-%define "parser_class_name" "calcxx_parser"
+%define parser_class_name "calcxx_parser"
@end example
@noindent
+@findex %code requires
Then come the declarations/inclusions needed to define the
@code{%union}. Because the parser uses the parsing driver and
reciprocally, both cannot include the header of the other. Because the
driver's header needs detailed knowledge about the parser class (in
particular its inner types), it is the parser's header which will simply
use a forward declaration of the driver.
+@xref{Decl Summary, ,%code}.
@comment file: calc++-parser.yy
@example
-%@{
+%code requires @{
# include <string>
class calcxx_driver;
-%@}
+@}
@end example
@noindent
@end example
@noindent
-The code between @samp{%@{} and @samp{%@}} after the introduction of the
-@samp{%union} is output in the @file{*.cc} file; it needs detailed
-knowledge about the driver.
+@findex %code
+The code between @samp{%code @{} and @samp{@}} is output in the
+@file{*.cc} file; it needs detailed knowledge about the driver.
@comment file: calc++-parser.yy
@example
-%@{
+%code @{
# include "calc++-driver.hh"
-%@}
+@}
@end example
%token ASSIGN ":="
%token <sval> IDENTIFIER "identifier"
%token <ival> NUMBER "number"
-%type <ival> exp "expression"
+%type <ival> exp
@end example
@noindent
To enable memory deallocation during error recovery, use
@code{%destructor}.
+@c FIXME: Document %printer, and mention that it takes a braced-code operand.
@comment file: calc++-parser.yy
@example
%printer @{ debug_stream () << *$$; @} "identifier"
%destructor @{ delete $$; @} "identifier"
-%printer @{ debug_stream () << $$; @} "number" "expression"
+%printer @{ debug_stream () << $$; @} <ival>
@end example
@noindent
assignments: assignments assignment @{@}
| /* Nothing. */ @{@};
-assignment: "identifier" ":=" exp @{ driver.variables[*$1] = $3; @};
+assignment:
+ "identifier" ":=" exp
+ @{ driver.variables[*$1] = $3; delete $1; @};
%left '+' '-';
%left '*' '/';
| exp '-' exp @{ $$ = $1 - $3; @}
| exp '*' exp @{ $$ = $1 * $3; @}
| exp '/' exp @{ $$ = $1 / $3; @}
- | "identifier" @{ $$ = driver.variables[*$1]; @}
+ | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @}
| "number" @{ $$ = $1; @};
%%
@end example
@end example
@node Calc++ Scanner
-@subsection Calc++ Scanner
+@subsubsection Calc++ Scanner
The Flex scanner first includes the driver declaration, then the
parser's to get the set of defined tokens.
# include <string>
# include "calc++-driver.hh"
# include "calc++-parser.hh"
+
+/* Work around an incompatibility in flex (at least versions
+ 2.5.31 through 2.5.33): it generates code that does
+ not conform to C89. See Debian bug 333231
+ <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */
+# undef yywrap
+# define yywrap() 1
+
+/* By default yylex returns int, we use token_type.
+ Unfortunately yyterminate by default returns 0, which is
+ not of token_type. */
+#define yyterminate() return token::END
%@}
@end example
%@{
typedef yy::calcxx_parser::token token;
%@}
-
-[-+*/] return yytext[0];
+ /* Convert ints to the actual type of tokens. */
+[-+*/] return yy::calcxx_parser::token_type (yytext[0]);
":=" return token::ASSIGN;
@{int@} @{
errno = 0;
calcxx_driver::scan_begin ()
@{
yy_flex_debug = trace_scanning;
- if (!(yyin = fopen (file.c_str (), "r")))
- error (std::string ("cannot open ") + file);
+ if (file == "-")
+ yyin = stdin;
+ else if (!(yyin = fopen (file.c_str (), "r")))
+ @{
+ error (std::string ("cannot open ") + file);
+ exit (1);
+ @}
@}
void
@end example
@node Calc++ Top Level
-@subsection Calc++ Top Level
+@subsubsection Calc++ Top Level
The top level file, @file{calc++.cc}, poses no problem.
driver.trace_parsing = true;
else if (*argv == std::string ("-s"))
driver.trace_scanning = true;
- else
- @{
- driver.parse (*argv);
- std::cout << driver.result << std::endl;
- @}
+ else if (!driver.parse (*argv))
+ std::cout << driver.result << std::endl;
@}
@end example
+@node Java Parsers
+@section Java Parsers
+
+@menu
+* Java Bison Interface:: Asking for Java parser generation
+* Java Semantic Values:: %type and %token vs. Java
+* Java Location Values:: The position and location classes
+* Java Parser Interface:: Instantiating and running the parser
+* Java Scanner Interface:: Java scanners, and pure parsers
+* Java Differences:: Differences between C/C++ and Java Grammars
+@end menu
+
+@node Java Bison Interface
+@subsection Java Bison Interface
+@c - %language "Java"
+@c - initial action
+
+The Java parser skeletons are selected using a language directive,
+@samp{%language "Java"}, or the synonymous command-line option
+@option{--language=java}.
+
+When run, @command{bison} will create several entities whose name
+starts with @samp{YY}. Use the @samp{%name-prefix} directive to
+change the prefix, see @ref{Decl Summary}; classes can be placed
+in an arbitrary Java package using a @samp{%define package} section.
+
+The parser class defines an inner class, @code{Location}, that is used
+for location tracking. If the parser is pure, it also defines an
+inner interface, @code{Lexer}; see~@ref{Java Scanner Interface} for the
+meaning of pure parsers when the Java language is chosen. Other than
+these inner class/interface, and the members described in~@ref{Java
+Parser Interface}, all the other members and fields are preceded
+with a @code{yy} prefix to avoid clashes with user code.
+
+No header file can be generated for Java parsers; you must not pass
+@option{-d}/@option{--defines} to @command{bison}, nor use the
+@samp{%defines} directive.
+
+By default, the @samp{YYParser} class has package visibility. A
+declaration @samp{%define "public"} will change to public visibility.
+Remember that, according to the Java language specification, the name
+of the @file{.java} file should match the name of the class in this
+case.
+
+Similarly, a declaration @samp{%define "abstract"} will make your
+class abstract.
+
+You can create documentation for generated parsers using Javadoc.
+
+@node Java Semantic Values
+@subsection Java Semantic Values
+@c - No %union, specify type in %type/%token.
+@c - YYSTYPE
+@c - Printer and destructor
+
+There is no @code{%union} directive in Java parsers. Instead, the
+semantic values' types (class names) should be specified in the
+@code{%type} or @code{%token} directive:
+
+@example
+%type <Expression> expr assignment_expr term factor
+%type <Integer> number
+@end example
+
+By default, the semantic stack is declared to have @code{Object} members,
+which means that the class types you specify can be of any class.
+To improve the type safety of the parser, you can declare the common
+superclass of all the semantic values using the @samp{%define} directive.
+For example, after the following declaration:
+
+@example
+%define "stype" "ASTNode"
+@end example
+
+@noindent
+any @code{%type} or @code{%token} specifying a semantic type which
+is not a subclass of ASTNode, will cause a compile-time error.
+
+Types used in the directives may be qualified with a package name.
+Primitive data types are accepted for Java version 1.5 or later. Note
+that in this case the autoboxing feature of Java 1.5 will be used.
+
+Java parsers do not support @code{%destructor}, since the language
+adopts garbage collection. The parser will try to hold references
+to semantic values for as little time as needed.
+
+Java parsers do not support @code{%printer}, as @code{toString()}
+can be used to print the semantic values. This however may change
+(in a backwards-compatible way) in future versions of Bison.
+
+
+@node Java Location Values
+@subsection Java Location Values
+@c - %locations
+@c - class Position
+@c - class Location
+
+When the directive @code{%locations} is used, the Java parser
+supports location tracking, see @ref{Locations, , Locations Overview}.
+An auxiliary user-defined class defines a @dfn{position}, a single point
+in a file; Bison itself defines a class representing a @dfn{location},
+a range composed of a pair of positions (possibly spanning several
+files). The location class is an inner class of the parser; the name
+is @code{Location} by default, may also be renamed using @code{%define
+"location_type" "@var{class-name}}.
+
+The location class treats the position as a completely opaque value.
+By default, the class name is @code{Position}, but this can be changed
+with @code{%define "position_type" "@var{class-name}"}.
+
+
+@deftypemethod {Location} {Position} begin
+@deftypemethodx {Location} {Position} end
+The first, inclusive, position of the range, and the first beyond.
+@end deftypemethod
+
+@deftypemethod {Location} {void} toString ()
+Prints the range represented by the location. For this to work
+properly, the position class should override the @code{equals} and
+@code{toString} methods appropriately.
+@end deftypemethod
+
+
+@node Java Parser Interface
+@subsection Java Parser Interface
+@c - define parser_class_name
+@c - Ctor
+@c - parse, error, set_debug_level, debug_level, set_debug_stream,
+@c debug_stream.
+@c - Reporting errors
+
+The output file defines the parser class in the package optionally
+indicated in the @code{%define package} section. The class name defaults
+to @code{YYParser}. The @code{YY} prefix may be changed using
+@samp{%name-prefix}; alternatively, you can use @samp{%define
+"parser_class_name" "@var{name}"} to give a custom name to the class.
+The interface of this class is detailed below. It can be extended using
+the @code{%parse-param} directive; each occurrence of the directive will
+add a field to the parser class, and an argument to its constructor.
+
+@deftypemethod {YYParser} {} YYParser (@var{type1} @var{arg1}, ...)
+Build a new parser object. There are no arguments by default, unless
+@samp{%parse-param @{@var{type1} @var{arg1}@}} was used.
+@end deftypemethod
+
+@deftypemethod {YYParser} {boolean} parse ()
+Run the syntactic analysis, and return @code{true} on success,
+@code{false} otherwise.
+@end deftypemethod
+
+@deftypemethod {YYParser} {boolean} recovering ()
+During the syntactic analysis, return @code{true} if recovering
+from a syntax error. @xref{Error Recovery}.
+@end deftypemethod
+
+@deftypemethod {YYParser} {java.io.PrintStream} getDebugStream ()
+@deftypemethodx {YYParser} {void} setDebugStream (java.io.printStream @var{o})
+Get or set the stream used for tracing the parsing. It defaults to
+@code{System.err}.
+@end deftypemethod
+
+@deftypemethod {YYParser} {int} getDebugLevel ()
+@deftypemethodx {YYParser} {void} setDebugLevel (int @var{l})
+Get or set the tracing level. Currently its value is either 0, no trace,
+or nonzero, full tracing.
+@end deftypemethod
+
+@deftypemethod {YYParser} {void} error (Location @var{l}, String @var{m})
+The definition for this member function must be supplied by the user
+in the same way as the scanner interface (@pxref{Java Scanner
+Interface}); the parser uses it to report a parser error occurring at
+@var{l}, described by @var{m}.
+@end deftypemethod
+
+
+@node Java Scanner Interface
+@subsection Java Scanner Interface
+@c - %code lexer
+@c - %lex-param
+@c - Lexer interface
+
+Contrary to C parsers, Java parsers do not use global variables; the
+state of the parser is always local to an instance of the parser class.
+Therefore, all Java parsers are ``pure'', and the @code{%pure-parser}
+directive does not do anything when used in Java.
+
+The scanner always resides in a separate class than the parser.
+Still, Java also two possible ways to interface a Bison-generated Java
+parser with a scanner, that is, the scanner may reside in a separate file
+than the Bison grammar, or in the same file. The interface
+to the scanner is similar in the two cases.
+
+In the first case, where the scanner in the same file as the grammar, the
+scanner code has to be placed in @code{%code lexer} blocks. If you want
+to pass parameters from the parser constructor to the scanner constructor,
+specify them with @code{%lex-param}; they are passed before
+@code{%parse-param}s to the constructor.
+
+In the second case, the scanner has to implement interface @code{Lexer},
+which is defined within the parser class (e.g., @code{YYParser.Lexer}).
+The constructor of the parser object will then accept an object
+implementing the interface; @code{%lex-param} is not used in this
+case.
+
+In both cases, the scanner has to implement the following methods.
+
+@deftypemethod {Lexer} {void} yyerror (Location @var{l}, String @var{m})
+As explained in @pxref{Java Parser Interface}, this method is defined
+by the user to emit an error message. The first parameter is omitted
+if location tracking is not active. Its type can be changed using
+@samp{%define "location_type" "@var{class-name}".}
+@end deftypemethod
+
+@deftypemethod {Lexer} {int} yylex (@var{type1} @var{arg1}, ...)
+Return the next token. Its type is the return value, its semantic
+value and location are saved and returned by the ther methods in the
+interface. Invocations of @samp{%lex-param @{@var{type1}
+@var{arg1}@}} yield additional arguments.
+@end deftypemethod
+
+@deftypemethod {Lexer} {Position} getStartPos ()
+@deftypemethodx {Lexer} {Position} getEndPos ()
+Return respectively the first position of the last token that
+@code{yylex} returned, and the first position beyond it. These
+methods are not needed unless location tracking is active.
+
+The return type can be changed using @samp{%define "position_type"
+"@var{class-name}".}
+@end deftypemethod
+
+@deftypemethod {Lexer} {Object} getLVal ()
+Return respectively the first position of the last token that yylex
+returned, and the first position beyond it.
+
+The return type can be changed using @samp{%define "stype"
+"@var{class-name}".}
+@end deftypemethod
+
+
+The lexer interface resides in the same class (@code{YYParser}) as the
+Bison-generated parser.
+The fields and methods that are provided to this end are as follows.
+
+@deftypemethod {YYParser} {void} error (Location @var{l}, String @var{m})
+As explained in @pxref{Java Parser Interface}, this method is defined
+by the user to emit an error message. The first parameter is not used
+unless location tracking is active. Its type can be changed using
+@samp{%define "location_type" "@var{class-name}".}
+@end deftypemethod
+
+@deftypemethod {YYParser} {int} yylex (@var{type1} @var{arg1}, ...)
+Return the next token. Its type is the return value, its semantic
+value and location are saved into @code{yylval}, @code{yystartpos},
+@code{yyendpos}. Invocations of @samp{%lex-param @{@var{type1}
+@var{arg1}@}} yield additional arguments.
+@end deftypemethod
+
+@deftypecv {Field} {YYParser} Position yystartpos
+@deftypecvx {Field} {YYParser} Position yyendpos
+Contain respectively the first position of the last token that yylex
+returned, and the first position beyond it. These methods are not
+needed unless location tracking is active.
+
+The field's type can be changed using @samp{%define "position_type"
+"@var{class-name}".}
+@end deftypecv
+
+@deftypecv {Field} {YYParser} Object yylval
+Return respectively the first position of the last token that yylex
+returned, and the first position beyond it.
+
+The field's type can be changed using @samp{%define "stype"
+"@var{class-name}".}
+@end deftypecv
+
+@node Java Differences
+@subsection Differences between C/C++ and Java Grammars
+
+The different structure of the Java language forces several differences
+between C/C++ grammars, and grammars designed for Java parsers. This
+section summarizes these differences.
+
+@itemize
+@item
+Java lacks a preprocessor, so the @code{YYERROR}, @code{YYACCEPT},
+@code{YYABORT} symbols (@pxref{Table of Symbols}) cannot obviously be
+macros. Instead, they should be preceded by @code{return} when they
+appear in an action. The actual definition of these symbols is
+opaque to the Bison grammar, and it might change in the future. The
+only meaningful operation that you can do, is to return them.
+
+Note that of these three symbols, only @code{YYACCEPT} and
+@code{YYABORT} will cause a return from the @code{yyparse}
+method@footnote{Java parsers include the actions in a separate
+method than @code{yyparse} in order to have an intuitive syntax that
+corresponds to these C macros.}.
+
+@item
+The prolog declarations have a different meaning than in C/C++ code.
+@table @asis
+@item @code{%code imports}
+blocks are placed at the beginning of the Java source code. They may
+include copyright notices. For a @code{package} declarations, it is
+suggested to use @code{%define package} instead.
+
+@item unqualified @code{%code}
+blocks are placed inside the parser class.
+
+@item @code{%code lexer}
+blocks, if specified, should include the implementation of the
+scanner. If there is no such block, the scanner can be any class
+that implements the appropriate interface (see @pxref{Java Scanner
+Interface}).
+@end table
+
+Other @code{%code} blocks are not supported in Java parsers.
+The epilogue has the same meaning as in C/C++ code and it can
+be used to define other classes used by the parser.
+@end itemize
+
@c ================================================= FAQ
@node FAQ
are addressed.
@menu
-* Memory Exhausted:: Breaking the Stack Limits
-* How Can I Reset the Parser:: @code{yyparse} Keeps some State
-* Strings are Destroyed:: @code{yylval} Loses Track of Strings
-* Implementing Gotos/Loops:: Control Flow in the Calculator
+* Memory Exhausted:: Breaking the Stack Limits
+* How Can I Reset the Parser:: @code{yyparse} Keeps some State
+* Strings are Destroyed:: @code{yylval} Loses Track of Strings
+* Implementing Gotos/Loops:: Control Flow in the Calculator
+* Multiple start-symbols:: Factoring closely related grammars
+* Secure? Conform?:: Is Bison @acronym{POSIX} safe?
+* I can't build Bison:: Troubleshooting
+* Where can I find help?:: Troubleshouting
+* Bug Reports:: Troublereporting
+* More Languages:: Parsers in C++, Java, and so on
+* Beta Testing:: Experimenting development versions
+* Mailing Lists:: Meeting other Bison users
@end menu
@node Memory Exhausted
@display
My parser includes support for an @samp{#include}-like feature, in
which case I run @code{yyparse} from @code{yyparse}. This fails
-although I did specify I needed a @code{%pure-parser}.
+although I did specify @code{%define api.pure}.
@end display
These problems typically come not from Bison itself, but from
This error is probably the single most frequent ``bug report'' sent to
Bison lists, but is only concerned with a misunderstanding of the role
-of scanner. Consider the following Lex code:
+of the scanner. Consider the following Lex code:
@verbatim
%{
invited to consult the dedicated literature.
+@node Multiple start-symbols
+@section Multiple start-symbols
+
+@display
+I have several closely related grammars, and I would like to share their
+implementations. In fact, I could use a single grammar but with
+multiple entry points.
+@end display
+
+Bison does not support multiple start-symbols, but there is a very
+simple means to simulate them. If @code{foo} and @code{bar} are the two
+pseudo start-symbols, then introduce two new tokens, say
+@code{START_FOO} and @code{START_BAR}, and use them as switches from the
+real start-symbol:
+
+@example
+%token START_FOO START_BAR;
+%start start;
+start: START_FOO foo
+ | START_BAR bar;
+@end example
+
+These tokens prevents the introduction of new conflicts. As far as the
+parser goes, that is all that is needed.
+
+Now the difficult part is ensuring that the scanner will send these
+tokens first. If your scanner is hand-written, that should be
+straightforward. If your scanner is generated by Lex, them there is
+simple means to do it: recall that anything between @samp{%@{ ... %@}}
+after the first @code{%%} is copied verbatim in the top of the generated
+@code{yylex} function. Make sure a variable @code{start_token} is
+available in the scanner (e.g., a global variable or using
+@code{%lex-param} etc.), and use the following:
+
+@example
+ /* @r{Prologue.} */
+%%
+%@{
+ if (start_token)
+ @{
+ int t = start_token;
+ start_token = 0;
+ return t;
+ @}
+%@}
+ /* @r{The rules.} */
+@end example
+
+
+@node Secure? Conform?
+@section Secure? Conform?
+
+@display
+Is Bison secure? Does it conform to POSIX?
+@end display
+
+If you're looking for a guarantee or certification, we don't provide it.
+However, Bison is intended to be a reliable program that conforms to the
+@acronym{POSIX} specification for Yacc. If you run into problems,
+please send us a bug report.
+
+@node I can't build Bison
+@section I can't build Bison
+
+@display
+I can't build Bison because @command{make} complains that
+@code{msgfmt} is not found.
+What should I do?
+@end display
+
+Like most GNU packages with internationalization support, that feature
+is turned on by default. If you have problems building in the @file{po}
+subdirectory, it indicates that your system's internationalization
+support is lacking. You can re-configure Bison with
+@option{--disable-nls} to turn off this support, or you can install GNU
+gettext from @url{ftp://ftp.gnu.org/gnu/gettext/} and re-configure
+Bison. See the file @file{ABOUT-NLS} for more information.
+
+
+@node Where can I find help?
+@section Where can I find help?
+
+@display
+I'm having trouble using Bison. Where can I find help?
+@end display
+
+First, read this fine manual. Beyond that, you can send mail to
+@email{help-bison@@gnu.org}. This mailing list is intended to be
+populated with people who are willing to answer questions about using
+and installing Bison. Please keep in mind that (most of) the people on
+the list have aspects of their lives which are not related to Bison (!),
+so you may not receive an answer to your question right away. This can
+be frustrating, but please try not to honk them off; remember that any
+help they provide is purely voluntary and out of the kindness of their
+hearts.
+
+@node Bug Reports
+@section Bug Reports
+
+@display
+I found a bug. What should I include in the bug report?
+@end display
+
+Before you send a bug report, make sure you are using the latest
+version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its
+mirrors. Be sure to include the version number in your bug report. If
+the bug is present in the latest version but not in a previous version,
+try to determine the most recent version which did not contain the bug.
+
+If the bug is parser-related, you should include the smallest grammar
+you can which demonstrates the bug. The grammar file should also be
+complete (i.e., I should be able to run it through Bison without having
+to edit or add anything). The smaller and simpler the grammar, the
+easier it will be to fix the bug.
+
+Include information about your compilation environment, including your
+operating system's name and version and your compiler's name and
+version. If you have trouble compiling, you should also include a
+transcript of the build session, starting with the invocation of
+`configure'. Depending on the nature of the bug, you may be asked to
+send additional files as well (such as `config.h' or `config.cache').
+
+Patches are most welcome, but not required. That is, do not hesitate to
+send a bug report just because you can not provide a fix.
+
+Send bug reports to @email{bug-bison@@gnu.org}.
+
+@node More Languages
+@section More Languages
+
+@display
+Will Bison ever have C++ and Java support? How about @var{insert your
+favorite language here}?
+@end display
+
+C++ and Java support is there now, and is documented. We'd love to add other
+languages; contributions are welcome.
+
+@node Beta Testing
+@section Beta Testing
+
+@display
+What is involved in being a beta tester?
+@end display
+
+It's not terribly involved. Basically, you would download a test
+release, compile it, and use it to build and run a parser or two. After
+that, you would submit either a bug report or a message saying that
+everything is okay. It is important to report successes as well as
+failures because test releases eventually become mainstream releases,
+but only if they are adequately tested. If no one tests, development is
+essentially halted.
+
+Beta testers are particularly needed for operating systems to which the
+developers do not have easy access. They currently have easy access to
+recent GNU/Linux and Solaris versions. Reports about other operating
+systems are especially welcome.
+
+@node Mailing Lists
+@section Mailing Lists
+
+@display
+How do I join the help-bison and bug-bison mailing lists?
+@end display
+
+See @url{http://lists.gnu.org/}.
@c ================================================= Table of Symbols
@xref{Rules, ,Syntax of Grammar Rules}.
@end deffn
+@deffn {Directive} <*>
+Used to define a default tagged @code{%destructor} or default tagged
+@code{%printer}.
+
+This feature is experimental.
+More user feedback will help to determine whether it should become a permanent
+feature.
+
+@xref{Destructor Decl, , Freeing Discarded Symbols}.
+@end deffn
+
+@deffn {Directive} <>
+Used to define a default tagless @code{%destructor} or default tagless
+@code{%printer}.
+
+This feature is experimental.
+More user feedback will help to determine whether it should become a permanent
+feature.
+
+@xref{Destructor Decl, , Freeing Discarded Symbols}.
+@end deffn
+
@deffn {Symbol} $accept
The predefined nonterminal whose only rule is @samp{$accept: @var{start}
$end}, where @var{start} is the start symbol. @xref{Start Decl, , The
Start-Symbol}. It cannot be used in the grammar.
@end deffn
+@deffn {Directive} %code @{@var{code}@}
+@deffnx {Directive} %code @var{qualifier} @{@var{code}@}
+Insert @var{code} verbatim into output parser source.
+@xref{Decl Summary,,%code}.
+@end deffn
+
+@deffn {Directive} %debug
+Equip the parser for debugging. @xref{Decl Summary}.
+@end deffn
+
@deffn {Directive} %debug
Equip the parser for debugging. @xref{Decl Summary}.
@end deffn
@end deffn
@end ifset
+@deffn {Directive} %define @var{define-variable}
+@deffnx {Directive} %define @var{define-variable} @var{value}
+Define a variable to adjust Bison's behavior.
+@xref{Decl Summary,,%define}.
+@end deffn
+
@deffn {Directive} %defines
Bison declaration to create a header file meant for the scanner.
@xref{Decl Summary}.
@end deffn
+@deffn {Directive} %defines @var{defines-file}
+Same as above, but save in the file @var{defines-file}.
+@xref{Decl Summary}.
+@end deffn
+
@deffn {Directive} %destructor
Specify how the parser should reclaim the memory associated to
discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
grammar rules so as to allow the Bison parser to recognize an error in
the grammar without halting the process. In effect, a sentence
containing an error may be recognized as valid. On a syntax error, the
-token @code{error} becomes the current look-ahead token. Actions
-corresponding to @code{error} are then executed, and the look-ahead
+token @code{error} becomes the current lookahead token. Actions
+corresponding to @code{error} are then executed, and the lookahead
token is reset to the token that originally caused the violation.
@xref{Error Recovery}.
@end deffn
when @code{yyerror} is called.
@end deffn
-@deffn {Directive} %file-prefix="@var{prefix}"
+@deffn {Directive} %file-prefix "@var{prefix}"
Bison declaration to set the prefix of the output files. @xref{Decl
Summary}.
@end deffn
Run user code before parsing. @xref{Initial Action Decl, , Performing Actions before Parsing}.
@end deffn
+@deffn {Directive} %language
+Specify the programming language for the generated parser.
+@xref{Decl Summary}.
+@end deffn
+
@deffn {Directive} %left
Bison declaration to assign left associativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
@xref{GLR Parsers, ,Writing @acronym{GLR} Parsers}.
@end deffn
-@deffn {Directive} %name-prefix="@var{prefix}"
+@deffn {Directive} %name-prefix "@var{prefix}"
Bison declaration to rename the external symbols. @xref{Decl Summary}.
@end deffn
@xref{Precedence Decl, ,Operator Precedence}.
@end deffn
-@deffn {Directive} %output="@var{file}"
+@deffn {Directive} %output "@var{file}"
Bison declaration to set the name of the parser file. @xref{Decl
Summary}.
@end deffn
@end deffn
@deffn {Directive} %pure-parser
-Bison declaration to request a pure (reentrant) parser.
-@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
+Deprecated version of @code{%define api.pure} (@pxref{Decl Summary, ,%define}),
+for which Bison is more careful to warn about unreasonable usage.
@end deffn
@deffn {Directive} %require "@var{version}"
@xref{Precedence Decl, ,Operator Precedence}.
@end deffn
+@deffn {Directive} %skeleton
+Specify the skeleton to use; usually for development.
+@xref{Decl Summary}.
+@end deffn
+
@deffn {Directive} %start
Bison declaration to specify the start symbol. @xref{Start Decl, ,The
Start-Symbol}.
making @code{yyparse} return 1 immediately. The error reporting
function @code{yyerror} is not called. @xref{Parser Function, ,The
Parser Function @code{yyparse}}.
+
+For Java parsers, this functionality is invoked using @code{return YYABORT;}
+instead.
@end deffn
@deffn {Macro} YYACCEPT
Macro to pretend that a complete utterance of the language has been
read, by making @code{yyparse} return 0 immediately.
@xref{Parser Function, ,The Parser Function @code{yyparse}}.
+
+For Java parsers, this functionality is invoked using @code{return YYACCEPT;}
+instead.
@end deffn
@deffn {Macro} YYBACKUP
-Macro to discard a value from the parser stack and fake a look-ahead
+Macro to discard a value from the parser stack and fake a lookahead
token. @xref{Action Features, ,Special Features for Use in Actions}.
@end deffn
@deffn {Variable} yychar
-External integer variable that contains the integer value of the current
-look-ahead token. (In a pure parser, it is a local variable within
+External integer variable that contains the integer value of the
+lookahead token. (In a pure parser, it is a local variable within
@code{yyparse}.) Error-recovery rule actions may examine this variable.
@xref{Action Features, ,Special Features for Use in Actions}.
@end deffn
@deffn {Variable} yyclearin
Macro used in error-recovery rule actions. It clears the previous
-look-ahead token. @xref{Error Recovery}.
+lookahead token. @xref{Error Recovery}.
@end deffn
@deffn {Macro} YYDEBUG
@code{yyerror} and then perform normal error recovery if possible
(@pxref{Error Recovery}), or (if recovery is impossible) make
@code{yyparse} return 1. @xref{Error Recovery}.
+
+For Java parsers, this functionality is invoked using @code{return YYERROR;}
+instead.
@end deffn
@deffn {Function} yyerror
@deffn {Macro} YYLEX_PARAM
An obsolete macro for specifying an extra argument (or list of extra
-arguments) for @code{yyparse} to pass to @code{yylex}. he use of this
+arguments) for @code{yyparse} to pass to @code{yylex}. The use of this
macro is deprecated, and is supported only for Yacc like parsers.
@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
@end deffn
External variable in which @code{yylex} should place the line and column
numbers associated with a token. (In a pure parser, it is a local
variable within @code{yyparse}, and its address is passed to
-@code{yylex}.) You can ignore this variable if you don't use the
-@samp{@@} feature in the grammar actions. @xref{Token Locations,
-,Textual Locations of Tokens}.
+@code{yylex}.)
+You can ignore this variable if you don't use the @samp{@@} feature in the
+grammar actions.
+@xref{Token Locations, ,Textual Locations of Tokens}.
+In semantic actions, it stores the location of the lookahead token.
+@xref{Actions and Locations, ,Actions and Locations}.
@end deffn
@deffn {Type} YYLTYPE
External variable in which @code{yylex} should place the semantic
value associated with a token. (In a pure parser, it is a local
variable within @code{yyparse}, and its address is passed to
-@code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}.
+@code{yylex}.)
+@xref{Token Values, ,Semantic Values of Tokens}.
+In semantic actions, it stores the semantic value of the lookahead token.
+@xref{Actions, ,Actions}.
@end deffn
@deffn {Macro} YYMAXDEPTH
@deffn {Variable} yynerrs
Global variable which Bison increments each time it reports a syntax error.
-(In a pure parser, it is a local variable within @code{yyparse}.)
+(In a pure parser, it is a local variable within @code{yyparse}. In a
+pure push parser, it is a member of yypstate.)
@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
@end deffn
parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
@end deffn
+@deffn {Function} yypstate_delete
+The function to delete a parser instance, produced by Bison in push mode;
+call this function to delete the memory associated with a parser.
+@xref{Parser Delete Function, ,The Parser Delete Function
+@code{yypstate_delete}}.
+@end deffn
+
+@deffn {Function} yypstate_new
+The function to create a parser instance, produced by Bison in push mode;
+call this function to create a new parser.
+@xref{Parser Create Function, ,The Parser Create Function
+@code{yypstate_new}}.
+@end deffn
+
+@deffn {Function} yypull_parse
+The parser function produced by Bison in push mode; call this function to
+parse the rest of the input stream.
+@xref{Pull Parser Function, ,The Pull Parser Function
+@code{yypull_parse}}.
+@end deffn
+
+@deffn {Function} yypush_parse
+The parser function produced by Bison in push mode; call this function to
+parse a single token. @xref{Push Parser Function, ,The Push Parser Function
+@code{yypush_parse}}.
+@end deffn
+
@deffn {Macro} YYPARSE_PARAM
An obsolete macro for specifying the name of a parameter that
@code{yyparse} should accept. The use of this macro is deprecated, and
@end deffn
@deffn {Macro} YYRECOVERING
-Macro whose value indicates whether the parser is recovering from a
-syntax error. @xref{Action Features, ,Special Features for Use in Actions}.
+The expression @code{YYRECOVERING ()} yields 1 when the parser
+is recovering from a syntax error, and 0 otherwise.
+@xref{Action Features, ,Special Features for Use in Actions}.
@end deffn
@deffn {Macro} YYSTACK_USE_ALLOCA
@item Literal string token
A token which consists of two or more fixed characters. @xref{Symbols}.
-@item Look-ahead token
-A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead
+@item Lookahead token
+A token already read but not yet shifted. @xref{Lookahead, ,Lookahead
Tokens}.
@item @acronym{LALR}(1)
@item @acronym{LR}(1)
The class of context-free grammars in which at most one token of
-look-ahead is needed to disambiguate the parsing of any piece of input.
+lookahead is needed to disambiguate the parsing of any piece of input.
@item Nonterminal symbol
A grammar symbol standing for a grammatical construct that can
@node Copying This Manual
@appendix Copying This Manual
-
-@menu
-* GNU Free Documentation License:: License for copying this manual.
-@end menu
-
@include fdl.texi
@node Index
@c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG
@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit
@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok
-@c LocalWords: longjmp fprintf stderr preg yylloc YYLTYPE cos ln
+@c LocalWords: longjmp fprintf stderr yylloc YYLTYPE cos ln
@c LocalWords: smallexample symrec val tptr FNCT fnctptr func struct sym
@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof
@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum
@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype
-@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless typefull yynerrs
+@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless yynerrs
@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES
@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param
@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP
-@c LocalWords: YYEMPTY YYRECOVERING yyclearin GE def UMINUS maybeword
+@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword
@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH
-@c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm VCG notype
+@c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm notype
@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args
-@c LocalWords: YYPRINTF infile ypp yxx outfile itemx vcg tex leaderfill
+@c LocalWords: infile ypp yxx outfile itemx tex leaderfill
@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll
-@c LocalWords: yyrestart nbar yytext fst snd osplit ntwo strdup AST
+@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST
@c LocalWords: YYSTACK DVI fdl printindex