Writing @acronym{GLR} Parsers
-* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
-* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
-* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
+* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
+* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
+* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
Examples
Bison Declarations
+* Require Decl:: Requiring a Bison version.
* Token Decl:: Declaring terminal symbols.
* Precedence Decl:: Declaring terminals with precedence and associativity.
* Union Decl:: Declaring the set of all semantic value types.
@cindex @acronym{GLR} parsing
@cindex generalized @acronym{LR} (@acronym{GLR}) parsing
@cindex ambiguous grammars
-@cindex non-deterministic parsing
+@cindex nondeterministic parsing
Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning
roughly that the next grammar rule to apply at any point in the input is
(called a @dfn{look-ahead}) of the remaining input. A context-free
grammar can be @dfn{ambiguous}, meaning that there are multiple ways to
apply the grammar rules to get the same inputs. Even unambiguous
-grammars can be @dfn{non-deterministic}, meaning that no fixed
+grammars can be @dfn{nondeterministic}, meaning that no fixed
look-ahead always suffices to determine the next grammar rule to apply.
With the proper declarations, Bison is also able to parse these more
general context-free grammars, using a technique known as @acronym{GLR}
merged result.
@menu
-* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
-* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
-* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
+* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
+* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
+* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
@end menu
@node Simple GLR Parsers
This also includes numerous identifiers used for internal purposes.
Therefore, you should avoid using C identifiers starting with @samp{yy}
or @samp{YY} in the Bison grammar file except for the ones defined in
-this manual.
+this manual. Also, you should avoid using the C identifiers
+@samp{malloc} and @samp{free} for anything other than their usual
+meanings.
In some cases the Bison parser file includes system headers, and in
those cases your code should respect the identifiers reserved by those
-headers. On some non-@acronym{GNU} hosts, @code{<alloca.h>},
+headers. On some non-@acronym{GNU} hosts, @code{<alloca.h>}, @code{<malloc.h>},
@code{<stddef.h>}, and @code{<stdlib.h>} are included as needed to
declare memory allocators and related types. @code{<libintl.h>} is
included if message translation is in use
It is easy to add new operators to the infix calculator as long as they are
only single-character literals. The lexical analyzer @code{yylex} passes
-back all nonnumber characters as tokens, so new grammar rules suffice for
+back all nonnumeric characters as tokens, so new grammar rules suffice for
adding a new operator. But we want something more flexible: built-in
functions whose syntax has this form:
The function @code{yylex} must now recognize variables, numeric values, and
the single-character arithmetic operators. Strings of alphanumeric
-characters with a leading non-digit are recognized as either variables or
+characters with a leading letter are recognized as either variables or
functions depending on what the symbol table says about them.
The string is passed to @code{getsym} for look up in the symbol table. If
in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
If you want to write a grammar that is portable to any Standard C
-host, you must use only non-null character tokens taken from the basic
+host, you must use only nonnull character tokens taken from the basic
execution character set of Standard C@. This set consists of the ten
digits, the 52 lower- and upper-case English letters, and the
characters in the following C-language string:
Grammars}).
@menu
+* Require Decl:: Requiring a Bison version.
* Token Decl:: Declaring terminal symbols.
* Precedence Decl:: Declaring terminals with precedence and associativity.
* Union Decl:: Declaring the set of all semantic value types.
* Decl Summary:: Table of all Bison declarations.
@end menu
+@node Require Decl
+@subsection Require a Version of Bison
+@cindex version requirement
+@cindex requiring a version of Bison
+@findex %require
+
+You may require the minimum version of Bison to process the grammar. If
+the requirement is not met, @command{bison} exits with an error (exit
+status 63).
+
+@example
+%require "@var{version}"
+@end example
+
@node Token Decl
@subsection Token Type Names
@cindex declaring token type names
%parse-param @{ char const *file_name @};
%initial-action
@{
- @@$.begin.filename = @@$.end.filename = file_name;
+ @@$.initialize (file_name);
@};
@end example
@cindex freeing discarded symbols
@findex %destructor
-Some symbols can be discarded by the parser. During error
-recovery (@pxref{Error Recovery}), symbols already pushed
-on the stack and tokens coming from the rest of the file
-are discarded until the parser falls on its feet. If the parser
-runs out of memory, all the symbols on the stack must be discarded.
-Even if the parser succeeds, it must discard the start symbol.
+During error recovery (@pxref{Error Recovery}), symbols already pushed
+on the stack and tokens coming from the rest of the file are discarded
+until the parser falls on its feet. If the parser runs out of memory,
+or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the
+symbols on the stack must be discarded. Even if the parser succeeds, it
+must discard the start symbol.
When discarded symbols convey heap based information, this memory is
lost. While this behavior can be tolerable for batch parsers, such as
-in traditional compilers, it is unacceptable for programs like shells
-or protocol implementations that may parse and execute indefinitely.
+in traditional compilers, it is unacceptable for programs like shells or
+protocol implementations that may parse and execute indefinitely.
-The @code{%destructor} directive defines code that
-is called when a symbol is discarded.
+The @code{%destructor} directive defines code that is called when a
+symbol is automatically discarded.
@deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
@findex %destructor
-Invoke @var{code} whenever the parser discards one of the
-@var{symbols}. Within @var{code}, @code{$$} designates the semantic
-value associated with the discarded symbol. The additional
-parser parameters are also available
-(@pxref{Parser Function, , The Parser Function @code{yyparse}}).
-
-@strong{Warning:} as of Bison 2.1, this feature is still
-experimental, as there has not been enough user feedback. In particular,
-the syntax might still change.
+Invoke @var{code} whenever the parser discards one of the @var{symbols}.
+Within @var{code}, @code{$$} designates the semantic value associated
+with the discarded symbol. The additional parser parameters are also
+available (@pxref{Parser Function, , The Parser Function
+@code{yyparse}}).
@end deffn
For instance:
guarantees that when a @code{STRING} or a @code{string} is discarded,
its associated memory will be freed.
-Note that in the future, Bison might also consider that right hand side
-members that are not mentioned in the action can be destroyed. For
-instance, in:
-
-@smallexample
-comment: "/*" STRING "*/";
-@end smallexample
-
-@noindent
-the parser is entitled to destroy the semantic value of the
-@code{string}. Of course, this will not apply to the default action;
-compare:
-
-@smallexample
-typeless: string; // $$ = $1 does not apply; $1 is destroyed.
-typefull: string; // $$ = $1 applies, $1 is not destroyed.
-@end smallexample
-
@sp 1
@cindex discarded symbols
@item
incoming terminals during the second phase of error recovery,
@item
-the current look-ahead and the entire stack when the parser aborts
-(either via an explicit call to @code{YYABORT}, or as a consequence of
-a failed error recovery or of memory exhaustion), and
+the current look-ahead and the entire stack (except the current
+right-hand side symbols) when the parser returns immediately, and
@item
the start symbol, when the parser succeeds.
@end itemize
+The parser can @dfn{return immediately} because of an explicit call to
+@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory
+exhaustion.
+
+Right-hand size symbols of a rule that explicitly triggers a syntax
+error via @code{YYERROR} are not discarded automatically. As a rule
+of thumb, destructors are invoked only when user actions cannot manage
+the memory.
@node Expect Decl
@subsection Suppressing Conflict Warnings
%expect @var{n}
@end example
-Here @var{n} is a decimal integer. The declaration says there should be
-no warning if there are @var{n} shift/reduce conflicts and no
-reduce/reduce conflicts. The usual warning is
-given if there are either more or fewer conflicts, or if there are any
-reduce/reduce conflicts.
+Here @var{n} is a decimal integer. The declaration says there should
+be @var{n} shift/reduce conflicts and no reduce/reduce conflicts.
+Bison reports an error if the number of shift/reduce conflicts differs
+from @var{n}, or if there are any reduce/reduce conflicts.
-For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more serious,
-and should be eliminated entirely. Bison will always report
-reduce/reduce conflicts for these parsers. With @acronym{GLR} parsers, however,
-both shift/reduce and reduce/reduce are routine (otherwise, there
-would be no need to use @acronym{GLR} parsing). Therefore, it is also possible
-to specify an expected number of reduce/reduce conflicts in @acronym{GLR}
-parsers, using the declaration:
+For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more
+serious, and should be eliminated entirely. Bison will always report
+reduce/reduce conflicts for these parsers. With @acronym{GLR}
+parsers, however, both kinds of conflicts are routine; otherwise,
+there would be no need to use @acronym{GLR} parsing. Therefore, it is
+also possible to specify an expected number of reduce/reduce conflicts
+in @acronym{GLR} parsers, using the declaration:
@example
%expect-rr @var{n}
@item
Add an @code{%expect} declaration, copying the number @var{n} from the
-number which Bison printed.
+number which Bison printed. With @acronym{GLR} parsers, add an
+@code{%expect-rr} declaration as well.
@end itemize
-Now Bison will stop annoying you if you do not change the number of
-conflicts, but it will warn you again if changes in the grammar result
-in more or fewer conflicts.
+Now Bison will warn you if you introduce an unexpected conflict, but
+will keep silent otherwise.
@node Start Decl
@subsection The Start-Symbol
A @dfn{reentrant} program is one which does not alter in the course of
execution; in other words, it consists entirely of @dfn{pure} (read-only)
code. Reentrancy is important whenever asynchronous execution is possible;
-for example, a non-reentrant program may not be safe to call from a signal
-handler. In systems with multiple threads of control, a non-reentrant
+for example, a nonreentrant program may not be safe to call from a signal
+handler. In systems with multiple threads of control, a nonreentrant
program must be called only within interlocks.
Normally, Bison generates a parser which is not reentrant. This is
(Reentrant) Parser}).
@end deffn
+@deffn {Directive} %require "@var{version}"
+Require version @var{version} or higher of Bison. @xref{Require Decl, ,
+Require a Version of Bison}.
+@end deffn
+
@deffn {Directive} %token-table
Generate an array of token names in the parser file. The name of the
array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
@subsection Semantic Values of Tokens
@vindex yylval
-In an ordinary (non-reentrant) parser, the semantic value of the token must
+In an ordinary (nonreentrant) parser, the semantic value of the token must
be stored into the global variable @code{yylval}. When you are using
just one data type for semantic values, @code{yylval} has that type.
Thus, if the type is @code{int} (the default), you might write this in
@vindex yynerrs
The variable @code{yynerrs} contains the number of syntax errors
-encountered so far. Normally this variable is global; but if you
+reported so far. Normally this variable is global; but if you
request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser})
then it is a local variable which only the actions can access.
@cindex @acronym{GLR} parsing
@cindex generalized @acronym{LR} (@acronym{GLR}) parsing
@cindex ambiguous grammars
-@cindex non-deterministic parsing
+@cindex nondeterministic parsing
Bison produces @emph{deterministic} parsers that choose uniquely
when to reduce and which reduction to apply
context-free grammar in cubic worst-case time. However, Bison currently
uses a simpler data structure that requires time proportional to the
length of the input times the maximum number of stacks required for any
-prefix of the input. Thus, really ambiguous or non-deterministic
+prefix of the input. Thus, really ambiguous or nondeterministic
grammars can require exponential time and space to process. Such badly
behaving examples, however, are not generally of practical interest.
-Usually, non-determinism in a grammar is local---the parser is ``in
+Usually, nondeterminism in a grammar is local---the parser is ``in
doubt'' only for a few tokens at a time. Therefore, the current data
structure should generally be adequate. On @acronym{LALR}(1) portions of a
grammar, in particular, it is only slightly slower than with the default
The @code{%union} directive works as for C, see @ref{Union Decl, ,The
Collection of Value Types}. In particular it produces a genuine
@code{union}@footnote{In the future techniques to allow complex types
-within pseudo-unions (variants) might be implemented to alleviate
-these issues.}, which have a few specific features in C++.
+within pseudo-unions (similar to Boost variants) might be implemented to
+alleviate these issues.}, which have a few specific features in C++.
@itemize @minus
@item
-The name @code{YYSTYPE} also denotes @samp{union YYSTYPE}. You may
-forward declare it just with @samp{union YYSTYPE;}.
+The type @code{YYSTYPE} is defined but its use is discouraged: rather
+you should refer to the parser's encapsulated type
+@code{yy::parser::semantic_type}.
@item
Non POD (Plain Old Data) types cannot be used. C++ forbids any
instance of classes with constructors in unions: only @emph{pointers}
declare and define the parser class in the namespace @code{yy}. The
class name defaults to @code{parser}, but may be changed using
@samp{%define "parser_class_name" "@var{name}"}. The interface of
-this class is detailled below. It can be extended using the
+this class is detailed below. It can be extended using the
@code{%parse-param} feature: its semantics is slightly changed since
it describes an additional member of the parser class, and an
additional argument for its constructor.
@deftypemethod {parser} {debug_level_type} debug_level ()
@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l})
Get or set the tracing level. Currently its value is either 0, no trace,
-or non-zero, full tracing.
+or nonzero, full tracing.
@end deftypemethod
@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m})
@subsection Calc++ --- C++ Calculator
Of course the grammar is dedicated to arithmetics, a single
-expression, possibily preceded by variable assignments. An
+expression, possibly preceded by variable assignments. An
environment containing possibly predefined variables such as
@code{one} and @code{two}, is exchanged with the parser. An example
of valid input follows.
The declaration of this driver class, @file{calc++-driver.hh}, is as
follows. The first part includes the CPP guard and imports the
-required standard library components.
+required standard library components, and the declaration of the parser
+class.
@comment file: calc++-driver.hh
@example
# define CALCXX_DRIVER_HH
# include <string>
# include <map>
+# include "calc++-parser.hh"
@end example
-@noindent
-Then come forward declarations. Because the parser uses the parsing
-driver and reciprocally, simple inclusions of header files will not
-do. Because the driver's declaration is the one that will be imported
-by the rest of the project, it is saner to forward declare the
-parser's information here.
-
-@comment file: calc++-driver.hh
-@example
-// Forward declarations.
-union YYSTYPE;
-namespace yy
-@{
- class location;
- class calcxx_parser;
-@}
-class calcxx_driver;
-@end example
@noindent
Then comes the declaration of the scanning function. Flex expects
@example
// Announce to Flex the prototype we want for lexing function, ...
# define YY_DECL \
- int yylex (YYSTYPE* yylval, yy::location* yylloc, calcxx_driver& driver)
+ int yylex (yy::calcxx_parser::semantic_type* yylval, \
+ yy::calcxx_parser::location_type* yylloc, \
+ calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;
@end example
@node Calc++ Parser
@subsection Calc++ Parser
-The parser definition file @file{calc++-parser.yy} starts by asking
-for the C++ skeleton, the creation of the parser header file, and
-specifies the name of the parser class. It then includes the required
-headers.
+The parser definition file @file{calc++-parser.yy} starts by asking for
+the C++ LALR(1) skeleton, the creation of the parser header file, and
+specifies the name of the parser class. Because the C++ skeleton
+changed several times, it is safer to require the version you designed
+the grammar for.
@comment file: calc++-parser.yy
@example
%skeleton "lalr1.cc" /* -*- C++ -*- */
-%define "parser_class_name" "calcxx_parser"
+%require "2.1a"
%defines
+%define "parser_class_name" "calcxx_parser"
+@end example
+
+@noindent
+Then come the declarations/inclusions needed to define the
+@code{%union}. Because the parser uses the parsing driver and
+reciprocally, both cannot include the header of the other. Because the
+driver's header needs detailed knowledge about the parser class (in
+particular its inner types), it is the parser's header which will simply
+use a forward declaration of the driver.
+
+@comment file: calc++-parser.yy
+@example
%@{
# include <string>
-# include "calc++-driver.hh"
+class calcxx_driver;
%@}
@end example
@};
@end example
+@noindent
+The code between @samp{%@{} and @samp{%@}} after the introduction of the
+@samp{%union} is output in the @file{*.cc} file; it needs detailed
+knowledge about the driver.
+
+@comment file: calc++-parser.yy
+@example
+%@{
+# include "calc++-driver.hh"
+%@}
+@end example
+
+
@noindent
The token numbered as 0 corresponds to end of file; the following line
allows for nicer error messages referring to ``end of file'' instead
@comment file: calc++-parser.yy
@example
-%token TOKEN_EOF 0 "end of file"
-%token TOKEN_ASSIGN ":="
-%token <sval> TOKEN_IDENTIFIER "identifier"
-%token <ival> TOKEN_NUMBER "number"
-%type <ival> exp "expression"
+%token END 0 "end of file"
+%token ASSIGN ":="
+%token <sval> IDENTIFIER "identifier"
+%token <ival> NUMBER "number"
+%type <ival> exp "expression"
@end example
@noindent
unit: assignments exp @{ driver.result = $2; @};
assignments: assignments assignment @{@}
- | /* Nothing. */ @{@};
+ | /* Nothing. */ @{@};
-assignment: TOKEN_IDENTIFIER ":=" exp @{ driver.variables[*$1] = $3; @};
+assignment: "identifier" ":=" exp @{ driver.variables[*$1] = $3; @};
%left '+' '-';
%left '*' '/';
| exp '-' exp @{ $$ = $1 - $3; @}
| exp '*' exp @{ $$ = $1 * $3; @}
| exp '/' exp @{ $$ = $1 / $3; @}
- | TOKEN_IDENTIFIER @{ $$ = driver.variables[*$1]; @}
- | TOKEN_NUMBER @{ $$ = $1; @};
+ | "identifier" @{ $$ = driver.variables[*$1]; @}
+ | "number" @{ $$ = $1; @};
%%
@end example
@end example
@noindent
-The following paragraph suffices to track locations acurately. Each
+The following paragraph suffices to track locations accurately. Each
time @code{yylex} is invoked, the begin position is moved onto the end
position. Then when a pattern is matched, the end position is
advanced of its width. In case it matched ends of lines, the end
@end example
@noindent
-The rules are simple, just note the use of the driver to report
-errors.
+The rules are simple, just note the use of the driver to report errors.
+It is convenient to use a typedef to shorten
+@code{yy::calcxx_parser::token::identifier} into
+@code{token::identifier} for instance.
@comment file: calc++-scanner.ll
@example
+%@{
+ typedef yy::calcxx_parser::token token;
+%@}
+
[-+*/] return yytext[0];
-":=" return TOKEN_ASSIGN;
+":=" return token::ASSIGN;
@{int@} @{
errno = 0;
long n = strtol (yytext, NULL, 10);
if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
driver.error (*yylloc, "integer is out of range");
yylval->ival = n;
- return TOKEN_NUMBER;
+ return token::NUMBER;
@}
-@{id@} yylval->sval = new std::string (yytext); return TOKEN_IDENTIFIER;
+@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER;
. driver.error (*yylloc, "invalid character");
%%
@end example
@end deffn
@deffn {Directive} %nonassoc
-Bison declaration to assign non-associativity to token(s).
+Bison declaration to assign nonassociativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
@end deffn
@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
@end deffn
+@deffn {Directive} %require "@var{version}"
+Require version @var{version} or higher of Bison. @xref{Require Decl, ,
+Require a Version of Bison}.
+@end deffn
+
@deffn {Directive} %right
Bison declaration to assign right associativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
@end deffn
@deffn {Variable} yynerrs
-Global variable which Bison increments each time there is a syntax error.
+Global variable which Bison increments each time it reports a syntax error.
(In a pure parser, it is a local variable within @code{yyparse}.)
@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
@end deffn
reserved for future Bison extensions. If not defined,
@code{YYSTACK_USE_ALLOCA} defaults to 0.
-If you define @code{YYSTACK_USE_ALLOCA} to 1, it is your
-responsibility to make sure that @code{alloca} is visible, e.g., by
-using @acronym{GCC} or by including @code{<stdlib.h>}. Furthermore,
-in the all-too-common case where your code may run on a host with a
+In the all-too-common case where your code may run on a host with a
limited stack and with unreliable stack-overflow checking, you should
set @code{YYMAXDEPTH} to a value that cannot possibly result in
unchecked stack overflow on any of your target hosts when