which reads tokens.
* Error Reporting:: You must supply a function @code{yyerror}.
* Action Features:: Special features for use in actions.
+* Internationalization:: How to let the parser speak in the user's
+ native language.
The Lexical Analyzer Function @code{yylex}
* Reduce/Reduce:: When two rules are applicable in the same situation.
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
-* Stack Overflow:: What happens when stack gets full. How to avoid it.
+* Memory Management:: What happens when memory is exhausted. How to avoid it.
Operator Precedence
Frequently Asked Questions
-* Parser Stack Overflow:: Breaking the Stack Limits
+* Memory Exhausted:: Breaking the Stack Limits
* How Can I Reset the Parser:: @code{yyparse} Keeps some State
* Strings are Destroyed:: @code{yylval} Loses Track of Strings
* Implementing Gotos/Loops:: Control Flow in the Calculator
arrange for it to call @code{yyparse} or the parser will never run.
@xref{Interface, ,Parser C-Language Interface}.
-If your code defines a C preprocessor macro @code{_} (a single
-underscore), Bison assumes that it can be used to translate
-English-language strings to the user's preferred language using a
-function-like syntax, e.g., @code{_("syntax error")}. Otherwise,
-Bison defines a no-op macro by that name that merely returns its
-argument, so strings are not translated.
-
-Aside from @code{_} and the token type names and the symbols in the actions you
+Aside from the token type names and the symbols in the actions you
write, all symbols defined in the Bison parser file itself
begin with @samp{yy} or @samp{YY}. This includes interface functions
such as the lexical analyzer function @code{yylex}, the error reporting
those cases your code should respect the identifiers reserved by those
headers. On some non-@acronym{GNU} hosts, @code{<alloca.h>},
@code{<stddef.h>}, and @code{<stdlib.h>} are included as needed to
-declare memory allocators and related types. Other system headers may
+declare memory allocators and related types. @code{<libintl.h>} is
+included if message translation is in use
+(@pxref{Internationalization}). Other system headers may
be included if you define @code{YYDEBUG} to a nonzero value
(@pxref{Tracing, ,Tracing Your Parser}).
@cindex freeing discarded symbols
@findex %destructor
-Some symbols can be discarded by the parser. For instance, during error
-recovery (@pxref{Error Recovery}), embarrassing symbols already pushed
-on the stack, and embarrassing tokens coming from the rest of the file
-are thrown away until the parser falls on its feet. If these symbols
-convey heap based information, this memory is lost. While this behavior
-can be tolerable for batch parsers, such as in compilers, it is not for
-possibly ``never ending'' parsers such as shells, or implementations of
-communication protocols.
+Some symbols can be discarded by the parser. During error
+recovery (@pxref{Error Recovery}), symbols already pushed
+on the stack and tokens coming from the rest of the file
+are discarded until the parser falls on its feet. If the parser
+runs out of memory, all the symbols on the stack must be discarded.
+Even if the parser succeeds, it must discard the start symbol.
+
+When discarded symbols convey heap based information, this memory is
+lost. While this behavior can be tolerable for batch parsers, such as
+in traditional compilers, it is unacceptable for programs like shells
+or protocol implementations that may parse and execute indefinitely.
-The @code{%destructor} directive allows for the definition of code that
-is called when a symbol is thrown away.
+The @code{%destructor} directive defines code that
+is called when a symbol is discarded.
@deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
@findex %destructor
-Declare that the @var{code} must be invoked for each of the
-@var{symbols} that will be discarded by the parser. The @var{code}
-should use @code{$$} to designate the semantic value associated to the
-@var{symbols}. The additional parser parameters are also available
+Invoke @var{code} whenever the parser discards one of the
+@var{symbols}. Within @var{code}, @code{$$} designates the semantic
+value associated with the discarded symbol. The additional
+parser parameters are also available
(@pxref{Parser Function, , The Parser Function @code{yyparse}}).
-@strong{Warning:} as of Bison 1.875, this feature is still considered as
-experimental, as there was not enough user feedback. In particular,
+@strong{Warning:} as of Bison 2.1, this feature is still
+experimental, as there has not been enough user feedback. In particular,
the syntax might still change.
@end deffn
@end smallexample
@noindent
-guarantees that when a @code{STRING} or a @code{string} will be discarded,
+guarantees that when a @code{STRING} or a @code{string} is discarded,
its associated memory will be freed.
Note that in the future, Bison might also consider that right hand side
@item
incoming terminals during the second phase of error recovery,
@item
-the current look-ahead when the parser aborts (either via an explicit
-call to @code{YYABORT}, or as a consequence of a failed error recovery).
+the current look-ahead and the entire stack when the parser aborts
+(either via an explicit call to @code{YYABORT}, or as a consequence of
+a failed error recovery or of memory exhaustion), and
+@item
+the start symbol, when the parser succeeds.
@end itemize
@end deffn
@deffn {Directive} %destructor
-Specifying how the parser should reclaim the memory associated to
+Specify how the parser should reclaim the memory associated to
discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
@end deffn
@code{"error"}, and @code{"$undefined"}; after these come the symbols
defined in the grammar file.
-For single-character literal tokens and literal string tokens, the name
-in the table includes the single-quote or double-quote characters: for
-example, @code{"'+'"} is a single-character literal and @code{"\"<=\""}
-is a literal string token. All the characters of the literal string
-token appear verbatim in the string found in the table; even
-double-quote characters are not escaped. For example, if the token
-consists of three characters @samp{*"*}, its string in @code{yytname}
-contains @samp{"*"*"}. (In C, that would be written as
-@code{"\"*\"*\""}).
+The name in the table includes all the characters needed to represent
+the token in Bison. For single-character literals and literal
+strings, this includes the surrounding quoting characters and any
+escape sequences. For example, the Bison single-character literal
+@code{'+'} corresponds to a three-character name, represented in C as
+@code{"'+'"}; and the Bison two-character literal string @code{"\\/"}
+corresponds to a five-character name, represented in C as
+@code{"\"\\\\/\""}.
When you specify @code{%token-table}, Bison also generates macro
definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
which reads tokens.
* Error Reporting:: You must supply a function @code{yyerror}.
* Action Features:: Special features for use in actions.
+* Internationalization:: How to let the parser speak in the user's
+ native language.
@end menu
@node Parser Function
table. The index of the token in the table is the token type's code.
The name of a multicharacter token is recorded in @code{yytname} with a
double-quote, the token's characters, and another double-quote. The
-token's characters are not escaped in any way; they appear verbatim in
-the contents of the string in the table.
+token's characters are escaped as necessary to be suitable as input
+to Bison.
-Here's code for looking up a token in @code{yytname}, assuming that the
-characters of the token are stored in @code{token_buffer}.
+Here's code for looking up a multicharacter token in @code{yytname},
+assuming that the characters of the token are stored in
+@code{token_buffer}, and assuming that the token does not contain any
+characters like @samp{"} that require escaping.
@smallexample
for (i = 0; i < YYNTOKENS; i++)
Section}), then Bison provides a more verbose and specific error message
string instead of just plain @w{@code{"syntax error"}}.
-The parser can detect one other kind of error: stack overflow. This
-happens when the input contains constructions that are very deeply
+The parser can detect one other kind of error: memory exhaustion. This
+can happen when the input contains constructions that are very deeply
nested. It isn't likely you will encounter this, since the Bison
-parser extends its stack automatically up to a very large limit. But
-if overflow happens, @code{yyparse} calls @code{yyerror} in the usual
-fashion, except that the argument string is @w{@code{"parser stack
-overflow"}}.
+parser normally extends its stack automatically up to a very large limit. But
+if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual
+fashion, except that the argument string is @w{@code{"memory exhausted"}}.
+
+In some cases diagnostics like @w{@code{"syntax error"}} are
+translated automatically from English to some other language before
+they are passed to @code{yyerror}. @xref{Internationalization}.
The following definition suffices in simple programs:
Tracking Locations}.
@end deffn
+@node Internationalization
+@section Parser Internationalization
+@cindex internationalization
+@cindex i18n
+@cindex NLS
+@cindex gettext
+@cindex bison-po
+
+A Bison-generated parser can print diagnostics, including error and
+tracing messages. By default, they appear in English. However, Bison
+also supports outputting diagnostics in the user's native language.
+To make this work, the user should set the usual environment
+variables. @xref{Users, , The User's View, gettext, GNU
+@code{gettext} utilities}. For
+example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might set
+the user's locale to French Canadian using the @acronym{UTF}-8
+encoding. The exact set of available locales depends on the user's
+installation.
+
+The maintainer of a package that uses a Bison-generated parser enables
+the internationalization of the parser's output through the following
+steps. Here we assume a package that uses @acronym{GNU} Autoconf and
+@acronym{GNU} Automake.
+
+@enumerate
+@item
+@cindex bison-i18n.m4
+Into the directory containing the @acronym{GNU} Autoconf macros used
+by the package---often called @file{m4}---copy the
+@file{bison-i18n.m4} file installed by Bison under
+@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory.
+For example:
+
+@example
+cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4
+@end example
+
+@item
+@findex BISON_I18N
+@vindex BISON_LOCALEDIR
+@vindex YYENABLE_NLS
+In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT}
+invocation, add an invocation of @code{BISON_I18N}. This macro is
+defined in the file @file{bison-i18n.m4} that you copied earlier. It
+causes @samp{configure} to find the value of the
+@code{BISON_LOCALEDIR} variable, and it defines the source-language
+symbol @code{YYENABLE_NLS} to enable translations in the
+Bison-generated parser.
+
+@item
+In the @code{main} function of your program, designate the directory
+containing Bison's runtime message catalog, through a call to
+@samp{bindtextdomain} with domain name @samp{bison-runtime}.
+For example:
+
+@example
+bindtextdomain ("bison-runtime", BISON_LOCALEDIR);
+@end example
+
+Typically this appears after any other call @code{bindtextdomain
+(PACKAGE, LOCALEDIR)} that your package already has. Here we rely on
+@samp{BISON_LOCALEDIR} to be defined as a string through the
+@file{Makefile}.
+
+@item
+In the @file{Makefile.am} that controls the compilation of the @code{main}
+function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro,
+either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example:
+
+@example
+DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
+@end example
+
+or:
+
+@example
+AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
+@end example
+
+@item
+Finally, invoke the command @command{autoreconf} to generate the build
+infrastructure.
+@end enumerate
+
@node Algorithm
@chapter The Bison Parser Algorithm
* Reduce/Reduce:: When two rules are applicable in the same situation.
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
-* Stack Overflow:: What happens when stack gets full. How to avoid it.
+* Memory Management:: What happens when memory is exhausted. How to avoid it.
@end menu
@node Look-Ahead
;
@end example
+For a more detailed exposition of @acronym{LALR}(1) parsers and parser
+generators, please see:
+Frank DeRemer and Thomas Pennello, Efficient Computation of
+@acronym{LALR}(1) Look-Ahead Sets, @cite{@acronym{ACM} Transactions on
+Programming Languages and Systems}, Vol.@: 4, No.@: 4 (October 1982),
+pp.@: 615--649 @uref{http://doi.acm.org/10.1145/69622.357187}.
+
@node Generalized LR Parsing
@section Generalized @acronym{LR} (@acronym{GLR}) Parsing
@cindex @acronym{GLR} parsing
@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps},
(2000-12-24).
-@node Stack Overflow
-@section Stack Overflow, and How to Avoid It
+@node Memory Management
+@section Memory Management, and How to Avoid Memory Exhaustion
+@cindex memory exhaustion
+@cindex memory management
@cindex stack overflow
@cindex parser stack overflow
@cindex overflow of parser stack
-The Bison parser stack can overflow if too many tokens are shifted and
+The Bison parser stack can run out of memory if too many tokens are shifted and
not reduced. When this happens, the parser function @code{yyparse}
-returns a nonzero value, pausing only to call @code{yyerror} to report
-the overflow.
+calls @code{yyerror} and then returns 2.
Because Bison parsers have growing stacks, hitting the upper limit
usually results from using a right recursion instead of a left
@vindex YYMAXDEPTH
By defining the macro @code{YYMAXDEPTH}, you can control how deep the
-parser stack can become before a stack overflow occurs. Define the
+parser stack can become before memory is exhausted. Define the
macro with a value that is an integer. This value is the maximum number
of tokens that can be shifted (and not reduced) before overflow.
The stack space allowed is not necessarily allocated. If you specify a
-large value for @code{YYMAXDEPTH}, the parser actually allocates a small
+large value for @code{YYMAXDEPTH}, the parser normally allocates a small
stack at first, and then makes it bigger by stages as needed. This
increasing allocation happens automatically and silently. Therefore,
you do not need to make @code{YYMAXDEPTH} painfully small merely to save
unless you are assuming C99 or some other target language or compiler
that allows variable-length arrays. The default is 200.
-Do not allow @code{YYINITDEPTH} to be a value so large that arithmetic
-overflow would occur when calculating the size of the stack space.
-Also, do not allow @code{YYINITDEPTH} to be greater than
-@code{YYMAXDEPTH}.
+Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}.
@c FIXME: C++ output.
Because of semantical differences between C and C++, the
-@acronym{LALR}(1) parsers in C produced by Bison by compiled as C++
-cannot grow. In this precise case (compiling a C parser as C++) you are
-suggested to grow @code{YYINITDEPTH}. In the near future, a C++ output
-output will be provided which addresses this issue.
+@acronym{LALR}(1) parsers in C produced by Bison cannot grow when compiled
+by C++ compilers. In this precise case (compiling a C parser as C++) you are
+suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix
+this deficiency in a future release.
@node Error Recovery
@chapter Error Recovery
@itemx --version
Print the version number of Bison and exit.
+@item --print-localedir
+Print the name of the directory containing locale-dependent data.
+
@need 1750
@item -y
@itemx --yacc
\line{ --no-lines \leaderfill -l}
\line{ --no-parser \leaderfill -n}
\line{ --output \leaderfill -o}
+\line{ --print-localedir}
\line{ --token-table \leaderfill -k}
\line{ --verbose \leaderfill -v}
\line{ --version \leaderfill -V}
--no-lines -l
--no-parser -n
--output=@var{outfile} -o @var{outfile}
+--print-localedir
--token-table -k
--verbose -v
--version -V
@c - Always pure
@c - initial action
-The C++ parser LALR(1) skeleton is named @file{lalr1.cc}. To select
+The C++ parser @acronym{LALR}(1) skeleton is named @file{lalr1.cc}. To select
it, you may either pass the option @option{--skeleton=lalr1.cc} to
Bison, or include the directive @samp{%skeleton "lalr1.cc"} in the
grammar preamble. When run, @command{bison} will create several
it describes an additional member of the parser class, and an
additional argument for its constructor.
-@deftypemethod {parser} {semantic_value_type}
-@deftypemethodx {parser} {location_value_type}
+@defcv {Type} {parser} {semantic_value_type}
+@defcvx {Type} {parser} {location_value_type}
The types for semantics value and locations.
-@c FIXME: deftypemethod pour des types ???
-@end deftypemethod
+@end defcv
@deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...)
Build a new parser object. There are no arguments by default, unless
follows. The first part includes the CPP guard and imports the
required standard library components.
+@comment file: calc++-driver.hh
@example
#ifndef CALCXX_DRIVER_HH
# define CALCXX_DRIVER_HH
by the rest of the project, it is saner to forward declare the
parser's information here.
+@comment file: calc++-driver.hh
@example
// Forward declarations.
union YYSTYPE;
-namespace yy @{ class calcxx_parser; @}
+namespace yy
+@{
+ class location;
+ class calcxx_parser;
+@}
class calcxx_driver;
@end example
the signature of @code{yylex} to be defined in the macro
@code{YY_DECL}, and the C++ parser expects it to be declared. We can
factor both as follows.
+
+@comment file: calc++-driver.hh
@example
// Announce to Flex the prototype we want for lexing function, ...
-# define YY_DECL \
+# define YY_DECL \
int yylex (YYSTYPE* yylval, yy::location* yylloc, calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;
The @code{calcxx_driver} class is then declared with its most obvious
members.
+@comment file: calc++-driver.hh
@example
// Conducting the whole scanning and parsing of Calc++.
class calcxx_driver
have two members function to open and close the scanning phase.
members.
+@comment file: calc++-driver.hh
@example
// Handling the scanner.
void scan_begin ();
@noindent
Similarly for the parser itself.
+@comment file: calc++-driver.hh
@example
// Handling the parser.
void parse (const std::string& f);
compiler driver using the following two member functions. Finally, we
close the class declaration and CPP guard.
+@comment file: calc++-driver.hh
@example
// Error handling.
void error (const yy::location& l, const std::string& m);
are simple stubs, they should actually register the located error
messages and set error state.
+@comment file: calc++-driver.cc
@example
#include "calc++-driver.hh"
#include "calc++-parser.hh"
for the C++ skeleton, the creation of the parser header file, and
specifies the name of the parser class. It then includes the required
headers.
+
+@comment file: calc++-parser.yy
@example
%skeleton "lalr1.cc" /* -*- C++ -*- */
%define "parser_class_name" "calcxx_parser"
This provides a simple but effective pure interface, not relying on
global variables.
+@comment file: calc++-parser.yy
@example
// The parsing context.
%parse-param @{ calcxx_driver& driver @}
relatively to the previous locations: the file name will be
automatically propagated.
+@comment file: calc++-parser.yy
@example
%locations
%initial-action
Use the two following directives to enable parser tracing and verbose
error messages.
+@comment file: calc++-parser.yy
@example
%debug
%error-verbose
Semantic values cannot use ``real'' objects, but only pointers to
them.
+@comment file: calc++-parser.yy
@example
// Symbols.
%union
symbol. Note that the tokens names are prefixed by @code{TOKEN_} to
avoid name clashes.
+@comment file: calc++-parser.yy
@example
%token YYEOF 0 "end of file"
%token TOKEN_ASSIGN ":="
To enable memory deallocation during error recovery, use
@code{%destructor}.
+@comment file: calc++-parser.yy
@example
%printer @{ debug_stream () << *$$; @} "identifier"
%destructor @{ delete $$; @} "identifier"
@noindent
The grammar itself is straightforward.
+@comment file: calc++-parser.yy
@example
%%
%start unit;
Finally the @code{error} member function registers the errors to the
driver.
+@comment file: calc++-parser.yy
@example
void
-yy::calcxx_parser::error (const location_type& l, const std::string& m)
+yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l,
+ const std::string& m)
@{
driver.error (l, m);
@}
The Flex scanner first includes the driver declaration, then the
parser's to get the set of defined tokens.
+@comment file: calc++-scanner.ll
@example
%@{ /* -*- C++ -*- */
+# include <cstdlib>
+# include <errno.h>
+# include <limits.h>
# include <string>
# include "calc++-driver.hh"
# include "calc++-parser.hh"
actual file, this is not an interactive session with the user.
Finally we enable the scanner tracing features.
+@comment file: calc++-scanner.ll
@example
%option noyywrap nounput batch debug
@end example
@noindent
Abbreviations allow for more readable rules.
+@comment file: calc++-scanner.ll
@example
id [a-zA-Z][a-zA-Z_0-9]*
int [0-9]+
is moved onto the end cursor to effectively ignore the blanks
preceding tokens. Comments would be treated equally.
+@comment file: calc++-scanner.ll
@example
+%@{
+# define YY_USER_ACTION yylloc->columns (yyleng);
+%@}
%%
%@{
yylloc->step ();
-# define YY_USER_ACTION yylloc->columns (yyleng);
%@}
@{blank@}+ yylloc->step ();
[\n]+ yylloc->lines (yyleng); yylloc->step ();
The rules are simple, just note the use of the driver to report
errors.
+@comment file: calc++-scanner.ll
@example
[-+*/] return yytext[0];
":=" return TOKEN_ASSIGN;
-@{int@} yylval->ival = atoi (yytext); return TOKEN_NUMBER;
+@{int@} @{
+ errno = 0;
+ long n = strtol (yytext, NULL, 10);
+ if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
+ driver.error (*yylloc, "integer is out of range");
+ yylval->ival = n;
+ return TOKEN_NUMBER;
+@}
@{id@} yylval->sval = new std::string (yytext); return TOKEN_IDENTIFIER;
. driver.error (*yylloc, "invalid character");
%%
Finally, because the scanner related driver's member function depend
on the scanner's data, it is simpler to implement them in this file.
+@comment file: calc++-scanner.ll
@example
void
calcxx_driver::scan_begin ()
The top level file, @file{calc++.cc}, poses no problem.
+@comment file: calc++.cc
@example
#include <iostream>
#include "calc++-driver.hh"
are addressed.
@menu
-* Parser Stack Overflow:: Breaking the Stack Limits
+* Memory Exhausted:: Breaking the Stack Limits
* How Can I Reset the Parser:: @code{yyparse} Keeps some State
* Strings are Destroyed:: @code{yylval} Loses Track of Strings
* Implementing Gotos/Loops:: Control Flow in the Calculator
@end menu
-@node Parser Stack Overflow
-@section Parser Stack Overflow
+@node Memory Exhausted
+@section Memory Exhausted
@display
-My parser returns with error with a @samp{parser stack overflow}
+My parser returns with error with a @samp{memory exhausted}
message. What can I do?
@end display
@end deffn
@deffn {Directive} %destructor
-Specifying how the parser should reclaim the memory associated to
+Specify how the parser should reclaim the memory associated to
discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
@end deffn
@deffn {Macro} YYINITDEPTH
Macro for specifying the initial size of the parser stack.
-@xref{Stack Overflow}.
+@xref{Memory Management}.
@end deffn
@deffn {Function} yylex
@end deffn
@deffn {Macro} YYMAXDEPTH
-Macro for specifying the maximum size of the parser stack. @xref{Stack
-Overflow}.
+Macro for specifying the maximum size of the parser stack. @xref{Memory
+Management}.
@end deffn
@deffn {Variable} yynerrs