This manual (@value{UPDATED}) is for GNU Bison (version
@value{VERSION}), the GNU parser generator.
-Copyright @copyright{} 1988-1993, 1995, 1998-2011 Free Software
+Copyright @copyright{} 1988-1993, 1995, 1998-2012 Free Software
Foundation, Inc.
@quotation
* Table of Symbols:: All the keywords of the Bison language are explained.
* Glossary:: Basic concepts are explained.
* Copying This Manual:: License for copying this manual.
+* Bibliography:: Publications cited in this manual.
* Index:: Cross-references to the text.
@detailmenu
the name of an identifier, etc.).
* Semantic Actions:: Each rule can have an action containing C code.
* GLR Parsers:: Writing parsers for general context-free languages.
-* Locations Overview:: Tracking Locations.
+* Locations:: Overview of location tracking.
* Bison Parser:: What are Bison's input and output,
how is the output used?
* Stages:: Stages in writing and running Bison grammars.
Grammar Rules for @code{rpcalc}
-* Rpcalc Input::
-* Rpcalc Line::
-* Rpcalc Expr::
+* Rpcalc Input:: Explanation of the @code{input} nonterminal
+* Rpcalc Line:: Explanation of the @code{line} nonterminal
+* Rpcalc Expr:: Explanation of the @code{expr} nonterminal
Location Tracking Calculator: @code{ltcalc}
* Mfcalc Declarations:: Bison declarations for multi-function calculator.
* Mfcalc Rules:: Grammar rules for the calculator.
* Mfcalc Symbol Table:: Symbol table management subroutines.
+* Mfcalc Lexer:: The lexical analyzer.
+* Mfcalc Main:: The controlling function.
Bison Grammar Files
-* Grammar Outline:: Overall layout of the grammar file.
-* Symbols:: Terminal and nonterminal symbols.
-* Rules:: How to write grammar rules.
-* Recursion:: Writing recursive rules.
-* Semantics:: Semantic values and actions.
-* Locations:: Locations and actions.
-* Declarations:: All kinds of Bison declarations are described here.
-* Multiple Parsers:: Putting more than one Bison parser in one program.
+* Grammar Outline:: Overall layout of the grammar file.
+* Symbols:: Terminal and nonterminal symbols.
+* Rules:: How to write grammar rules.
+* Recursion:: Writing recursive rules.
+* Semantics:: Semantic values and actions.
+* Tracking Locations:: Locations and actions.
+* Named References:: Using named references in actions.
+* Declarations:: All kinds of Bison declarations are described here.
+* Multiple Parsers:: Putting more than one Bison parser in one program.
Outline of a Bison Grammar
* Mid-Rule Actions:: Most actions go at the end of a rule.
This says when, why and how to use the exceptional
action in the middle of a rule.
-* Named References:: Using named references in actions.
Tracking Locations
* Type Decl:: Declaring the choice of type for a nonterminal symbol.
* Initial Action Decl:: Code run before parsing starts.
* Destructor Decl:: Declaring how symbols are freed.
+* Printer Decl:: Declaring how symbol values are displayed.
* Expect Decl:: Suppressing warnings about parsing conflicts.
* Start Decl:: Specifying the start symbol.
* Pure Decl:: Requesting a reentrant parser.
* Contextual Precedence:: When an operator's precedence depends on context.
* Parser States:: The parser is a finite-state-machine with stack.
* Reduce/Reduce:: When two rules are applicable in the same situation.
-* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
+* Mysterious Conflicts:: Conflicts that look unjustified.
+* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
* Memory Management:: What happens when memory is exhausted. How to avoid it.
* Precedence Examples:: How these features are used in the previous example.
* How Precedence:: How they work.
+Tuning LR
+
+* LR Table Construction:: Choose a different construction algorithm.
+* Default Reductions:: Disable default reductions.
+* LAC:: Correct lookahead sets in the parser states.
+* Unreachable States:: Keep unreachable parser states for debugging.
+
Handling Context Dependencies
* Semantic Tokens:: Token parsing can depend on the semantic context.
* Understanding:: Understanding the structure of your parser.
* Tracing:: Tracing the execution of your parser.
+Tracing Your Parser
+
+* Enabling Traces:: Activating run-time trace support
+* Mfcalc Traces:: Extending @code{mfcalc} to support traces
+* The YYPRINT Macro:: Obsolete interface for semantic value reports
+
Invoking Bison
* Bison Options:: All the options described in detail,
* C++ Scanner Interface:: Exchanges between yylex and parse
* A Complete C++ Example:: Demonstrating their use
+C++ Location Values
+
+* C++ position:: One point in the source file
+* C++ location:: Two points in the source file
+
A Complete C++ Example
* Calc++ --- C++ Calculator:: The specifications
the name of an identifier, etc.).
* Semantic Actions:: Each rule can have an action containing C code.
* GLR Parsers:: Writing parsers for general context-free languages.
-* Locations Overview:: Tracking Locations.
+* Locations:: Overview of location tracking.
* Bison Parser:: What are Bison's input and output,
how is the output used?
* Stages:: Stages in writing and running Bison grammars.
BNF is a context-free grammar. The input to Bison is
essentially machine-readable BNF.
-@cindex LALR(1) grammars
-@cindex IELR(1) grammars
-@cindex LR(1) grammars
-There are various important subclasses of context-free grammars.
-Although it can handle almost all context-free grammars, Bison is
-optimized for what are called LR(1) grammars.
-In brief, in these grammars, it must be possible to tell how to parse
-any portion of an input string with just a single token of lookahead.
-For historical reasons, Bison by default is limited by the additional
-restrictions of LALR(1), which is hard to explain simply.
-@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for
-more information on this.
-As an experimental feature, you can escape these additional restrictions by
-requesting IELR(1) or canonical LR(1) parser tables.
-@xref{%define Summary,,lr.type}, to learn how.
+@cindex LALR grammars
+@cindex IELR grammars
+@cindex LR grammars
+There are various important subclasses of context-free grammars. Although
+it can handle almost all context-free grammars, Bison is optimized for what
+are called LR(1) grammars. In brief, in these grammars, it must be possible
+to tell how to parse any portion of an input string with just a single token
+of lookahead. For historical reasons, Bison by default is limited by the
+additional restrictions of LALR(1), which is hard to explain simply.
+@xref{Mysterious Conflicts}, for more information on this. As an
+experimental feature, you can escape these additional restrictions by
+requesting IELR(1) or canonical LR(1) parser tables. @xref{LR Table
+Construction}, to learn how.
@cindex GLR parsing
@cindex generalized LR (GLR) parsing
Here is a simple C function subdivided into tokens:
-@ifinfo
@example
int /* @r{keyword `int'} */
square (int x) /* @r{identifier, open-paren, keyword `int',}
@r{identifier, semicolon} */
@} /* @r{close-brace} */
@end example
-@end ifinfo
-@ifnotinfo
-@example
-int /* @r{keyword `int'} */
-square (int x) /* @r{identifier, open-paren, keyword `int', identifier, close-paren} */
-@{ /* @r{open-brace} */
- return x * x; /* @r{keyword `return', identifier, asterisk, identifier, semicolon} */
-@} /* @r{close-brace} */
-@end example
-@end ifnotinfo
The syntactic groupings of C include the expression, the statement, the
declaration, and the function definition. These are represented in the
used in every rule.
@example
-stmt: RETURN expr ';'
- ;
+stmt: RETURN expr ';' ;
@end example
@noindent
two subexpressions:
@example
-expr: expr '+' expr @{ $$ = $1 + $3; @}
- ;
+expr: expr '+' expr @{ $$ = $1 + $3; @} ;
@end example
@noindent
%%
@group
-type_decl : TYPE ID '=' type ';'
- ;
+type_decl: TYPE ID '=' type ';' ;
@end group
@group
-type : '(' id_list ')'
- | expr DOTDOT expr
- ;
+type:
+ '(' id_list ')'
+| expr DOTDOT expr
+;
@end group
@group
-id_list : ID
- | id_list ',' ID
- ;
+id_list:
+ ID
+| id_list ',' ID
+;
@end group
@group
-expr : '(' expr ')'
- | expr '+' expr
- | expr '-' expr
- | expr '*' expr
- | expr '/' expr
- | ID
- ;
+expr:
+ '(' expr ')'
+| expr '+' expr
+| expr '-' expr
+| expr '*' expr
+| expr '/' expr
+| ID
+;
@end group
@end example
%%
-prog :
- | prog stmt @{ printf ("\n"); @}
- ;
+prog:
+ /* Nothing. */
+| prog stmt @{ printf ("\n"); @}
+;
-stmt : expr ';' %dprec 1
- | decl %dprec 2
- ;
+stmt:
+ expr ';' %dprec 1
+| decl %dprec 2
+;
-expr : ID @{ printf ("%s ", $$); @}
- | TYPENAME '(' expr ')'
- @{ printf ("%s <cast> ", $1); @}
- | expr '+' expr @{ printf ("+ "); @}
- | expr '=' expr @{ printf ("= "); @}
- ;
+expr:
+ ID @{ printf ("%s ", $$); @}
+| TYPENAME '(' expr ')'
+ @{ printf ("%s <cast> ", $1); @}
+| expr '+' expr @{ printf ("+ "); @}
+| expr '=' expr @{ printf ("= "); @}
+;
-decl : TYPENAME declarator ';'
- @{ printf ("%s <declare> ", $1); @}
- | TYPENAME declarator '=' expr ';'
- @{ printf ("%s <init-declare> ", $1); @}
- ;
+decl:
+ TYPENAME declarator ';'
+ @{ printf ("%s <declare> ", $1); @}
+| TYPENAME declarator '=' expr ';'
+ @{ printf ("%s <init-declare> ", $1); @}
+;
-declarator : ID @{ printf ("\"%s\" ", $1); @}
- | '(' declarator ')'
- ;
+declarator:
+ ID @{ printf ("\"%s\" ", $1); @}
+| '(' declarator ')'
+;
@end example
@noindent
follows:
@example
-stmt : expr ';' %merge <stmtMerge>
- | decl %merge <stmtMerge>
- ;
+stmt:
+ expr ';' %merge <stmtMerge>
+| decl %merge <stmtMerge>
+;
@end example
@noindent
initiate error recovery.
During deterministic GLR operation, the effect of @code{YYERROR} is
the same as its effect in a deterministic parser.
-The effect in a deferred action is similar, but the precise point of the
-error is undefined; instead, the parser reverts to deterministic operation,
+The effect in a deferred action is similar, but the precise point of the
+error is undefined; instead, the parser reverts to deterministic operation,
selecting an unspecified stack on which to continue with a syntax error.
In a semantic predicate (see @ref{Semantic Predicates}) during nondeterministic
parsing, @code{YYERROR} silently prunes
if there are alternative parses. (This feature is experimental and may
evolve. We welcome user feedback.) For example,
-@smallexample
-widget :
- %?@{ new_syntax @} "widget" id new_args @{ $$ = f($3, $4); @}
- | %?@{ !new_syntax @} "widget" id old_args @{ $$ = f($3, $4); @}
- ;
-@end smallexample
+@example
+widget:
+ %?@{ new_syntax @} "widget" id new_args @{ $$ = f($3, $4); @}
+| %?@{ !new_syntax @} "widget" id old_args @{ $$ = f($3, $4); @}
+;
+@end example
@noindent
-is one way to allow the same parser to handle two different syntaxes for
+is one way to allow the same parser to handle two different syntaxes for
widgets. The clause preceded by @code{%?} is treated like an ordinary
action, except that its text is treated as an expression and is always
-evaluated immediately (even when in nondeterministic mode). If the
+evaluated immediately (even when in nondeterministic mode). If the
expression yields 0 (false), the clause is treated as a syntax error,
-which, in a nondeterministic parser, causes the stack in which it is reduced
+which, in a nondeterministic parser, causes the stack in which it is reduced
to die. In a deterministic parser, it acts like YYERROR.
As the example shows, predicates otherwise look like semantic actions, and
There is a subtle difference between semantic predicates and ordinary
actions in nondeterministic mode, since the latter are deferred.
-For example, we could try to rewrite the previous example as
+For example, we could try to rewrite the previous example as
-@smallexample
-widget :
- @{ if (!new_syntax) YYERROR; @} "widget" id new_args @{ $$ = f($3, $4); @}
- | @{ if (new_syntax) YYERROR; @} "widget" id old_args @{ $$ = f($3, $4); @}
- ;
-@end smallexample
+@example
+widget:
+ @{ if (!new_syntax) YYERROR; @}
+ "widget" id new_args @{ $$ = f($3, $4); @}
+| @{ if (new_syntax) YYERROR; @}
+ "widget" id old_args @{ $$ = f($3, $4); @}
+;
+@end example
@noindent
(reversing the sense of the predicate tests to cause an error when they are
false). However, this
does @emph{not} have the same effect if @code{new_args} and @code{old_args}
have overlapping syntax.
-Since the mid-rule actions testing @code{new_syntax} are deferred,
+Since the mid-rule actions testing @code{new_syntax} are deferred,
a GLR parser first encounters the unresolved ambiguous reduction
for cases where @code{new_args} and @code{old_args} recognize the same string
@emph{before} performing the tests of @code{new_syntax}. It therefore
@example
%@{
- #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline
- #define inline
+ #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \
+ && ! defined inline)
+ # define inline
#endif
%@}
@end example
-@node Locations Overview
+@node Locations
@section Locations
@cindex location
@cindex textual location
Bison provides a mechanism for handling these locations.
Each token has a semantic value. In a similar fashion, each token has an
-associated location, but the type of locations is the same for all tokens and
-groupings. Moreover, the output parser is equipped with a default data
-structure for storing locations (@pxref{Locations}, for more details).
+associated location, but the type of locations is the same for all tokens
+and groupings. Moreover, the output parser is equipped with a default data
+structure for storing locations (@pxref{Tracking Locations}, for more
+details).
Like semantic values, locations can be reached in actions using a dedicated
set of constructs. In the example above, the location of the whole grouping
@cindex simple examples
@cindex examples, simple
-Now we show and explain three sample programs written using Bison: a
+Now we show and explain several sample programs written using Bison: a
reverse polish notation calculator, an algebraic (infix) notation
-calculator, and a multi-function calculator. All three have been tested
-under BSD Unix 4.3; each produces a usable, though limited, interactive
-desk-top calculator.
+calculator --- later extended to track ``locations'' ---
+and a multi-function calculator. All
+produce usable, though limited, interactive desk-top calculators.
These examples are simple, but Bison grammars for real programming
languages are written the same way. You can copy these examples into a
Here are the C and Bison declarations for the reverse polish notation
calculator. As in C, comments are placed between @samp{/*@dots{}*/}.
+@comment file: rpcalc.y
@example
/* Reverse polish notation calculator. */
%@{
#define YYSTYPE double
+ #include <stdio.h>
#include <math.h>
int yylex (void);
void yyerror (char const *);
Here are the grammar rules for the reverse polish notation calculator.
+@comment file: rpcalc.y
@example
-input: /* empty */
- | input line
+@group
+input:
+ /* empty */
+| input line
;
+@end group
-line: '\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
+@group
+line:
+ '\n'
+| exp '\n' @{ printf ("%.10g\n", $1); @}
;
+@end group
-exp: NUM @{ $$ = $1; @}
- | exp exp '+' @{ $$ = $1 + $2; @}
- | exp exp '-' @{ $$ = $1 - $2; @}
- | exp exp '*' @{ $$ = $1 * $2; @}
- | exp exp '/' @{ $$ = $1 / $2; @}
- /* Exponentiation */
- | exp exp '^' @{ $$ = pow ($1, $2); @}
- /* Unary minus */
- | exp 'n' @{ $$ = -$1; @}
+@group
+exp:
+ NUM @{ $$ = $1; @}
+| exp exp '+' @{ $$ = $1 + $2; @}
+| exp exp '-' @{ $$ = $1 - $2; @}
+| exp exp '*' @{ $$ = $1 * $2; @}
+| exp exp '/' @{ $$ = $1 / $2; @}
+| exp exp '^' @{ $$ = pow ($1, $2); @} /* Exponentiation */
+| exp 'n' @{ $$ = -$1; @} /* Unary minus */
;
+@end group
%%
@end example
rule are referred to as @code{$1}, @code{$2}, and so on.
@menu
-* Rpcalc Input::
-* Rpcalc Line::
-* Rpcalc Expr::
+* Rpcalc Input:: Explanation of the @code{input} nonterminal
+* Rpcalc Line:: Explanation of the @code{line} nonterminal
+* Rpcalc Expr:: Explanation of the @code{expr} nonterminal
@end menu
@node Rpcalc Input
Consider the definition of @code{input}:
@example
-input: /* empty */
- | input line
+input:
+ /* empty */
+| input line
;
@end example
Now consider the definition of @code{line}:
@example
-line: '\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
+line:
+ '\n'
+| exp '\n' @{ printf ("%.10g\n", $1); @}
;
@end example
followed by a plus-sign. The third handles subtraction, and so on.
@example
-exp: NUM
- | exp exp '+' @{ $$ = $1 + $2; @}
- | exp exp '-' @{ $$ = $1 - $2; @}
- @dots{}
- ;
+exp:
+ NUM
+| exp exp '+' @{ $$ = $1 + $2; @}
+| exp exp '-' @{ $$ = $1 - $2; @}
+@dots{}
+;
@end example
We have used @samp{|} to join all the rules for @code{exp}, but we could
equally well have written them separately:
@example
-exp: NUM ;
-exp: exp exp '+' @{ $$ = $1 + $2; @} ;
-exp: exp exp '-' @{ $$ = $1 - $2; @} ;
- @dots{}
+exp: NUM ;
+exp: exp exp '+' @{ $$ = $1 + $2; @};
+exp: exp exp '-' @{ $$ = $1 - $2; @};
+@dots{}
@end example
Most of the rules have actions that compute the value of the expression in
For example, this:
@example
-exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ;
+exp: NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ;
@end example
@noindent
means the same thing as this:
@example
-exp: NUM
- | exp exp '+' @{ $$ = $1 + $2; @}
- | @dots{}
+exp:
+ NUM
+| exp exp '+' @{ $$ = $1 + $2; @}
+| @dots{}
;
@end example
Here is the code for the lexical analyzer:
+@comment file: rpcalc.y
@example
@group
/* The lexical analyzer returns a double floating point
/* Skip white space. */
while ((c = getchar ()) == ' ' || c == '\t')
- ;
+ continue;
@end group
@group
/* Process numbers. */
kept to the bare minimum. The only requirement is that it call
@code{yyparse} to start the process of parsing.
+@comment file: rpcalc.y
@example
@group
int
@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
here is the definition we will use:
+@comment file: rpcalc.y
@example
@group
#include <stdio.h>
+@end group
+@group
/* Called by yyparse on error. */
void
yyerror (char const *s)
@example
$ @kbd{rpcalc}
@kbd{4 9 +}
-13
+@result{} 13
@kbd{3 7 + 3 4 5 *+-}
--13
+@result{} -13
@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
-13
+@result{} 13
@kbd{5 6 / 4 n +}
--3.166666667
+@result{} -3.166666667
@kbd{3 4 ^} @r{Exponentiation}
-81
+@result{} 81
@kbd{^D} @r{End-of-file indicator}
$
@end example
@example
/* Infix notation calculator. */
+@group
%@{
#define YYSTYPE double
#include <math.h>
int yylex (void);
void yyerror (char const *);
%@}
+@end group
+@group
/* Bison declarations. */
%token NUM
%left '-' '+'
%left '*' '/'
%precedence NEG /* negation--unary minus */
%right '^' /* exponentiation */
+@end group
%% /* The grammar follows. */
-input: /* empty */
- | input line
+@group
+input:
+ /* empty */
+| input line
;
+@end group
-line: '\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
+@group
+line:
+ '\n'
+| exp '\n' @{ printf ("\t%.10g\n", $1); @}
;
+@end group
-exp: NUM @{ $$ = $1; @}
- | exp '+' exp @{ $$ = $1 + $3; @}
- | exp '-' exp @{ $$ = $1 - $3; @}
- | exp '*' exp @{ $$ = $1 * $3; @}
- | exp '/' exp @{ $$ = $1 / $3; @}
- | '-' exp %prec NEG @{ $$ = -$2; @}
- | exp '^' exp @{ $$ = pow ($1, $3); @}
- | '(' exp ')' @{ $$ = $2; @}
+@group
+exp:
+ NUM @{ $$ = $1; @}
+| exp '+' exp @{ $$ = $1 + $3; @}
+| exp '-' exp @{ $$ = $1 - $3; @}
+| exp '*' exp @{ $$ = $1 * $3; @}
+| exp '/' exp @{ $$ = $1 / $3; @}
+| '-' exp %prec NEG @{ $$ = -$2; @}
+| exp '^' exp @{ $$ = pow ($1, $3); @}
+| '(' exp ')' @{ $$ = $2; @}
;
+@end group
%%
@end example
@example
@group
-line: '\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
- | error '\n' @{ yyerrok; @}
+line:
+ '\n'
+| exp '\n' @{ printf ("\t%.10g\n", $1); @}
+| error '\n' @{ yyerrok; @}
;
@end group
@end example
@example
@group
-input : /* empty */
- | input line
+input:
+ /* empty */
+| input line
;
@end group
@group
-line : '\n'
- | exp '\n' @{ printf ("%d\n", $1); @}
+line:
+ '\n'
+| exp '\n' @{ printf ("%d\n", $1); @}
;
@end group
@group
-exp : NUM @{ $$ = $1; @}
- | exp '+' exp @{ $$ = $1 + $3; @}
- | exp '-' exp @{ $$ = $1 - $3; @}
- | exp '*' exp @{ $$ = $1 * $3; @}
+exp:
+ NUM @{ $$ = $1; @}
+| exp '+' exp @{ $$ = $1 + $3; @}
+| exp '-' exp @{ $$ = $1 - $3; @}
+| exp '*' exp @{ $$ = $1 * $3; @}
@end group
@group
- | exp '/' exp
- @{
- if ($3)
- $$ = $1 / $3;
- else
- @{
- $$ = 1;
- fprintf (stderr, "%d.%d-%d.%d: division by zero",
- @@3.first_line, @@3.first_column,
- @@3.last_line, @@3.last_column);
- @}
- @}
+| exp '/' exp
+ @{
+ if ($3)
+ $$ = $1 / $3;
+ else
+ @{
+ $$ = 1;
+ fprintf (stderr, "%d.%d-%d.%d: division by zero",
+ @@3.first_line, @@3.first_column,
+ @@3.last_line, @@3.last_column);
+ @}
+ @}
@end group
@group
- | '-' exp %prec NEG @{ $$ = -$2; @}
- | exp '^' exp @{ $$ = pow ($1, $3); @}
- | '(' exp ')' @{ $$ = $2; @}
+| '-' exp %prec NEG @{ $$ = -$2; @}
+| exp '^' exp @{ $$ = pow ($1, $3); @}
+| '(' exp ')' @{ $$ = $2; @}
@end group
@end example
if (c == EOF)
return 0;
+@group
/* Return a single char, and update location. */
if (c == '\n')
@{
++yylloc.last_column;
return c;
@}
+@end group
@end example
Basically, the lexical analyzer performs the same processing as before:
Here is a sample session with the multi-function calculator:
@example
+@group
$ @kbd{mfcalc}
@kbd{pi = 3.141592653589}
-3.1415926536
+@result{} 3.1415926536
+@end group
+@group
@kbd{sin(pi)}
-0.0000000000
+@result{} 0.0000000000
+@end group
@kbd{alpha = beta1 = 2.3}
-2.3000000000
+@result{} 2.3000000000
@kbd{alpha}
-2.3000000000
+@result{} 2.3000000000
@kbd{ln(alpha)}
-0.8329091229
+@result{} 0.8329091229
@kbd{exp(ln(beta1))}
-2.3000000000
+@result{} 2.3000000000
$
@end example
* Mfcalc Declarations:: Bison declarations for multi-function calculator.
* Mfcalc Rules:: Grammar rules for the calculator.
* Mfcalc Symbol Table:: Symbol table management subroutines.
+* Mfcalc Lexer:: The lexical analyzer.
+* Mfcalc Main:: The controlling function.
@end menu
@node Mfcalc Declarations
Here are the C and Bison declarations for the multi-function calculator.
-@smallexample
+@comment file: mfcalc.y: 1
+@example
@group
%@{
- #include <math.h> /* For math functions, cos(), sin(), etc. */
- #include "calc.h" /* Contains definition of `symrec'. */
+ #include <stdio.h> /* For printf, etc. */
+ #include <math.h> /* For pow, used in the grammar. */
+ #include "calc.h" /* Contains definition of `symrec'. */
int yylex (void);
void yyerror (char const *);
%@}
@end group
+
@group
%union @{
double val; /* For returning numbers. */
@}
@end group
%token <val> NUM /* Simple double precision number. */
-%token <tptr> VAR FNCT /* Variable and Function. */
+%token <tptr> VAR FNCT /* Variable and function. */
%type <val> exp
@group
%precedence NEG /* negation--unary minus */
%right '^' /* exponentiation */
@end group
-%% /* The grammar follows. */
-@end smallexample
+@end example
The above grammar introduces only two new features of the Bison language.
These features allow semantic values to have various data types
Most of them are copied directly from @code{calc}; three rules,
those which mention @code{VAR} or @code{FNCT}, are new.
-@smallexample
+@comment file: mfcalc.y: 3
+@example
+%% /* The grammar follows. */
@group
-input: /* empty */
- | input line
+input:
+ /* empty */
+| input line
;
@end group
@group
line:
- '\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
- | error '\n' @{ yyerrok; @}
+ '\n'
+| exp '\n' @{ printf ("%.10g\n", $1); @}
+| error '\n' @{ yyerrok; @}
;
@end group
@group
-exp: NUM @{ $$ = $1; @}
- | VAR @{ $$ = $1->value.var; @}
- | VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
- | FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
- | exp '+' exp @{ $$ = $1 + $3; @}
- | exp '-' exp @{ $$ = $1 - $3; @}
- | exp '*' exp @{ $$ = $1 * $3; @}
- | exp '/' exp @{ $$ = $1 / $3; @}
- | '-' exp %prec NEG @{ $$ = -$2; @}
- | exp '^' exp @{ $$ = pow ($1, $3); @}
- | '(' exp ')' @{ $$ = $2; @}
+exp:
+ NUM @{ $$ = $1; @}
+| VAR @{ $$ = $1->value.var; @}
+| VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
+| FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
+| exp '+' exp @{ $$ = $1 + $3; @}
+| exp '-' exp @{ $$ = $1 - $3; @}
+| exp '*' exp @{ $$ = $1 * $3; @}
+| exp '/' exp @{ $$ = $1 / $3; @}
+| '-' exp %prec NEG @{ $$ = -$2; @}
+| exp '^' exp @{ $$ = pow ($1, $3); @}
+| '(' exp ')' @{ $$ = $2; @}
;
@end group
/* End of grammar. */
%%
-@end smallexample
+@end example
@node Mfcalc Symbol Table
@subsection The @code{mfcalc} Symbol Table
definition, which is kept in the header @file{calc.h}, is as follows. It
provides for either functions or variables to be placed in the table.
-@smallexample
+@comment file: calc.h
+@example
@group
/* Function type. */
typedef double (*func_t) (double);
symrec *putsym (char const *, int);
symrec *getsym (char const *);
@end group
-@end smallexample
-
-The new version of @code{main} includes a call to @code{init_table}, a
-function that initializes the symbol table. Here it is, and
-@code{init_table} as well:
-
-@smallexample
-#include <stdio.h>
+@end example
-@group
-/* Called by yyparse on error. */
-void
-yyerror (char const *s)
-@{
- printf ("%s\n", s);
-@}
-@end group
+The new version of @code{main} will call @code{init_table} to initialize
+the symbol table:
+@comment file: mfcalc.y: 3
+@example
@group
struct init
@{
@group
struct init const arith_fncts[] =
@{
- "sin", sin,
- "cos", cos,
- "atan", atan,
- "ln", log,
- "exp", exp,
- "sqrt", sqrt,
- 0, 0
+ @{ "atan", atan @},
+ @{ "cos", cos @},
+ @{ "exp", exp @},
+ @{ "ln", log @},
+ @{ "sin", sin @},
+ @{ "sqrt", sqrt @},
+ @{ 0, 0 @},
@};
@end group
@group
/* Put arithmetic functions in table. */
+static
void
init_table (void)
@{
int i;
- symrec *ptr;
for (i = 0; arith_fncts[i].fname != 0; i++)
@{
- ptr = putsym (arith_fncts[i].fname, FNCT);
+ symrec *ptr = putsym (arith_fncts[i].fname, FNCT);
ptr->value.fnctptr = arith_fncts[i].fnct;
@}
@}
@end group
-
-@group
-int
-main (void)
-@{
- init_table ();
- return yyparse ();
-@}
-@end group
-@end smallexample
+@end example
By simply editing the initialization list and adding the necessary include
files, you can add additional functions to the calculator.
The function @code{getsym} is passed the name of the symbol to look up. If
found, a pointer to that symbol is returned; otherwise zero is returned.
-@smallexample
+@comment file: mfcalc.y: 3
+@example
+#include <stdlib.h> /* malloc. */
+#include <string.h> /* strlen. */
+
+@group
symrec *
putsym (char const *sym_name, int sym_type)
@{
- symrec *ptr;
- ptr = (symrec *) malloc (sizeof (symrec));
+ symrec *ptr = (symrec *) malloc (sizeof (symrec));
ptr->name = (char *) malloc (strlen (sym_name) + 1);
strcpy (ptr->name,sym_name);
ptr->type = sym_type;
sym_table = ptr;
return ptr;
@}
+@end group
+@group
symrec *
getsym (char const *sym_name)
@{
symrec *ptr;
for (ptr = sym_table; ptr != (symrec *) 0;
ptr = (symrec *)ptr->next)
- if (strcmp (ptr->name,sym_name) == 0)
+ if (strcmp (ptr->name, sym_name) == 0)
return ptr;
return 0;
@}
-@end smallexample
+@end group
+@end example
+
+@node Mfcalc Lexer
+@subsection The @code{mfcalc} Lexer
The function @code{yylex} must now recognize variables, numeric values, and
the single-character arithmetic operators. Strings of alphanumeric
No change is needed in the handling of numeric values and arithmetic
operators in @code{yylex}.
-@smallexample
+@comment file: mfcalc.y: 3
+@example
@group
#include <ctype.h>
@end group
int c;
/* Ignore white space, get first nonwhite character. */
- while ((c = getchar ()) == ' ' || c == '\t');
+ while ((c = getchar ()) == ' ' || c == '\t')
+ continue;
if (c == EOF)
return 0;
/* Char starts an identifier => read the name. */
if (isalpha (c))
@{
- symrec *s;
+ /* Initially make the buffer long enough
+ for a 40-character symbol name. */
+ static size_t length = 40;
static char *symbuf = 0;
- static int length = 0;
+ symrec *s;
int i;
@end group
-
-@group
- /* Initially make the buffer long enough
- for a 40-character symbol name. */
- if (length == 0)
- length = 40, symbuf = (char *)malloc (length + 1);
+ if (!symbuf)
+ symbuf = (char *) malloc (length + 1);
i = 0;
do
-@end group
@group
@{
/* If buffer is full, make it bigger. */
return c;
@}
@end group
-@end smallexample
+@end example
+
+@node Mfcalc Main
+@subsection The @code{mfcalc} Main
+
+The error reporting function is unchanged, and the new version of
+@code{main} includes a call to @code{init_table} and sets the @code{yydebug}
+on user demand (@xref{Tracing, , Tracing Your Parser}, for details):
+
+@comment file: mfcalc.y: 3
+@example
+@group
+/* Called by yyparse on error. */
+void
+yyerror (char const *s)
+@{
+ fprintf (stderr, "%s\n", s);
+@}
+@end group
+
+@group
+int
+main (int argc, char const* argv[])
+@{
+ int i;
+ /* Enable parse traces on option -p. */
+ for (i = 1; i < argc; ++i)
+ if (!strcmp(argv[i], "-p"))
+ yydebug = 1;
+ init_table ();
+ return yyparse ();
+@}
+@end group
+@end example
This program is both powerful and flexible. You may easily add new
functions, and it is a simple job to modify this code to install
@xref{Invocation, ,Invoking Bison}.
@menu
-* Grammar Outline:: Overall layout of the grammar file.
-* Symbols:: Terminal and nonterminal symbols.
-* Rules:: How to write grammar rules.
-* Recursion:: Writing recursive rules.
-* Semantics:: Semantic values and actions.
-* Locations:: Locations and actions.
-* Declarations:: All kinds of Bison declarations are described here.
-* Multiple Parsers:: Putting more than one Bison parser in one program.
+* Grammar Outline:: Overall layout of the grammar file.
+* Symbols:: Terminal and nonterminal symbols.
+* Rules:: How to write grammar rules.
+* Recursion:: Writing recursive rules.
+* Semantics:: Semantic values and actions.
+* Tracking Locations:: Locations and actions.
+* Named References:: Using named references in actions.
+* Declarations:: All kinds of Bison declarations are described here.
+* Multiple Parsers:: Putting more than one Bison parser in one program.
@end menu
@node Grammar Outline
can be done with two @var{Prologue} blocks, one before and one after the
@code{%union} declaration.
-@smallexample
+@example
%@{
#define _GNU_SOURCE
#include <stdio.h>
%@}
@dots{}
-@end smallexample
+@end example
When in doubt, it is usually safer to put prologue code before all
Bison declarations, rather than after. For example, any definitions
Look again at the example of the previous section:
-@smallexample
+@example
%@{
#define _GNU_SOURCE
#include <stdio.h>
%@}
@dots{}
-@end smallexample
+@end example
@noindent
Notice that there are two @var{Prologue} sections here, but there's a
Let's go ahead and add the new @code{YYLTYPE} definition and the
@code{trace_token} prototype at the same time:
-@smallexample
+@example
%code top @{
#define _GNU_SOURCE
#include <stdio.h>
@}
@dots{}
-@end smallexample
+@end example
@noindent
In this way, @code{%code top} and the unqualified @code{%code} achieve the same
definitions.
Thus, they belong in one or more @code{%code requires}:
-@smallexample
+@example
+@group
%code top @{
#define _GNU_SOURCE
#include <stdio.h>
@}
+@end group
+@group
%code requires @{
#include "ptypes.h"
@}
+@end group
+@group
%union @{
long int n;
tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
@}
+@end group
+@group
%code requires @{
#define YYLTYPE YYLTYPE
typedef struct YYLTYPE
char *filename;
@} YYLTYPE;
@}
+@end group
+@group
%code @{
static void print_token_value (FILE *, int, YYSTYPE);
#define YYPRINT(F, N, L) print_token_value (F, N, L)
static void trace_token (enum yytokentype token, YYLTYPE loc);
@}
+@end group
@dots{}
-@end smallexample
+@end example
@noindent
Now Bison will insert @code{#include "ptypes.h"} and the new
sufficient. Instead, move its prototype from the unqualified
@code{%code} to a @code{%code provides}:
-@smallexample
+@example
+@group
%code top @{
#define _GNU_SOURCE
#include <stdio.h>
@}
+@end group
+@group
%code requires @{
#include "ptypes.h"
@}
+@end group
+@group
%union @{
long int n;
tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
@}
+@end group
+@group
%code requires @{
#define YYLTYPE YYLTYPE
typedef struct YYLTYPE
char *filename;
@} YYLTYPE;
@}
+@end group
+@group
%code provides @{
void trace_token (enum yytokentype token, YYLTYPE loc);
@}
+@end group
+@group
%code @{
static void print_token_value (FILE *, int, YYSTYPE);
#define YYPRINT(F, N, L) print_token_value (F, N, L)
@}
+@end group
@dots{}
-@end smallexample
+@end example
@noindent
Bison will insert the @code{trace_token} prototype into both the
For example, you may organize semantic-type-related directives by semantic
type:
-@smallexample
+@example
+@group
%code requires @{ #include "type1.h" @}
%union @{ type1 field1; @}
%destructor @{ type1_free ($$); @} <field1>
%printer @{ type1_print ($$); @} <field1>
+@end group
+@group
%code requires @{ #include "type2.h" @}
%union @{ type2 field2; @}
%destructor @{ type2_free ($$); @} <field2>
%printer @{ type2_print ($$); @} <field2>
-@end smallexample
+@end group
+@end example
@noindent
You could even place each of the above directive groups in the rules section of
@example
@group
-@var{result}: @var{components}@dots{}
- ;
+@var{result}: @var{components}@dots{};
@end group
@end example
@example
@group
-exp: exp '+' exp
- ;
+exp: exp '+' exp;
@end group
@end example
@example
@group
-@var{result}: @var{rule1-components}@dots{}
- | @var{rule2-components}@dots{}
- @dots{}
- ;
+@var{result}:
+ @var{rule1-components}@dots{}
+| @var{rule2-components}@dots{}
+@dots{}
+;
@end group
@end example
@example
@group
-expseq: /* empty */
- | expseq1
- ;
+expseq:
+ /* empty */
+| expseq1
+;
@end group
@group
-expseq1: exp
- | expseq1 ',' exp
- ;
+expseq1:
+ exp
+| expseq1 ',' exp
+;
@end group
@end example
@example
@group
-expseq1: exp
- | expseq1 ',' exp
- ;
+expseq1:
+ exp
+| expseq1 ',' exp
+;
@end group
@end example
@example
@group
-expseq1: exp
- | exp ',' expseq1
- ;
+expseq1:
+ exp
+| exp ',' expseq1
+;
@end group
@end example
@example
@group
-expr: primary
- | primary '+' primary
- ;
+expr:
+ primary
+| primary '+' primary
+;
@end group
@group
-primary: constant
- | '(' expr ')'
- ;
+primary:
+ constant
+| '(' expr ')'
+;
@end group
@end example
* Mid-Rule Actions:: Most actions go at the end of a rule.
This says when, why and how to use the exceptional
action in the middle of a rule.
-* Named References:: Using named references in actions.
@end menu
@node Value Type
@example
@group
-exp: @dots{}
- | exp '+' exp
- @{ $$ = $1 + $3; @}
+exp:
+@dots{}
+| exp '+' exp @{ $$ = $1 + $3; @}
@end group
@end example
@example
@group
-exp[result]: @dots{}
- | exp[left] '+' exp[right]
- @{ $result = $left + $right; @}
+exp[result]:
+@dots{}
+| exp[left] '+' exp[right] @{ $result = $left + $right; @}
@end group
@end example
useful semantic value associated with the @samp{+} token, it could be
referred to as @code{$2}.
-@xref{Named References,,Using Named References}, for more information
-about using the named references construct.
+@xref{Named References}, for more information about using the named
+references construct.
Note that the vertical-bar character @samp{|} is really a rule
separator, and actions are attached to a single rule. This is a
@example
@group
-foo: expr bar '+' expr @{ @dots{} @}
- | expr bar '-' expr @{ @dots{} @}
- ;
+foo:
+ expr bar '+' expr @{ @dots{} @}
+| expr bar '-' expr @{ @dots{} @}
+;
@end group
@group
-bar: /* empty */
- @{ previous_expr = $0; @}
- ;
+bar:
+ /* empty */ @{ previous_expr = $0; @}
+;
@end group
@end example
@example
@group
-exp: @dots{}
- | exp '+' exp
- @{ $$ = $1 + $3; @}
+exp:
+ @dots{}
+| exp '+' exp @{ $$ = $1 + $3; @}
@end group
@end example
@example
@group
-stmt: LET '(' var ')'
- @{ $<context>$ = push_context ();
- declare_variable ($3); @}
- stmt @{ $$ = $6;
- pop_context ($<context>5); @}
+stmt:
+ LET '(' var ')'
+ @{ $<context>$ = push_context (); declare_variable ($3); @}
+ stmt
+ @{ $$ = $6; pop_context ($<context>5); @}
@end group
@end example
%%
-stmt: let stmt
- @{ $$ = $2;
- pop_context ($1); @}
- ;
+stmt:
+ let stmt
+ @{
+ $$ = $2;
+ pop_context ($1);
+ @};
-let: LET '(' var ')'
- @{ $$ = push_context ();
- declare_variable ($3); @}
- ;
+let:
+ LET '(' var ')'
+ @{
+ $$ = push_context ();
+ declare_variable ($3);
+ @};
@end group
@end example
@example
@group
-compound: '@{' declarations statements '@}'
- | '@{' statements '@}'
- ;
+compound:
+ '@{' declarations statements '@}'
+| '@{' statements '@}'
+;
@end group
@end example
@example
@group
-compound: @{ prepare_for_local_variables (); @}
- '@{' declarations statements '@}'
+compound:
+ @{ prepare_for_local_variables (); @}
+ '@{' declarations statements '@}'
@end group
@group
- | '@{' statements '@}'
- ;
+| '@{' statements '@}'
+;
@end group
@end example
@example
@group
-compound: @{ prepare_for_local_variables (); @}
- '@{' declarations statements '@}'
- | @{ prepare_for_local_variables (); @}
- '@{' statements '@}'
- ;
+compound:
+ @{ prepare_for_local_variables (); @}
+ '@{' declarations statements '@}'
+| @{ prepare_for_local_variables (); @}
+ '@{' statements '@}'
+;
@end group
@end example
@example
@group
-compound: '@{' @{ prepare_for_local_variables (); @}
- declarations statements '@}'
- | '@{' statements '@}'
- ;
+compound:
+ '@{' @{ prepare_for_local_variables (); @}
+ declarations statements '@}'
+| '@{' statements '@}'
+;
@end group
@end example
@example
@group
-subroutine: /* empty */
- @{ prepare_for_local_variables (); @}
- ;
-
+subroutine:
+ /* empty */ @{ prepare_for_local_variables (); @}
+;
@end group
@group
-compound: subroutine
- '@{' declarations statements '@}'
- | subroutine
- '@{' statements '@}'
- ;
+compound:
+ subroutine '@{' declarations statements '@}'
+| subroutine '@{' statements '@}'
+;
@end group
@end example
Now Bison can execute the action in the rule for @code{subroutine} without
deciding which rule for @code{compound} it will eventually use.
-@node Named References
-@subsection Using Named References
-@cindex named references
-
-While every semantic value can be accessed with positional references
-@code{$@var{n}} and @code{$$}, it's often much more convenient to refer to
-them by name. First of all, original symbol names may be used as named
-references. For example:
+@node Tracking Locations
+@section Tracking Locations
+@cindex location
+@cindex textual location
+@cindex location, textual
-@example
-@group
-invocation: op '(' args ')'
- @{ $invocation = new_invocation ($op, $args, @@invocation); @}
-@end group
-@end example
+Though grammar rules and semantic actions are enough to write a fully
+functional parser, it can be useful to process some additional information,
+especially symbol locations.
-@noindent
-The positional @code{$$}, @code{@@$}, @code{$n}, and @code{@@n} can be
-mixed with @code{$name} and @code{@@name} arbitrarily. For example:
+The way locations are handled is defined by providing a data type, and
+actions to take when rules are matched.
-@example
-@group
-invocation: op '(' args ')'
- @{ $$ = new_invocation ($op, $args, @@$); @}
-@end group
-@end example
+@menu
+* Location Type:: Specifying a data type for locations.
+* Actions and Locations:: Using locations in actions.
+* Location Default Action:: Defining a general way to compute locations.
+@end menu
-@noindent
-However, sometimes regular symbol names are not sufficient due to
-ambiguities:
+@node Location Type
+@subsection Data Type of Locations
+@cindex data type of locations
+@cindex default location type
-@example
-@group
-exp: exp '/' exp
- @{ $exp = $exp / $exp; @} // $exp is ambiguous.
-
-exp: exp '/' exp
- @{ $$ = $1 / $exp; @} // One usage is ambiguous.
-
-exp: exp '/' exp
- @{ $$ = $1 / $3; @} // No error.
-@end group
-@end example
-
-@noindent
-When ambiguity occurs, explicitly declared names may be used for values and
-locations. Explicit names are declared as a bracketed name after a symbol
-appearance in rule definitions. For example:
-@example
-@group
-exp[result]: exp[left] '/' exp[right]
- @{ $result = $left / $right; @}
-@end group
-@end example
-
-@noindent
-Explicit names may be declared for RHS and for LHS symbols as well. In order
-to access a semantic value generated by a mid-rule action, an explicit name
-may also be declared by putting a bracketed name after the closing brace of
-the mid-rule action code:
-@example
-@group
-exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
- @{ $res = $left + $right; @}
-@end group
-@end example
-
-@noindent
-
-In references, in order to specify names containing dots and dashes, an explicit
-bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
-@example
-@group
-if-stmt: IF '(' expr ')' THEN then.stmt ';'
- @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
-@end group
-@end example
-
-It often happens that named references are followed by a dot, dash or other
-C punctuation marks and operators. By default, Bison will read
-@code{$name.suffix} as a reference to symbol value @code{$name} followed by
-@samp{.suffix}, i.e., an access to the @samp{suffix} field of the semantic
-value. In order to force Bison to recognize @code{name.suffix} in its entirety
-as the name of a semantic value, bracketed syntax @code{$[name.suffix]}
-must be used.
-
-
-@node Locations
-@section Tracking Locations
-@cindex location
-@cindex textual location
-@cindex location, textual
-
-Though grammar rules and semantic actions are enough to write a fully
-functional parser, it can be useful to process some additional information,
-especially symbol locations.
-
-The way locations are handled is defined by providing a data type, and
-actions to take when rules are matched.
-
-@menu
-* Location Type:: Specifying a data type for locations.
-* Actions and Locations:: Using locations in actions.
-* Location Default Action:: Defining a general way to compute locations.
-@end menu
-
-@node Location Type
-@subsection Data Type of Locations
-@cindex data type of locations
-@cindex default location type
-
-Defining a data type for locations is much simpler than for semantic values,
-since all tokens and groupings always use the same type.
+Defining a data type for locations is much simpler than for semantic values,
+since all tokens and groupings always use the same type.
You can specify the type of locations by defining a macro called
@code{YYLTYPE}, just as you can specify the semantic value type by
In addition, the named references construct @code{@@@var{name}} and
@code{@@[@var{name}]} may also be used to address the symbol locations.
-@xref{Named References,,Using Named References}, for more information
-about using the named references construct.
+@xref{Named References}, for more information about using the named
+references construct.
Here is a basic example using the default data type for locations:
@example
@group
-exp: @dots{}
- | exp '/' exp
- @{
- @@$.first_column = @@1.first_column;
- @@$.first_line = @@1.first_line;
- @@$.last_column = @@3.last_column;
- @@$.last_line = @@3.last_line;
- if ($3)
- $$ = $1 / $3;
- else
- @{
- $$ = 1;
- fprintf (stderr,
- "Division by zero, l%d,c%d-l%d,c%d",
- @@3.first_line, @@3.first_column,
- @@3.last_line, @@3.last_column);
- @}
- @}
+exp:
+ @dots{}
+| exp '/' exp
+ @{
+ @@$.first_column = @@1.first_column;
+ @@$.first_line = @@1.first_line;
+ @@$.last_column = @@3.last_column;
+ @@$.last_line = @@3.last_line;
+ if ($3)
+ $$ = $1 / $3;
+ else
+ @{
+ $$ = 1;
+ fprintf (stderr,
+ "Division by zero, l%d,c%d-l%d,c%d",
+ @@3.first_line, @@3.first_column,
+ @@3.last_line, @@3.last_column);
+ @}
+ @}
@end group
@end example
@example
@group
-exp: @dots{}
- | exp '/' exp
- @{
- if ($3)
- $$ = $1 / $3;
- else
- @{
- $$ = 1;
- fprintf (stderr,
- "Division by zero, l%d,c%d-l%d,c%d",
- @@3.first_line, @@3.first_column,
- @@3.last_line, @@3.last_column);
- @}
- @}
+exp:
+ @dots{}
+| exp '/' exp
+ @{
+ if ($3)
+ $$ = $1 / $3;
+ else
+ @{
+ $$ = 1;
+ fprintf (stderr,
+ "Division by zero, l%d,c%d-l%d,c%d",
+ @@3.first_line, @@3.first_column,
+ @@3.last_line, @@3.last_column);
+ @}
+ @}
@end group
@end example
By default, @code{YYLLOC_DEFAULT} is defined this way:
-@smallexample
-@group
-# define YYLLOC_DEFAULT(Current, Rhs, N) \
- do \
- if (N) \
- @{ \
- (Current).first_line = YYRHSLOC(Rhs, 1).first_line; \
- (Current).first_column = YYRHSLOC(Rhs, 1).first_column; \
- (Current).last_line = YYRHSLOC(Rhs, N).last_line; \
- (Current).last_column = YYRHSLOC(Rhs, N).last_column; \
- @} \
- else \
- @{ \
- (Current).first_line = (Current).last_line = \
- YYRHSLOC(Rhs, 0).last_line; \
- (Current).first_column = (Current).last_column = \
- YYRHSLOC(Rhs, 0).last_column; \
- @} \
- while (0)
-@end group
-@end smallexample
+@example
+@group
+# define YYLLOC_DEFAULT(Cur, Rhs, N) \
+do \
+ if (N) \
+ @{ \
+ (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \
+ (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \
+ (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \
+ (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \
+ @} \
+ else \
+ @{ \
+ (Cur).first_line = (Cur).last_line = \
+ YYRHSLOC(Rhs, 0).last_line; \
+ (Cur).first_column = (Cur).last_column = \
+ YYRHSLOC(Rhs, 0).last_column; \
+ @} \
+while (0)
+@end group
+@end example
+@noindent
where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol
in @var{rhs} when @var{k} is positive, and the location of the symbol
just before the reduction when @var{k} and @var{n} are both zero.
statement when it is followed by a semicolon.
@end itemize
+@node Named References
+@section Named References
+@cindex named references
+
+As described in the preceding sections, the traditional way to refer to any
+semantic value or location is a @dfn{positional reference}, which takes the
+form @code{$@var{n}}, @code{$$}, @code{@@@var{n}}, and @code{@@$}. However,
+such a reference is not very descriptive. Moreover, if you later decide to
+insert or remove symbols in the right-hand side of a grammar rule, the need
+to renumber such references can be tedious and error-prone.
+
+To avoid these issues, you can also refer to a semantic value or location
+using a @dfn{named reference}. First of all, original symbol names may be
+used as named references. For example:
+
+@example
+@group
+invocation: op '(' args ')'
+ @{ $invocation = new_invocation ($op, $args, @@invocation); @}
+@end group
+@end example
+
+@noindent
+Positional and named references can be mixed arbitrarily. For example:
+
+@example
+@group
+invocation: op '(' args ')'
+ @{ $$ = new_invocation ($op, $args, @@$); @}
+@end group
+@end example
+
+@noindent
+However, sometimes regular symbol names are not sufficient due to
+ambiguities:
+
+@example
+@group
+exp: exp '/' exp
+ @{ $exp = $exp / $exp; @} // $exp is ambiguous.
+
+exp: exp '/' exp
+ @{ $$ = $1 / $exp; @} // One usage is ambiguous.
+
+exp: exp '/' exp
+ @{ $$ = $1 / $3; @} // No error.
+@end group
+@end example
+
+@noindent
+When ambiguity occurs, explicitly declared names may be used for values and
+locations. Explicit names are declared as a bracketed name after a symbol
+appearance in rule definitions. For example:
+@example
+@group
+exp[result]: exp[left] '/' exp[right]
+ @{ $result = $left / $right; @}
+@end group
+@end example
+
+@noindent
+In order to access a semantic value generated by a mid-rule action, an
+explicit name may also be declared by putting a bracketed name after the
+closing brace of the mid-rule action code:
+@example
+@group
+exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
+ @{ $res = $left + $right; @}
+@end group
+@end example
+
+@noindent
+
+In references, in order to specify names containing dots and dashes, an explicit
+bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
+@example
+@group
+if-stmt: "if" '(' expr ')' "then" then.stmt ';'
+ @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
+@end group
+@end example
+
+It often happens that named references are followed by a dot, dash or other
+C punctuation marks and operators. By default, Bison will read
+@samp{$name.suffix} as a reference to symbol value @code{$name} followed by
+@samp{.suffix}, i.e., an access to the @code{suffix} field of the semantic
+value. In order to force Bison to recognize @samp{name.suffix} in its
+entirety as the name of a semantic value, the bracketed syntax
+@samp{$[name.suffix]} must be used.
+
+The named references feature is experimental. More user feedback will help
+to stabilize it.
+
@node Declarations
@section Bison Declarations
@cindex declarations, Bison
* Type Decl:: Declaring the choice of type for a nonterminal symbol.
* Initial Action Decl:: Code run before parsing starts.
* Destructor Decl:: Declaring how symbols are freed.
+* Printer Decl:: Declaring how symbol values are displayed.
* Expect Decl:: Suppressing warnings about parsing conflicts.
* Start Decl:: Specifying the start symbol.
* Pure Decl:: Requesting a reentrant parser.
@noindent
For example:
-@smallexample
+@example
%union @{ char *string; @}
%token <string> STRING1
%token <string> STRING2
%destructor @{ free ($$); @} <*>
%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
%destructor @{ printf ("Discarding tagless symbol.\n"); @} <>
-@end smallexample
+@end example
@noindent
guarantees that, when the parser discards any user-defined symbol that has a
However, it may invoke one of them for the end token (token 0) if you
redefine it from @code{$end} to, for example, @code{END}:
-@smallexample
+@example
%token END 0
-@end smallexample
+@end example
@cindex actions in mid-rule
@cindex mid-rule actions
Finally, Bison will never invoke a @code{%destructor} for an unreferenced
mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}).
-That is, Bison does not consider a mid-rule to have a semantic value if you do
-not reference @code{$$} in the mid-rule's action or @code{$@var{n}} (where
-@var{n} is the RHS symbol position of the mid-rule) in any later action in that
-rule.
-However, if you do reference either, the Bison-generated parser will invoke the
-@code{<>} @code{%destructor} whenever it discards the mid-rule symbol.
+That is, Bison does not consider a mid-rule to have a semantic value if you
+do not reference @code{$$} in the mid-rule's action or @code{$@var{n}}
+(where @var{n} is the right-hand side symbol position of the mid-rule) in
+any later action in that rule. However, if you do reference either, the
+Bison-generated parser will invoke the @code{<>} @code{%destructor} whenever
+it discards the mid-rule symbol.
@ignore
@noindent
of thumb, destructors are invoked only when user actions cannot manage
the memory.
+@node Printer Decl
+@subsection Printing Semantic Values
+@cindex printing semantic values
+@findex %printer
+@findex <*>
+@findex <>
+When run-time traces are enabled (@pxref{Tracing, ,Tracing Your Parser}),
+the parser reports its actions, such as reductions. When a symbol involved
+in an action is reported, only its kind is displayed, as the parser cannot
+know how semantic values should be formatted.
+
+The @code{%printer} directive defines code that is called when a symbol is
+reported. Its syntax is the same as @code{%destructor} (@pxref{Destructor
+Decl, , Freeing Discarded Symbols}).
+
+@deffn {Directive} %printer @{ @var{code} @} @var{symbols}
+@findex %printer
+@vindex yyoutput
+@c This is the same text as for %destructor.
+Invoke the braced @var{code} whenever the parser displays one of the
+@var{symbols}. Within @var{code}, @code{yyoutput} denotes the output stream
+(a @code{FILE*} in C, and an @code{std::ostream&} in C++),
+@code{$$} designates the semantic value associated with the symbol, and
+@code{@@$} its location. The additional parser parameters are also
+available (@pxref{Parser Function, , The Parser Function @code{yyparse}}).
+
+The @var{symbols} are defined as for @code{%destructor} (@pxref{Destructor
+Decl, , Freeing Discarded Symbols}.): they can be per-type (e.g.,
+@samp{<ival>}), per-symbol (e.g., @samp{exp}, @samp{NUM}, @samp{"float"}),
+typed per-default (i.e., @samp{<*>}, or untyped per-default (i.e.,
+@samp{<>}).
+@end deffn
+
+@noindent
+For example:
+
+@example
+%union @{ char *string; @}
+%token <string> STRING1
+%token <string> STRING2
+%type <string> string1
+%type <string> string2
+%union @{ char character; @}
+%token <character> CHR
+%type <character> chr
+%token TAGLESS
+
+%printer @{ fprintf (yyoutput, "'%c'", $$); @} <character>
+%printer @{ fprintf (yyoutput, "&%p", $$); @} <*>
+%printer @{ fprintf (yyoutput, "\"%s\"", $$); @} STRING1 string1
+%printer @{ fprintf (yyoutput, "<>"); @} <>
+@end example
+
+@noindent
+guarantees that, when the parser print any symbol that has a semantic type
+tag other than @code{<character>}, it display the address of the semantic
+value by default. However, when the parser displays a @code{STRING1} or a
+@code{string1}, it formats it as a string in double quotes. It performs
+only the second @code{%printer} in this case, so it prints only once.
+Finally, the parser print @samp{<>} for any symbol, such as @code{TAGLESS},
+that has no semantic type tag. See also
+
+
@node Expect Decl
@subsection Suppressing Conflict Warnings
@cindex suppressing conflict warnings
(Reentrant) Parser}.
If you have also used locations, the parser header file declares
-@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of
-the @code{YYSTYPE} macro and @code{yylval}. @xref{Locations,
-,Tracking Locations}.
+@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of the
+@code{YYSTYPE} macro and @code{yylval}. @xref{Tracking Locations}.
This parser header file is normally essential if you wish to put the
definition of @code{yylex} in a separate source file, because
@item Purpose: Specify the namespace for the parser class.
For example, if you specify:
-@smallexample
+@example
%define api.namespace "foo::bar"
-@end smallexample
+@end example
Bison uses @code{foo::bar} verbatim in references such as:
-@smallexample
+@example
foo::bar::parser::semantic_type
-@end smallexample
+@end example
However, to open a namespace, Bison removes any leading @code{::} and then
splits on any remaining occurrences:
-@smallexample
+@example
namespace foo @{ namespace bar @{
class position;
class location;
@} @}
-@end smallexample
+@end example
@item Accepted Values:
Any absolute or relative C++ namespace reference without a trailing
api.namespace} so that @code{%name-prefix} @emph{only} affects the
lexical analyzer function. For example, if you specify:
-@smallexample
+@example
%define api.namespace "foo"
%name-prefix "bar::"
-@end smallexample
+@end example
The parser namespace is @code{foo} and @code{yylex} is referenced as
@code{bar::lex}.
@c ================================================== lex_symbol
-@item variant
+@item lex_symbol
@findex %define lex_symbol
@itemize @bullet
@c ================================================== lr.default-reductions
@item lr.default-reductions
-@cindex default reductions
@findex %define lr.default-reductions
-@cindex delayed syntax errors
-@cindex syntax errors delayed
-@cindex LAC
-@findex %nonassoc
@itemize @bullet
@item Language(s): all
@item Purpose: Specify the kind of states that are permitted to
-contain default reductions.
-That is, in such a state, Bison selects the reduction with the largest
-lookahead set to be the default parser action and then removes that
-lookahead set.
-(The ability to specify where default reductions should be used is
-experimental.
-More user feedback will help to stabilize it.)
-
-@item Accepted Values:
-@itemize
-@item @code{all}.
-This is the traditional Bison behavior. The main advantage is a
-significant decrease in the size of the parser tables. The
-disadvantage is that, when the generated parser encounters a
-syntactically unacceptable token, the parser might then perform
-unnecessary default reductions before it can detect the syntax error.
-Such delayed syntax error detection is usually inherent in LALR and
-IELR parser tables anyway due to LR state merging (@pxref{%define
-Summary,,lr.type}). Furthermore, the use of @code{%nonassoc} can
-contribute to delayed syntax error detection even in the case of
-canonical LR. As an experimental feature, delayed syntax error
-detection can be overcome in all cases by enabling LAC (@pxref{%define
-Summary,,parse.lac}, for details, including a discussion of the
-effects of delayed syntax error detection).
-
-@item @code{consistent}.
-@cindex consistent states
-A consistent state is a state that has only one possible action.
-If that action is a reduction, then the parser does not need to request
-a lookahead token from the scanner before performing that action.
-However, the parser recognizes the ability to ignore the lookahead token
-in this way only when such a reduction is encoded as a default
-reduction.
-Thus, if default reductions are permitted only in consistent states,
-then a canonical LR parser that does not employ
-@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the
-syntactically unacceptable token from the scanner.
-
-@item @code{accepting}.
-@cindex accepting state
-In the accepting state, the default reduction is actually the accept
-action.
-In this case, a canonical LR parser that does not employ
-@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the
-syntactically unacceptable token in the input.
-That is, it does not perform any extra reductions.
-@end itemize
+contain default reductions. @xref{Default Reductions}. (The ability to
+specify where default reductions should be used is experimental. More user
+feedback will help to stabilize it.)
+@item Accepted Values: @code{most}, @code{consistent}, @code{accepting}
@item Default Value:
@itemize
@item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
-@item @code{all} otherwise.
+@item @code{most} otherwise.
@end itemize
@end itemize
@itemize @bullet
@item Language(s): all
-
@item Purpose: Request that Bison allow unreachable parser states to
-remain in the parser tables.
-Bison considers a state to be unreachable if there exists no sequence of
-transitions from the start state to that state.
-A state can become unreachable during conflict resolution if Bison disables a
-shift action leading to it from a predecessor state.
-Keeping unreachable states is sometimes useful for analysis purposes, but they
-are useless in the generated parser.
-
+remain in the parser tables. @xref{Unreachable States}.
@item Accepted Values: Boolean
-
@item Default Value: @code{false}
-
-@item Caveats:
-
-@itemize @bullet
-
-@item Unreachable states may contain conflicts and may use rules not used in
-any other state.
-Thus, keeping unreachable states may induce warnings that are irrelevant to
-your parser's behavior, and it may eliminate warnings that are relevant.
-Of course, the change in warnings may actually be relevant to a parser table
-analysis that wants to keep unreachable states, so this behavior will likely
-remain in future Bison releases.
-
-@item While Bison is able to remove unreachable states, it is not guaranteed to
-remove other kinds of useless states.
-Specifically, when Bison disables reduce actions during conflict resolution,
-some goto actions may become useless, and thus some additional states may
-become useless.
-If Bison were to compute which goto actions were useless and then disable those
-actions, it could identify such states as unreachable and then remove those
-states.
-However, Bison does not compute which goto actions are useless.
-@end itemize
@end itemize
@c lr.keep-unreachable-states
@item lr.type
@findex %define lr.type
-@cindex LALR
-@cindex IELR
-@cindex LR
@itemize @bullet
@item Language(s): all
@item Purpose: Specify the type of parser tables within the
-LR(1) family.
-(This feature is experimental.
+LR(1) family. @xref{LR Table Construction}. (This feature is experimental.
More user feedback will help to stabilize it.)
-@item Accepted Values:
-@itemize
-@item @code{lalr}.
-While Bison generates LALR parser tables by default for
-historical reasons, IELR or canonical LR is almost
-always preferable for deterministic parsers.
-The trouble is that LALR parser tables can suffer from
-mysterious conflicts and thus may not accept the full set of sentences
-that IELR and canonical LR accept.
-@xref{Mystery Conflicts}, for details.
-However, there are at least two scenarios where LALR may be
-worthwhile:
-@itemize
-@cindex GLR with LALR
-@item When employing GLR parsers (@pxref{GLR Parsers}), if you
-do not resolve any conflicts statically (for example, with @code{%left}
-or @code{%prec}), then the parser explores all potential parses of any
-given input.
-In this case, the use of LALR parser tables is guaranteed not
-to alter the language accepted by the parser.
-LALR parser tables are the smallest parser tables Bison can
-currently generate, so they may be preferable.
-Nevertheless, once you begin to resolve conflicts statically,
-GLR begins to behave more like a deterministic parser, and so
-IELR and canonical LR can be helpful to avoid
-LALR's mysterious behavior.
-
-@item Occasionally during development, an especially malformed grammar
-with a major recurring flaw may severely impede the IELR or
-canonical LR parser table generation algorithm.
-LALR can be a quick way to generate parser tables in order to
-investigate such problems while ignoring the more subtle differences
-from IELR and canonical LR.
-@end itemize
-
-@item @code{ielr}.
-IELR is a minimal LR algorithm.
-That is, given any grammar (LR or non-LR),
-IELR and canonical LR always accept exactly the same
-set of sentences.
-However, as for LALR, the number of parser states is often an
-order of magnitude less for IELR than for canonical
-LR.
-More importantly, because canonical LR's extra parser states
-may contain duplicate conflicts in the case of non-LR
-grammars, the number of conflicts for IELR is often an order
-of magnitude less as well.
-This can significantly reduce the complexity of developing of a grammar.
-
-@item @code{canonical-lr}.
-@cindex delayed syntax errors
-@cindex syntax errors delayed
-@cindex LAC
-@findex %nonassoc
-While inefficient, canonical LR parser tables can be an interesting
-means to explore a grammar because they have a property that IELR and
-LALR tables do not. That is, if @code{%nonassoc} is not used and
-default reductions are left disabled (@pxref{%define
-Summary,,lr.default-reductions}), then, for every left context of
-every canonical LR state, the set of tokens accepted by that state is
-guaranteed to be the exact set of tokens that is syntactically
-acceptable in that left context. It might then seem that an advantage
-of canonical LR parsers in production is that, under the above
-constraints, they are guaranteed to detect a syntax error as soon as
-possible without performing any unnecessary reductions. However, IELR
-parsers using LAC (@pxref{%define Summary,,parse.lac}) are also able
-to achieve this behavior without sacrificing @code{%nonassoc} or
-default reductions.
-@end itemize
+@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr}
@item Default Value: @code{lalr}
@end itemize
Error messages passed to @code{yyerror} are simply @w{@code{"syntax
error"}}.
@item @code{verbose}
-Error messages report the unexpected token, and possibly the expected
-ones.
+Error messages report the unexpected token, and possibly the expected ones.
+However, this report can often be incorrect when LAC is not enabled
+(@pxref{LAC}).
@end itemize
@item Default Value:
@c ================================================== parse.lac
@item parse.lac
@findex %define parse.lac
-@cindex LAC
-@cindex lookahead correction
@itemize
-@item Languages(s): C
+@item Languages(s): C (deterministic parsers only)
@item Purpose: Enable LAC (lookahead correction) to improve
-syntax error handling.
-
-Canonical LR, IELR, and LALR can suffer
-from a couple of problems upon encountering a syntax error. First, the
-parser might perform additional parser stack reductions before
-discovering the syntax error. Such reductions perform user semantic
-actions that are unexpected because they are based on an invalid token,
-and they cause error recovery to begin in a different syntactic context
-than the one in which the invalid token was encountered. Second, when
-verbose error messages are enabled (with @code{%error-verbose} or
-@code{#define YYERROR_VERBOSE}), the expected token list in the syntax
-error message can both contain invalid tokens and omit valid tokens.
-
-The culprits for the above problems are @code{%nonassoc}, default
-reductions in inconsistent states, and parser state merging. Thus,
-IELR and LALR suffer the most. Canonical
-LR can suffer only if @code{%nonassoc} is used or if default
-reductions are enabled for inconsistent states.
-
-LAC is a new mechanism within the parsing algorithm that
-completely solves these problems for canonical LR,
-IELR, and LALR without sacrificing @code{%nonassoc},
-default reductions, or state mering. Conceptually, the mechanism is
-straight-forward. Whenever the parser fetches a new token from the
-scanner so that it can determine the next parser action, it immediately
-suspends normal parsing and performs an exploratory parse using a
-temporary copy of the normal parser state stack. During this
-exploratory parse, the parser does not perform user semantic actions.
-If the exploratory parse reaches a shift action, normal parsing then
-resumes on the normal parser stacks. If the exploratory parse reaches
-an error instead, the parser reports a syntax error. If verbose syntax
-error messages are enabled, the parser must then discover the list of
-expected tokens, so it performs a separate exploratory parse for each
-token in the grammar.
-
-There is one subtlety about the use of LAC. That is, when in a
-consistent parser state with a default reduction, the parser will not
-attempt to fetch a token from the scanner because no lookahead is
-needed to determine the next parser action. Thus, whether default
-reductions are enabled in consistent states (@pxref{%define
-Summary,,lr.default-reductions}) affects how soon the parser detects a
-syntax error: when it @emph{reaches} an erroneous token or when it
-eventually @emph{needs} that token as a lookahead. The latter
-behavior is probably more intuitive, so Bison currently provides no
-way to achieve the former behavior while default reductions are fully
-enabled.
-
-Thus, when LAC is in use, for some fixed decision of whether
-to enable default reductions in consistent states, canonical
-LR and IELR behave exactly the same for both
-syntactically acceptable and syntactically unacceptable input. While
-LALR still does not support the full language-recognition
-power of canonical LR and IELR, LAC at
-least enables LALR's syntax error handling to correctly
-reflect LALR's language-recognition power.
-
-Because LAC requires many parse actions to be performed twice,
-it can have a performance penalty. However, not all parse actions must
-be performed twice. Specifically, during a series of default reductions
-in consistent states and shift actions, the parser never has to initiate
-an exploratory parse. Moreover, the most time-consuming tasks in a
-parse are often the file I/O, the lexical analysis performed by the
-scanner, and the user's semantic actions, but none of these are
-performed during the exploratory parse. Finally, the base of the
-temporary stack used during an exploratory parse is a pointer into the
-normal parser state stack so that the stack is never physically copied.
-In our experience, the performance penalty of LAC has proven
-insignificant for practical grammars.
-
+syntax error handling. @xref{LAC}.
@item Accepted Values: @code{none}, @code{full}
-
@item Default Value: @code{none}
@end itemize
@c parse.lac
Not all qualifiers are accepted for all target languages. Unaccepted
qualifiers produce an error. Some of the accepted qualifiers are:
-@itemize @bullet
+@table @code
@item requires
@findex %code requires
occasionally it is necessary to insert code much nearer the top of the
parser implementation file. For example:
-@smallexample
+@example
%code top @{
#define _GNU_SOURCE
#include <stdio.h>
@}
-@end smallexample
+@end example
@item Location(s): Near the top of the parser implementation file.
@end itemize
@item Location(s): The parser Java file after any Java package directive and
before any class definitions.
@end itemize
-@end itemize
+@end table
Though we say the insertion locations are language-dependent, they are
technically skeleton-dependent. Writers of non-standard skeletons
@code{token_buffer}, and assuming that the token does not contain any
characters like @samp{"} that require escaping.
-@smallexample
+@example
for (i = 0; i < YYNTOKENS; i++)
@{
if (yytname[i] != 0
&& yytname[i][strlen (token_buffer) + 2] == 0)
break;
@}
-@end smallexample
+@end example
The @code{yytname} table is generated only if you use the
@code{%token-table} declaration. @xref{Decl Summary}.
@subsection Textual Locations of Tokens
@vindex yylloc
-If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
-Tracking Locations}) in actions to keep track of the textual locations
-of tokens and groupings, then you must provide this information in
-@code{yylex}. The function @code{yyparse} expects to find the textual
-location of a token just parsed in the global variable @code{yylloc}.
-So @code{yylex} must store the proper data in that variable.
+If you are using the @samp{@@@var{n}}-feature (@pxref{Tracking Locations})
+in actions to keep track of the textual locations of tokens and groupings,
+then you must provide this information in @code{yylex}. The function
+@code{yyparse} expects to find the textual location of a token just parsed
+in the global variable @code{yylloc}. So @code{yylex} must store the proper
+data in that variable.
By default, the value of @code{yylloc} is a structure and you need only
initialize the members that are going to be used by the actions. The
@w{@code{"syntax error"}}.
@findex %define parse.error
-If you invoke @samp{%define parse.error verbose} in the Bison
-declarations section (@pxref{Bison Declarations, ,The Bison Declarations
-Section}), then Bison provides a more verbose and specific error message
-string instead of just plain @w{@code{"syntax error"}}.
+If you invoke @samp{%define parse.error verbose} in the Bison declarations
+section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then
+Bison provides a more verbose and specific error message string instead of
+just plain @w{@code{"syntax error"}}. However, that message sometimes
+contains incorrect information if LAC is not enabled (@pxref{LAC}).
The parser can detect one other kind of error: memory exhaustion. This
can happen when the input contains constructions that are very deeply
@deffn {Value} @@$
@findex @@$
-Acts like a structure variable containing information on the textual location
-of the grouping made by the current rule. @xref{Locations, ,
-Tracking Locations}.
+Acts like a structure variable containing information on the textual
+location of the grouping made by the current rule. @xref{Tracking
+Locations}.
@c Check if those paragraphs are still useful or not.
@deffn {Value} @@@var{n}
@findex @@@var{n}
-Acts like a structure variable containing information on the textual location
-of the @var{n}th component of the current rule. @xref{Locations, ,
-Tracking Locations}.
+Acts like a structure variable containing information on the textual
+location of the @var{n}th component of the current rule. @xref{Tracking
+Locations}.
@end deffn
@node Internationalization
* Contextual Precedence:: When an operator's precedence depends on context.
* Parser States:: The parser is a finite-state-machine with stack.
* Reduce/Reduce:: When two rules are applicable in the same situation.
-* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
+* Mysterious Conflicts:: Conflicts that look unjustified.
+* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
* Memory Management:: What happens when memory is exhausted. How to avoid it.
@end menu
@example
@group
-expr: term '+' expr
- | term
- ;
+expr:
+ term '+' expr
+| term
+;
@end group
@group
-term: '(' expr ')'
- | term '!'
- | NUMBER
- ;
+term:
+ '(' expr ')'
+| term '!'
+| NUMBER
+;
@end group
@end example
@example
@group
if_stmt:
- IF expr THEN stmt
- | IF expr THEN stmt ELSE stmt
- ;
+ IF expr THEN stmt
+| IF expr THEN stmt ELSE stmt
+;
@end group
@end example
%%
@end group
@group
-stmt: expr
- | if_stmt
- ;
+stmt:
+ expr
+| if_stmt
+;
@end group
@group
if_stmt:
- IF expr THEN stmt
- | IF expr THEN stmt ELSE stmt
- ;
+ IF expr THEN stmt
+| IF expr THEN stmt ELSE stmt
+;
@end group
-expr: variable
- ;
+expr:
+ variable
+;
@end example
@node Precedence
@example
@group
-expr: expr '-' expr
- | expr '*' expr
- | expr '<' expr
- | '(' expr ')'
- @dots{}
- ;
+expr:
+ expr '-' expr
+| expr '*' expr
+| expr '<' expr
+| '(' expr ')'
+@dots{}
+;
@end group
@end example
@example
@group
-exp: @dots{}
- | exp '-' exp
- @dots{}
- | '-' exp %prec UMINUS
+exp:
+ @dots{}
+| exp '-' exp
+ @dots{}
+| '-' exp %prec UMINUS
@end group
@end example
of zero or more @code{word} groupings.
@example
-sequence: /* empty */
- @{ printf ("empty sequence\n"); @}
- | maybeword
- | sequence word
- @{ printf ("added word %s\n", $2); @}
- ;
+@group
+sequence:
+ /* empty */ @{ printf ("empty sequence\n"); @}
+| maybeword
+| sequence word @{ printf ("added word %s\n", $2); @}
+;
+@end group
-maybeword: /* empty */
- @{ printf ("empty maybeword\n"); @}
- | word
- @{ printf ("single word %s\n", $1); @}
- ;
+@group
+maybeword:
+ /* empty */ @{ printf ("empty maybeword\n"); @}
+| word @{ printf ("single word %s\n", $1); @}
+;
+@end group
@end example
@noindent
proper way to define @code{sequence}:
@example
-sequence: /* empty */
- @{ printf ("empty sequence\n"); @}
- | sequence word
- @{ printf ("added word %s\n", $2); @}
- ;
+sequence:
+ /* empty */ @{ printf ("empty sequence\n"); @}
+| sequence word @{ printf ("added word %s\n", $2); @}
+;
@end example
Here is another common error that yields a reduce/reduce conflict:
@example
-sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
+sequence:
+ /* empty */
+| sequence words
+| sequence redirects
+;
-words: /* empty */
- | words word
- ;
+words:
+ /* empty */
+| words word
+;
-redirects:/* empty */
- | redirects redirect
- ;
+redirects:
+ /* empty */
+| redirects redirect
+;
@end example
@noindent
of sequence:
@example
-sequence: /* empty */
- | sequence word
- | sequence redirect
- ;
+sequence:
+ /* empty */
+| sequence word
+| sequence redirect
+;
@end example
Second, to prevent either a @code{words} or a @code{redirects}
from being empty:
@example
-sequence: /* empty */
- | sequence words
- | sequence redirects
- ;
+@group
+sequence:
+ /* empty */
+| sequence words
+| sequence redirects
+;
+@end group
-words: word
- | words word
- ;
+@group
+words:
+ word
+| words word
+;
+@end group
-redirects:redirect
- | redirects redirect
- ;
+@group
+redirects:
+ redirect
+| redirects redirect
+;
+@end group
@end example
-@node Mystery Conflicts
-@section Mysterious Reduce/Reduce Conflicts
+@node Mysterious Conflicts
+@section Mysterious Conflicts
+@cindex Mysterious Conflicts
Sometimes reduce/reduce conflicts can occur that don't look warranted.
Here is an example:
%token ID
%%
-def: param_spec return_spec ','
- ;
+def: param_spec return_spec ',';
param_spec:
- type
- | name_list ':' type
- ;
+ type
+| name_list ':' type
+;
@end group
@group
return_spec:
- type
- | name ':' type
- ;
+ type
+| name ':' type
+;
@end group
@group
-type: ID
- ;
+type: ID;
@end group
@group
-name: ID
- ;
+name: ID;
name_list:
- name
- | name ',' name_list
- ;
+ name
+| name ',' name_list
+;
@end group
@end example
a @code{name} if a comma or colon follows, or a @code{type} if another
@code{ID} follows. In other words, this grammar is LR(1).
-@cindex LR(1)
-@cindex LALR(1)
+@cindex LR
+@cindex LALR
However, for historical reasons, Bison cannot by default handle all
LR(1) grammars.
In this grammar, two contexts, that after an @code{ID} at the beginning
the two contexts causes a conflict later. In parser terminology, this
occurrence means that the grammar is not LALR(1).
-For many practical grammars (specifically those that fall into the
-non-LR(1) class), the limitations of LALR(1) result in difficulties
-beyond just mysterious reduce/reduce conflicts. The best way to fix
-all these problems is to select a different parser table generation
-algorithm. Either IELR(1) or canonical LR(1) would suffice, but the
-former is more efficient and easier to debug during development.
-@xref{%define Summary,,lr.type}, for details. (Bison's IELR(1) and
-canonical LR(1) implementations are experimental. More user feedback
-will help to stabilize them.)
+@cindex IELR
+@cindex canonical LR
+For many practical grammars (specifically those that fall into the non-LR(1)
+class), the limitations of LALR(1) result in difficulties beyond just
+mysterious reduce/reduce conflicts. The best way to fix all these problems
+is to select a different parser table construction algorithm. Either
+IELR(1) or canonical LR(1) would suffice, but the former is more efficient
+and easier to debug during development. @xref{LR Table Construction}, for
+details. (Bison's IELR(1) and canonical LR(1) implementations are
+experimental. More user feedback will help to stabilize them.)
If you instead wish to work around LALR(1)'s limitations, you
can often fix a mysterious conflict by identifying the two parser states
%%
@dots{}
return_spec:
- type
- | name ':' type
- /* This rule is never used. */
- | ID BOGUS
- ;
+ type
+| name ':' type
+| ID BOGUS /* This rule is never used. */
+;
@end group
@end example
As long as the token @code{BOGUS} is never generated by @code{yylex},
the added rule cannot alter the way actual input is parsed.
-In this particular example, there is another way to solve the problem:
-rewrite the rule for @code{return_spec} to use @code{ID} directly
-instead of via @code{name}. This also causes the two confusing
-contexts to have different sets of active rules, because the one for
-@code{return_spec} activates the altered rule for @code{return_spec}
-rather than the one for @code{name}.
+In this particular example, there is another way to solve the problem:
+rewrite the rule for @code{return_spec} to use @code{ID} directly
+instead of via @code{name}. This also causes the two confusing
+contexts to have different sets of active rules, because the one for
+@code{return_spec} activates the altered rule for @code{return_spec}
+rather than the one for @code{name}.
+
+@example
+param_spec:
+ type
+| name_list ':' type
+;
+return_spec:
+ type
+| ID ':' type
+;
+@end example
+
+For a more detailed exposition of LALR(1) parsers and parser
+generators, @pxref{Bibliography,,DeRemer 1982}.
+
+@node Tuning LR
+@section Tuning LR
+
+The default behavior of Bison's LR-based parsers is chosen mostly for
+historical reasons, but that behavior is often not robust. For example, in
+the previous section, we discussed the mysterious conflicts that can be
+produced by LALR(1), Bison's default parser table construction algorithm.
+Another example is Bison's @code{%define parse.error verbose} directive,
+which instructs the generated parser to produce verbose syntax error
+messages, which can sometimes contain incorrect information.
+
+In this section, we explore several modern features of Bison that allow you
+to tune fundamental aspects of the generated LR-based parsers. Some of
+these features easily eliminate shortcomings like those mentioned above.
+Others can be helpful purely for understanding your parser.
+
+Most of the features discussed in this section are still experimental. More
+user feedback will help to stabilize them.
+
+@menu
+* LR Table Construction:: Choose a different construction algorithm.
+* Default Reductions:: Disable default reductions.
+* LAC:: Correct lookahead sets in the parser states.
+* Unreachable States:: Keep unreachable parser states for debugging.
+@end menu
+
+@node LR Table Construction
+@subsection LR Table Construction
+@cindex Mysterious Conflict
+@cindex LALR
+@cindex IELR
+@cindex canonical LR
+@findex %define lr.type
+
+For historical reasons, Bison constructs LALR(1) parser tables by default.
+However, LALR does not possess the full language-recognition power of LR.
+As a result, the behavior of parsers employing LALR parser tables is often
+mysterious. We presented a simple example of this effect in @ref{Mysterious
+Conflicts}.
+
+As we also demonstrated in that example, the traditional approach to
+eliminating such mysterious behavior is to restructure the grammar.
+Unfortunately, doing so correctly is often difficult. Moreover, merely
+discovering that LALR causes mysterious behavior in your parser can be
+difficult as well.
+
+Fortunately, Bison provides an easy way to eliminate the possibility of such
+mysterious behavior altogether. You simply need to activate a more powerful
+parser table construction algorithm by using the @code{%define lr.type}
+directive.
+
+@deffn {Directive} {%define lr.type @var{TYPE}}
+Specify the type of parser tables within the LR(1) family. The accepted
+values for @var{TYPE} are:
+
+@itemize
+@item @code{lalr} (default)
+@item @code{ielr}
+@item @code{canonical-lr}
+@end itemize
+
+(This feature is experimental. More user feedback will help to stabilize
+it.)
+@end deffn
+
+For example, to activate IELR, you might add the following directive to you
+grammar file:
+
+@example
+%define lr.type ielr
+@end example
+
+@noindent For the example in @ref{Mysterious Conflicts}, the mysterious
+conflict is then eliminated, so there is no need to invest time in
+comprehending the conflict or restructuring the grammar to fix it. If,
+during future development, the grammar evolves such that all mysterious
+behavior would have disappeared using just LALR, you need not fear that
+continuing to use IELR will result in unnecessarily large parser tables.
+That is, IELR generates LALR tables when LALR (using a deterministic parsing
+algorithm) is sufficient to support the full language-recognition power of
+LR. Thus, by enabling IELR at the start of grammar development, you can
+safely and completely eliminate the need to consider LALR's shortcomings.
+
+While IELR is almost always preferable, there are circumstances where LALR
+or the canonical LR parser tables described by Knuth
+(@pxref{Bibliography,,Knuth 1965}) can be useful. Here we summarize the
+relative advantages of each parser table construction algorithm within
+Bison:
+
+@itemize
+@item LALR
+
+There are at least two scenarios where LALR can be worthwhile:
+
+@itemize
+@item GLR without static conflict resolution.
+
+@cindex GLR with LALR
+When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
+conflicts statically (for example, with @code{%left} or @code{%prec}), then
+the parser explores all potential parses of any given input. In this case,
+the choice of parser table construction algorithm is guaranteed not to alter
+the language accepted by the parser. LALR parser tables are the smallest
+parser tables Bison can currently construct, so they may then be preferable.
+Nevertheless, once you begin to resolve conflicts statically, GLR behaves
+more like a deterministic parser in the syntactic contexts where those
+conflicts appear, and so either IELR or canonical LR can then be helpful to
+avoid LALR's mysterious behavior.
+
+@item Malformed grammars.
+
+Occasionally during development, an especially malformed grammar with a
+major recurring flaw may severely impede the IELR or canonical LR parser
+table construction algorithm. LALR can be a quick way to construct parser
+tables in order to investigate such problems while ignoring the more subtle
+differences from IELR and canonical LR.
+@end itemize
+
+@item IELR
+
+IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given
+any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables
+always accept exactly the same set of sentences. However, like LALR, IELR
+merges parser states during parser table construction so that the number of
+parser states is often an order of magnitude less than for canonical LR.
+More importantly, because canonical LR's extra parser states may contain
+duplicate conflicts in the case of non-LR grammars, the number of conflicts
+for IELR is often an order of magnitude less as well. This effect can
+significantly reduce the complexity of developing a grammar.
+
+@item Canonical LR
+
+@cindex delayed syntax error detection
+@cindex LAC
+@findex %nonassoc
+While inefficient, canonical LR parser tables can be an interesting means to
+explore a grammar because they possess a property that IELR and LALR tables
+do not. That is, if @code{%nonassoc} is not used and default reductions are
+left disabled (@pxref{Default Reductions}), then, for every left context of
+every canonical LR state, the set of tokens accepted by that state is
+guaranteed to be the exact set of tokens that is syntactically acceptable in
+that left context. It might then seem that an advantage of canonical LR
+parsers in production is that, under the above constraints, they are
+guaranteed to detect a syntax error as soon as possible without performing
+any unnecessary reductions. However, IELR parsers that use LAC are also
+able to achieve this behavior without sacrificing @code{%nonassoc} or
+default reductions. For details and a few caveats of LAC, @pxref{LAC}.
+@end itemize
+
+For a more detailed exposition of the mysterious behavior in LALR parsers
+and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and
+@ref{Bibliography,,Denny 2010 November}.
+
+@node Default Reductions
+@subsection Default Reductions
+@cindex default reductions
+@findex %define lr.default-reductions
+@findex %nonassoc
+
+After parser table construction, Bison identifies the reduction with the
+largest lookahead set in each parser state. To reduce the size of the
+parser state, traditional Bison behavior is to remove that lookahead set and
+to assign that reduction to be the default parser action. Such a reduction
+is known as a @dfn{default reduction}.
+
+Default reductions affect more than the size of the parser tables. They
+also affect the behavior of the parser:
+
+@itemize
+@item Delayed @code{yylex} invocations.
+
+@cindex delayed yylex invocations
+@cindex consistent states
+@cindex defaulted states
+A @dfn{consistent state} is a state that has only one possible parser
+action. If that action is a reduction and is encoded as a default
+reduction, then that consistent state is called a @dfn{defaulted state}.
+Upon reaching a defaulted state, a Bison-generated parser does not bother to
+invoke @code{yylex} to fetch the next token before performing the reduction.
+In other words, whether default reductions are enabled in consistent states
+determines how soon a Bison-generated parser invokes @code{yylex} for a
+token: immediately when it @emph{reaches} that token in the input or when it
+eventually @emph{needs} that token as a lookahead to determine the next
+parser action. Traditionally, default reductions are enabled, and so the
+parser exhibits the latter behavior.
+
+The presence of defaulted states is an important consideration when
+designing @code{yylex} and the grammar file. That is, if the behavior of
+@code{yylex} can influence or be influenced by the semantic actions
+associated with the reductions in defaulted states, then the delay of the
+next @code{yylex} invocation until after those reductions is significant.
+For example, the semantic actions might pop a scope stack that @code{yylex}
+uses to determine what token to return. Thus, the delay might be necessary
+to ensure that @code{yylex} does not look up the next token in a scope that
+should already be considered closed.
+
+@item Delayed syntax error detection.
+
+@cindex delayed syntax error detection
+When the parser fetches a new token by invoking @code{yylex}, it checks
+whether there is an action for that token in the current parser state. The
+parser detects a syntax error if and only if either (1) there is no action
+for that token or (2) the action for that token is the error action (due to
+the use of @code{%nonassoc}). However, if there is a default reduction in
+that state (which might or might not be a defaulted state), then it is
+impossible for condition 1 to exist. That is, all tokens have an action.
+Thus, the parser sometimes fails to detect the syntax error until it reaches
+a later state.
+
+@cindex LAC
+@c If there's an infinite loop, default reductions can prevent an incorrect
+@c sentence from being rejected.
+While default reductions never cause the parser to accept syntactically
+incorrect sentences, the delay of syntax error detection can have unexpected
+effects on the behavior of the parser. However, the delay can be caused
+anyway by parser state merging and the use of @code{%nonassoc}, and it can
+be fixed by another Bison feature, LAC. We discuss the effects of delayed
+syntax error detection and LAC more in the next section (@pxref{LAC}).
+@end itemize
+
+For canonical LR, the only default reduction that Bison enables by default
+is the accept action, which appears only in the accepting state, which has
+no other action and is thus a defaulted state. However, the default accept
+action does not delay any @code{yylex} invocation or syntax error detection
+because the accept action ends the parse.
+
+For LALR and IELR, Bison enables default reductions in nearly all states by
+default. There are only two exceptions. First, states that have a shift
+action on the @code{error} token do not have default reductions because
+delayed syntax error detection could then prevent the @code{error} token
+from ever being shifted in that state. However, parser state merging can
+cause the same effect anyway, and LAC fixes it in both cases, so future
+versions of Bison might drop this exception when LAC is activated. Second,
+GLR parsers do not record the default reduction as the action on a lookahead
+token for which there is a conflict. The correct action in this case is to
+split the parse instead.
+
+To adjust which states have default reductions enabled, use the
+@code{%define lr.default-reductions} directive.
+
+@deffn {Directive} {%define lr.default-reductions @var{WHERE}}
+Specify the kind of states that are permitted to contain default reductions.
+The accepted values of @var{WHERE} are:
+@itemize
+@item @code{most} (default for LALR and IELR)
+@item @code{consistent}
+@item @code{accepting} (default for canonical LR)
+@end itemize
+
+(The ability to specify where default reductions are permitted is
+experimental. More user feedback will help to stabilize it.)
+@end deffn
+
+@node LAC
+@subsection LAC
+@findex %define parse.lac
+@cindex LAC
+@cindex lookahead correction
+
+Canonical LR, IELR, and LALR can suffer from a couple of problems upon
+encountering a syntax error. First, the parser might perform additional
+parser stack reductions before discovering the syntax error. Such
+reductions can perform user semantic actions that are unexpected because
+they are based on an invalid token, and they cause error recovery to begin
+in a different syntactic context than the one in which the invalid token was
+encountered. Second, when verbose error messages are enabled (@pxref{Error
+Reporting}), the expected token list in the syntax error message can both
+contain invalid tokens and omit valid tokens.
+
+The culprits for the above problems are @code{%nonassoc}, default reductions
+in inconsistent states (@pxref{Default Reductions}), and parser state
+merging. Because IELR and LALR merge parser states, they suffer the most.
+Canonical LR can suffer only if @code{%nonassoc} is used or if default
+reductions are enabled for inconsistent states.
+
+LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
+that solves these problems for canonical LR, IELR, and LALR without
+sacrificing @code{%nonassoc}, default reductions, or state merging. You can
+enable LAC with the @code{%define parse.lac} directive.
+
+@deffn {Directive} {%define parse.lac @var{VALUE}}
+Enable LAC to improve syntax error handling.
+@itemize
+@item @code{none} (default)
+@item @code{full}
+@end itemize
+(This feature is experimental. More user feedback will help to stabilize
+it. Moreover, it is currently only available for deterministic parsers in
+C.)
+@end deffn
+
+Conceptually, the LAC mechanism is straight-forward. Whenever the parser
+fetches a new token from the scanner so that it can determine the next
+parser action, it immediately suspends normal parsing and performs an
+exploratory parse using a temporary copy of the normal parser state stack.
+During this exploratory parse, the parser does not perform user semantic
+actions. If the exploratory parse reaches a shift action, normal parsing
+then resumes on the normal parser stacks. If the exploratory parse reaches
+an error instead, the parser reports a syntax error. If verbose syntax
+error messages are enabled, the parser must then discover the list of
+expected tokens, so it performs a separate exploratory parse for each token
+in the grammar.
+
+There is one subtlety about the use of LAC. That is, when in a consistent
+parser state with a default reduction, the parser will not attempt to fetch
+a token from the scanner because no lookahead is needed to determine the
+next parser action. Thus, whether default reductions are enabled in
+consistent states (@pxref{Default Reductions}) affects how soon the parser
+detects a syntax error: immediately when it @emph{reaches} an erroneous
+token or when it eventually @emph{needs} that token as a lookahead to
+determine the next parser action. The latter behavior is probably more
+intuitive, so Bison currently provides no way to achieve the former behavior
+while default reductions are enabled in consistent states.
+
+Thus, when LAC is in use, for some fixed decision of whether to enable
+default reductions in consistent states, canonical LR and IELR behave almost
+exactly the same for both syntactically acceptable and syntactically
+unacceptable input. While LALR still does not support the full
+language-recognition power of canonical LR and IELR, LAC at least enables
+LALR's syntax error handling to correctly reflect LALR's
+language-recognition power.
+
+There are a few caveats to consider when using LAC:
+
+@itemize
+@item Infinite parsing loops.
+
+IELR plus LAC does have one shortcoming relative to canonical LR. Some
+parsers generated by Bison can loop infinitely. LAC does not fix infinite
+parsing loops that occur between encountering a syntax error and detecting
+it, but enabling canonical LR or disabling default reductions sometimes
+does.
+
+@item Verbose error message limitations.
+
+Because of internationalization considerations, Bison-generated parsers
+limit the size of the expected token list they are willing to report in a
+verbose syntax error message. If the number of expected tokens exceeds that
+limit, the list is simply dropped from the message. Enabling LAC can
+increase the size of the list and thus cause the parser to drop it. Of
+course, dropping the list is better than reporting an incorrect list.
+
+@item Performance.
+
+Because LAC requires many parse actions to be performed twice, it can have a
+performance penalty. However, not all parse actions must be performed
+twice. Specifically, during a series of default reductions in consistent
+states and shift actions, the parser never has to initiate an exploratory
+parse. Moreover, the most time-consuming tasks in a parse are often the
+file I/O, the lexical analysis performed by the scanner, and the user's
+semantic actions, but none of these are performed during the exploratory
+parse. Finally, the base of the temporary stack used during an exploratory
+parse is a pointer into the normal parser state stack so that the stack is
+never physically copied. In our experience, the performance penalty of LAC
+has proved insignificant for practical grammars.
+@end itemize
+
+While the LAC algorithm shares techniques that have been recognized in the
+parser community for years, for the publication that introduces LAC,
+@pxref{Bibliography,,Denny 2010 May}.
+
+@node Unreachable States
+@subsection Unreachable States
+@findex %define lr.keep-unreachable-states
+@cindex unreachable states
+
+If there exists no sequence of transitions from the parser's start state to
+some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable
+state}. A state can become unreachable during conflict resolution if Bison
+disables a shift action leading to it from a predecessor state.
-@example
-param_spec:
- type
- | name_list ':' type
- ;
-return_spec:
- type
- | ID ':' type
- ;
-@end example
+By default, Bison removes unreachable states from the parser after conflict
+resolution because they are useless in the generated parser. However,
+keeping unreachable states is sometimes useful when trying to understand the
+relationship between the parser and the grammar.
-For a more detailed exposition of LALR(1) parsers and parser
-generators, please see:
-Frank DeRemer and Thomas Pennello, Efficient Computation of
-LALR(1) Look-Ahead Sets, @cite{ACM Transactions on
-Programming Languages and Systems}, Vol.@: 4, No.@: 4 (October 1982),
-pp.@: 615--649 @uref{http://doi.acm.org/10.1145/69622.357187}.
+@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}}
+Request that Bison allow unreachable states to remain in the parser tables.
+@var{VALUE} must be a Boolean. The default is @code{false}.
+@end deffn
+
+There are a few caveats to consider:
+
+@itemize @bullet
+@item Missing or extraneous warnings.
+
+Unreachable states may contain conflicts and may use rules not used in any
+other state. Thus, keeping unreachable states may induce warnings that are
+irrelevant to your parser's behavior, and it may eliminate warnings that are
+relevant. Of course, the change in warnings may actually be relevant to a
+parser table analysis that wants to keep unreachable states, so this
+behavior will likely remain in future Bison releases.
+
+@item Other useless states.
+
+While Bison is able to remove unreachable states, it is not guaranteed to
+remove other kinds of useless states. Specifically, when Bison disables
+reduce actions during conflict resolution, some goto actions may become
+useless, and thus some additional states may become useless. If Bison were
+to compute which goto actions were useless and then disable those actions,
+it could identify such states as unreachable and then remove those states.
+However, Bison does not compute which goto actions are useless.
+@end itemize
@node Generalized LR Parsing
@section Generalized LR (GLR) Parsing
The same is true of languages that require more than one symbol of
lookahead, since the parser lacks the information necessary to make a
decision at the point it must be made in a shift-reduce parser.
-Finally, as previously mentioned (@pxref{Mystery Conflicts}),
+Finally, as previously mentioned (@pxref{Mysterious Conflicts}),
there are languages where Bison's default choice of how to
summarize the input seen so far loses necessary information.
grammar, in particular, it is only slightly slower than with the
deterministic LR(1) Bison parser.
-For a more detailed exposition of GLR parsers, please see: Elizabeth
-Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style
-Generalised LR Parsers, Royal Holloway, University of
-London, Department of Computer Science, TR-00-12,
-@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps},
-(2000-12-24).
+For a more detailed exposition of GLR parsers, @pxref{Bibliography,,Scott
+2000}.
@node Memory Management
@section Memory Management, and How to Avoid Memory Exhaustion
Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}.
You can generate a deterministic parser containing C++ user code from
-the default (C) skeleton, as well as from the C++ skeleton
+the default (C) skeleton, as well as from the C++ skeleton
(@pxref{C++ Parsers}). However, if you do use the default skeleton
and want to allow the parsing stack to grow,
be careful not to use semantic types or location types that require
For example:
@example
-stmnts: /* empty string */
- | stmnts '\n'
- | stmnts exp '\n'
- | stmnts error '\n'
+stmts:
+ /* empty string */
+| stmts '\n'
+| stmts exp '\n'
+| stmts error '\n'
@end example
The fourth rule in this example says that an error followed by a newline
-makes a valid addition to any @code{stmnts}.
+makes a valid addition to any @code{stmts}.
What happens if a syntax error occurs in the middle of an @code{exp}? The
error recovery rule, interpreted strictly, applies to the precise sequence
-of a @code{stmnts}, an @code{error} and a newline. If an error occurs in
+of a @code{stmts}, an @code{error} and a newline. If an error occurs in
the middle of an @code{exp}, there will probably be some additional tokens
-and subexpressions on the stack after the last @code{stmnts}, and there
+and subexpressions on the stack after the last @code{stmts}, and there
will be tokens to read before the next newline. So the rule is not
applicable in the ordinary way.
the semantic context and part of the input. First it discards states
and objects from the stack until it gets back to a state in which the
@code{error} token is acceptable. (This means that the subexpressions
-already parsed are discarded, back to the last complete @code{stmnts}.)
+already parsed are discarded, back to the last complete @code{stmts}.)
At this point the @code{error} token can be shifted. Then, if the old
lookahead token is not acceptable to be shifted next, the parser reads
tokens and discards them until it finds a token which is acceptable. In
the current input line or current statement if an error is detected:
@example
-stmnt: error ';' /* On error, skip until ';' is read. */
+stmt: error ';' /* On error, skip until ';' is read. */
@end example
It is also useful to recover to the matching close-delimiter of an
spurious error message:
@example
-primary: '(' expr ')'
- | '(' error ')'
- @dots{}
- ;
+primary:
+ '(' expr ')'
+| '(' error ')'
+@dots{}
+;
@end example
Error recovery strategies are necessarily guesses. When they guess wrong,
one syntax error often leads to another. In the above example, the error
recovery rule guesses that an error is due to bad input within one
-@code{stmnt}. Suppose that instead a spurious semicolon is inserted in the
-middle of a valid @code{stmnt}. After the error recovery rule recovers
+@code{stmt}. Suppose that instead a spurious semicolon is inserted in the
+middle of a valid @code{stmt}. After the error recovery rule recovers
from the first error, another syntax error will be found straightaway,
since the text following the spurious semicolon is also an invalid
-@code{stmnt}.
+@code{stmt}.
To prevent an outpouring of error messages, the parser will output no error
message for another syntax error that happens shortly after the first; only
@example
typedef int foo, bar;
int baz (void)
+@group
@{
static bar (bar); /* @r{redeclare @code{bar} as static variable} */
extern foo foo (foo); /* @r{redeclare @code{foo} as function} */
return foo (bar);
@}
+@end group
@end example
Unfortunately, the name being declared is separated from the declaration
duplication, with actions omitted for brevity:
@example
+@group
initdcl:
- declarator maybeasm '='
- init
- | declarator maybeasm
- ;
+ declarator maybeasm '=' init
+| declarator maybeasm
+;
+@end group
+@group
notype_initdcl:
- notype_declarator maybeasm '='
- init
- | notype_declarator maybeasm
- ;
+ notype_declarator maybeasm '=' init
+| notype_declarator maybeasm
+;
+@end group
@end example
@noindent
@dots{}
@end group
@group
-expr: IDENTIFIER
- | constant
- | HEX '('
- @{ hexflag = 1; @}
- expr ')'
- @{ hexflag = 0;
- $$ = $4; @}
- | expr '+' expr
- @{ $$ = make_sum ($1, $3); @}
- @dots{}
- ;
+expr:
+ IDENTIFIER
+| constant
+| HEX '(' @{ hexflag = 1; @}
+ expr ')' @{ hexflag = 0; $$ = $4; @}
+| expr '+' expr @{ $$ = make_sum ($1, $3); @}
+@dots{}
+;
@end group
@group
constant:
- INTEGER
- | STRING
- ;
+ INTEGER
+| STRING
+;
@end group
@end example
tokens until the next semicolon, and then start a new statement, like this:
@example
-stmt: expr ';'
- | IF '(' expr ')' stmt @{ @dots{} @}
- @dots{}
- error ';'
- @{ hexflag = 0; @}
- ;
+stmt:
+ expr ';'
+| IF '(' expr ')' stmt @{ @dots{} @}
+@dots{}
+| error ';' @{ hexflag = 0; @}
+;
@end example
If there is a syntax error in the middle of a @samp{hex (@var{expr})}
@example
@group
-expr: @dots{}
- | '(' expr ')'
- @{ $$ = $2; @}
- | '(' error ')'
- @dots{}
+expr:
+ @dots{}
+| '(' expr ')' @{ $$ = $2; @}
+| '(' error ')'
+@dots{}
@end group
@end example
@node Debugging
@chapter Debugging Your Parser
-Developing a parser can be a challenge, especially if you don't
-understand the algorithm (@pxref{Algorithm, ,The Bison Parser
-Algorithm}). Even so, sometimes a detailed description of the automaton
-can help (@pxref{Understanding, , Understanding Your Parser}), or
-tracing the execution of the parser can give some insight on why it
-behaves improperly (@pxref{Tracing, , Tracing Your Parser}).
+Developing a parser can be a challenge, especially if you don't understand
+the algorithm (@pxref{Algorithm, ,The Bison Parser Algorithm}). This
+chapter explains how to generate and read the detailed description of the
+automaton, and how to enable and understand the parser run-time traces.
@menu
* Understanding:: Understanding the structure of your parser.
representation of it, either textually or graphically (as a DOT file).
The textual file is generated when the options @option{--report} or
-@option{--verbose} are specified, see @xref{Invocation, , Invoking
+@option{--verbose} are specified, see @ref{Invocation, , Invoking
Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from
the parser implementation file name, and adding @samp{.output}
instead. Therefore, if the grammar file is @file{foo.y}, then the
%left '+' '-'
%left '*'
%%
-exp: exp '+' exp
- | exp '-' exp
- | exp '*' exp
- | exp '/' exp
- | NUM
- ;
+exp:
+ exp '+' exp
+| exp '-' exp
+| exp '*' exp
+| exp '/' exp
+| NUM
+;
useless: STR;
%%
@end example
order of the output and the exact presentation might vary, but the
interpretation is the same.
-The first section includes details on conflicts that were solved thanks
-to precedence and/or associativity:
-
-@example
-Conflict in state 8 between rule 2 and token '+' resolved as reduce.
-Conflict in state 8 between rule 2 and token '-' resolved as reduce.
-Conflict in state 8 between rule 2 and token '*' resolved as shift.
-@exdent @dots{}
-@end example
-
-@noindent
-The next section lists states that still have conflicts.
-
-@example
-State 8 conflicts: 1 shift/reduce
-State 9 conflicts: 1 shift/reduce
-State 10 conflicts: 1 shift/reduce
-State 11 conflicts: 4 shift/reduce
-@end example
-
@noindent
@cindex token, useless
@cindex useless token
@cindex useless nonterminal
@cindex rule, useless
@cindex useless rule
-The next section reports useless tokens, nonterminal and rules. Useless
-nonterminals and rules are removed in order to produce a smaller parser,
-but useless tokens are preserved, since they might be used by the
-scanner (note the difference between ``useless'' and ``unused''
-below):
+The first section reports useless tokens, nonterminals and rules. Useless
+nonterminals and rules are removed in order to produce a smaller parser, but
+useless tokens are preserved, since they might be used by the scanner (note
+the difference between ``useless'' and ``unused'' below):
@example
-Nonterminals useless in grammar:
+Nonterminals useless in grammar
useless
-Terminals unused in grammar:
+Terminals unused in grammar
STR
-Rules useless in grammar:
-#6 useless: STR;
+Rules useless in grammar
+ 6 useless: STR
+@end example
+
+@noindent
+The next section lists states that still have conflicts.
+
+@example
+State 8 conflicts: 1 shift/reduce
+State 9 conflicts: 1 shift/reduce
+State 10 conflicts: 1 shift/reduce
+State 11 conflicts: 4 shift/reduce
@end example
@noindent
-The next section reproduces the exact grammar that Bison used:
+Then Bison reproduces the exact grammar it used:
@example
Grammar
- Number, Line, Rule
- 0 5 $accept -> exp $end
- 1 5 exp -> exp '+' exp
- 2 6 exp -> exp '-' exp
- 3 7 exp -> exp '*' exp
- 4 8 exp -> exp '/' exp
- 5 9 exp -> NUM
+ 0 $accept: exp $end
+
+ 1 exp: exp '+' exp
+ 2 | exp '-' exp
+ 3 | exp '*' exp
+ 4 | exp '/' exp
+ 5 | NUM
@end example
@noindent
and reports the uses of the symbols:
@example
+@group
Terminals, with rules where they appear
$end (0) 0
'/' (47) 4
error (256)
NUM (258) 5
+STR (259)
+@end group
+@group
Nonterminals, with rules where they appear
-$accept (8)
+$accept (9)
on left: 0
-exp (9)
+exp (10)
on left: 1 2 3 4 5, on right: 0 1 2 3 4
+@end group
@end example
@noindent
@cindex pointed rule
@cindex rule, pointed
Bison then proceeds onto the automaton itself, describing each state
-with it set of @dfn{items}, also known as @dfn{pointed rules}. Each
-item is a production rule together with a point (marked by @samp{.})
-that the input cursor.
+with its set of @dfn{items}, also known as @dfn{pointed rules}. Each
+item is a production rule together with a point (@samp{.}) marking
+the location of the input cursor.
@example
state 0
- $accept -> . exp $ (rule 0)
+ 0 $accept: . exp $end
- NUM shift, and go to state 1
+ NUM shift, and go to state 1
- exp go to state 2
+ exp go to state 2
@end example
This reads as follows: ``state 0 corresponds to being at the very
symbol (here, @code{exp}). When the parser returns to this state right
after having reduced a rule that produced an @code{exp}, the control
flow jumps to state 2. If there is no such transition on a nonterminal
-symbol, and the lookahead is a @code{NUM}, then this token is shifted on
+symbol, and the lookahead is a @code{NUM}, then this token is shifted onto
the parse stack, and the control flow jumps to state 1. Any other
lookahead triggers a syntax error.''
at the beginning of any rule deriving an @code{exp}. By default Bison
reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
you want to see more detail you can invoke @command{bison} with
-@option{--report=itemset} to list all the items, include those that can
-be derived:
+@option{--report=itemset} to list the derived items as well:
@example
state 0
- $accept -> . exp $ (rule 0)
- exp -> . exp '+' exp (rule 1)
- exp -> . exp '-' exp (rule 2)
- exp -> . exp '*' exp (rule 3)
- exp -> . exp '/' exp (rule 4)
- exp -> . NUM (rule 5)
+ 0 $accept: . exp $end
+ 1 exp: . exp '+' exp
+ 2 | . exp '-' exp
+ 3 | . exp '*' exp
+ 4 | . exp '/' exp
+ 5 | . NUM
- NUM shift, and go to state 1
+ NUM shift, and go to state 1
- exp go to state 2
+ exp go to state 2
@end example
@noindent
-In the state 1...
+In the state 1@dots{}
@example
state 1
- exp -> NUM . (rule 5)
+ 5 exp: NUM .
- $default reduce using rule 5 (exp)
+ $default reduce using rule 5 (exp)
@end example
@noindent
@example
state 2
- $accept -> exp . $ (rule 0)
- exp -> exp . '+' exp (rule 1)
- exp -> exp . '-' exp (rule 2)
- exp -> exp . '*' exp (rule 3)
- exp -> exp . '/' exp (rule 4)
+ 0 $accept: exp . $end
+ 1 exp: exp . '+' exp
+ 2 | exp . '-' exp
+ 3 | exp . '*' exp
+ 4 | exp . '/' exp
- $ shift, and go to state 3
- '+' shift, and go to state 4
- '-' shift, and go to state 5
- '*' shift, and go to state 6
- '/' shift, and go to state 7
+ $end shift, and go to state 3
+ '+' shift, and go to state 4
+ '-' shift, and go to state 5
+ '*' shift, and go to state 6
+ '/' shift, and go to state 7
@end example
@noindent
In state 2, the automaton can only shift a symbol. For instance,
-because of the item @samp{exp -> exp . '+' exp}, if the lookahead if
-@samp{+}, it will be shifted on the parse stack, and the automaton
-control will jump to state 4, corresponding to the item @samp{exp -> exp
-'+' . exp}. Since there is no default action, any other token than
-those listed above will trigger a syntax error.
+because of the item @samp{exp: exp . '+' exp}, if the lookahead is
+@samp{+} it is shifted onto the parse stack, and the automaton
+jumps to state 4, corresponding to the item @samp{exp: exp '+' . exp}.
+Since there is no default action, any lookahead not listed triggers a syntax
+error.
@cindex accepting state
The state 3 is named the @dfn{final state}, or the @dfn{accepting
@example
state 3
- $accept -> exp $ . (rule 0)
+ 0 $accept: exp $end .
- $default accept
+ $default accept
@end example
@noindent
-the initial rule is completed (the start symbol and the end
-of input were read), the parsing exits successfully.
+the initial rule is completed (the start symbol and the end-of-input were
+read), the parsing exits successfully.
The interpretation of states 4 to 7 is straightforward, and is left to
the reader.
@example
state 4
- exp -> exp '+' . exp (rule 1)
+ 1 exp: exp '+' . exp
- NUM shift, and go to state 1
+ NUM shift, and go to state 1
+
+ exp go to state 8
- exp go to state 8
state 5
- exp -> exp '-' . exp (rule 2)
+ 2 exp: exp '-' . exp
+
+ NUM shift, and go to state 1
- NUM shift, and go to state 1
+ exp go to state 9
- exp go to state 9
state 6
- exp -> exp '*' . exp (rule 3)
+ 3 exp: exp '*' . exp
- NUM shift, and go to state 1
+ NUM shift, and go to state 1
+
+ exp go to state 10
- exp go to state 10
state 7
- exp -> exp '/' . exp (rule 4)
+ 4 exp: exp '/' . exp
- NUM shift, and go to state 1
+ NUM shift, and go to state 1
- exp go to state 11
+ exp go to state 11
@end example
As was announced in beginning of the report, @samp{State 8 conflicts:
@example
state 8
- exp -> exp . '+' exp (rule 1)
- exp -> exp '+' exp . (rule 1)
- exp -> exp . '-' exp (rule 2)
- exp -> exp . '*' exp (rule 3)
- exp -> exp . '/' exp (rule 4)
+ 1 exp: exp . '+' exp
+ 1 | exp '+' exp .
+ 2 | exp . '-' exp
+ 3 | exp . '*' exp
+ 4 | exp . '/' exp
- '*' shift, and go to state 6
- '/' shift, and go to state 7
+ '*' shift, and go to state 6
+ '/' shift, and go to state 7
- '/' [reduce using rule 1 (exp)]
- $default reduce using rule 1 (exp)
+ '/' [reduce using rule 1 (exp)]
+ $default reduce using rule 1 (exp)
@end example
Indeed, there are two actions associated to the lookahead @samp{/}:
Because in deterministic parsing a single decision can be made, Bison
arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, ,
-Shift/Reduce Conflicts}. Discarded actions are reported in between
+Shift/Reduce Conflicts}. Discarded actions are reported between
square brackets.
Note that all the previous states had a single possible action: either
@example
state 8
- exp -> exp . '+' exp (rule 1)
- exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1)
- exp -> exp . '-' exp (rule 2)
- exp -> exp . '*' exp (rule 3)
- exp -> exp . '/' exp (rule 4)
+ 1 exp: exp . '+' exp
+ 1 | exp '+' exp . [$end, '+', '-', '/']
+ 2 | exp . '-' exp
+ 3 | exp . '*' exp
+ 4 | exp . '/' exp
+
+ '*' shift, and go to state 6
+ '/' shift, and go to state 7
+
+ '/' [reduce using rule 1 (exp)]
+ $default reduce using rule 1 (exp)
+@end example
- '*' shift, and go to state 6
- '/' shift, and go to state 7
+Note however that while @samp{NUM + NUM / NUM} is ambiguous (which results in
+the conflicts on @samp{/}), @samp{NUM + NUM * NUM} is not: the conflict was
+solved thanks to associativity and precedence directives. If invoked with
+@option{--report=solved}, Bison includes information about the solved
+conflicts in the report:
- '/' [reduce using rule 1 (exp)]
- $default reduce using rule 1 (exp)
+@example
+Conflict between rule 1 and token '+' resolved as reduce (%left '+').
+Conflict between rule 1 and token '-' resolved as reduce (%left '-').
+Conflict between rule 1 and token '*' resolved as shift ('+' < '*').
@end example
+
The remaining states are similar:
@example
+@group
state 9
- exp -> exp . '+' exp (rule 1)
- exp -> exp . '-' exp (rule 2)
- exp -> exp '-' exp . (rule 2)
- exp -> exp . '*' exp (rule 3)
- exp -> exp . '/' exp (rule 4)
+ 1 exp: exp . '+' exp
+ 2 | exp . '-' exp
+ 2 | exp '-' exp .
+ 3 | exp . '*' exp
+ 4 | exp . '/' exp
- '*' shift, and go to state 6
- '/' shift, and go to state 7
+ '*' shift, and go to state 6
+ '/' shift, and go to state 7
- '/' [reduce using rule 2 (exp)]
- $default reduce using rule 2 (exp)
+ '/' [reduce using rule 2 (exp)]
+ $default reduce using rule 2 (exp)
+@end group
+@group
state 10
- exp -> exp . '+' exp (rule 1)
- exp -> exp . '-' exp (rule 2)
- exp -> exp . '*' exp (rule 3)
- exp -> exp '*' exp . (rule 3)
- exp -> exp . '/' exp (rule 4)
+ 1 exp: exp . '+' exp
+ 2 | exp . '-' exp
+ 3 | exp . '*' exp
+ 3 | exp '*' exp .
+ 4 | exp . '/' exp
- '/' shift, and go to state 7
+ '/' shift, and go to state 7
- '/' [reduce using rule 3 (exp)]
- $default reduce using rule 3 (exp)
+ '/' [reduce using rule 3 (exp)]
+ $default reduce using rule 3 (exp)
+@end group
+@group
state 11
- exp -> exp . '+' exp (rule 1)
- exp -> exp . '-' exp (rule 2)
- exp -> exp . '*' exp (rule 3)
- exp -> exp . '/' exp (rule 4)
- exp -> exp '/' exp . (rule 4)
-
- '+' shift, and go to state 4
- '-' shift, and go to state 5
- '*' shift, and go to state 6
- '/' shift, and go to state 7
-
- '+' [reduce using rule 4 (exp)]
- '-' [reduce using rule 4 (exp)]
- '*' [reduce using rule 4 (exp)]
- '/' [reduce using rule 4 (exp)]
- $default reduce using rule 4 (exp)
+ 1 exp: exp . '+' exp
+ 2 | exp . '-' exp
+ 3 | exp . '*' exp
+ 4 | exp . '/' exp
+ 4 | exp '/' exp .
+
+ '+' shift, and go to state 4
+ '-' shift, and go to state 5
+ '*' shift, and go to state 6
+ '/' shift, and go to state 7
+
+ '+' [reduce using rule 4 (exp)]
+ '-' [reduce using rule 4 (exp)]
+ '*' [reduce using rule 4 (exp)]
+ '/' [reduce using rule 4 (exp)]
+ $default reduce using rule 4 (exp)
+@end group
@end example
@noindent
@cindex debugging
@cindex tracing the parser
-If a Bison grammar compiles properly but doesn't do what you want when it
-runs, the @code{yydebug} parser-trace feature can help you figure out why.
+When a Bison grammar compiles properly but parses ``incorrectly'', the
+@code{yydebug} parser-trace feature helps figuring out why.
+
+@menu
+* Enabling Traces:: Activating run-time trace support
+* Mfcalc Traces:: Extending @code{mfcalc} to support traces
+* The YYPRINT Macro:: Obsolete interface for semantic value reports
+@end menu
+@node Enabling Traces
+@subsection Enabling Traces
There are several means to enable compilation of trace facilities:
@table @asis
We suggest that you always enable the trace option so that debugging is
always possible.
+@findex YYFPRINTF
The trace facility outputs messages with macro calls of the form
@code{YYFPRINTF (stderr, @var{format}, @var{args})} where
@var{format} and @var{args} are the usual @code{printf} format and variadic
of the state stack afterward.
@end itemize
-To make sense of this information, it helps to refer to the listing file
-produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking
-Bison}). This file shows the meaning of each state in terms of
+To make sense of this information, it helps to refer to the automaton
+description file (@pxref{Understanding, ,Understanding Your Parser}).
+This file shows the meaning of each state in terms of
positions in various rules, and also what each state will do with each
possible input token. As you read the successive trace messages, you
can see that the parser is functioning according to its specification in
something undesirable happens, and you will see which parts of the
grammar are to blame.
-The parser implementation file is a C program and you can use C
+The parser implementation file is a C/C++/Java program and you can use
debuggers on it, but it's not easy to interpret what it is doing. The
parser function is a finite-state machine interpreter, and aside from
the actions it executes the same code over and over. Only the values
of variables show where in the grammar it is working.
+@node Mfcalc Traces
+@subsection Enabling Debug Traces for @code{mfcalc}
+
+The debugging information normally gives the token type of each token read,
+but not its semantic value. The @code{%printer} directive allows specify
+how semantic values are reported, see @ref{Printer Decl, , Printing
+Semantic Values}. For backward compatibility, Yacc like C parsers may also
+use the @code{YYPRINT} (@pxref{The YYPRINT Macro, , The @code{YYPRINT}
+Macro}), but its use is discouraged.
+
+As a demonstration of @code{%printer}, consider the multi-function
+calculator, @code{mfcalc} (@pxref{Multi-function Calc}). To enable run-time
+traces, and semantic value reports, insert the following directives in its
+prologue:
+
+@comment file: mfcalc.y: 2
+@example
+/* Generate the parser description file. */
+%verbose
+/* Enable run-time traces (yydebug). */
+%define parse.trace
+
+/* Formatting semantic values. */
+%printer @{ fprintf (yyoutput, "%s", $$->name); @} VAR;
+%printer @{ fprintf (yyoutput, "%s()", $$->name); @} FNCT;
+%printer @{ fprintf (yyoutput, "%g", $$); @} <val>;
+@end example
+
+The @code{%define} directive instructs Bison to generate run-time trace
+support. Then, activation of these traces is controlled at run-time by the
+@code{yydebug} variable, which is disabled by default. Because these traces
+will refer to the ``states'' of the parser, it is helpful to ask for the
+creation of a description of that parser; this is the purpose of (admittedly
+ill-named) @code{%verbose} directive.
+
+The set of @code{%printer} directives demonstrates how to format the
+semantic value in the traces. Note that the specification can be done
+either on the symbol type (e.g., @code{VAR} or @code{FNCT}), or on the type
+tag: since @code{<val>} is the type for both @code{NUM} and @code{exp}, this
+printer will be used for them.
+
+Here is a sample of the information provided by run-time traces. The traces
+are sent onto standard error.
+
+@example
+$ @kbd{echo 'sin(1-1)' | ./mfcalc -p}
+Starting parse
+Entering state 0
+Reducing stack by rule 1 (line 34):
+-> $$ = nterm input ()
+Stack now 0
+Entering state 1
+@end example
+
+@noindent
+This first batch shows a specific feature of this grammar: the first rule
+(which is in line 34 of @file{mfcalc.y} can be reduced without even having
+to look for the first token. The resulting left-hand symbol (@code{$$}) is
+a valueless (@samp{()}) @code{input} non terminal (@code{nterm}).
+
+Then the parser calls the scanner.
+@example
+Reading a token: Next token is token FNCT (sin())
+Shifting token FNCT (sin())
+Entering state 6
+@end example
+
+@noindent
+That token (@code{token}) is a function (@code{FNCT}) whose value is
+@samp{sin} as formatted per our @code{%printer} specification: @samp{sin()}.
+The parser stores (@code{Shifting}) that token, and others, until it can do
+something about it.
+
+@example
+Reading a token: Next token is token '(' ()
+Shifting token '(' ()
+Entering state 14
+Reading a token: Next token is token NUM (1.000000)
+Shifting token NUM (1.000000)
+Entering state 4
+Reducing stack by rule 6 (line 44):
+ $1 = token NUM (1.000000)
+-> $$ = nterm exp (1.000000)
+Stack now 0 1 6 14
+Entering state 24
+@end example
+
+@noindent
+The previous reduction demonstrates the @code{%printer} directive for
+@code{<val>}: both the token @code{NUM} and the resulting non-terminal
+@code{exp} have @samp{1} as value.
+
+@example
+Reading a token: Next token is token '-' ()
+Shifting token '-' ()
+Entering state 17
+Reading a token: Next token is token NUM (1.000000)
+Shifting token NUM (1.000000)
+Entering state 4
+Reducing stack by rule 6 (line 44):
+ $1 = token NUM (1.000000)
+-> $$ = nterm exp (1.000000)
+Stack now 0 1 6 14 24 17
+Entering state 26
+Reading a token: Next token is token ')' ()
+Reducing stack by rule 11 (line 49):
+ $1 = nterm exp (1.000000)
+ $2 = token '-' ()
+ $3 = nterm exp (1.000000)
+-> $$ = nterm exp (0.000000)
+Stack now 0 1 6 14
+Entering state 24
+@end example
+
+@noindent
+The rule for the subtraction was just reduced. The parser is about to
+discover the end of the call to @code{sin}.
+
+@example
+Next token is token ')' ()
+Shifting token ')' ()
+Entering state 31
+Reducing stack by rule 9 (line 47):
+ $1 = token FNCT (sin())
+ $2 = token '(' ()
+ $3 = nterm exp (0.000000)
+ $4 = token ')' ()
+-> $$ = nterm exp (0.000000)
+Stack now 0 1
+Entering state 11
+@end example
+
+@noindent
+Finally, the end-of-line allow the parser to complete the computation, and
+display its result.
+
+@example
+Reading a token: Next token is token '\n' ()
+Shifting token '\n' ()
+Entering state 22
+Reducing stack by rule 4 (line 40):
+ $1 = nterm exp (0.000000)
+ $2 = token '\n' ()
+@result{} 0
+-> $$ = nterm line ()
+Stack now 0 1
+Entering state 10
+Reducing stack by rule 2 (line 35):
+ $1 = nterm input ()
+ $2 = nterm line ()
+-> $$ = nterm input ()
+Stack now 0
+Entering state 1
+@end example
+
+The parser has returned into state 1, in which it is waiting for the next
+expression to evaluate, or for the end-of-file token, which causes the
+completion of the parsing.
+
+@example
+Reading a token: Now at end of input.
+Shifting token $end ()
+Entering state 2
+Stack now 0 1 2
+Cleanup: popping token $end ()
+Cleanup: popping nterm input ()
+@end example
+
+
+@node The YYPRINT Macro
+@subsection The @code{YYPRINT} Macro
+
@findex YYPRINT
-The debugging information normally gives the token type of each token
-read, but not its semantic value. You can optionally define a macro
-named @code{YYPRINT} to provide a way to print the value. If you define
-@code{YYPRINT}, it should take three arguments. The parser will pass a
-standard I/O stream, the numeric code for the token type, and the token
-value (from @code{yylval}).
+Before @code{%printer} support, semantic values could be displayed using the
+@code{YYPRINT} macro, which works only for terminal symbols and only with
+the @file{yacc.c} skeleton.
+
+@deffn {Macro} YYPRINT (@var{stream}, @var{token}, @var{value});
+@findex YYPRINT
+If you define @code{YYPRINT}, it should take three arguments. The parser
+will pass a standard I/O stream, the numeric code for the token type, and
+the token value (from @code{yylval}).
+
+For @file{yacc.c} only. Obsoleted by @code{%printer}.
+@end deffn
Here is an example of @code{YYPRINT} suitable for the multi-function
calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}):
-@smallexample
+@example
%@{
static void print_token_value (FILE *, int, YYSTYPE);
- #define YYPRINT(file, type, value) print_token_value (file, type, value)
+ #define YYPRINT(File, Type, Value) \
+ print_token_value (File, Type, Value)
%@}
@dots{} %% @dots{} %% @dots{}
else if (type == NUM)
fprintf (file, "%d", value.val);
@}
-@end smallexample
+@end example
@c ================================================= Invoking Bison
For example, warn about unset @code{$$} in the mid-rule action in:
@example
- exp: '1' @{ $1 = 1; @} '+' exp @{ $$ = $2 + $4; @};
+exp: '1' @{ $1 = 1; @} '+' exp @{ $$ = $2 + $4; @};
@end example
These warnings are not enabled by default since they sometimes prove to
be false alarms in existing grammars employing the Yacc constructs
@code{$0} or @code{$-@var{n}} (where @var{n} is some positive integer).
-
@item yacc
Incompatibilities with POSIX Yacc.
+@item conflicts-sr
+@itemx conflicts-rr
+S/R and R/R conflicts. These warnings are enabled by default. However, if
+the @code{%expect} or @code{%expect-rr} directive is specified, an
+unexpected number of conflicts is an error, and an expected number of
+conflicts is not reported, so @option{-W} and @option{--warning} then have
+no effect on the conflict report.
+
+@item other
+All warnings not categorized above. These warnings are enabled by default.
+
+This category is provided merely for the sake of completeness. Future
+releases of Bison may move warnings from this category to new, more specific
+categories.
+
@item all
All the warnings.
@item none
@c - %define filename_type "const symbol::Symbol"
When the directive @code{%locations} is used, the C++ parser supports
-location tracking, see @ref{Locations, , Locations Overview}. Two
-auxiliary classes define a @code{position}, a single point in a file,
-and a @code{location}, a range composed of a pair of
-@code{position}s (possibly spanning several files).
+location tracking, see @ref{Tracking Locations}. Two auxiliary classes
+define a @code{position}, a single point in a file, and a @code{location}, a
+range composed of a pair of @code{position}s (possibly spanning several
+files).
+
+@tindex uint
+In this section @code{uint} is an abbreviation for @code{unsigned int}: in
+genuine code only the latter is used.
+
+@menu
+* C++ position:: One point in the source file
+* C++ location:: Two points in the source file
+@end menu
+
+@node C++ position
+@subsubsection C++ @code{position}
+
+@deftypeop {Constructor} {position} {} position (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1)
+Create a @code{position} denoting a given point. Note that @code{file} is
+not reclaimed when the @code{position} is destroyed: memory managed must be
+handled elsewhere.
+@end deftypeop
+
+@deftypemethod {position} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1)
+Reset the position to the given values.
+@end deftypemethod
-@deftypemethod {position} {std::string*} file
+@deftypeivar {position} {std::string*} file
The name of the file. It will always be handled as a pointer, the
parser will never duplicate nor deallocate it. As an experimental
feature you may change it to @samp{@var{type}*} using @samp{%define
filename_type "@var{type}"}.
-@end deftypemethod
+@end deftypeivar
-@deftypemethod {position} {unsigned int} line
+@deftypeivar {position} {uint} line
The line, starting at 1.
-@end deftypemethod
+@end deftypeivar
-@deftypemethod {position} {unsigned int} lines (int @var{height} = 1)
+@deftypemethod {position} {uint} lines (int @var{height} = 1)
Advance by @var{height} lines, resetting the column number.
@end deftypemethod
-@deftypemethod {position} {unsigned int} column
-The column, starting at 0.
-@end deftypemethod
+@deftypeivar {position} {uint} column
+The column, starting at 1.
+@end deftypeivar
-@deftypemethod {position} {unsigned int} columns (int @var{width} = 1)
+@deftypemethod {position} {uint} columns (int @var{width} = 1)
Advance by @var{width} columns, without changing the line number.
@end deftypemethod
-@deftypemethod {position} {position&} operator+= (position& @var{pos}, int @var{width})
-@deftypemethodx {position} {position} operator+ (const position& @var{pos}, int @var{width})
-@deftypemethodx {position} {position&} operator-= (const position& @var{pos}, int @var{width})
-@deftypemethodx {position} {position} operator- (position& @var{pos}, int @var{width})
+@deftypemethod {position} {position&} operator+= (int @var{width})
+@deftypemethodx {position} {position} operator+ (int @var{width})
+@deftypemethodx {position} {position&} operator-= (int @var{width})
+@deftypemethodx {position} {position} operator- (int @var{width})
Various forms of syntactic sugar for @code{columns}.
@end deftypemethod
-@deftypemethod {position} {position} operator<< (std::ostream @var{o}, const position& @var{p})
+@deftypemethod {position} {bool} operator== (const position& @var{that})
+@deftypemethodx {position} {bool} operator!= (const position& @var{that})
+Whether @code{*this} and @code{that} denote equal/different positions.
+@end deftypemethod
+
+@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const position& @var{p})
Report @var{p} on @var{o} like this:
@samp{@var{file}:@var{line}.@var{column}}, or
@samp{@var{line}.@var{column}} if @var{file} is null.
+@end deftypefun
+
+@node C++ location
+@subsubsection C++ @code{location}
+
+@deftypeop {Constructor} {location} {} location (const position& @var{begin}, const position& @var{end})
+Create a @code{Location} from the endpoints of the range.
+@end deftypeop
+
+@deftypeop {Constructor} {location} {} location (const position& @var{pos} = position())
+@deftypeopx {Constructor} {location} {} location (std::string* @var{file}, uint @var{line}, uint @var{col})
+Create a @code{Location} denoting an empty range located at a given point.
+@end deftypeop
+
+@deftypemethod {location} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1)
+Reset the location to an empty range at the given values.
@end deftypemethod
-@deftypemethod {location} {position} begin
-@deftypemethodx {location} {position} end
+@deftypeivar {location} {position} begin
+@deftypeivarx {location} {position} end
The first, inclusive, position of the range, and the first beyond.
-@end deftypemethod
+@end deftypeivar
-@deftypemethod {location} {unsigned int} columns (int @var{width} = 1)
-@deftypemethodx {location} {unsigned int} lines (int @var{height} = 1)
+@deftypemethod {location} {uint} columns (int @var{width} = 1)
+@deftypemethodx {location} {uint} lines (int @var{height} = 1)
Advance the @code{end} position.
@end deftypemethod
-@deftypemethod {location} {location} operator+ (const location& @var{begin}, const location& @var{end})
-@deftypemethodx {location} {location} operator+ (const location& @var{begin}, int @var{width})
-@deftypemethodx {location} {location} operator+= (const location& @var{loc}, int @var{width})
+@deftypemethod {location} {location} operator+ (const location& @var{end})
+@deftypemethodx {location} {location} operator+ (int @var{width})
+@deftypemethodx {location} {location} operator+= (int @var{width})
Various forms of syntactic sugar.
@end deftypemethod
Move @code{begin} onto @code{end}.
@end deftypemethod
+@deftypemethod {location} {bool} operator== (const location& @var{that})
+@deftypemethodx {location} {bool} operator!= (const location& @var{that})
+Whether @code{*this} and @code{that} denote equal/different ranges of
+positions.
+@end deftypemethod
+
+@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const location& @var{p})
+Report @var{p} on @var{o}, taking care of special cases such as: no
+@code{filename} defined, or equal filename/line or column.
+@end deftypefun
@node C++ Parser Interface
@subsection C++ Parser Interface
@end defcv
@defcv {Type} {parser} {token}
-A structure that contains (only) the definition of the tokens as the
-@code{yytokentype} enumeration. To refer to the token @code{FOO}, the
-scanner should use @code{yy::parser::token::FOO}. The scanner can use
+A structure that contains (only) the @code{yytokentype} enumeration, which
+defines the tokens. To refer to the token @code{FOO},
+use @code{yy::parser::token::FOO}. The scanner can use
@samp{typedef yy::parser::token token;} to ``import'' the token enumeration
(@pxref{Calc++ Scanner}).
@end defcv
@defcv {Type} {parser} {syntax_error}
This class derives from @code{std::runtime_error}. Throw instances of it
-from user actions to raise parse errors. This is equivalent with first
+from the scanner or from the user actions to raise parse errors. This is
+equivalent with first
invoking @code{error} to report the location and message of the syntax
error, and then to invoke @code{YYERROR} to enter the error-recovery mode.
But contrary to @code{YYERROR} which can only be invoked from user actions
@comment file: calc++-parser.yy
@example
-%skeleton "lalr1.cc" /* -*- C++ -*- */
+%skeleton "lalr1.cc" /* -*- C++ -*- */
%require "@value{VERSION}"
%defines
%define parser_class_name "calcxx_parser"
@end example
@noindent
-Use the following two directives to enable parser tracing and verbose
-error messages.
+Use the following two directives to enable parser tracing and verbose error
+messages. However, verbose error messages can contain incorrect information
+(@pxref{LAC}).
@comment file: calc++-parser.yy
@example
unit: assignments exp @{ driver.result = $2; @};
assignments:
- assignments assignment @{@}
-| /* Nothing. */ @{@};
+ /* Nothing. */ @{@}
+| assignments assignment @{@};
assignment:
"identifier" ":=" exp @{ driver.variables[$1] = $3; @};
@comment file: calc++-scanner.ll
@example
-%@{ /* -*- C++ -*- */
+%@{ /* -*- C++ -*- */
# include <cerrno>
# include <climits>
# include <cstdlib>
@comment file: calc++-scanner.ll
@example
+@group
%@{
// Code run each time a pattern is matched.
# define YY_USER_ACTION loc.columns (yyleng);
%@}
+@end group
%%
+@group
%@{
// Code run each time yylex is called.
loc.step ();
%@}
+@end group
@{blank@}+ loc.step ();
[\n]+ loc.lines (yyleng); loc.step ();
@end example
")" return yy::calcxx_parser::make_RPAREN(loc);
":=" return yy::calcxx_parser::make_ASSIGN(loc);
+@group
@{int@} @{
errno = 0;
long n = strtol (yytext, NULL, 10);
driver.error (loc, "integer is out of range");
return yy::calcxx_parser::make_NUMBER(n, loc);
@}
+@end group
@{id@} return yy::calcxx_parser::make_IDENTIFIER(yytext, loc);
. driver.error (loc, "invalid character");
<<EOF>> return yy::calcxx_parser::make_END(loc);
@comment file: calc++-scanner.ll
@example
+@group
void
calcxx_driver::scan_begin ()
@{
yy_flex_debug = trace_scanning;
- if (file == "-")
+ if (file.empty () || file == "-")
yyin = stdin;
else if (!(yyin = fopen (file.c_str (), "r")))
@{
- error (std::string ("cannot open ") + file + ": " + strerror(errno));
- exit (1);
+ error ("cannot open " + file + ": " + strerror(errno));
+ exit (EXIT_FAILURE);
@}
@}
+@end group
+@group
void
calcxx_driver::scan_end ()
@{
fclose (yyin);
@}
+@end group
@end example
@node Calc++ Top Level
#include <iostream>
#include "calc++-driver.hh"
+@group
int
main (int argc, char *argv[])
@{
int res = 0;
calcxx_driver driver;
- for (++argv; argv[0]; ++argv)
- if (*argv == std::string ("-p"))
+ for (int i = 1; i < argc; ++i)
+ if (argv[i] == std::string ("-p"))
driver.trace_parsing = true;
- else if (*argv == std::string ("-s"))
+ else if (argv[i] == std::string ("-s"))
driver.trace_scanning = true;
- else if (!driver.parse (*argv))
+ else if (!driver.parse (argv[i]))
std::cout << driver.result << std::endl;
else
res = 1;
return res;
@}
+@end group
@end example
@node Java Parsers
@c - class Position
@c - class Location
-When the directive @code{%locations} is used, the Java parser
-supports location tracking, see @ref{Locations, , Locations Overview}.
-An auxiliary user-defined class defines a @dfn{position}, a single point
-in a file; Bison itself defines a class representing a @dfn{location},
-a range composed of a pair of positions (possibly spanning several
-files). The location class is an inner class of the parser; the name
-is @code{Location} by default, and may also be renamed using
-@samp{%define location_type "@var{class-name}"}.
+When the directive @code{%locations} is used, the Java parser supports
+location tracking, see @ref{Tracking Locations}. An auxiliary user-defined
+class defines a @dfn{position}, a single point in a file; Bison itself
+defines a class representing a @dfn{location}, a range composed of a pair of
+positions (possibly spanning several files). The location class is an inner
+class of the parser; the name is @code{Location} by default, and may also be
+renamed using @samp{%define location_type "@var{class-name}"}.
The location class treats the position as a completely opaque value.
By default, the class name is @code{Position}, but this can be changed
Use @code{%code init} for code added to the start of the constructor
body. This is especially useful to initialize superclasses. Use
-@samp{%define init_throws} to specify any uncatch exceptions.
+@samp{%define init_throws} to specify any uncaught exceptions.
@end deftypeop
@deftypemethod {YYParser} {boolean} parse ()
appear in an action. The actual definition of these symbols is
opaque to the Bison grammar, and it might change in the future. The
only meaningful operation that you can do, is to return them.
-See @pxref{Java Action Features}.
+@xref{Java Action Features}.
Note that of these three symbols, only @code{YYACCEPT} and
@code{YYABORT} will cause a return from the @code{yyparse}
an union. The type of @code{$$}, even with angle brackets, is the base
type since Java casts are not allow on the left-hand side of assignments.
Also, @code{$@var{n}} and @code{@@@var{n}} are not allowed on the
-left-hand side of assignments. See @pxref{Java Semantic Values} and
-@pxref{Java Action Features}.
+left-hand side of assignments. @xref{Java Semantic Values} and
+@ref{Java Action Features}.
@item
The prologue declarations have a different meaning than in C/C++ code.
@item @code{%code lexer}
blocks, if specified, should include the implementation of the
scanner. If there is no such block, the scanner can be any class
-that implements the appropriate interface (see @pxref{Java Scanner
+that implements the appropriate interface (@pxref{Java Scanner
Interface}).
@end table
@node Memory Exhausted
@section Memory Exhausted
-@display
+@quotation
My parser returns with error with a @samp{memory exhausted}
message. What can I do?
-@end display
+@end quotation
This question is already addressed elsewhere, @xref{Recursion,
,Recursive Rules}.
The following phenomenon has several symptoms, resulting in the
following typical questions:
-@display
+@quotation
I invoke @code{yyparse} several times, and on correct input it works
properly; but when a parse error is found, all the other calls fail
too. How can I reset the error flag of @code{yyparse}?
-@end display
+@end quotation
@noindent
or
-@display
+@quotation
My parser includes support for an @samp{#include}-like feature, in
which case I run @code{yyparse} from @code{yyparse}. This fails
although I did specify @samp{%define api.pure}.
-@end display
+@end quotation
These problems typically come not from Bison itself, but from
Lex-generated scanners. Because these scanners use large buffers for
demonstration, consider the following source file,
@file{first-line.l}:
-@verbatim
-%{
+@example
+@group
+%@{
#include <stdio.h>
#include <stdlib.h>
-%}
+%@}
+@end group
%%
.*\n ECHO; return 1;
%%
+@group
int
yyparse (char const *file)
-{
+@{
yyin = fopen (file, "r");
if (!yyin)
- exit (2);
+ @{
+ perror ("fopen");
+ exit (EXIT_FAILURE);
+ @}
+@end group
+@group
/* One token only. */
yylex ();
if (fclose (yyin) != 0)
- exit (3);
+ @{
+ perror ("fclose");
+ exit (EXIT_FAILURE);
+ @}
return 0;
-}
+@}
+@end group
+@group
int
main (void)
-{
+@{
yyparse ("input");
yyparse ("input");
return 0;
-}
-@end verbatim
+@}
+@end group
+@end example
@noindent
If the file @file{input} contains
-@verbatim
+@example
input:1: Hello,
input:2: World!
-@end verbatim
+@end example
@noindent
then instead of getting the first line twice, you get:
@node Strings are Destroyed
@section Strings are Destroyed
-@display
+@quotation
My parser seems to destroy old strings, or maybe it loses track of
them. Instead of reporting @samp{"foo", "bar"}, it reports
@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
-@end display
+@end quotation
This error is probably the single most frequent ``bug report'' sent to
Bison lists, but is only concerned with a misunderstanding of the role
of the scanner. Consider the following Lex code:
-@verbatim
-%{
+@example
+@group
+%@{
#include <stdio.h>
char *yylval = NULL;
-%}
+%@}
+@end group
+@group
%%
.* yylval = yytext; return 1;
\n /* IGNORE */
%%
+@end group
+@group
int
main ()
-{
+@{
/* Similar to using $1, $2 in a Bison action. */
char *fst = (yylex (), yylval);
char *snd = (yylex (), yylval);
printf ("\"%s\", \"%s\"\n", fst, snd);
return 0;
-}
-@end verbatim
+@}
+@end group
+@end example
If you compile and run this code, you get:
@node Implementing Gotos/Loops
@section Implementing Gotos/Loops
-@display
+@quotation
My simple calculator supports variables, assignments, and functions,
but how can I implement gotos, or loops?
-@end display
+@end quotation
Although very pedagogical, the examples included in the document blur
the distinction to make between the parser---whose job is to recover
@node Multiple start-symbols
@section Multiple start-symbols
-@display
+@quotation
I have several closely related grammars, and I would like to share their
implementations. In fact, I could use a single grammar but with
multiple entry points.
-@end display
+@end quotation
Bison does not support multiple start-symbols, but there is a very
simple means to simulate them. If @code{foo} and @code{bar} are the two
@example
%token START_FOO START_BAR;
%start start;
-start: START_FOO foo
- | START_BAR bar;
+start:
+ START_FOO foo
+| START_BAR bar;
@end example
These tokens prevents the introduction of new conflicts. As far as the
@node Secure? Conform?
@section Secure? Conform?
-@display
+@quotation
Is Bison secure? Does it conform to POSIX?
-@end display
+@end quotation
If you're looking for a guarantee or certification, we don't provide it.
However, Bison is intended to be a reliable program that conforms to the
@node I can't build Bison
@section I can't build Bison
-@display
+@quotation
I can't build Bison because @command{make} complains that
@code{msgfmt} is not found.
What should I do?
-@end display
+@end quotation
Like most GNU packages with internationalization support, that feature
is turned on by default. If you have problems building in the @file{po}
@node Where can I find help?
@section Where can I find help?
-@display
+@quotation
I'm having trouble using Bison. Where can I find help?
-@end display
+@end quotation
First, read this fine manual. Beyond that, you can send mail to
@email{help-bison@@gnu.org}. This mailing list is intended to be
@node Bug Reports
@section Bug Reports
-@display
+@quotation
I found a bug. What should I include in the bug report?
-@end display
+@end quotation
Before you send a bug report, make sure you are using the latest
version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its
send additional files as well (such as `config.h' or `config.cache').
Patches are most welcome, but not required. That is, do not hesitate to
-send a bug report just because you can not provide a fix.
+send a bug report just because you cannot provide a fix.
Send bug reports to @email{bug-bison@@gnu.org}.
@node More Languages
@section More Languages
-@display
+@quotation
Will Bison ever have C++ and Java support? How about @var{insert your
favorite language here}?
-@end display
+@end quotation
C++ and Java support is there now, and is documented. We'd love to add other
languages; contributions are welcome.
@node Beta Testing
@section Beta Testing
-@display
+@quotation
What is involved in being a beta tester?
-@end display
+@end quotation
It's not terribly involved. Basically, you would download a test
release, compile it, and use it to build and run a parser or two. After
@node Mailing Lists
@section Mailing Lists
-@display
+@quotation
How do I join the help-bison and bug-bison mailing lists?
-@end display
+@end quotation
See @url{http://lists.gnu.org/}.
@deffn {Variable} @@$
In an action, the location of the left-hand side of the rule.
-@xref{Locations, , Locations Overview}.
+@xref{Tracking Locations}.
@end deffn
@deffn {Variable} @@@var{n}
-In an action, the location of the @var{n}-th symbol of the right-hand
-side of the rule. @xref{Locations, , Locations Overview}.
+In an action, the location of the @var{n}-th symbol of the right-hand side
+of the rule. @xref{Tracking Locations}.
@end deffn
@deffn {Variable} @@@var{name}
-In an action, the location of a symbol addressed by name.
-@xref{Locations, , Locations Overview}.
+In an action, the location of a symbol addressed by name. @xref{Tracking
+Locations}.
@end deffn
@deffn {Variable} @@[@var{name}]
-In an action, the location of a symbol addressed by name.
-@xref{Locations, , Locations Overview}.
+In an action, the location of a symbol addressed by name. @xref{Tracking
+Locations}.
@end deffn
@deffn {Variable} $$
@end deffn
@end ifset
-@deffn {Directive} %define @var{define-variable}
-@deffnx {Directive} %define @var{define-variable} @var{value}
-@deffnx {Directive} %define @var{define-variable} "@var{value}"
+@deffn {Directive} %define @var{variable}
+@deffnx {Directive} %define @var{variable} @var{value}
+@deffnx {Directive} %define @var{variable} "@var{value}"
Define a variable to adjust Bison's behavior. @xref{%define Summary}.
@end deffn
@end deffn
@deffn {Directive} %error-verbose
-An obsolete directive standing for @samp{%define parse.error verbose}.
+An obsolete directive standing for @samp{%define parse.error verbose}
+(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}).
@end deffn
@deffn {Directive} %file-prefix "@var{prefix}"
(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}).
@end deffn
+@deffn {Macro} YYFPRINTF
+Macro used to output run-time traces.
+@xref{Enabling Traces}.
+@end deffn
+
@deffn {Macro} YYINITDEPTH
Macro for specifying the initial size of the parser stack.
@xref{Memory Management}.
parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
@end deffn
+@deffn {Macro} YYPRINT
+Macro used to output token semantic values. For @file{yacc.c} only.
+Obsoleted by @code{%printer}.
+@xref{The YYPRINT Macro, , The @code{YYPRINT} Macro}.
+@end deffn
+
@deffn {Function} yypstate_delete
The function to delete a parser instance, produced by Bison in push mode;
call this function to delete the memory associated with a parser.
@cindex glossary
@table @asis
-@item Accepting State
+@item Accepting state
A state whose only action is the accept action.
The accepting state is thus a consistent state.
@xref{Understanding,,}.
committee document contributing to what became the Algol 60 report.
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
-@item Consistent State
-A state containing only one possible action. @xref{%define
-Summary,,lr.default-reductions}.
+@item Consistent state
+A state containing only one possible action. @xref{Default Reductions}.
@item Context-free grammars
Grammars specified as rules that can be applied regardless of context.
permitted. @xref{Language and Grammar, ,Languages and Context-Free
Grammars}.
-@item Default Reduction
+@item Default reduction
The reduction that a parser should perform if the current parser state
contains no other action for the lookahead token. In permitted parser
-states, Bison declares the reduction with the largest lookahead set to
-be the default reduction and removes that lookahead set.
-@xref{%define Summary,,lr.default-reductions}.
+states, Bison declares the reduction with the largest lookahead set to be
+the default reduction and removes that lookahead set. @xref{Default
+Reductions}.
+
+@item Defaulted state
+A consistent state with a default reduction. @xref{Default Reductions}.
@item Dynamic allocation
Allocation of memory that occurs during execution, rather than at
for example, `expression' or `declaration' in C@.
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
-@item IELR(1)
-A minimal LR(1) parser table generation algorithm. That is, given any
+@item IELR(1) (Inadequacy Elimination LR(1))
+A minimal LR(1) parser table construction algorithm. That is, given any
context-free grammar, IELR(1) generates parser tables with the full
-language recognition power of canonical LR(1) but with nearly the same
-number of parser states as LALR(1). This reduction in parser states
-is often an order of magnitude. More importantly, because canonical
-LR(1)'s extra parser states may contain duplicate conflicts in the
-case of non-LR(1) grammars, the number of conflicts for IELR(1) is
-often an order of magnitude less as well. This can significantly
-reduce the complexity of developing of a grammar. @xref{%define
-Summary,,lr.type}.
+language-recognition power of canonical LR(1) but with nearly the same
+number of parser states as LALR(1). This reduction in parser states is
+often an order of magnitude. More importantly, because canonical LR(1)'s
+extra parser states may contain duplicate conflicts in the case of non-LR(1)
+grammars, the number of conflicts for IELR(1) is often an order of magnitude
+less as well. This can significantly reduce the complexity of developing a
+grammar. @xref{LR Table Construction}.
@item Infix operator
An arithmetic operator that is placed between the operands on which it
@item LAC (Lookahead Correction)
A parsing mechanism that fixes the problem of delayed syntax error
-detection, which is caused by LR state merging, default reductions,
-and the use of @code{%nonassoc}. Delayed syntax error detection
-results in unexpected semantic actions, initiation of error recovery
-in the wrong syntactic context, and an incorrect list of expected
-tokens in a verbose syntax error message. @xref{%define
-Summary,,parse.lac}.
+detection, which is caused by LR state merging, default reductions, and the
+use of @code{%nonassoc}. Delayed syntax error detection results in
+unexpected semantic actions, initiation of error recovery in the wrong
+syntactic context, and an incorrect list of expected tokens in a verbose
+syntax error message. @xref{LAC}.
@item Language construct
One of the typical usage schemas of the language. For example, one of
@item LALR(1)
The class of context-free grammars that Bison (like most other parser
generators) can handle by default; a subset of LR(1).
-@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}.
+@xref{Mysterious Conflicts}.
@item LR(1)
The class of context-free grammars in which at most one token of
A grammar symbol that has no rules in the grammar and therefore is
grammatically indivisible. The piece of text it represents is a token.
@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
+
+@item Unreachable state
+A parser state to which there does not exist a sequence of transitions from
+the parser's start state. A state can become unreachable during conflict
+resolution. @xref{Unreachable States}.
@end table
@node Copying This Manual
@appendix Copying This Manual
@include fdl.texi
+@node Bibliography
+@unnumbered Bibliography
+
+@table @asis
+@item [Denny 2008]
+Joel E. Denny and Brian A. Malloy, IELR(1): Practical LR(1) Parser Tables
+for Non-LR(1) Grammars with Conflict Resolution, in @cite{Proceedings of the
+2008 ACM Symposium on Applied Computing} (SAC'08), ACM, New York, NY, USA,
+pp.@: 240--245. @uref{http://dx.doi.org/10.1145/1363686.1363747}
+
+@item [Denny 2010 May]
+Joel E. Denny, PSLR(1): Pseudo-Scannerless Minimal LR(1) for the
+Deterministic Parsing of Composite Languages, Ph.D. Dissertation, Clemson
+University, Clemson, SC, USA (May 2010).
+@uref{http://proquest.umi.com/pqdlink?did=2041473591&Fmt=7&clientId=79356&RQT=309&VName=PQD}
+
+@item [Denny 2010 November]
+Joel E. Denny and Brian A. Malloy, The IELR(1) Algorithm for Generating
+Minimal LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution,
+in @cite{Science of Computer Programming}, Vol.@: 75, Issue 11 (November
+2010), pp.@: 943--979. @uref{http://dx.doi.org/10.1016/j.scico.2009.08.001}
+
+@item [DeRemer 1982]
+Frank DeRemer and Thomas Pennello, Efficient Computation of LALR(1)
+Look-Ahead Sets, in @cite{ACM Transactions on Programming Languages and
+Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@:
+615--649. @uref{http://dx.doi.org/10.1145/69622.357187}
+
+@item [Knuth 1965]
+Donald E. Knuth, On the Translation of Languages from Left to Right, in
+@cite{Information and Control}, Vol.@: 8, Issue 6 (December 1965), pp.@:
+607--639. @uref{http://dx.doi.org/10.1016/S0019-9958(65)90426-2}
+
+@item [Scott 2000]
+Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf Hussain,
+@cite{Tomita-Style Generalised LR Parsers}, Royal Holloway, University of
+London, Department of Computer Science, TR-00-12 (December 2000).
+@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps}
+@end table
+
@node Index
@unnumbered Index
@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit nonfree
@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok rr
@c LocalWords: longjmp fprintf stderr yylloc YYLTYPE cos ln Stallman Destructor
-@c LocalWords: smallexample symrec val tptr FNCT fnctptr func struct sym enum
+@c LocalWords: symrec val tptr FNCT fnctptr func struct sym enum IEC syntaxes
@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof Lex
@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum DOTDOT
@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype Unary
@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP subrange
@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword loc
@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH inline
-@c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm notype Lookahead
+@c LocalWords: YYINITDEPTH stmts ref initdcl maybeasm notype Lookahead yyoutput
@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args Autoconf
@c LocalWords: infile ypp yxx outfile itemx tex leaderfill Troubleshouting sqrt
@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC
-@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs
-@c LocalWords: yytokentype destructor multicharacter nonnull EBCDIC
+@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs sr
+@c LocalWords: yytokentype destructor multicharacter nonnull EBCDIC nterm LR's
@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
-@c LocalWords: destructors Reentrancy nonreentrant subgrammar nonassociative
+@c LocalWords: destructors Reentrancy nonreentrant subgrammar nonassociative Ph
@c LocalWords: deffnx namespace xml goto lalr ielr runtime lex yacc yyps env
@c LocalWords: yystate variadic Unshift NLS gettext po UTF Automake LOCALEDIR
@c LocalWords: YYENABLE bindtextdomain Makefile DEFS CPPFLAGS DBISON DeRemer
-@c LocalWords: autoreconf Pennello multisets nondeterminism Generalised baz
+@c LocalWords: autoreconf Pennello multisets nondeterminism Generalised baz ACM
@c LocalWords: redeclare automata Dparse localedir datadir XSLT midrule Wno
-@c LocalWords: Graphviz multitable headitem hh basename Doxygen fno
+@c LocalWords: Graphviz multitable headitem hh basename Doxygen fno filename
@c LocalWords: doxygen ival sval deftypemethod deallocate pos deftypemethodx
@c LocalWords: Ctor defcv defcvx arg accessors arithmetics CPP ifndef CALCXX
@c LocalWords: lexer's calcxx bool LPAREN RPAREN deallocation cerrno climits
@c LocalWords: cstdlib Debian undef yywrap unput noyywrap nounput zA yyleng
-@c LocalWords: errno strtol ERANGE str strerror iostream argc argv Javadoc
+@c LocalWords: errno strtol ERANGE str strerror iostream argc argv Javadoc PSLR
@c LocalWords: bytecode initializers superclass stype ASTNode autoboxing nls
@c LocalWords: toString deftypeivar deftypeivarx deftypeop YYParser strictfp
@c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv
@c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url
@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos
-@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett
-@c LocalWords: subdirectory Solaris nonassociativity
+@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's
+@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy
+@c LocalWords: Scannerless ispell american
@c Local Variables:
@c ispell-dictionary: "american"