This manual (@value{UPDATED}) is for GNU Bison (version
@value{VERSION}), the GNU parser generator.
-Copyright @copyright{} 1988-1993, 1995, 1998-2011 Free Software
+Copyright @copyright{} 1988-1993, 1995, 1998-2012 Free Software
Foundation, Inc.
@quotation
the name of an identifier, etc.).
* Semantic Actions:: Each rule can have an action containing C code.
* GLR Parsers:: Writing parsers for general context-free languages.
-* Locations Overview:: Tracking Locations.
+* Locations:: Overview of location tracking.
* Bison Parser:: What are Bison's input and output,
how is the output used?
* Stages:: Stages in writing and running Bison grammars.
Grammar Rules for @code{rpcalc}
-* Rpcalc Input::
-* Rpcalc Line::
-* Rpcalc Expr::
+* Rpcalc Input:: Explanation of the @code{input} nonterminal
+* Rpcalc Line:: Explanation of the @code{line} nonterminal
+* Rpcalc Expr:: Explanation of the @code{expr} nonterminal
Location Tracking Calculator: @code{ltcalc}
* Mfcalc Declarations:: Bison declarations for multi-function calculator.
* Mfcalc Rules:: Grammar rules for the calculator.
* Mfcalc Symbol Table:: Symbol table management subroutines.
+* Mfcalc Lexer:: The lexical analyzer.
+* Mfcalc Main:: The controlling function.
Bison Grammar Files
-* Grammar Outline:: Overall layout of the grammar file.
-* Symbols:: Terminal and nonterminal symbols.
-* Rules:: How to write grammar rules.
-* Recursion:: Writing recursive rules.
-* Semantics:: Semantic values and actions.
-* Locations:: Locations and actions.
-* Declarations:: All kinds of Bison declarations are described here.
-* Multiple Parsers:: Putting more than one Bison parser in one program.
+* Grammar Outline:: Overall layout of the grammar file.
+* Symbols:: Terminal and nonterminal symbols.
+* Rules:: How to write grammar rules.
+* Recursion:: Writing recursive rules.
+* Semantics:: Semantic values and actions.
+* Tracking Locations:: Locations and actions.
+* Named References:: Using named references in actions.
+* Declarations:: All kinds of Bison declarations are described here.
+* Multiple Parsers:: Putting more than one Bison parser in one program.
Outline of a Bison Grammar
* Mid-Rule Actions:: Most actions go at the end of a rule.
This says when, why and how to use the exceptional
action in the middle of a rule.
-* Named References:: Using named references in actions.
Tracking Locations
the name of an identifier, etc.).
* Semantic Actions:: Each rule can have an action containing C code.
* GLR Parsers:: Writing parsers for general context-free languages.
-* Locations Overview:: Tracking Locations.
+* Locations:: Overview of location tracking.
* Bison Parser:: What are Bison's input and output,
how is the output used?
* Stages:: Stages in writing and running Bison grammars.
initiate error recovery.
During deterministic GLR operation, the effect of @code{YYERROR} is
the same as its effect in a deterministic parser.
-The effect in a deferred action is similar, but the precise point of the
-error is undefined; instead, the parser reverts to deterministic operation,
+The effect in a deferred action is similar, but the precise point of the
+error is undefined; instead, the parser reverts to deterministic operation,
selecting an unspecified stack on which to continue with a syntax error.
In a semantic predicate (see @ref{Semantic Predicates}) during nondeterministic
parsing, @code{YYERROR} silently prunes
@end smallexample
@noindent
-is one way to allow the same parser to handle two different syntaxes for
+is one way to allow the same parser to handle two different syntaxes for
widgets. The clause preceded by @code{%?} is treated like an ordinary
action, except that its text is treated as an expression and is always
-evaluated immediately (even when in nondeterministic mode). If the
+evaluated immediately (even when in nondeterministic mode). If the
expression yields 0 (false), the clause is treated as a syntax error,
-which, in a nondeterministic parser, causes the stack in which it is reduced
+which, in a nondeterministic parser, causes the stack in which it is reduced
to die. In a deterministic parser, it acts like YYERROR.
As the example shows, predicates otherwise look like semantic actions, and
There is a subtle difference between semantic predicates and ordinary
actions in nondeterministic mode, since the latter are deferred.
-For example, we could try to rewrite the previous example as
+For example, we could try to rewrite the previous example as
@smallexample
widget :
false). However, this
does @emph{not} have the same effect if @code{new_args} and @code{old_args}
have overlapping syntax.
-Since the mid-rule actions testing @code{new_syntax} are deferred,
+Since the mid-rule actions testing @code{new_syntax} are deferred,
a GLR parser first encounters the unresolved ambiguous reduction
for cases where @code{new_args} and @code{old_args} recognize the same string
@emph{before} performing the tests of @code{new_syntax}. It therefore
@example
%@{
- #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline
- #define inline
+ #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \
+ && ! defined inline)
+ # define inline
#endif
%@}
@end example
-@node Locations Overview
+@node Locations
@section Locations
@cindex location
@cindex textual location
Bison provides a mechanism for handling these locations.
Each token has a semantic value. In a similar fashion, each token has an
-associated location, but the type of locations is the same for all tokens and
-groupings. Moreover, the output parser is equipped with a default data
-structure for storing locations (@pxref{Locations}, for more details).
+associated location, but the type of locations is the same for all tokens
+and groupings. Moreover, the output parser is equipped with a default data
+structure for storing locations (@pxref{Tracking Locations}, for more
+details).
Like semantic values, locations can be reached in actions using a dedicated
set of constructs. In the example above, the location of the whole grouping
@cindex simple examples
@cindex examples, simple
-Now we show and explain three sample programs written using Bison: a
+Now we show and explain several sample programs written using Bison: a
reverse polish notation calculator, an algebraic (infix) notation
-calculator, and a multi-function calculator. All three have been tested
-under BSD Unix 4.3; each produces a usable, though limited, interactive
-desk-top calculator.
+calculator --- later extended to track ``locations'' ---
+and a multi-function calculator. All
+produce usable, though limited, interactive desk-top calculators.
These examples are simple, but Bison grammars for real programming
languages are written the same way. You can copy these examples into a
Here are the C and Bison declarations for the reverse polish notation
calculator. As in C, comments are placed between @samp{/*@dots{}*/}.
+@comment file: rpcalc.y
@example
/* Reverse polish notation calculator. */
%@{
#define YYSTYPE double
+ #include <stdio.h>
#include <math.h>
int yylex (void);
void yyerror (char const *);
Here are the grammar rules for the reverse polish notation calculator.
+@comment file: rpcalc.y
@example
+@group
input: /* empty */
| input line
;
+@end group
+@group
line: '\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
+ | exp '\n' @{ printf ("%.10g\n", $1); @}
;
+@end group
+@group
exp: NUM @{ $$ = $1; @}
| exp exp '+' @{ $$ = $1 + $2; @}
| exp exp '-' @{ $$ = $1 - $2; @}
| exp exp '*' @{ $$ = $1 * $2; @}
| exp exp '/' @{ $$ = $1 / $2; @}
- /* Exponentiation */
- | exp exp '^' @{ $$ = pow ($1, $2); @}
- /* Unary minus */
- | exp 'n' @{ $$ = -$1; @}
+ | exp exp '^' @{ $$ = pow ($1, $2); @} /* Exponentiation */
+ | exp 'n' @{ $$ = -$1; @} /* Unary minus */
;
+@end group
%%
@end example
rule are referred to as @code{$1}, @code{$2}, and so on.
@menu
-* Rpcalc Input::
-* Rpcalc Line::
-* Rpcalc Expr::
+* Rpcalc Input:: Explanation of the @code{input} nonterminal
+* Rpcalc Line:: Explanation of the @code{line} nonterminal
+* Rpcalc Expr:: Explanation of the @code{expr} nonterminal
@end menu
@node Rpcalc Input
@example
line: '\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
+ | exp '\n' @{ printf ("%.10g\n", $1); @}
;
@end example
Here is the code for the lexical analyzer:
+@comment file: rpcalc.y
@example
@group
/* The lexical analyzer returns a double floating point
/* Skip white space. */
while ((c = getchar ()) == ' ' || c == '\t')
- ;
+ continue;
@end group
@group
/* Process numbers. */
kept to the bare minimum. The only requirement is that it call
@code{yyparse} to start the process of parsing.
+@comment file: rpcalc.y
@example
@group
int
@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
here is the definition we will use:
+@comment file: rpcalc.y
@example
@group
#include <stdio.h>
+@end group
+@group
/* Called by yyparse on error. */
void
yyerror (char const *s)
@example
$ @kbd{rpcalc}
@kbd{4 9 +}
-13
+@result{} 13
@kbd{3 7 + 3 4 5 *+-}
--13
+@result{} -13
@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
-13
+@result{} 13
@kbd{5 6 / 4 n +}
--3.166666667
+@result{} -3.166666667
@kbd{3 4 ^} @r{Exponentiation}
-81
+@result{} 81
@kbd{^D} @r{End-of-file indicator}
$
@end example
@example
/* Infix notation calculator. */
+@group
%@{
#define YYSTYPE double
#include <math.h>
int yylex (void);
void yyerror (char const *);
%@}
+@end group
+@group
/* Bison declarations. */
%token NUM
%left '-' '+'
%left '*' '/'
%precedence NEG /* negation--unary minus */
%right '^' /* exponentiation */
+@end group
%% /* The grammar follows. */
+@group
input: /* empty */
| input line
;
+@end group
+@group
line: '\n'
| exp '\n' @{ printf ("\t%.10g\n", $1); @}
;
+@end group
-exp: NUM @{ $$ = $1; @}
- | exp '+' exp @{ $$ = $1 + $3; @}
- | exp '-' exp @{ $$ = $1 - $3; @}
- | exp '*' exp @{ $$ = $1 * $3; @}
- | exp '/' exp @{ $$ = $1 / $3; @}
- | '-' exp %prec NEG @{ $$ = -$2; @}
+@group
+exp: NUM @{ $$ = $1; @}
+ | exp '+' exp @{ $$ = $1 + $3; @}
+ | exp '-' exp @{ $$ = $1 - $3; @}
+ | exp '*' exp @{ $$ = $1 * $3; @}
+ | exp '/' exp @{ $$ = $1 / $3; @}
+ | '-' exp %prec NEG @{ $$ = -$2; @}
| exp '^' exp @{ $$ = pow ($1, $3); @}
- | '(' exp ')' @{ $$ = $2; @}
+ | '(' exp ')' @{ $$ = $2; @}
;
+@end group
%%
@end example
if (c == EOF)
return 0;
+@group
/* Return a single char, and update location. */
if (c == '\n')
@{
++yylloc.last_column;
return c;
@}
+@end group
@end example
Basically, the lexical analyzer performs the same processing as before:
Here is a sample session with the multi-function calculator:
@example
+@group
$ @kbd{mfcalc}
@kbd{pi = 3.141592653589}
-3.1415926536
+@result{} 3.1415926536
+@end group
+@group
@kbd{sin(pi)}
-0.0000000000
+@result{} 0.0000000000
+@end group
@kbd{alpha = beta1 = 2.3}
-2.3000000000
+@result{} 2.3000000000
@kbd{alpha}
-2.3000000000
+@result{} 2.3000000000
@kbd{ln(alpha)}
-0.8329091229
+@result{} 0.8329091229
@kbd{exp(ln(beta1))}
-2.3000000000
+@result{} 2.3000000000
$
@end example
* Mfcalc Declarations:: Bison declarations for multi-function calculator.
* Mfcalc Rules:: Grammar rules for the calculator.
* Mfcalc Symbol Table:: Symbol table management subroutines.
+* Mfcalc Lexer:: The lexical analyzer.
+* Mfcalc Main:: The controlling function.
@end menu
@node Mfcalc Declarations
Here are the C and Bison declarations for the multi-function calculator.
+@comment file: mfcalc.y
@smallexample
@group
%@{
- #include <math.h> /* For math functions, cos(), sin(), etc. */
- #include "calc.h" /* Contains definition of `symrec'. */
+ #include <stdio.h> /* For printf, etc. */
+ #include <math.h> /* For pow, used in the grammar. */
+ #include "calc.h" /* Contains definition of `symrec'. */
int yylex (void);
void yyerror (char const *);
%@}
Most of them are copied directly from @code{calc}; three rules,
those which mention @code{VAR} or @code{FNCT}, are new.
+@comment file: mfcalc.y
@smallexample
@group
input: /* empty */
@group
line:
'\n'
- | exp '\n' @{ printf ("\t%.10g\n", $1); @}
- | error '\n' @{ yyerrok; @}
+ | exp '\n' @{ printf ("%.10g\n", $1); @}
+ | error '\n' @{ yyerrok; @}
;
@end group
definition, which is kept in the header @file{calc.h}, is as follows. It
provides for either functions or variables to be placed in the table.
+@comment file: calc.h
@smallexample
@group
/* Function type. */
@end group
@end smallexample
-The new version of @code{main} includes a call to @code{init_table}, a
-function that initializes the symbol table. Here it is, and
-@code{init_table} as well:
+The new version of @code{main} will call @code{init_table} to initialize
+the symbol table:
+@comment file: mfcalc.y
@smallexample
-#include <stdio.h>
-
-@group
-/* Called by yyparse on error. */
-void
-yyerror (char const *s)
-@{
- printf ("%s\n", s);
-@}
-@end group
-
@group
struct init
@{
@group
struct init const arith_fncts[] =
@{
- "sin", sin,
- "cos", cos,
- "atan", atan,
- "ln", log,
- "exp", exp,
- "sqrt", sqrt,
- 0, 0
+ @{ "atan", atan @},
+ @{ "cos", cos @},
+ @{ "exp", exp @},
+ @{ "ln", log @},
+ @{ "sin", sin @},
+ @{ "sqrt", sqrt @},
+ @{ 0, 0 @},
@};
@end group
@group
/* Put arithmetic functions in table. */
+static
void
init_table (void)
@{
int i;
- symrec *ptr;
for (i = 0; arith_fncts[i].fname != 0; i++)
@{
- ptr = putsym (arith_fncts[i].fname, FNCT);
+ symrec *ptr = putsym (arith_fncts[i].fname, FNCT);
ptr->value.fnctptr = arith_fncts[i].fnct;
@}
@}
@end group
-
-@group
-int
-main (void)
-@{
- init_table ();
- return yyparse ();
-@}
-@end group
@end smallexample
By simply editing the initialization list and adding the necessary include
The function @code{getsym} is passed the name of the symbol to look up. If
found, a pointer to that symbol is returned; otherwise zero is returned.
+@comment file: mfcalc.y
@smallexample
+#include <stdlib.h> /* malloc. */
+#include <string.h> /* strlen. */
+
+@group
symrec *
putsym (char const *sym_name, int sym_type)
@{
- symrec *ptr;
- ptr = (symrec *) malloc (sizeof (symrec));
+ symrec *ptr = (symrec *) malloc (sizeof (symrec));
ptr->name = (char *) malloc (strlen (sym_name) + 1);
strcpy (ptr->name,sym_name);
ptr->type = sym_type;
sym_table = ptr;
return ptr;
@}
+@end group
+@group
symrec *
getsym (char const *sym_name)
@{
symrec *ptr;
for (ptr = sym_table; ptr != (symrec *) 0;
ptr = (symrec *)ptr->next)
- if (strcmp (ptr->name,sym_name) == 0)
+ if (strcmp (ptr->name, sym_name) == 0)
return ptr;
return 0;
@}
+@end group
@end smallexample
+@node Mfcalc Lexer
+@subsection The @code{mfcalc} Lexer
+
The function @code{yylex} must now recognize variables, numeric values, and
the single-character arithmetic operators. Strings of alphanumeric
characters with a leading letter are recognized as either variables or
No change is needed in the handling of numeric values and arithmetic
operators in @code{yylex}.
+@comment file: mfcalc.y
@smallexample
@group
#include <ctype.h>
int c;
/* Ignore white space, get first nonwhite character. */
- while ((c = getchar ()) == ' ' || c == '\t');
+ while ((c = getchar ()) == ' ' || c == '\t')
+ continue;
if (c == EOF)
return 0;
/* Char starts an identifier => read the name. */
if (isalpha (c))
@{
- symrec *s;
+ /* Initially make the buffer long enough
+ for a 40-character symbol name. */
+ static size_t length = 40;
static char *symbuf = 0;
- static int length = 0;
+ symrec *s;
int i;
@end group
-
-@group
- /* Initially make the buffer long enough
- for a 40-character symbol name. */
- if (length == 0)
- length = 40, symbuf = (char *)malloc (length + 1);
+ if (!symbuf)
+ symbuf = (char *) malloc (length + 1);
i = 0;
do
-@end group
@group
@{
/* If buffer is full, make it bigger. */
@end group
@end smallexample
+@node Mfcalc Main
+@subsection The @code{mfcalc} Main
+
+The error reporting function is unchanged, and the new version of
+@code{main} includes a call to @code{init_table}:
+
+@comment file: mfcalc.y
+@smallexample
+@group
+/* Called by yyparse on error. */
+void
+yyerror (char const *s)
+@{
+ fprintf (stderr, "%s\n", s);
+@}
+@end group
+
+@group
+int
+main (int argc, char const* argv[])
+@{
+ init_table ();
+ return yyparse ();
+@}
+@end group
+@end smallexample
+
This program is both powerful and flexible. You may easily add new
functions, and it is a simple job to modify this code to install
predefined variables such as @code{pi} or @code{e} as well.
@xref{Invocation, ,Invoking Bison}.
@menu
-* Grammar Outline:: Overall layout of the grammar file.
-* Symbols:: Terminal and nonterminal symbols.
-* Rules:: How to write grammar rules.
-* Recursion:: Writing recursive rules.
-* Semantics:: Semantic values and actions.
-* Locations:: Locations and actions.
-* Declarations:: All kinds of Bison declarations are described here.
-* Multiple Parsers:: Putting more than one Bison parser in one program.
+* Grammar Outline:: Overall layout of the grammar file.
+* Symbols:: Terminal and nonterminal symbols.
+* Rules:: How to write grammar rules.
+* Recursion:: Writing recursive rules.
+* Semantics:: Semantic values and actions.
+* Tracking Locations:: Locations and actions.
+* Named References:: Using named references in actions.
+* Declarations:: All kinds of Bison declarations are described here.
+* Multiple Parsers:: Putting more than one Bison parser in one program.
@end menu
@node Grammar Outline
Thus, they belong in one or more @code{%code requires}:
@smallexample
+@group
%code top @{
#define _GNU_SOURCE
#include <stdio.h>
@}
+@end group
+@group
%code requires @{
#include "ptypes.h"
@}
+@end group
+@group
%union @{
long int n;
tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
@}
+@end group
+@group
%code requires @{
#define YYLTYPE YYLTYPE
typedef struct YYLTYPE
char *filename;
@} YYLTYPE;
@}
+@end group
+@group
%code @{
static void print_token_value (FILE *, int, YYSTYPE);
#define YYPRINT(F, N, L) print_token_value (F, N, L)
static void trace_token (enum yytokentype token, YYLTYPE loc);
@}
+@end group
@dots{}
@end smallexample
@code{%code} to a @code{%code provides}:
@smallexample
+@group
%code top @{
#define _GNU_SOURCE
#include <stdio.h>
@}
+@end group
+@group
%code requires @{
#include "ptypes.h"
@}
+@end group
+@group
%union @{
long int n;
tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
@}
+@end group
+@group
%code requires @{
#define YYLTYPE YYLTYPE
typedef struct YYLTYPE
char *filename;
@} YYLTYPE;
@}
+@end group
+@group
%code provides @{
void trace_token (enum yytokentype token, YYLTYPE loc);
@}
+@end group
+@group
%code @{
static void print_token_value (FILE *, int, YYSTYPE);
#define YYPRINT(F, N, L) print_token_value (F, N, L)
@}
+@end group
@dots{}
@end smallexample
type:
@smallexample
+@group
%code requires @{ #include "type1.h" @}
%union @{ type1 field1; @}
%destructor @{ type1_free ($$); @} <field1>
%printer @{ type1_print ($$); @} <field1>
+@end group
+@group
%code requires @{ #include "type2.h" @}
%union @{ type2 field2; @}
%destructor @{ type2_free ($$); @} <field2>
%printer @{ type2_print ($$); @} <field2>
+@end group
@end smallexample
@noindent
* Mid-Rule Actions:: Most actions go at the end of a rule.
This says when, why and how to use the exceptional
action in the middle of a rule.
-* Named References:: Using named references in actions.
@end menu
@node Value Type
useful semantic value associated with the @samp{+} token, it could be
referred to as @code{$2}.
-@xref{Named References,,Using Named References}, for more information
-about using the named references construct.
+@xref{Named References}, for more information about using the named
+references construct.
Note that the vertical-bar character @samp{|} is really a rule
separator, and actions are attached to a single rule. This is a
Now Bison can execute the action in the rule for @code{subroutine} without
deciding which rule for @code{compound} it will eventually use.
-@node Named References
-@subsection Using Named References
-@cindex named references
-
-While every semantic value can be accessed with positional references
-@code{$@var{n}} and @code{$$}, it's often much more convenient to refer to
-them by name. First of all, original symbol names may be used as named
-references. For example:
-
-@example
-@group
-invocation: op '(' args ')'
- @{ $invocation = new_invocation ($op, $args, @@invocation); @}
-@end group
-@end example
-
-@noindent
-The positional @code{$$}, @code{@@$}, @code{$n}, and @code{@@n} can be
-mixed with @code{$name} and @code{@@name} arbitrarily. For example:
-
-@example
-@group
-invocation: op '(' args ')'
- @{ $$ = new_invocation ($op, $args, @@$); @}
-@end group
-@end example
-
-@noindent
-However, sometimes regular symbol names are not sufficient due to
-ambiguities:
-
-@example
-@group
-exp: exp '/' exp
- @{ $exp = $exp / $exp; @} // $exp is ambiguous.
-
-exp: exp '/' exp
- @{ $$ = $1 / $exp; @} // One usage is ambiguous.
-
-exp: exp '/' exp
- @{ $$ = $1 / $3; @} // No error.
-@end group
-@end example
-
-@noindent
-When ambiguity occurs, explicitly declared names may be used for values and
-locations. Explicit names are declared as a bracketed name after a symbol
-appearance in rule definitions. For example:
-@example
-@group
-exp[result]: exp[left] '/' exp[right]
- @{ $result = $left / $right; @}
-@end group
-@end example
-
-@noindent
-Explicit names may be declared for RHS and for LHS symbols as well. In order
-to access a semantic value generated by a mid-rule action, an explicit name
-may also be declared by putting a bracketed name after the closing brace of
-the mid-rule action code:
-@example
-@group
-exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
- @{ $res = $left + $right; @}
-@end group
-@end example
-
-@noindent
-
-In references, in order to specify names containing dots and dashes, an explicit
-bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
-@example
-@group
-if-stmt: IF '(' expr ')' THEN then.stmt ';'
- @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
-@end group
-@end example
-
-It often happens that named references are followed by a dot, dash or other
-C punctuation marks and operators. By default, Bison will read
-@code{$name.suffix} as a reference to symbol value @code{$name} followed by
-@samp{.suffix}, i.e., an access to the @samp{suffix} field of the semantic
-value. In order to force Bison to recognize @code{name.suffix} in its entirety
-as the name of a semantic value, bracketed syntax @code{$[name.suffix]}
-must be used.
-
-
-@node Locations
+@node Tracking Locations
@section Tracking Locations
@cindex location
@cindex textual location
In addition, the named references construct @code{@@@var{name}} and
@code{@@[@var{name}]} may also be used to address the symbol locations.
-@xref{Named References,,Using Named References}, for more information
-about using the named references construct.
+@xref{Named References}, for more information about using the named
+references construct.
Here is a basic example using the default data type for locations:
@end group
@end smallexample
+@noindent
where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol
in @var{rhs} when @var{k} is positive, and the location of the symbol
just before the reduction when @var{k} and @var{n} are both zero.
statement when it is followed by a semicolon.
@end itemize
+@node Named References
+@section Named References
+@cindex named references
+
+As described in the preceding sections, the traditional way to refer to any
+semantic value or location is a @dfn{positional reference}, which takes the
+form @code{$@var{n}}, @code{$$}, @code{@@@var{n}}, and @code{@@$}. However,
+such a reference is not very descriptive. Moreover, if you later decide to
+insert or remove symbols in the right-hand side of a grammar rule, the need
+to renumber such references can be tedious and error-prone.
+
+To avoid these issues, you can also refer to a semantic value or location
+using a @dfn{named reference}. First of all, original symbol names may be
+used as named references. For example:
+
+@example
+@group
+invocation: op '(' args ')'
+ @{ $invocation = new_invocation ($op, $args, @@invocation); @}
+@end group
+@end example
+
+@noindent
+Positional and named references can be mixed arbitrarily. For example:
+
+@example
+@group
+invocation: op '(' args ')'
+ @{ $$ = new_invocation ($op, $args, @@$); @}
+@end group
+@end example
+
+@noindent
+However, sometimes regular symbol names are not sufficient due to
+ambiguities:
+
+@example
+@group
+exp: exp '/' exp
+ @{ $exp = $exp / $exp; @} // $exp is ambiguous.
+
+exp: exp '/' exp
+ @{ $$ = $1 / $exp; @} // One usage is ambiguous.
+
+exp: exp '/' exp
+ @{ $$ = $1 / $3; @} // No error.
+@end group
+@end example
+
+@noindent
+When ambiguity occurs, explicitly declared names may be used for values and
+locations. Explicit names are declared as a bracketed name after a symbol
+appearance in rule definitions. For example:
+@example
+@group
+exp[result]: exp[left] '/' exp[right]
+ @{ $result = $left / $right; @}
+@end group
+@end example
+
+@noindent
+In order to access a semantic value generated by a mid-rule action, an
+explicit name may also be declared by putting a bracketed name after the
+closing brace of the mid-rule action code:
+@example
+@group
+exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
+ @{ $res = $left + $right; @}
+@end group
+@end example
+
+@noindent
+
+In references, in order to specify names containing dots and dashes, an explicit
+bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
+@example
+@group
+if-stmt: "if" '(' expr ')' "then" then.stmt ';'
+ @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
+@end group
+@end example
+
+It often happens that named references are followed by a dot, dash or other
+C punctuation marks and operators. By default, Bison will read
+@samp{$name.suffix} as a reference to symbol value @code{$name} followed by
+@samp{.suffix}, i.e., an access to the @code{suffix} field of the semantic
+value. In order to force Bison to recognize @samp{name.suffix} in its
+entirety as the name of a semantic value, the bracketed syntax
+@samp{$[name.suffix]} must be used.
+
+The named references feature is experimental. More user feedback will help
+to stabilize it.
+
@node Declarations
@section Bison Declarations
@cindex declarations, Bison
@cindex mid-rule actions
Finally, Bison will never invoke a @code{%destructor} for an unreferenced
mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}).
-That is, Bison does not consider a mid-rule to have a semantic value if you do
-not reference @code{$$} in the mid-rule's action or @code{$@var{n}} (where
-@var{n} is the RHS symbol position of the mid-rule) in any later action in that
-rule.
-However, if you do reference either, the Bison-generated parser will invoke the
-@code{<>} @code{%destructor} whenever it discards the mid-rule symbol.
+That is, Bison does not consider a mid-rule to have a semantic value if you
+do not reference @code{$$} in the mid-rule's action or @code{$@var{n}}
+(where @var{n} is the right-hand side symbol position of the mid-rule) in
+any later action in that rule. However, if you do reference either, the
+Bison-generated parser will invoke the @code{<>} @code{%destructor} whenever
+it discards the mid-rule symbol.
@ignore
@noindent
(Reentrant) Parser}.
If you have also used locations, the parser header file declares
-@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of
-the @code{YYSTYPE} macro and @code{yylval}. @xref{Locations,
-,Tracking Locations}.
+@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of the
+@code{YYSTYPE} macro and @code{yylval}. @xref{Tracking Locations}.
This parser header file is normally essential if you wish to put the
definition of @code{yylex} in a separate source file, because
@subsection Textual Locations of Tokens
@vindex yylloc
-If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
-Tracking Locations}) in actions to keep track of the textual locations
-of tokens and groupings, then you must provide this information in
-@code{yylex}. The function @code{yyparse} expects to find the textual
-location of a token just parsed in the global variable @code{yylloc}.
-So @code{yylex} must store the proper data in that variable.
+If you are using the @samp{@@@var{n}}-feature (@pxref{Tracking Locations})
+in actions to keep track of the textual locations of tokens and groupings,
+then you must provide this information in @code{yylex}. The function
+@code{yyparse} expects to find the textual location of a token just parsed
+in the global variable @code{yylloc}. So @code{yylex} must store the proper
+data in that variable.
By default, the value of @code{yylloc} is a structure and you need only
initialize the members that are going to be used by the actions. The
@deffn {Value} @@$
@findex @@$
-Acts like a structure variable containing information on the textual location
-of the grouping made by the current rule. @xref{Locations, ,
-Tracking Locations}.
+Acts like a structure variable containing information on the textual
+location of the grouping made by the current rule. @xref{Tracking
+Locations}.
@c Check if those paragraphs are still useful or not.
@deffn {Value} @@@var{n}
@findex @@@var{n}
-Acts like a structure variable containing information on the textual location
-of the @var{n}th component of the current rule. @xref{Locations, ,
-Tracking Locations}.
+Acts like a structure variable containing information on the textual
+location of the @var{n}th component of the current rule. @xref{Tracking
+Locations}.
@end deffn
@node Internationalization
of zero or more @code{word} groupings.
@example
+@group
sequence: /* empty */
@{ printf ("empty sequence\n"); @}
| maybeword
| sequence word
@{ printf ("added word %s\n", $2); @}
;
+@end group
+@group
maybeword: /* empty */
@{ printf ("empty maybeword\n"); @}
| word
@{ printf ("single word %s\n", $1); @}
;
+@end group
@end example
@noindent
from being empty:
@example
+@group
sequence: /* empty */
| sequence words
| sequence redirects
;
+@end group
+@group
words: word
| words word
;
+@end group
+@group
redirects:redirect
| redirects redirect
;
+@end group
@end example
@node Mysterious Conflicts
Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}.
You can generate a deterministic parser containing C++ user code from
-the default (C) skeleton, as well as from the C++ skeleton
+the default (C) skeleton, as well as from the C++ skeleton
(@pxref{C++ Parsers}). However, if you do use the default skeleton
and want to allow the parsing stack to grow,
be careful not to use semantic types or location types that require
@example
typedef int foo, bar;
int baz (void)
+@group
@{
static bar (bar); /* @r{redeclare @code{bar} as static variable} */
extern foo foo (foo); /* @r{redeclare @code{foo} as function} */
return foo (bar);
@}
+@end group
@end example
Unfortunately, the name being declared is separated from the declaration
duplication, with actions omitted for brevity:
@example
+@group
initdcl:
declarator maybeasm '='
init
| declarator maybeasm
;
+@end group
+@group
notype_initdcl:
notype_declarator maybeasm '='
init
| notype_declarator maybeasm
;
+@end group
@end example
@noindent
and reports the uses of the symbols:
@example
+@group
Terminals, with rules where they appear
$end (0) 0
'/' (47) 4
error (256)
NUM (258) 5
+@end group
+@group
Nonterminals, with rules where they appear
$accept (8)
on left: 0
exp (9)
on left: 1 2 3 4 5, on right: 0 1 2 3 4
+@end group
@end example
@noindent
@cindex pointed rule
@cindex rule, pointed
Bison then proceeds onto the automaton itself, describing each state
-with it set of @dfn{items}, also known as @dfn{pointed rules}. Each
-item is a production rule together with a point (marked by @samp{.})
-that the input cursor.
+with its set of @dfn{items}, also known as @dfn{pointed rules}. Each
+item is a production rule together with a point (@samp{.}) marking
+the location of the input cursor.
@example
state 0
symbol (here, @code{exp}). When the parser returns to this state right
after having reduced a rule that produced an @code{exp}, the control
flow jumps to state 2. If there is no such transition on a nonterminal
-symbol, and the lookahead is a @code{NUM}, then this token is shifted on
+symbol, and the lookahead is a @code{NUM}, then this token is shifted onto
the parse stack, and the control flow jumps to state 1. Any other
lookahead triggers a syntax error.''
at the beginning of any rule deriving an @code{exp}. By default Bison
reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
you want to see more detail you can invoke @command{bison} with
-@option{--report=itemset} to list all the items, include those that can
-be derived:
+@option{--report=itemset} to list the derived items as well:
@example
state 0
@noindent
In state 2, the automaton can only shift a symbol. For instance,
-because of the item @samp{exp -> exp . '+' exp}, if the lookahead if
-@samp{+}, it will be shifted on the parse stack, and the automaton
-control will jump to state 4, corresponding to the item @samp{exp -> exp
-'+' . exp}. Since there is no default action, any other token than
-those listed above will trigger a syntax error.
+because of the item @samp{exp -> exp . '+' exp}, if the lookahead is
+@samp{+} it is shifted onto the parse stack, and the automaton
+jumps to state 4, corresponding to the item @samp{exp -> exp '+' . exp}.
+Since there is no default action, any lookahead not listed triggers a syntax
+error.
@cindex accepting state
The state 3 is named the @dfn{final state}, or the @dfn{accepting
The remaining states are similar:
@example
+@group
state 9
exp -> exp . '+' exp (rule 1)
'/' [reduce using rule 2 (exp)]
$default reduce using rule 2 (exp)
+@end group
+@group
state 10
exp -> exp . '+' exp (rule 1)
'/' [reduce using rule 3 (exp)]
$default reduce using rule 3 (exp)
+@end group
+@group
state 11
exp -> exp . '+' exp (rule 1)
'*' [reduce using rule 4 (exp)]
'/' [reduce using rule 4 (exp)]
$default reduce using rule 4 (exp)
+@end group
@end example
@noindent
@item yacc
Incompatibilities with POSIX Yacc.
+@item conflicts-sr
+@itemx conflicts-rr
+S/R and R/R conflicts. These warnings are enabled by default. However, if
+the @code{%expect} or @code{%expect-rr} directive is specified, an
+unexpected number of conflicts is an error, and an expected number of
+conflicts is not reported, so @option{-W} and @option{--warning} then have
+no effect on the conflict report.
+
@item other
All warnings not categorized above. These warnings are enabled by default.
@c - %define filename_type "const symbol::Symbol"
When the directive @code{%locations} is used, the C++ parser supports
-location tracking, see @ref{Locations, , Locations Overview}. Two
-auxiliary classes define a @code{position}, a single point in a file,
-and a @code{location}, a range composed of a pair of
-@code{position}s (possibly spanning several files).
+location tracking, see @ref{Tracking Locations}. Two auxiliary classes
+define a @code{position}, a single point in a file, and a @code{location}, a
+range composed of a pair of @code{position}s (possibly spanning several
+files).
@deftypemethod {position} {std::string*} file
The name of the file. It will always be handled as a pointer, the
@end defcv
@defcv {Type} {parser} {token}
-A structure that contains (only) the definition of the tokens as the
-@code{yytokentype} enumeration. To refer to the token @code{FOO}, the
-scanner should use @code{yy::parser::token::FOO}. The scanner can use
+A structure that contains (only) the @code{yytokentype} enumeration, which
+defines the tokens. To refer to the token @code{FOO},
+use @code{yy::parser::token::FOO}. The scanner can use
@samp{typedef yy::parser::token token;} to ``import'' the token enumeration
(@pxref{Calc++ Scanner}).
@end defcv
@defcv {Type} {parser} {syntax_error}
This class derives from @code{std::runtime_error}. Throw instances of it
-from user actions to raise parse errors. This is equivalent with first
+from the scanner or from the user actions to raise parse errors. This is
+equivalent with first
invoking @code{error} to report the location and message of the syntax
error, and then to invoke @code{YYERROR} to enter the error-recovery mode.
But contrary to @code{YYERROR} which can only be invoked from user actions
@comment file: calc++-scanner.ll
@example
+@group
%@{
// Code run each time a pattern is matched.
# define YY_USER_ACTION loc.columns (yyleng);
%@}
+@end group
%%
+@group
%@{
// Code run each time yylex is called.
loc.step ();
%@}
+@end group
@{blank@}+ loc.step ();
[\n]+ loc.lines (yyleng); loc.step ();
@end example
")" return yy::calcxx_parser::make_RPAREN(loc);
":=" return yy::calcxx_parser::make_ASSIGN(loc);
+@group
@{int@} @{
errno = 0;
long n = strtol (yytext, NULL, 10);
driver.error (loc, "integer is out of range");
return yy::calcxx_parser::make_NUMBER(n, loc);
@}
+@end group
@{id@} return yy::calcxx_parser::make_IDENTIFIER(yytext, loc);
. driver.error (loc, "invalid character");
<<EOF>> return yy::calcxx_parser::make_END(loc);
@comment file: calc++-scanner.ll
@example
+@group
void
calcxx_driver::scan_begin ()
@{
yyin = stdin;
else if (!(yyin = fopen (file.c_str (), "r")))
@{
- error (std::string ("cannot open ") + file + ": " + strerror(errno));
- exit (1);
+ error ("cannot open " + file + ": " + strerror(errno));
+ exit (EXIT_FAILURE);
@}
@}
+@end group
+@group
void
calcxx_driver::scan_end ()
@{
fclose (yyin);
@}
+@end group
@end example
@node Calc++ Top Level
#include <iostream>
#include "calc++-driver.hh"
+@group
int
main (int argc, char *argv[])
@{
res = 1;
return res;
@}
+@end group
@end example
@node Java Parsers
@c - class Position
@c - class Location
-When the directive @code{%locations} is used, the Java parser
-supports location tracking, see @ref{Locations, , Locations Overview}.
-An auxiliary user-defined class defines a @dfn{position}, a single point
-in a file; Bison itself defines a class representing a @dfn{location},
-a range composed of a pair of positions (possibly spanning several
-files). The location class is an inner class of the parser; the name
-is @code{Location} by default, and may also be renamed using
-@samp{%define location_type "@var{class-name}"}.
+When the directive @code{%locations} is used, the Java parser supports
+location tracking, see @ref{Tracking Locations}. An auxiliary user-defined
+class defines a @dfn{position}, a single point in a file; Bison itself
+defines a class representing a @dfn{location}, a range composed of a pair of
+positions (possibly spanning several files). The location class is an inner
+class of the parser; the name is @code{Location} by default, and may also be
+renamed using @samp{%define location_type "@var{class-name}"}.
The location class treats the position as a completely opaque value.
By default, the class name is @code{Position}, but this can be changed
@node Memory Exhausted
@section Memory Exhausted
-@display
+@quotation
My parser returns with error with a @samp{memory exhausted}
message. What can I do?
-@end display
+@end quotation
This question is already addressed elsewhere, @xref{Recursion,
,Recursive Rules}.
The following phenomenon has several symptoms, resulting in the
following typical questions:
-@display
+@quotation
I invoke @code{yyparse} several times, and on correct input it works
properly; but when a parse error is found, all the other calls fail
too. How can I reset the error flag of @code{yyparse}?
-@end display
+@end quotation
@noindent
or
-@display
+@quotation
My parser includes support for an @samp{#include}-like feature, in
which case I run @code{yyparse} from @code{yyparse}. This fails
although I did specify @samp{%define api.pure}.
-@end display
+@end quotation
These problems typically come not from Bison itself, but from
Lex-generated scanners. Because these scanners use large buffers for
demonstration, consider the following source file,
@file{first-line.l}:
-@verbatim
-%{
+@example
+@group
+%@{
#include <stdio.h>
#include <stdlib.h>
-%}
+%@}
+@end group
%%
.*\n ECHO; return 1;
%%
+@group
int
yyparse (char const *file)
-{
+@{
yyin = fopen (file, "r");
if (!yyin)
- exit (2);
+ @{
+ perror ("fopen");
+ exit (EXIT_FAILURE);
+ @}
+@end group
+@group
/* One token only. */
yylex ();
if (fclose (yyin) != 0)
- exit (3);
+ @{
+ perror ("fclose");
+ exit (EXIT_FAILURE);
+ @}
return 0;
-}
+@}
+@end group
+@group
int
main (void)
-{
+@{
yyparse ("input");
yyparse ("input");
return 0;
-}
-@end verbatim
+@}
+@end group
+@end example
@noindent
If the file @file{input} contains
-@verbatim
+@example
input:1: Hello,
input:2: World!
-@end verbatim
+@end example
@noindent
then instead of getting the first line twice, you get:
@node Strings are Destroyed
@section Strings are Destroyed
-@display
+@quotation
My parser seems to destroy old strings, or maybe it loses track of
them. Instead of reporting @samp{"foo", "bar"}, it reports
@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
-@end display
+@end quotation
This error is probably the single most frequent ``bug report'' sent to
Bison lists, but is only concerned with a misunderstanding of the role
of the scanner. Consider the following Lex code:
-@verbatim
-%{
+@example
+@group
+%@{
#include <stdio.h>
char *yylval = NULL;
-%}
+%@}
+@end group
+@group
%%
.* yylval = yytext; return 1;
\n /* IGNORE */
%%
+@end group
+@group
int
main ()
-{
+@{
/* Similar to using $1, $2 in a Bison action. */
char *fst = (yylex (), yylval);
char *snd = (yylex (), yylval);
printf ("\"%s\", \"%s\"\n", fst, snd);
return 0;
-}
-@end verbatim
+@}
+@end group
+@end example
If you compile and run this code, you get:
@node Implementing Gotos/Loops
@section Implementing Gotos/Loops
-@display
+@quotation
My simple calculator supports variables, assignments, and functions,
but how can I implement gotos, or loops?
-@end display
+@end quotation
Although very pedagogical, the examples included in the document blur
the distinction to make between the parser---whose job is to recover
@node Multiple start-symbols
@section Multiple start-symbols
-@display
+@quotation
I have several closely related grammars, and I would like to share their
implementations. In fact, I could use a single grammar but with
multiple entry points.
-@end display
+@end quotation
Bison does not support multiple start-symbols, but there is a very
simple means to simulate them. If @code{foo} and @code{bar} are the two
@node Secure? Conform?
@section Secure? Conform?
-@display
+@quotation
Is Bison secure? Does it conform to POSIX?
-@end display
+@end quotation
If you're looking for a guarantee or certification, we don't provide it.
However, Bison is intended to be a reliable program that conforms to the
@node I can't build Bison
@section I can't build Bison
-@display
+@quotation
I can't build Bison because @command{make} complains that
@code{msgfmt} is not found.
What should I do?
-@end display
+@end quotation
Like most GNU packages with internationalization support, that feature
is turned on by default. If you have problems building in the @file{po}
@node Where can I find help?
@section Where can I find help?
-@display
+@quotation
I'm having trouble using Bison. Where can I find help?
-@end display
+@end quotation
First, read this fine manual. Beyond that, you can send mail to
@email{help-bison@@gnu.org}. This mailing list is intended to be
@node Bug Reports
@section Bug Reports
-@display
+@quotation
I found a bug. What should I include in the bug report?
-@end display
+@end quotation
Before you send a bug report, make sure you are using the latest
version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its
send additional files as well (such as `config.h' or `config.cache').
Patches are most welcome, but not required. That is, do not hesitate to
-send a bug report just because you can not provide a fix.
+send a bug report just because you cannot provide a fix.
Send bug reports to @email{bug-bison@@gnu.org}.
@node More Languages
@section More Languages
-@display
+@quotation
Will Bison ever have C++ and Java support? How about @var{insert your
favorite language here}?
-@end display
+@end quotation
C++ and Java support is there now, and is documented. We'd love to add other
languages; contributions are welcome.
@node Beta Testing
@section Beta Testing
-@display
+@quotation
What is involved in being a beta tester?
-@end display
+@end quotation
It's not terribly involved. Basically, you would download a test
release, compile it, and use it to build and run a parser or two. After
@node Mailing Lists
@section Mailing Lists
-@display
+@quotation
How do I join the help-bison and bug-bison mailing lists?
-@end display
+@end quotation
See @url{http://lists.gnu.org/}.
@deffn {Variable} @@$
In an action, the location of the left-hand side of the rule.
-@xref{Locations, , Locations Overview}.
+@xref{Tracking Locations}.
@end deffn
@deffn {Variable} @@@var{n}
-In an action, the location of the @var{n}-th symbol of the right-hand
-side of the rule. @xref{Locations, , Locations Overview}.
+In an action, the location of the @var{n}-th symbol of the right-hand side
+of the rule. @xref{Tracking Locations}.
@end deffn
@deffn {Variable} @@@var{name}
-In an action, the location of a symbol addressed by name.
-@xref{Locations, , Locations Overview}.
+In an action, the location of a symbol addressed by name. @xref{Tracking
+Locations}.
@end deffn
@deffn {Variable} @@[@var{name}]
-In an action, the location of a symbol addressed by name.
-@xref{Locations, , Locations Overview}.
+In an action, the location of a symbol addressed by name. @xref{Tracking
+Locations}.
@end deffn
@deffn {Variable} $$