This manual (@value{UPDATED}) is for GNU Bison (version
@value{VERSION}), the GNU parser generator.
-Copyright @copyright{} 1988-1993, 1995, 1998-2012 Free Software
+Copyright @copyright{} 1988-1993, 1995, 1998-2013 Free Software
Foundation, Inc.
@quotation
* Grammar Outline:: Overall layout of the grammar file.
* Symbols:: Terminal and nonterminal symbols.
* Rules:: How to write grammar rules.
-* Recursion:: Writing recursive rules.
* Semantics:: Semantic values and actions.
* Tracking Locations:: Locations and actions.
* Named References:: Using named references in actions.
* Grammar Rules:: Syntax and usage of the grammar rules section.
* Epilogue:: Syntax and usage of the epilogue.
+Grammar Rules
+
+* Rules Syntax:: Syntax of the rules.
+* Empty Rules:: Symbols that can match the empty string.
+* Recursion:: Writing recursive rules.
+
+
Defining Language Semantics
* Value Type:: Specifying one data type for all semantic values.
This says when, why and how to use the exceptional
action in the middle of a rule.
+Actions in Mid-Rule
+
+* Using Mid-Rule Actions:: Putting an action in the middle of a rule.
+* Mid-Rule Action Translation:: How mid-rule actions are actually processed.
+* Mid-Rule Conflicts:: Mid-rule actions can cause conflicts.
+
Tracking Locations
* Location Type:: Specifying a data type for locations.
* Precedence Only:: How to specify precedence only.
* Precedence Examples:: How these features are used in the previous example.
* How Precedence:: How they work.
+* Non Operators:: Using precedence for general conflicts.
Tuning LR
* Understanding:: Understanding the structure of your parser.
* Graphviz:: Getting a visual representation of the parser.
+* Xml:: Getting a markup representation of the parser.
* Tracing:: Tracing the execution of your parser.
Tracing Your Parser
@end group
%%
-
-@group
type_decl: TYPE ID '=' type ';' ;
-@end group
@group
type:
%%
prog:
- /* Nothing. */
+ %empty
| prog stmt @{ printf ("\n"); @}
;
@example
/* Reverse polish notation calculator. */
+@group
%@{
#define YYSTYPE double
#include <stdio.h>
int yylex (void);
void yyerror (char const *);
%@}
+@end group
%token NUM
@example
@group
input:
- /* empty */
+ %empty
| input line
;
@end group
@example
input:
- /* empty */
+ %empty
| input line
;
@end example
colon and the first @samp{|}; this means that @code{input} can match an
empty string of input (no tokens). We write the rules this way because it
is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
-It's conventional to put an empty alternative first and write the comment
-@samp{/* empty */} in it.
+It's conventional to put an empty alternative first and to use the
+(optional) @code{%empty} directive, or to write the comment @samp{/* empty
+*/} in it (@pxref{Empty Rules}).
The second alternate rule (@code{input line}) handles all nontrivial input.
It means, ``After reading any number of lines, read one more line if
@comment file: rpcalc.y
@example
-@group
#include <stdio.h>
-@end group
@group
/* Called by yyparse on error. */
%% /* The grammar follows. */
@group
input:
- /* empty */
+ %empty
| input line
;
@end group
@example
@group
input:
- /* empty */
+ %empty
| input line
;
@end group
%@{
#include <stdio.h> /* For printf, etc. */
#include <math.h> /* For pow, used in the grammar. */
- #include "calc.h" /* Contains definition of `symrec'. */
+ #include "calc.h" /* Contains definition of 'symrec'. */
int yylex (void);
void yyerror (char const *);
%@}
%type <val> exp
@group
-%right '='
+%precedence '='
%left '-' '+'
%left '*' '/'
%precedence NEG /* negation--unary minus */
%% /* The grammar follows. */
@group
input:
- /* empty */
+ %empty
| input line
;
@end group
@group
typedef struct symrec symrec;
-/* The symbol table: a chain of `struct symrec'. */
+/* The symbol table: a chain of 'struct symrec'. */
extern symrec *sym_table;
symrec *putsym (char const *, int);
@end group
@group
-/* The symbol table: a chain of `struct symrec'. */
+/* The symbol table: a chain of 'struct symrec'. */
symrec *sym_table;
@end group
@comment file: mfcalc.y: 3
@example
-@group
#include <ctype.h>
-@end group
@group
int
* Grammar Outline:: Overall layout of the grammar file.
* Symbols:: Terminal and nonterminal symbols.
* Rules:: How to write grammar rules.
-* Recursion:: Writing recursive rules.
* Semantics:: Semantic values and actions.
* Tracking Locations:: Locations and actions.
* Named References:: Using named references in actions.
@node Grammar Outline
@section Outline of a Bison Grammar
+@cindex comment
+@findex // @dots{}
+@findex /* @dots{} */
A Bison grammar file has four main sections, shown here with the
appropriate delimiters:
@end example
Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
-As a GNU extension, @samp{//} introduces a comment that
-continues until end of line.
+As a GNU extension, @samp{//} introduces a comment that continues until end
+of line.
@menu
* Prologue:: Syntax and usage of the prologue.
@code{%union} declaration.
@example
+@group
%@{
#define _GNU_SOURCE
#include <stdio.h>
#include "ptypes.h"
%@}
+@end group
+@group
%union @{
long int n;
tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
@}
+@end group
+@group
%@{
static void print_token_value (FILE *, int, YYSTYPE);
#define YYPRINT(F, N, L) print_token_value (F, N, L)
%@}
+@end group
@dots{}
@end example
Look again at the example of the previous section:
@example
+@group
%@{
#define _GNU_SOURCE
#include <stdio.h>
#include "ptypes.h"
%@}
+@end group
+@group
%union @{
long int n;
tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
@}
+@end group
+@group
%@{
static void print_token_value (FILE *, int, YYSTYPE);
#define YYPRINT(F, N, L) print_token_value (F, N, L)
%@}
+@end group
@dots{}
@end example
#include <stdio.h>
/* WARNING: The following code really belongs
- * in a `%code requires'; see below. */
+ * in a '%code requires'; see below. */
#include "ptypes.h"
#define YYLTYPE YYLTYPE
@} YYLTYPE;
@}
+@group
%union @{
long int n;
tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
@}
+@end group
+@group
%code @{
static void print_token_value (FILE *, int, YYSTYPE);
#define YYPRINT(F, N, L) print_token_value (F, N, L)
static void trace_token (enum yytokentype token, YYLTYPE loc);
@}
+@end group
@dots{}
@end example
one of your tokens with a @code{%token} declaration.
@node Rules
-@section Syntax of Grammar Rules
+@section Grammar Rules
+
+A Bison grammar is a list of rules.
+
+@menu
+* Rules Syntax:: Syntax of the rules.
+* Empty Rules:: Symbols that can match the empty string.
+* Recursion:: Writing recursive rules.
+@end menu
+
+@node Rules Syntax
+@subsection Syntax of Grammar Rules
@cindex rule syntax
@cindex grammar rule syntax
@cindex syntax of grammar rules
A Bison grammar rule has the following general form:
@example
-@group
@var{result}: @var{components}@dots{};
-@end group
@end example
@noindent
For example,
@example
-@group
exp: exp '+' exp;
-@end group
@end example
@noindent
@noindent
They are still considered distinct rules even when joined in this way.
-If @var{components} in a rule is empty, it means that @var{result} can
-match the empty string. For example, here is how to define a
-comma-separated sequence of zero or more @code{exp} groupings:
+@node Empty Rules
+@subsection Empty Rules
+@cindex empty rule
+@cindex rule, empty
+@findex %empty
+
+A rule is said to be @dfn{empty} if its right-hand side (@var{components})
+is empty. It means that @var{result} can match the empty string. For
+example, here is how to define an optional semicolon:
+
+@example
+semicolon.opt: | ";";
+@end example
+
+@noindent
+It is easy not to see an empty rule, especially when @code{|} is used. The
+@code{%empty} directive allows to make explicit that a rule is empty on
+purpose:
@example
@group
-expseq:
- /* empty */
-| expseq1
+semicolon.opt:
+ %empty
+| ";"
;
@end group
+@end example
+
+Flagging a non-empty rule with @code{%empty} is an error. If run with
+@option{-Wempty-rule}, @command{bison} will report empty rules without
+@code{%empty}. Using @code{%empty} enables this warning, unless
+@option{-Wno-empty-rule} was specified.
+The @code{%empty} directive is a Bison extension, it does not work with
+Yacc. To remain compatible with POSIX Yacc, it is customary to write a
+comment @samp{/* empty */} in each rule with no components:
+
+@example
@group
-expseq1:
- exp
-| expseq1 ',' exp
+semicolon.opt:
+ /* empty */
+| ";"
;
@end group
@end example
-@noindent
-It is customary to write a comment @samp{/* empty */} in each rule
-with no components.
@node Recursion
-@section Recursive Rules
+@subsection Recursive Rules
@cindex recursive rule
+@cindex rule, recursive
A rule is called @dfn{recursive} when its @var{result} nonterminal
appears also on its right hand side. Nearly all Bison grammars need to
following example, the action is triggered only when @samp{b} is found:
@example
-@group
a-or-b: 'a'|'b' @{ a_or_b_found = 1; @};
-@end group
@end example
@cindex default action
@group
bar:
- /* empty */ @{ previous_expr = $0; @}
+ %empty @{ previous_expr = $0; @}
;
@end group
@end example
These actions are written just like usual end-of-rule actions, but they
are executed before the parser even recognizes the following components.
+@menu
+* Using Mid-Rule Actions:: Putting an action in the middle of a rule.
+* Mid-Rule Action Translation:: How mid-rule actions are actually processed.
+* Mid-Rule Conflicts:: Mid-rule actions can cause conflicts.
+@end menu
+
+@node Using Mid-Rule Actions
+@subsubsection Using Mid-Rule Actions
+
A mid-rule action may refer to the components preceding it using
@code{$@var{n}}, but it may not refer to subsequent components because
it is run before they are parsed.
@example
@group
stmt:
- LET '(' var ')'
- @{ $<context>$ = push_context (); declare_variable ($3); @}
+ "let" '(' var ')'
+ @{
+ $<context>$ = push_context ();
+ declare_variable ($3);
+ @}
stmt
- @{ $$ = $6; pop_context ($<context>5); @}
+ @{
+ $$ = $6;
+ pop_context ($<context>5);
+ @}
@end group
@end example
@code{context} in the data-type union. Then it calls
@code{declare_variable} to add the new variable to that list. Once the
first action is finished, the embedded statement @code{stmt} can be
-parsed. Note that the mid-rule action is component number 5, so the
-@samp{stmt} is component number 6.
+parsed.
+
+Note that the mid-rule action is component number 5, so the @samp{stmt} is
+component number 6. Named references can be used to improve the readability
+and maintainability (@pxref{Named References}):
+
+@example
+@group
+stmt:
+ "let" '(' var ')'
+ @{
+ $<context>let = push_context ();
+ declare_variable ($3);
+ @}[let]
+ stmt
+ @{
+ $$ = $6;
+ pop_context ($<context>let);
+ @}
+@end group
+@end example
After the embedded statement is parsed, its semantic value becomes the
value of the entire @code{let}-statement. Then the semantic value from the
@group
%type <context> let
%destructor @{ pop_context ($$); @} let
+@end group
%%
+@group
stmt:
let stmt
@{
$$ = $2;
- pop_context ($1);
+ pop_context ($let);
@};
+@end group
+@group
let:
- LET '(' var ')'
+ "let" '(' var ')'
@{
- $$ = push_context ();
+ $let = push_context ();
declare_variable ($3);
@};
Any mid-rule action can be converted to an end-of-rule action in this way, and
this is what Bison actually does to implement mid-rule actions.
+@node Mid-Rule Action Translation
+@subsubsection Mid-Rule Action Translation
+@vindex $@@@var{n}
+@vindex @@@var{n}
+
+As hinted earlier, mid-rule actions are actually transformed into regular
+rules and actions. The various reports generated by Bison (textual,
+graphical, etc., see @ref{Understanding, , Understanding Your Parser})
+reveal this translation, best explained by means of an example. The
+following rule:
+
+@example
+exp: @{ a(); @} "b" @{ c(); @} @{ d(); @} "e" @{ f(); @};
+@end example
+
+@noindent
+is translated into:
+
+@example
+$@@1: %empty @{ a(); @};
+$@@2: %empty @{ c(); @};
+$@@3: %empty @{ d(); @};
+exp: $@@1 "b" $@@2 $@@3 "e" @{ f(); @};
+@end example
+
+@noindent
+with new nonterminal symbols @code{$@@@var{n}}, where @var{n} is a number.
+
+A mid-rule action is expected to generate a value if it uses @code{$$}, or
+the (final) action uses @code{$@var{n}} where @var{n} denote the mid-rule
+action. In that case its nonterminal is rather named @code{@@@var{n}}:
+
+@example
+exp: @{ a(); @} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
+@end example
+
+@noindent
+is translated into
+
+@example
+@@1: %empty @{ a(); @};
+@@2: %empty @{ $$ = c(); @};
+$@@3: %empty @{ d(); @};
+exp: @@1 "b" @@2 $@@3 "e" @{ f = $1; @}
+@end example
+
+There are probably two errors in the above example: the first mid-rule
+action does not generate a value (it does not use @code{$$} although the
+final action uses it), and the value of the second one is not used (the
+final action does not use @code{$3}). Bison reports these errors when the
+@code{midrule-value} warnings are enabled (@pxref{Invocation, ,Invoking
+Bison}):
+
+@example
+$ bison -fcaret -Wmidrule-value mid.y
+@group
+mid.y:2.6-13: warning: unset value: $$
+ exp: @{ a(); @} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
+ ^^^^^^^^
+@end group
+@group
+mid.y:2.19-31: warning: unused value: $3
+ exp: @{ a(); @} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
+ ^^^^^^^^^^^^^
+@end group
+@end example
+
+
+@node Mid-Rule Conflicts
+@subsubsection Conflicts due to Mid-Rule Actions
Taking action before a rule is completely recognized often leads to
conflicts since the parser must commit to a parse in order to execute the
action. For example, the following two rules, without mid-rule actions,
@example
@group
subroutine:
- /* empty */ @{ prepare_for_local_variables (); @}
+ %empty @{ prepare_for_local_variables (); @}
;
@end group
Now Bison can execute the action in the rule for @code{subroutine} without
deciding which rule for @code{compound} it will eventually use.
+
@node Tracking Locations
@section Tracking Locations
@cindex location
reentrant. It looks like this:
@example
-%define api.pure
+%define api.pure full
@end example
The result is that the communication variables @code{yylval} and
@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
Parsers}, for the details of this. The variable @code{yynerrs}
becomes local in @code{yyparse} in pull mode but it becomes a member
-of yypstate in push mode. (@pxref{Error Reporting, ,The Error
+of @code{yypstate} in push mode. (@pxref{Error Reporting, ,The Error
Reporting Function @code{yyerror}}). The convention for calling
@code{yyparse} itself is unchanged.
what you are doing, your declarations should look like this:
@example
-%define api.pure
+%define api.pure full
%define api.push-pull push
@end example
@end deffn
@deffn {Directive} %defines @var{defines-file}
-Same as above, but save in the file @var{defines-file}.
+Same as above, but save in the file @file{@var{defines-file}}.
@end deffn
@deffn {Directive} %destructor
supported languages include C, C++, and Java.
@var{language} is case-insensitive.
-This directive is experimental and its effect may be modified in future
-releases.
@end deffn
@deffn {Directive} %locations
@end deffn
@deffn {Directive} %output "@var{file}"
-Specify @var{file} for the parser implementation file.
+Generate the parser implementation in @file{@var{file}}.
@end deffn
@deffn {Directive} %pure-parser
skeleton (@pxref{Decl Summary,,%language}, @pxref{Decl
Summary,,%skeleton}).
Unaccepted @var{variable}s produce an error.
-Some of the accepted @var{variable}s are:
+Some of the accepted @var{variable}s are described below.
-@table @code
-@c ================================================== api.namespace
-@item api.namespace
-@findex %define api.namespace
+@deffn Directive {%define api.namespace} @{@var{namespace}@}
@itemize
@item Languages(s): C++
For example, if you specify:
@example
-%define api.namespace "foo::bar"
+%define api.namespace @{foo::bar@}
@end example
Bison uses @code{foo::bar} verbatim in references such as:
lexical analyzer function. For example, if you specify:
@example
-%define api.namespace "foo"
+%define api.namespace @{foo@}
%name-prefix "bar::"
@end example
The parser namespace is @code{foo} and @code{yylex} is referenced as
@code{bar::lex}.
@end itemize
-@c namespace
+@end deffn
+@c api.namespace
@c ================================================== api.location.type
-@item @code{api.location.type}
-@findex %define api.location.type
+@deffn {Directive} {%define api.location.type} @var{type}
@itemize @bullet
@item Language(s): C++, Java
@item Default Value: none
-@item History: introduced in Bison 2.7
+@item History:
+Introduced in Bison 2.7 for C, C++ and Java. Introduced under the name
+@code{location_type} for C++ in Bison 2.5 and for Java in Bison 2.4.
@end itemize
+@end deffn
@c ================================================== api.prefix
-@item api.prefix
-@findex %define api.prefix
+@deffn {Directive} {%define api.prefix} @var{prefix}
@itemize @bullet
@item Language(s): All
@item History: introduced in Bison 2.6
@end itemize
+@end deffn
@c ================================================== api.pure
-@item api.pure
-@findex %define api.pure
+@deffn Directive {%define api.pure}
@itemize @bullet
@item Language(s): C
@item Purpose: Request a pure (reentrant) parser program.
@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
-@item Accepted Values: Boolean
+@item Accepted Values: @code{true}, @code{false}, @code{full}
+
+The value may be omitted: this is equivalent to specifying @code{true}, as is
+the case for Boolean values.
+
+When @code{%define api.pure full} is used, the parser is made reentrant. This
+changes the signature for @code{yylex} (@pxref{Pure Calling}), and also that of
+@code{yyerror} when the tracking of locations has been activated, as shown
+below.
+
+The @code{true} value is very similar to the @code{full} value, the only
+difference is in the signature of @code{yyerror} on Yacc parsers without
+@code{%parse-param}, for historical reasons.
+
+I.e., if @samp{%locations %define api.pure} is passed then the prototypes for
+@code{yyerror} are:
+
+@example
+void yyerror (char const *msg); // Yacc parsers.
+void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers.
+@end example
+
+But if @samp{%locations %define api.pure %parse-param @{int *nastiness@}} is
+used, then both parsers have the same signature:
+
+@example
+void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg);
+@end example
+
+(@pxref{Error Reporting, ,The Error
+Reporting Function @code{yyerror}})
@item Default Value: @code{false}
+
+@item History:
+the @code{full} value was introduced in Bison 2.7
@end itemize
+@end deffn
@c api.pure
@c ================================================== api.push-pull
-@item api.push-pull
-@findex %define api.push-pull
+@deffn Directive {%define api.push-pull} @var{kind}
@itemize @bullet
@item Language(s): C (deterministic parsers only)
@item Default Value: @code{pull}
@end itemize
+@end deffn
@c api.push-pull
@c ================================================== api.token.constructor
-@item api.token.constructor
-@findex %define api.token.constructor
+@deffn Directive {%define api.token.constructor}
@itemize @bullet
@item Language(s):
@item History:
introduced in Bison 2.8
@end itemize
+@end deffn
@c api.token.constructor
@c ================================================== api.token.prefix
-@item api.token.prefix
-@findex %define api.token.prefix
+@deffn Directive {%define api.token.prefix} @var{prefix}
@itemize
@item Languages(s): all
@item History:
introduced in Bison 2.8
@end itemize
+@end deffn
@c api.token.prefix
+@c ================================================== api.value.type
+@deffn Directive {%define api.value.type} @var{type}
+@itemize @bullet
+@item Language(s):
+C++
+
+@item Purpose:
+Request variant-based semantic values.
+@xref{C++ Variants}.
+
+@item Default Value:
+FIXME:
+@item History:
+introduced in Bison 2.8. Was introduced for Java only in 2.3b as
+@code{stype}.
+@end itemize
+@end deffn
+@c api.value.type
+
+
+@c ================================================== location_type
+@deffn Directive {%define location_type}
+Obsoleted by @code{api.location.type} since Bison 2.7.
+@end deffn
+
+
@c ================================================== lr.default-reduction
-@item lr.default-reduction
-@findex %define lr.default-reduction
+@deffn Directive {%define lr.default-reduction} @var{when}
@itemize @bullet
@item Language(s): all
introduced as @code{lr.default-reduction} in 2.5, renamed as
@code{lr.default-reduction} in 2.8.
@end itemize
+@end deffn
@c ============================================ lr.keep-unreachable-state
-@item lr.keep-unreachable-state
-@findex %define lr.keep-unreachable-state
+@deffn Directive {%define lr.keep-unreachable-state}
@itemize @bullet
@item Language(s): all
remain in the parser tables. @xref{Unreachable States}.
@item Accepted Values: Boolean
@item Default Value: @code{false}
-@end itemize
+@item History:
introduced as @code{lr.keep_unreachable_states} in 2.3b, renamed as
-@code{lr.keep-unreachable-state} in 2.5, and as
+@code{lr.keep-unreachable-states} in 2.5, and as
@code{lr.keep-unreachable-state} in 2.8.
+@end itemize
+@end deffn
@c lr.keep-unreachable-state
@c ================================================== lr.type
-@item lr.type
-@findex %define lr.type
+@deffn Directive {%define lr.type} @var{type}
@itemize @bullet
@item Language(s): all
@item Default Value: @code{lalr}
@end itemize
-
+@end deffn
@c ================================================== namespace
-@item namespace
-@findex %define namespace
+@deffn Directive %define namespace @{@var{namespace}@}
Obsoleted by @code{api.namespace}
@c namespace
-
+@end deffn
@c ================================================== parse.assert
-@item parse.assert
-@findex %define parse.assert
+@deffn Directive {%define parse.assert}
@itemize
@item Languages(s): C++
@item Default Value: @code{false}
@end itemize
+@end deffn
@c parse.assert
@c ================================================== parse.error
-@item parse.error
-@findex %define parse.error
+@deffn Directive {%define parse.error}
@itemize
@item Languages(s):
all
@item Default Value:
@code{simple}
@end itemize
+@end deffn
@c parse.error
@c ================================================== parse.lac
-@item parse.lac
-@findex %define parse.lac
+@deffn Directive {%define parse.lac}
@itemize
@item Languages(s): C (deterministic parsers only)
@item Accepted Values: @code{none}, @code{full}
@item Default Value: @code{none}
@end itemize
+@end deffn
@c parse.lac
@c ================================================== parse.trace
-@item parse.trace
-@findex %define parse.trace
+@deffn Directive {%define parse.trace}
@itemize
@item Languages(s): C, C++, Java
@item Default Value: @code{false}
@end itemize
+@end deffn
@c parse.trace
-@c ================================================== variant
-@item variant
-@findex %define variant
-
-@itemize @bullet
-@item Language(s):
-C++
-
-@item Purpose:
-Request variant-based semantic values.
-@xref{C++ Variants}.
-
-@item Accepted Values:
-Boolean.
-
-@item Default Value:
-@code{false}
-@end itemize
-@c variant
-@end table
-
-
@node %code Summary
@subsection %code Summary
@findex %code
@code{YYDEBUG} (not renamed) is used as a default value:
@example
-/* Enabling traces. */
+/* Debug traces. */
#ifndef CDEBUG
# if defined YYDEBUG
# if YYDEBUG
exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
@end example
+@noindent
+Using the following:
+@example
+%parse-param @{int *randomness@}
+@end example
+
+Results in these signatures:
+@example
+void yyerror (int *randomness, const char *msg);
+int yyparse (int *randomness);
+@end example
+
+@noindent
+Or, if both @code{%define api.pure full} (or just @code{%define api.pure})
+and @code{%locations} are used:
+
+@example
+void yyerror (YYLTYPE *llocp, int *randomness, const char *msg);
+int yyparse (int *randomness);
+@end example
+
@node Push Parser Function
@section The Push Parser Function @code{yypush_parse}
@findex yypush_parse
@samp{%define api.push-pull both} declaration is used.
@xref{Push Decl, ,A Push Parser}.
-@deftypefun int yypush_parse (yypstate *yyps)
+@deftypefun int yypush_parse (yypstate *@var{yyps})
The value returned by @code{yypush_parse} is the same as for yyparse with
the following exception: it returns @code{YYPUSH_MORE} if more input is
required to finish parsing the grammar.
declaration is used.
@xref{Push Decl, ,A Push Parser}.
-@deftypefun int yypull_parse (yypstate *yyps)
+@deftypefun int yypull_parse (yypstate *@var{yyps})
The value returned by @code{yypull_parse} is the same as for @code{yyparse}.
@end deftypefun
@samp{%define api.push-pull both} declaration is used.
@xref{Push Decl, ,A Push Parser}.
-@deftypefun void yypstate_delete (yypstate *yyps)
+@deftypefun void yypstate_delete (yypstate *@var{yyps})
This function will reclaim the memory associated with a parser instance.
After this call, you should no longer attempt to use the parser instance.
@end deftypefun
return 0;
@dots{}
if (c == '+' || c == '-')
- return c; /* Assume token type for `+' is '+'. */
+ return c; /* Assume token type for '+' is '+'. */
@dots{}
return INT; /* Return the type of the token. */
@dots{}
@node Pure Calling
@subsection Calling Conventions for Pure Parsers
-When you use the Bison declaration @samp{%define api.pure} to request a
+When you use the Bison declaration @code{%define api.pure full} to request a
pure, reentrant parser, the global communication variables @code{yylval}
and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
Parser}.) In such parsers the two global variables are replaced by
declarations, which is equivalent to repeating @code{%param}.
@end deffn
+@noindent
For instance:
@example
int yyparse (parser_mode *mode, environment_type *env);
@end example
-If @samp{%define api.pure} is added:
+If @samp{%define api.pure full} is added:
@example
int yylex (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env);
@end example
@noindent
-and finally, if both @samp{%define api.pure} and @code{%locations} are used:
+and finally, if both @samp{%define api.pure full} and @code{%locations} are
+used:
@example
int yylex (YYSTYPE *lvalp, YYLTYPE *llocp,
immediately return 1.
Obviously, in location tracking pure parsers, @code{yyerror} should have
-an access to the current location.
-This is indeed the case for the GLR
-parsers, but not for the Yacc parser, for historical reasons. I.e., if
-@samp{%locations %define api.pure} is passed then the prototypes for
-@code{yyerror} are:
-
-@example
-void yyerror (char const *msg); /* Yacc parsers. */
-void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */
-@end example
-
-If @samp{%parse-param @{int *nastiness@}} is used, then:
-
-@example
-void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */
-void yyerror (int *nastiness, char const *msg); /* GLR parsers. */
-@end example
-
-Finally, GLR and Yacc parsers share the same @code{yyerror} calling
-convention for absolutely pure parsers, i.e., when the calling
-convention of @code{yylex} @emph{and} the calling convention of
-@samp{%define api.pure} are pure.
-I.e.:
-
-@example
-/* Location tracking. */
-%locations
-/* Pure yylex. */
-%define api.pure
-%lex-param @{int *nastiness@}
-/* Pure yyparse. */
-%parse-param @{int *nastiness@}
-%parse-param @{int *randomness@}
-@end example
+an access to the current location. With @code{%define api.pure}, this is
+indeed the case for the GLR parsers, but not for the Yacc parser, for
+historical reasons, and this is the why @code{%define api.pure full} should be
+prefered over @code{%define api.pure}.
-@noindent
-results in the following signatures for all the parser kinds:
+When @code{%locations %define api.pure full} is used, @code{yyerror} has the
+following signature:
@example
-int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
-int yyparse (int *nastiness, int *randomness);
-void yyerror (YYLTYPE *locp,
- int *nastiness, int *randomness,
- char const *msg);
+void yyerror (YYLTYPE *locp, char const *msg);
@end example
@noindent
@end deffn
@deffn {Value} @@$
-@findex @@$
Acts like a structure variable containing information on the textual
location of the grouping made by the current rule. @xref{Tracking
Locations}.
@item
@cindex bison-i18n.m4
Into the directory containing the GNU Autoconf macros used
-by the package---often called @file{m4}---copy the
+by the package ---often called @file{m4}--- copy the
@file{bison-i18n.m4} file installed by Bison under
@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory.
For example:
term:
'(' expr ')'
| term '!'
-| NUMBER
+| "number"
;
@end group
@end example
@example
@group
if_stmt:
- IF expr THEN stmt
-| IF expr THEN stmt ELSE stmt
+ "if" expr "then" stmt
+| "if" expr "then" stmt "else" stmt
;
@end group
@end example
@noindent
-Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
-terminal symbols for specific keyword tokens.
+Here @code{"if"}, @code{"then"} and @code{"else"} are terminal symbols for
+specific keyword tokens.
-When the @code{ELSE} token is read and becomes the lookahead token, the
+When the @code{"else"} token is read and becomes the lookahead token, the
contents of the stack (assuming the input is valid) are just right for
reduction by the first rule. But it is also legitimate to shift the
-@code{ELSE}, because that would lead to eventual reduction by the second
+@code{"else"}, because that would lead to eventual reduction by the second
rule.
This situation, where either a shift or a reduction would be valid, is
operator precedence declarations. To see the reason for this, let's
contrast it with the other alternative.
-Since the parser prefers to shift the @code{ELSE}, the result is to attach
+Since the parser prefers to shift the @code{"else"}, the result is to attach
the else-clause to the innermost if-statement, making these two inputs
equivalent:
@example
-if x then if y then win (); else lose;
+if x then if y then win; else lose;
-if x then do; if y then win (); else lose; end;
+if x then do; if y then win; else lose; end;
@end example
But if the parser chose to reduce when possible rather than shift, the
making these two inputs equivalent:
@example
-if x then if y then win (); else lose;
+if x then if y then win; else lose;
-if x then do; if y then win (); end; else lose;
+if x then do; if y then win; end; else lose;
@end example
The conflict exists because the grammar as written is ambiguous: either
Algol 60 and is called the ``dangling @code{else}'' ambiguity.
To avoid warnings from Bison about predictable, legitimate shift/reduce
-conflicts, use the @code{%expect @var{n}} declaration.
+conflicts, you can use the @code{%expect @var{n}} declaration.
There will be no warning as long as the number of shift/reduce conflicts
is exactly @var{n}, and Bison will report an error if there is a
different number.
-@xref{Expect Decl, ,Suppressing Conflict Warnings}.
+@xref{Expect Decl, ,Suppressing Conflict Warnings}. However, we don't
+recommend the use of @code{%expect} (except @samp{%expect 0}!), as an equal
+number of conflicts does not mean that they are the @emph{same}. When
+possible, you should rather use precedence directives to @emph{fix} the
+conflicts explicitly (@pxref{Non Operators,, Using Precedence For Non
+Operators}).
The definition of @code{if_stmt} above is solely to blame for the
conflict, but the conflict does not actually appear without additional
the conflict:
@example
-@group
-%token IF THEN ELSE variable
%%
-@end group
@group
stmt:
expr
@group
if_stmt:
- IF expr THEN stmt
-| IF expr THEN stmt ELSE stmt
+ "if" expr "then" stmt
+| "if" expr "then" stmt "else" stmt
;
@end group
expr:
- variable
+ "identifier"
;
@end example
* Precedence Only:: How to specify precedence only.
* Precedence Examples:: How these features are used in the previous example.
* How Precedence:: How they work.
+* Non Operators:: Using precedence for general conflicts.
@end menu
@node Why Precedence
declared with @code{'-'}:
@example
-%left '<' '>' '=' NE LE GE
+%left '<' '>' '=' "!=" "<=" ">="
%left '+' '-'
%left '*' '/'
@end example
-@noindent
-(Here @code{NE} and so on stand for the operators for ``not equal''
-and so on. We assume that these tokens are more than one character long
-and therefore are represented by names, not character literals.)
-
@node How Precedence
@subsection How Precedence Works
Not all rules and not all tokens have precedence. If either the rule or
the lookahead token has no precedence, then the default is to shift.
+@node Non Operators
+@subsection Using Precedence For Non Operators
+
+Using properly precedence and associativity directives can help fixing
+shift/reduce conflicts that do not involve arithmetics-like operators. For
+instance, the ``dangling @code{else}'' problem (@pxref{Shift/Reduce, ,
+Shift/Reduce Conflicts}) can be solved elegantly in two different ways.
+
+In the present case, the conflict is between the token @code{"else"} willing
+to be shifted, and the rule @samp{if_stmt: "if" expr "then" stmt}, asking
+for reduction. By default, the precedence of a rule is that of its last
+token, here @code{"then"}, so the conflict will be solved appropriately
+by giving @code{"else"} a precedence higher than that of @code{"then"}, for
+instance as follows:
+
+@example
+@group
+%precedence "then"
+%precedence "else"
+@end group
+@end example
+
+Alternatively, you may give both tokens the same precedence, in which case
+associativity is used to solve the conflict. To preserve the shift action,
+use right associativity:
+
+@example
+%right "then" "else"
+@end example
+
+Neither solution is perfect however. Since Bison does not provide, so far,
+``scoped'' precedence, both force you to declare the precedence
+of these keywords with respect to the other operators your grammar.
+Therefore, instead of being warned about new conflicts you would be unaware
+of (e.g., a shift/reduce conflict due to @samp{if test then 1 else 2 + 3}
+being ambiguous: @samp{if test then 1 else (2 + 3)} or @samp{(if test then 1
+else 2) + 3}?), the conflict will be already ``fixed''.
+
@node Contextual Precedence
@section Context-Dependent Precedence
@cindex context-dependent precedence
@example
@group
sequence:
- /* empty */ @{ printf ("empty sequence\n"); @}
+ %empty @{ printf ("empty sequence\n"); @}
| maybeword
| sequence word @{ printf ("added word %s\n", $2); @}
;
@group
maybeword:
- /* empty */ @{ printf ("empty maybeword\n"); @}
-| word @{ printf ("single word %s\n", $1); @}
+ %empty @{ printf ("empty maybeword\n"); @}
+| word @{ printf ("single word %s\n", $1); @}
;
@end group
@end example
proper way to define @code{sequence}:
@example
+@group
sequence:
- /* empty */ @{ printf ("empty sequence\n"); @}
+ %empty @{ printf ("empty sequence\n"); @}
| sequence word @{ printf ("added word %s\n", $2); @}
;
+@end group
@end example
Here is another common error that yields a reduce/reduce conflict:
@example
+@group
sequence:
- /* empty */
+ %empty
| sequence words
| sequence redirects
;
+@end group
+@group
words:
- /* empty */
+ %empty
| words word
;
+@end group
+@group
redirects:
- /* empty */
+ %empty
| redirects redirect
;
+@end group
@end example
@noindent
@example
sequence:
- /* empty */
+ %empty
| sequence word
| sequence redirect
;
@example
@group
sequence:
- /* empty */
+ %empty
| sequence words
| sequence redirects
;
@end group
@end example
+Yet this proposal introduces another kind of ambiguity! The input
+@samp{word word} can be parsed as a single @code{words} composed of two
+@samp{word}s, or as two one-@code{word} @code{words} (and likewise for
+@code{redirect}/@code{redirects}). However this ambiguity is now a
+shift/reduce conflict, and therefore it can now be addressed with precedence
+directives.
+
+To simplify the matter, we will proceed with @code{word} and @code{redirect}
+being tokens: @code{"word"} and @code{"redirect"}.
+
+To prefer the longest @code{words}, the conflict between the token
+@code{"word"} and the rule @samp{sequence: sequence words} must be resolved
+as a shift. To this end, we use the same techniques as exposed above, see
+@ref{Non Operators,, Using Precedence For Non Operators}. One solution
+relies on precedences: use @code{%prec} to give a lower precedence to the
+rule:
+
+@example
+%precedence "word"
+%precedence "sequence"
+%%
+@group
+sequence:
+ %empty
+| sequence word %prec "sequence"
+| sequence redirect %prec "sequence"
+;
+@end group
+
+@group
+words:
+ word
+| words "word"
+;
+@end group
+@end example
+
+Another solution relies on associativity: provide both the token and the
+rule with the same precedence, but make them right-associative:
+
+@example
+%right "word" "redirect"
+%%
+@group
+sequence:
+ %empty
+| sequence word %prec "word"
+| sequence redirect %prec "redirect"
+;
+@end group
+@end example
+
@node Mysterious Conflicts
@section Mysterious Conflicts
@cindex Mysterious Conflicts
@example
@group
-%token ID
-
%%
def: param_spec return_spec ',';
param_spec:
| name_list ':' type
;
@end group
+
@group
return_spec:
type
| name ':' type
;
@end group
+
+type: "id";
+
@group
-type: ID;
-@end group
-@group
-name: ID;
+name: "id";
name_list:
name
| name ',' name_list
@end group
@end example
-It would seem that this grammar can be parsed with only a single token
-of lookahead: when a @code{param_spec} is being read, an @code{ID} is
-a @code{name} if a comma or colon follows, or a @code{type} if another
-@code{ID} follows. In other words, this grammar is LR(1).
+It would seem that this grammar can be parsed with only a single token of
+lookahead: when a @code{param_spec} is being read, an @code{"id"} is a
+@code{name} if a comma or colon follows, or a @code{type} if another
+@code{"id"} follows. In other words, this grammar is LR(1).
@cindex LR
@cindex LALR
However, for historical reasons, Bison cannot by default handle all
LR(1) grammars.
-In this grammar, two contexts, that after an @code{ID} at the beginning
+In this grammar, two contexts, that after an @code{"id"} at the beginning
of a @code{param_spec} and likewise at the beginning of a
@code{return_spec}, are similar enough that Bison assumes they are the
same.
@example
@group
-%token BOGUS
-@dots{}
-%%
@dots{}
return_spec:
type
| name ':' type
-| ID BOGUS /* This rule is never used. */
+| "id" "bogus" /* This rule is never used. */
;
@end group
@end example
This corrects the problem because it introduces the possibility of an
-additional active rule in the context after the @code{ID} at the beginning of
+additional active rule in the context after the @code{"id"} at the beginning of
@code{return_spec}. This rule is not active in the corresponding context
in a @code{param_spec}, so the two contexts receive distinct parser states.
-As long as the token @code{BOGUS} is never generated by @code{yylex},
+As long as the token @code{"bogus"} is never generated by @code{yylex},
the added rule cannot alter the way actual input is parsed.
In this particular example, there is another way to solve the problem:
-rewrite the rule for @code{return_spec} to use @code{ID} directly
+rewrite the rule for @code{return_spec} to use @code{"id"} directly
instead of via @code{name}. This also causes the two confusing
contexts to have different sets of active rules, because the one for
@code{return_spec} activates the altered rule for @code{return_spec}
rather than the one for @code{name}.
@example
+@group
param_spec:
type
| name_list ':' type
;
+@end group
+
+@group
return_spec:
type
-| ID ':' type
+| "id" ':' type
;
+@end group
@end example
For a more detailed exposition of LALR(1) parsers and parser
parser table construction algorithm by using the @code{%define lr.type}
directive.
-@deffn {Directive} {%define lr.type @var{TYPE}}
+@deffn {Directive} {%define lr.type} @var{type}
Specify the type of parser tables within the LR(1) family. The accepted
-values for @var{TYPE} are:
+values for @var{type} are:
@itemize
@item @code{lalr} (default)
@cindex GLR with LALR
When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
-conflicts statically (for example, with @code{%left} or @code{%prec}), then
+conflicts statically (for example, with @code{%left} or @code{%precedence}),
+then
the parser explores all potential parses of any given input. In this case,
the choice of parser table construction algorithm is guaranteed not to alter
the language accepted by the parser. LALR parser tables are the smallest
To adjust which states have default reductions enabled, use the
@code{%define lr.default-reduction} directive.
-@deffn {Directive} {%define lr.default-reduction @var{WHERE}}
+@deffn {Directive} {%define lr.default-reduction} @var{where}
Specify the kind of states that are permitted to contain default reductions.
-The accepted values of @var{WHERE} are:
+The accepted values of @var{where} are:
@itemize
@item @code{most} (default for LALR and IELR)
@item @code{consistent}
sacrificing @code{%nonassoc}, default reductions, or state merging. You can
enable LAC with the @code{%define parse.lac} directive.
-@deffn {Directive} {%define parse.lac @var{VALUE}}
+@deffn {Directive} {%define parse.lac} @var{value}
Enable LAC to improve syntax error handling.
@itemize
@item @code{none} (default)
keeping unreachable states is sometimes useful when trying to understand the
relationship between the parser and the grammar.
-@deffn {Directive} {%define lr.keep-unreachable-state @var{VALUE}}
+@deffn {Directive} {%define lr.keep-unreachable-state} @var{value}
Request that Bison allow unreachable states to remain in the parser tables.
-@var{VALUE} must be a Boolean. The default is @code{false}.
+@var{value} must be a Boolean. The default is @code{false}.
@end deffn
There are a few caveats to consider:
@example
stmts:
- /* empty string */
+ %empty
| stmts '\n'
| stmts exp '\n'
| stmts error '\n'
Developing a parser can be a challenge, especially if you don't understand
the algorithm (@pxref{Algorithm, ,The Bison Parser Algorithm}). This
-chapter explains how to generate and read the detailed description of the
-automaton, and how to enable and understand the parser run-time traces.
+chapter explains how understand and debug a parser.
+
+The first sections focus on the static part of the parser: its structure.
+They explain how to generate and read the detailed description of the
+automaton. There are several formats available:
+@itemize @minus
+@item
+as text, see @ref{Understanding, , Understanding Your Parser};
+
+@item
+as a graph, see @ref{Graphviz,, Visualizing Your Parser};
+
+@item
+or as a markup report that can be turned, for instance, into HTML, see
+@ref{Xml,, Visualizing your parser in multiple formats}.
+@end itemize
+
+The last section focuses on the dynamic part of the parser: how to enable
+and understand the parser run-time traces (@pxref{Tracing, ,Tracing Your
+Parser}).
@menu
* Understanding:: Understanding the structure of your parser.
* Graphviz:: Getting a visual representation of the parser.
+* Xml:: Getting a markup representation of the parser.
* Tracing:: Tracing the execution of your parser.
@end menu
As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm})
Bison parsers are @dfn{shift/reduce automata}. In some cases (much more
frequent than one would hope), looking at this automaton is required to
-tune or simply fix a parser. Bison provides two different
-representation of it, either textually or graphically (as a DOT file).
+tune or simply fix a parser.
The textual file is generated when the options @option{--report} or
@option{--verbose} are specified, see @ref{Invocation, , Invoking
@example
%token NUM STR
+@group
%left '+' '-'
%left '*'
+@end group
%%
+@group
exp:
exp '+' exp
| exp '-' exp
| exp '/' exp
| NUM
;
+@end group
useless: STR;
%%
@end example
@example
calc.y: warning: 1 nonterminal useless in grammar
calc.y: warning: 1 rule useless in grammar
-calc.y:11.1-7: warning: nonterminal useless in grammar: useless
-calc.y:11.10-12: warning: rule useless in grammar: useless: STR
+calc.y:12.1-7: warning: nonterminal useless in grammar: useless
+calc.y:12.10-12: warning: rule useless in grammar: useless: STR
calc.y: conflicts: 7 shift/reduce
@end example
the location of the input cursor.
@example
-state 0
+State 0
0 $accept: . exp $end
@option{--report=itemset} to list the derived items as well:
@example
-state 0
+State 0
0 $accept: . exp $end
1 exp: . exp '+' exp
In the state 1@dots{}
@example
-state 1
+State 1
5 exp: NUM .
@noindent
the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token
(@samp{$default}), the parser will reduce it. If it was coming from
-state 0, then, after this reduction it will return to state 0, and will
+State 0, then, after this reduction it will return to state 0, and will
jump to state 2 (@samp{exp: go to state 2}).
@example
-state 2
+State 2
0 $accept: exp . $end
1 exp: exp . '+' exp
state}:
@example
-state 3
+State 3
0 $accept: exp $end .
the reader.
@example
-state 4
+State 4
1 exp: exp '+' . exp
exp go to state 8
-state 5
+State 5
2 exp: exp '-' . exp
exp go to state 9
-state 6
+State 6
3 exp: exp '*' . exp
exp go to state 10
-state 7
+State 7
4 exp: exp '/' . exp
1 shift/reduce}:
@example
-state 8
+State 8
1 exp: exp . '+' exp
1 | exp '+' exp .
@option{--report=lookahead}, Bison specifies these lookahead tokens:
@example
-state 8
+State 8
1 exp: exp . '+' exp
1 | exp '+' exp . [$end, '+', '-', '/']
@example
@group
-state 9
+State 9
1 exp: exp . '+' exp
2 | exp . '-' exp
@end group
@group
-state 10
+State 10
1 exp: exp . '+' exp
2 | exp . '-' exp
@end group
@group
-state 11
+State 11
1 exp: exp . '+' exp
2 | exp . '-' exp
@noindent
Observe that state 11 contains conflicts not only due to the lack of
-precedence of @samp{/} with respect to @samp{+}, @samp{-}, and
-@samp{*}, but also because the
-associativity of @samp{/} is not specified.
+precedence of @samp{/} with respect to @samp{+}, @samp{-}, and @samp{*}, but
+also because the associativity of @samp{/} is not specified.
+
+Bison may also produce an HTML version of this output, via an XML file and
+XSLT processing (@pxref{Xml,,Visualizing your parser in multiple formats}).
@c ================================================= Graphical Representation
fail due to memory exhaustion). This option was rather designed for beginners,
to help them understand LR parsers.
-This file is generated when the @option{--graph} option is specified (see
-@pxref{Invocation, , Invoking Bison}). Its name is made by removing
+This file is generated when the @option{--graph} option is specified
+(@pxref{Invocation, , Invoking Bison}). Its name is made by removing
@samp{.tab.c} or @samp{.c} from the parser implementation file name, and
adding @samp{.dot} instead. If the grammar file is @file{foo.y}, the
-Graphviz output file is called @file{foo.dot}.
+Graphviz output file is called @file{foo.dot}. A DOT file may also be
+produced via an XML file and XSLT processing (@pxref{Xml,,Visualizing your
+parser in multiple formats}).
+
The following grammar file, @file{rr.y}, will be used in the sequel:
@end group
@end example
-The graphical output is very similar to the textual one, and as such it is
-easier understood by making direct comparisons between them. See
-@ref{Debugging, , Debugging Your Parser} for a detailled analysis of the
-textual report.
+The graphical output
+@ifnotinfo
+(see @ref{fig:graph})
+@end ifnotinfo
+is very similar to the textual one, and as such it is easier understood by
+making direct comparisons between them. @xref{Debugging, , Debugging Your
+Parser}, for a detailled analysis of the textual report.
+
+@ifnotinfo
+@float Figure,fig:graph
+@image{figs/example, 430pt}
+@caption{A graphical rendering of the parser.}
+@end float
+@end ifnotinfo
@subheading Graphical Representation of States
@example
@group
-state 3
+State 3
1 exp: a . ";"
This is how reductions are represented in the verbose file @file{rr.output}:
@example
-state 1
+State 1
3 a: "0" . [";"]
4 b: "0" . ["."]
are distinguished by a red filling color on these nodes, just like how they are
reported between square brackets in the verbose file.
-The reduction corresponding to the rule number 0 is the acceptation state. It
-is shown as a blue diamond, labelled "Acc".
+The reduction corresponding to the rule number 0 is the acceptation
+state. It is shown as a blue diamond, labelled ``Acc''.
@subheading Graphical representation of go tos
The @samp{go to} jump transitions are represented as dotted lines bearing
the name of the rule being jumped to.
+@c ================================================= XML
+
+@node Xml
+@section Visualizing your parser in multiple formats
+@cindex xml
+
+Bison supports two major report formats: textual output
+(@pxref{Understanding, ,Understanding Your Parser}) when invoked
+with option @option{--verbose}, and DOT
+(@pxref{Graphviz,, Visualizing Your Parser}) when invoked with
+option @option{--graph}. However,
+another alternative is to output an XML file that may then be, with
+@command{xsltproc}, rendered as either a raw text format equivalent to the
+verbose file, or as an HTML version of the same file, with clickable
+transitions, or even as a DOT. The @file{.output} and DOT files obtained via
+XSLT have no difference whatsoever with those obtained by invoking
+@command{bison} with options @option{--verbose} or @option{--graph}.
+
+The XML file is generated when the options @option{-x} or
+@option{--xml[=FILE]} are specified, see @ref{Invocation,,Invoking Bison}.
+If not specified, its name is made by removing @samp{.tab.c} or @samp{.c}
+from the parser implementation file name, and adding @samp{.xml} instead.
+For instance, if the grammar file is @file{foo.y}, the default XML output
+file is @file{foo.xml}.
+
+Bison ships with a @file{data/xslt} directory, containing XSL Transformation
+files to apply to the XML file. Their names are non-ambiguous:
+
+@table @file
+@item xml2dot.xsl
+Used to output a copy of the DOT visualization of the automaton.
+@item xml2text.xsl
+Used to output a copy of the @samp{.output} file.
+@item xml2xhtml.xsl
+Used to output an xhtml enhancement of the @samp{.output} file.
+@end table
+
+Sample usage (requires @command{xsltproc}):
+@example
+$ bison -x gr.y
+@group
+$ bison --print-datadir
+/usr/local/share/bison
+@end group
+$ xsltproc /usr/local/share/bison/xslt/xml2xhtml.xsl gr.xml >gr.html
+@end example
+
@c ================================================= Tracing
@node Tracing
@noindent
The previous reduction demonstrates the @code{%printer} directive for
-@code{<val>}: both the token @code{NUM} and the resulting non-terminal
+@code{<val>}: both the token @code{NUM} and the resulting nonterminal
@code{exp} have @samp{1} as value.
@example
short option. It is followed by a cross key alphabetized by long
option.
-@c Please, keep this ordered as in `bison --help'.
+@c Please, keep this ordered as in 'bison --help'.
@noindent
Operations modes:
@table @option
Deprecated constructs whose support will be removed in future versions of
Bison.
+@item empty-rule
+Empty rules without @code{%empty}. @xref{Empty Rules}. Disabled by
+default, but enabled by uses of @code{%empty}, unless
+@option{-Wno-empty-rule} was specified.
+
+@item precedence
+Useless precedence and associativity directives. Disabled by default.
+
+Consider for instance the following grammar:
+
+@example
+@group
+%nonassoc "="
+%left "+"
+%left "*"
+%precedence "("
+@end group
+%%
+@group
+stmt:
+ exp
+| "var" "=" exp
+;
+@end group
+
+@group
+exp:
+ exp "+" exp
+| exp "*" "num"
+| "(" exp ")"
+| "num"
+;
+@end group
+@end example
+
+Bison reports:
+
+@c cannot leave the location and the [-Wprecedence] for lack of
+@c width in PDF.
+@example
+@group
+warning: useless precedence and associativity for "="
+ %nonassoc "="
+ ^^^
+@end group
+@group
+warning: useless associativity for "*", use %precedence
+ %left "*"
+ ^^^
+@end group
+@group
+warning: useless precedence for "("
+ %precedence "("
+ ^^^
+@end group
+@end example
+
+One would get the exact same parser with the following directives instead:
+
+@example
+@group
+%left "+"
+%precedence "*"
+@end group
+@end example
+
@item other
All warnings not categorized above. These warnings are enabled by default.
categories.
@item all
-All the warnings.
+All the warnings except @code{yacc}.
+
@item none
Turn off all the warnings.
+
@item error
See @option{-Werror}, below.
@end table
$ bison -Werror=yacc,conflicts-sr input.y
$ bison -Werror=yacc,error=conflicts-sr input.y
@end example
+
+@item -f [@var{feature}]
+@itemx --feature[=@var{feature}]
+Activate miscellaneous @var{feature}. @var{feature} can be one of:
+@table @code
+@item caret
+@itemx diagnostics-show-caret
+Show caret errors, in a manner similar to GCC's
+@option{-fdiagnostics-show-caret}, or Clang's @option{-fcaret-diagnotics}. The
+location provided with the message is used to quote the corresponding line of
+the source file, underlining the important part of it with carets (^). Here is
+an example, using the following file @file{in.y}:
+
+@example
+%type <ival> exp
+%%
+exp: exp '+' exp @{ $exp = $1 + $2; @};
+@end example
+
+When invoked with @option{-fcaret} (or nothing), Bison will report:
+
+@example
+@group
+in.y:3.20-23: error: ambiguous reference: '$exp'
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^^
+@end group
+@group
+in.y:3.1-3: refers to: $exp at $$
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^
+@end group
+@group
+in.y:3.6-8: refers to: $exp at $1
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^
+@end group
+@group
+in.y:3.14-16: refers to: $exp at $3
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^
+@end group
+@group
+in.y:3.32-33: error: $2 of 'exp' has no declared type
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^
+@end group
+@end example
+
+Whereas, when invoked with @option{-fno-caret}, Bison will only report:
+
+@example
+@group
+in.y:3.20-23: error: ambiguous reference: ‘$exp’
+in.y:3.1-3: refers to: $exp at $$
+in.y:3.6-8: refers to: $exp at $1
+in.y:3.14-16: refers to: $exp at $3
+in.y:3.32-33: error: $2 of ‘exp’ has no declared type
+@end group
+@end example
+
+This option is activated by default.
+
+@end table
@end table
@noindent
Summary}). Currently supported languages include C, C++, and Java.
@var{language} is case-insensitive.
-This option is experimental and its effect may be modified in future
-releases.
-
@item --locations
Pretend that @code{%locations} was specified. @xref{Decl Summary}.
@node C++ Variants
@subsubsection C++ Variants
-Starting with version 2.6, Bison provides a @emph{variant} based
-implementation of semantic values for C++. This alleviates all the
-limitations reported in the previous section, and in particular, object
-types can be used without pointers.
+Bison provides a @emph{variant} based implementation of semantic values for
+C++. This alleviates all the limitations reported in the previous section,
+and in particular, object types can be used without pointers.
To enable variant-based semantic values, set @code{%define} variable
@code{variant} (@pxref{%define Summary,, variant}). Once this defined,
type on all platforms, alignments are enforced for @code{double} whatever
types are actually used. This may waste space in some cases.
-@item
-Our implementation is not conforming with strict aliasing rules. Alias
-analysis is a technique used in optimizing compilers to detect when two
-pointers are disjoint (they cannot ``meet''). Our implementation breaks
-some of the rules that G++ 4.4 uses in its alias analysis, so @emph{strict
-alias analysis must be disabled}. Use the option
-@option{-fno-strict-aliasing} to compile the generated parser.
-
@item
There might be portability issues we are not aware of.
@end itemize
@node Split Symbols
@subsubsection Split Symbols
-Therefore the interface is as follows.
+The interface is as follows.
@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...)
@deftypemethodx {parser} {int} yylex (semantic_type* @var{yylval}, @var{type1} @var{arg1}, ...)
@node Complete Symbols
@subsubsection Complete Symbols
-If you specified both @code{%define variant} and
+If you specified both @code{%define api.value.type variant} and
@code{%define api.token.constructor},
the @code{parser} class also defines the class @code{parser::symbol_type}
which defines a @emph{complete} symbol, aggregating its type (i.e., the
@noindent
@findex %define api.token.constructor
-@findex %define variant
+@findex %define api.value.type variant
This example will use genuine C++ objects as semantic values, therefore, we
require the variant-based interface. To make sure we properly use it, we
enable assertions. To fully benefit from type-safety and more natural
@comment file: calc++-parser.yy
@example
%define api.token.constructor
+%define api.value.type variant
%define parse.assert
-%define variant
@end example
@noindent
unit: assignments exp @{ driver.result = $2; @};
assignments:
- /* Nothing. */ @{@}
+ %empty @{@}
| assignments assignment @{@};
assignment:
@comment file: calc++-scanner.ll
@example
-%option noyywrap nounput batch debug
+%option noyywrap nounput batch debug noinput
@end example
@noindent
Contrary to C parsers, Java parsers do not use global variables; the
state of the parser is always local to an instance of the parser class.
Therefore, all Java parsers are ``pure'', and the @code{%pure-parser}
-and @samp{%define api.pure} directives does not do anything when used in
-Java.
+and @code{%define api.pure} directives do nothing when used in Java.
Push parsers are currently unsupported in Java and @code{%define
api.push-pull} have no effect.
By default, the semantic stack is declared to have @code{Object} members,
which means that the class types you specify can be of any class.
To improve the type safety of the parser, you can declare the common
-superclass of all the semantic values using the @samp{%define stype}
+superclass of all the semantic values using the @samp{%define api.value.type}
directive. For example, after the following declaration:
@example
-%define stype "ASTNode"
+%define api.value.type "ASTNode"
@end example
@noindent
@deftypemethod {Lexer} {Object} getLVal ()
Return the semantic value of the last token that yylex returned.
-The return type can be changed using @samp{%define stype
+The return type can be changed using @samp{%define api.value.type
"@var{class-name}".}
@end deftypemethod
@defvar $$
The semantic value for the grouping made by the current rule. As a
value, this is in the base type (@code{Object} or as specified by
-@samp{%define stype}) as in not cast to the declared subtype because
+@samp{%define api.value.type}) as in not cast to the declared subtype because
casts are not allowed on the left-hand side of Java assignments.
Use an explicit Java cast if the correct subtype is needed.
@xref{Java Semantic Values}.
@item
Java lacks unions, so @code{%union} has no effect. Instead, semantic
values have a common base type: @code{Object} or as specified by
-@samp{%define stype}. Angle brackets on @code{%token}, @code{type},
+@samp{%define api.value.type}. Angle brackets on @code{%token}, @code{type},
@code{$@var{n}} and @code{$$} specify subtypes rather than fields of
an union. The type of @code{$$}, even with angle brackets, is the base
type since Java casts are not allow on the left-hand side of assignments.
@xref{Java Bison Interface}.
@end deffn
-@deffn {Directive} {%define stype} "@var{class}"
+@deffn {Directive} {%define api.value.type} "@var{class}"
The base type of semantic values. Default is @code{Object}.
@xref{Java Semantic Values}.
@end deffn
@quotation
My parser includes support for an @samp{#include}-like feature, in
which case I run @code{yyparse} from @code{yyparse}. This fails
-although I did specify @samp{%define api.pure}.
+although I did specify @samp{%define api.pure full}.
@end quotation
These problems typically come not from Bison itself, but from
version. If you have trouble compiling, you should also include a
transcript of the build session, starting with the invocation of
`configure'. Depending on the nature of the bug, you may be asked to
-send additional files as well (such as `config.h' or `config.cache').
+send additional files as well (such as @file{config.h} or @file{config.cache}).
Patches are most welcome, but not required. That is, do not hesitate to
send a bug report just because you cannot provide a fix.
@end deffn
@deffn {Variable} @@@var{n}
+@deffnx {Symbol} @@@var{n}
In an action, the location of the @var{n}-th symbol of the right-hand side
of the rule. @xref{Tracking Locations}.
+
+In a grammar, the Bison-generated nonterminal symbol for a mid-rule action
+with a semantical value. @xref{Mid-Rule Action Translation}.
@end deffn
@deffn {Variable} @@@var{name}
-In an action, the location of a symbol addressed by name. @xref{Tracking
-Locations}.
+@deffnx {Variable} @@[@var{name}]
+In an action, the location of a symbol addressed by @var{name}.
+@xref{Tracking Locations}.
@end deffn
-@deffn {Variable} @@[@var{name}]
-In an action, the location of a symbol addressed by name. @xref{Tracking
-Locations}.
+@deffn {Symbol} $@@@var{n}
+In a grammar, the Bison-generated nonterminal symbol for a mid-rule action
+with no semantical value. @xref{Mid-Rule Action Translation}.
@end deffn
@deffn {Variable} $$
@end deffn
@deffn {Variable} $@var{name}
-In an action, the semantic value of a symbol addressed by name.
-@xref{Actions}.
-@end deffn
-
-@deffn {Variable} $[@var{name}]
-In an action, the semantic value of a symbol addressed by name.
+@deffnx {Variable} $[@var{name}]
+In an action, the semantic value of a symbol addressed by @var{name}.
@xref{Actions}.
@end deffn
feature.
@end deffn
-@deffn {Construct} /*@dots{}*/
-Comment delimiters, as in C.
+@deffn {Construct} /* @dots{} */
+@deffnx {Construct} // @dots{}
+Comments, as in C/C++.
@end deffn
@deffn {Delimiter} :
GLR Parsers}.
@end deffn
+@deffn {Directive} %empty
+Bison declaration to declare make explicit that a rule has an empty
+right-hand side. @xref{Empty Rules}.
+@end deffn
+
@deffn {Symbol} $end
The predefined token marking the end of the token stream. It cannot be
used in the grammar.
@code{yylex}}.
@end deffn
-@deffn {Macro} YYLEX_PARAM
-An obsolete macro for specifying an extra argument (or list of extra
-arguments) for @code{yyparse} to pass to @code{yylex}. The use of this
-macro is deprecated, and is supported only for Yacc like parsers.
-@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
-@end deffn
-
@deffn {Variable} yylloc
External variable in which @code{yylex} should place the line and column
numbers associated with a token. (In a pure parser, it is a local
@deffn {Variable} yynerrs
Global variable which Bison increments each time it reports a syntax error.
(In a pure parser, it is a local variable within @code{yyparse}. In a
-pure push parser, it is a member of yypstate.)
+pure push parser, it is a member of @code{yypstate}.)
@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
@end deffn
@item Accepting state
A state whose only action is the accept action.
The accepting state is thus a consistent state.
-@xref{Understanding,,}.
+@xref{Understanding, ,Understanding Your Parser}.
@item Backus-Naur Form (BNF; also called ``Backus Normal Form'')
Formal method of specifying context-free grammars originally proposed
@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy ints
@c LocalWords: Scannerless ispell american ChangeLog smallexample CSTYPE CLTYPE
@c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate LocationType
-@c LocalWords: errorVerbose
+@c LocalWords: parsers parser's
+@c LocalWords: associativity subclasses precedences unresolvable runnable
+@c LocalWords: allocators subunit initializations unreferenced untyped
+@c LocalWords: errorVerbose subtype subtypes
@c Local Variables:
@c ispell-dictionary: "american"