This says when, why and how to use the exceptional
action in the middle of a rule.
+Actions in Mid-Rule
+
+* Using Mid-Rule Actions:: Putting an action in the middle of a rule.
+* Mid-Rule Action Translation:: How mid-rule actions are actually processed.
+* Mid-Rule Conflicts:: Mid-rule actions can cause conflicts.
+
Tracking Locations
* Location Type:: Specifying a data type for locations.
Debugging Your Parser
* Understanding:: Understanding the structure of your parser.
+* Graphviz:: Getting a visual representation of the parser.
+* Xml:: Getting a markup representation of the parser.
* Tracing:: Tracing the execution of your parser.
Tracing Your Parser
* C++ position:: One point in the source file
* C++ location:: Two points in the source file
+* User Defined Location Type:: Required interface for locations
A Complete C++ Example
void
yyerror (char const *s)
@{
- printf ("%s\n", s);
+ fprintf (stderr, "%s\n", s);
@}
@end group
@node Grammar Outline
@section Outline of a Bison Grammar
+@cindex comment
+@findex // @dots{}
+@findex /* @dots{} */
A Bison grammar file has four main sections, shown here with the
appropriate delimiters:
@end example
Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
-As a GNU extension, @samp{//} introduces a comment that
-continues until end of line.
+As a GNU extension, @samp{//} introduces a comment that continues until end
+of line.
@menu
* Prologue:: Syntax and usage of the prologue.
These actions are written just like usual end-of-rule actions, but they
are executed before the parser even recognizes the following components.
+@menu
+* Using Mid-Rule Actions:: Putting an action in the middle of a rule.
+* Mid-Rule Action Translation:: How mid-rule actions are actually processed.
+* Mid-Rule Conflicts:: Mid-rule actions can cause conflicts.
+@end menu
+
+@node Using Mid-Rule Actions
+@subsubsection Using Mid-Rule Actions
+
A mid-rule action may refer to the components preceding it using
@code{$@var{n}}, but it may not refer to subsequent components because
it is run before they are parsed.
@example
@group
stmt:
- LET '(' var ')'
- @{ $<context>$ = push_context (); declare_variable ($3); @}
+ "let" '(' var ')'
+ @{
+ $<context>$ = push_context ();
+ declare_variable ($3);
+ @}
stmt
- @{ $$ = $6; pop_context ($<context>5); @}
+ @{
+ $$ = $6;
+ pop_context ($<context>5);
+ @}
@end group
@end example
@code{context} in the data-type union. Then it calls
@code{declare_variable} to add the new variable to that list. Once the
first action is finished, the embedded statement @code{stmt} can be
-parsed. Note that the mid-rule action is component number 5, so the
-@samp{stmt} is component number 6.
+parsed.
+
+Note that the mid-rule action is component number 5, so the @samp{stmt} is
+component number 6. Named references can be used to improve the readability
+and maintainability (@pxref{Named References}):
+
+@example
+@group
+stmt:
+ "let" '(' var ')'
+ @{
+ $<context>let = push_context ();
+ declare_variable ($3);
+ @}[let]
+ stmt
+ @{
+ $$ = $6;
+ pop_context ($<context>let);
+ @}
+@end group
+@end example
After the embedded statement is parsed, its semantic value becomes the
value of the entire @code{let}-statement. Then the semantic value from the
let stmt
@{
$$ = $2;
- pop_context ($1);
+ pop_context ($let);
@};
let:
- LET '(' var ')'
+ "let" '(' var ')'
@{
- $$ = push_context ();
+ $let = push_context ();
declare_variable ($3);
@};
Any mid-rule action can be converted to an end-of-rule action in this way, and
this is what Bison actually does to implement mid-rule actions.
+@node Mid-Rule Action Translation
+@subsubsection Mid-Rule Action Translation
+@vindex $@@@var{n}
+@vindex @@@var{n}
+
+As hinted earlier, mid-rule actions are actually transformed into regular
+rules and actions. The various reports generated by Bison (textual,
+graphical, etc., see @ref{Understanding, , Understanding Your Parser})
+reveal this translation, best explained by means of an example. The
+following rule:
+
+@example
+exp: @{ a(); @} "b" @{ c(); @} @{ d(); @} "e" @{ f(); @};
+@end example
+
+@noindent
+is translated into:
+
+@example
+$@@1: /* empty */ @{ a(); @};
+$@@2: /* empty */ @{ c(); @};
+$@@3: /* empty */ @{ d(); @};
+exp: $@@1 "b" $@@2 $@@3 "e" @{ f(); @};
+@end example
+
+@noindent
+with new nonterminal symbols @code{$@@@var{n}}, where @var{n} is a number.
+
+A mid-rule action is expected to generate a value if it uses @code{$$}, or
+the (final) action uses @code{$@var{n}} where @var{n} denote the mid-rule
+action. In that case its nonterminal is rather named @code{@@@var{n}}:
+
+@example
+exp: @{ a(); @} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
+@end example
+
+@noindent
+is translated into
+
+@example
+@@1: /* empty */ @{ a(); @};
+@@2: /* empty */ @{ $$ = c(); @};
+$@@3: /* empty */ @{ d(); @};
+exp: @@1 "b" @@2 $@@3 "e" @{ f = $1; @}
+@end example
+
+There are probably two errors in the above example: the first mid-rule
+action does not generate a value (it does not use @code{$$} although the
+final action uses it), and the value of the second one is not used (the
+final action does not use @code{$3}). Bison reports these errors when the
+@code{midrule-value} warnings are enabled (@pxref{Invocation, ,Invoking
+Bison}):
+
+@example
+$ bison -fcaret -Wmidrule-value mid.y
+@group
+mid.y:2.6-13: warning: unset value: $$
+ exp: @{ a(); @} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
+ ^^^^^^^^
+@end group
+@group
+mid.y:2.19-31: warning: unused value: $3
+ exp: @{ a(); @} "b" @{ $$ = c(); @} @{ d(); @} "e" @{ f = $1; @};
+ ^^^^^^^^^^^^^
+@end group
+@end example
+
+
+@node Mid-Rule Conflicts
+@subsubsection Conflicts due to Mid-Rule Actions
Taking action before a rule is completely recognized often leads to
conflicts since the parser must commit to a parse in order to execute the
action. For example, the following two rules, without mid-rule actions,
Now Bison can execute the action in the rule for @code{subroutine} without
deciding which rule for @code{compound} it will eventually use.
+
@node Tracking Locations
@section Tracking Locations
@cindex location
the current lookahead and the entire stack (except the current
right-hand side symbols) when the parser returns immediately, and
@item
+the current lookahead and the entire stack (including the current right-hand
+side symbols) when the C++ parser (@file{lalr1.cc}) catches an exception in
+@code{parse},
+@item
the start symbol, when the parser succeeds.
@end itemize
reentrant. It looks like this:
@example
-%define api.pure
+%define api.pure full
@end example
The result is that the communication variables @code{yylval} and
what you are doing, your declarations should look like this:
@example
-%define api.pure
+%define api.pure full
%define api.push-pull push
@end example
yypstate_delete (ps);
@end example
-Adding the @code{%define api.pure} declaration does exactly the same thing to
-the generated parser with @code{%define api.push-pull both} as it did for
+Adding the @code{%define api.pure full} declaration does exactly the same thing
+to the generated parser with @code{%define api.push-pull both} as it did for
@code{%define api.push-pull push}.
@node Decl Summary
supported languages include C, C++, and Java.
@var{language} is case-insensitive.
-This directive is experimental and its effect may be modified in future
-releases.
@end deffn
@deffn {Directive} %locations
Some of the accepted @var{variable}s are:
@itemize @bullet
+@c ================================================== api.location.type
+@item @code{api.location.type}
+@findex %define api.location.type
+
+@itemize @bullet
+@item Language(s): C++, Java
+
+@item Purpose: Define the location type.
+@xref{User Defined Location Type}.
+
+@item Accepted Values: String
+
+@item Default Value: none
+
+@item History: introduced in Bison 2.7
+@end itemize
+
@c ================================================== api.prefix
@item @code{api.prefix}
@findex %define api.prefix
@itemize @bullet
@item Language(s): All
-@item Purpose: Rename exported symbols
+@item Purpose: Rename exported symbols.
@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
@item Accepted Values: String
@item Purpose: Request a pure (reentrant) parser program.
@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
-@item Accepted Values: Boolean
+@item Accepted Values: @code{true}, @code{false}, @code{full}
+
+The value may be omitted: this is equivalent to specifying @code{true}, as is
+the case for Boolean values.
+
+When @code{%define api.pure full} is used, the parser is made reentrant. This
+changes the signature for @code{yylex} (@pxref{Pure Calling}), and also that of
+@code{yyerror} when the tracking of locations has been activated, as shown
+below.
+
+The @code{true} value is very similar to the @code{full} value, the only
+difference is in the signature of @code{yyerror} on Yacc parsers without
+@code{%parse-param}, for historical reasons.
+
+I.e., if @samp{%locations %define api.pure} is passed then the prototypes for
+@code{yyerror} are:
+
+@example
+void yyerror (char const *msg); // Yacc parsers.
+void yyerror (YYLTYPE *locp, char const *msg); // GLR parsers.
+@end example
+
+But if @samp{%locations %define api.pure %parse-param @{int *nastiness@}} is
+used, then both parsers have the same signature:
+
+@example
+void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg);
+@end example
+
+(@pxref{Error Reporting, ,The Error
+Reporting Function @code{yyerror}})
@item Default Value: @code{false}
+
+@item History: the @code{full} value was introduced in Bison 2.7
@end itemize
@c ================================================== api.push-pull
exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
@end example
+@noindent
+Using the following:
+@example
+%parse-param @{int *randomness@}
+@end example
+
+Results in these signatures:
+@example
+void yyerror (int *randomness, const char *msg);
+int yyparse (int *randomness);
+@end example
+
+@noindent
+Or, if both @code{%define api.pure full} (or just @code{%define api.pure})
+and @code{%locations} are used:
+
+@example
+void yyerror (YYLTYPE *llocp, int *randomness, const char *msg);
+int yyparse (int *randomness);
+@end example
+
@node Push Parser Function
@section The Push Parser Function @code{yypush_parse}
@findex yypush_parse
@node Pure Calling
@subsection Calling Conventions for Pure Parsers
-When you use the Bison declaration @code{%define api.pure} to request a
+When you use the Bison declaration @code{%define api.pure full} to request a
pure, reentrant parser, the global communication variables @code{yylval}
and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
Parser}.) In such parsers the two global variables are replaced by
additional @code{yylex} argument declaration.
@end deffn
+@noindent
For instance:
@example
-%parse-param @{int *nastiness@}
%lex-param @{int *nastiness@}
-%parse-param @{int *randomness@}
@end example
@noindent
-results in the following signatures:
+results in the following signature:
@example
-int yylex (int *nastiness);
-int yyparse (int *nastiness, int *randomness);
-@end example
-
-If @code{%define api.pure} is added:
-
-@example
-int yylex (YYSTYPE *lvalp, int *nastiness);
-int yyparse (int *nastiness, int *randomness);
+int yylex (int *nastiness);
@end example
@noindent
-and finally, if both @code{%define api.pure} and @code{%locations} are used:
+If @code{%define api.pure full} (or just @code{%define api.pure}) is added:
@example
-int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
-int yyparse (int *nastiness, int *randomness);
+int yylex (YYSTYPE *lvalp, int *nastiness);
@end example
@node Error Reporting
immediately return 1.
Obviously, in location tracking pure parsers, @code{yyerror} should have
-an access to the current location.
-This is indeed the case for the GLR
-parsers, but not for the Yacc parser, for historical reasons. I.e., if
-@samp{%locations %define api.pure} is passed then the prototypes for
-@code{yyerror} are:
-
-@example
-void yyerror (char const *msg); /* Yacc parsers. */
-void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */
-@end example
+an access to the current location. With @code{%define api.pure}, this is
+indeed the case for the GLR parsers, but not for the Yacc parser, for
+historical reasons, and this is the why @code{%define api.pure full} should be
+prefered over @code{%define api.pure}.
-If @samp{%parse-param @{int *nastiness@}} is used, then:
+When @code{%locations %define api.pure full} is used, @code{yyerror} has the
+following signature:
@example
-void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */
-void yyerror (int *nastiness, char const *msg); /* GLR parsers. */
-@end example
-
-Finally, GLR and Yacc parsers share the same @code{yyerror} calling
-convention for absolutely pure parsers, i.e., when the calling
-convention of @code{yylex} @emph{and} the calling convention of
-@code{%define api.pure} are pure.
-I.e.:
-
-@example
-/* Location tracking. */
-%locations
-/* Pure yylex. */
-%define api.pure
-%lex-param @{int *nastiness@}
-/* Pure yyparse. */
-%parse-param @{int *nastiness@}
-%parse-param @{int *randomness@}
-@end example
-
-@noindent
-results in the following signatures for all the parser kinds:
-
-@example
-int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
-int yyparse (int *nastiness, int *randomness);
-void yyerror (YYLTYPE *locp,
- int *nastiness, int *randomness,
- char const *msg);
+void yyerror (YYLTYPE *locp, char const *msg);
@end example
@noindent
@end deffn
@deffn {Value} @@$
-@findex @@$
Acts like a structure variable containing information on the textual
location of the grouping made by the current rule. @xref{Tracking
Locations}.
@item
@cindex bison-i18n.m4
Into the directory containing the GNU Autoconf macros used
-by the package---often called @file{m4}---copy the
+by the package ---often called @file{m4}--- copy the
@file{bison-i18n.m4} file installed by Bison under
@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory.
For example:
parser table construction algorithm by using the @code{%define lr.type}
directive.
-@deffn {Directive} {%define lr.type @var{TYPE}}
+@deffn {Directive} {%define lr.type} @var{type}
Specify the type of parser tables within the LR(1) family. The accepted
-values for @var{TYPE} are:
+values for @var{type} are:
@itemize
@item @code{lalr} (default)
To adjust which states have default reductions enabled, use the
@code{%define lr.default-reductions} directive.
-@deffn {Directive} {%define lr.default-reductions @var{WHERE}}
+@deffn {Directive} {%define lr.default-reductions} @var{where}
Specify the kind of states that are permitted to contain default reductions.
-The accepted values of @var{WHERE} are:
+The accepted values of @var{where} are:
@itemize
@item @code{most} (default for LALR and IELR)
@item @code{consistent}
sacrificing @code{%nonassoc}, default reductions, or state merging. You can
enable LAC with the @code{%define parse.lac} directive.
-@deffn {Directive} {%define parse.lac @var{VALUE}}
+@deffn {Directive} {%define parse.lac} @var{value}
Enable LAC to improve syntax error handling.
@itemize
@item @code{none} (default)
keeping unreachable states is sometimes useful when trying to understand the
relationship between the parser and the grammar.
-@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}}
+@deffn {Directive} {%define lr.keep-unreachable-states} @var{value}
Request that Bison allow unreachable states to remain in the parser tables.
-@var{VALUE} must be a Boolean. The default is @code{false}.
+@var{value} must be a Boolean. The default is @code{false}.
@end deffn
There are a few caveats to consider:
Developing a parser can be a challenge, especially if you don't understand
the algorithm (@pxref{Algorithm, ,The Bison Parser Algorithm}). This
-chapter explains how to generate and read the detailed description of the
-automaton, and how to enable and understand the parser run-time traces.
+chapter explains how understand and debug a parser.
+
+The first sections focus on the static part of the parser: its structure.
+They explain how to generate and read the detailed description of the
+automaton. There are several formats available:
+@itemize @minus
+@item
+as text, see @ref{Understanding, , Understanding Your Parser};
+
+@item
+as a graph, see @ref{Graphviz,, Visualizing Your Parser};
+
+@item
+or as a markup report that can be turned, for instance, into HTML, see
+@ref{Xml,, Visualizing your parser in multiple formats}.
+@end itemize
+
+The last section focuses on the dynamic part of the parser: how to enable
+and understand the parser run-time traces (@pxref{Tracing, ,Tracing Your
+Parser}).
@menu
* Understanding:: Understanding the structure of your parser.
+* Graphviz:: Getting a visual representation of the parser.
+* Xml:: Getting a markup representation of the parser.
* Tracing:: Tracing the execution of your parser.
@end menu
As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm})
Bison parsers are @dfn{shift/reduce automata}. In some cases (much more
frequent than one would hope), looking at this automaton is required to
-tune or simply fix a parser. Bison provides two different
-representation of it, either textually or graphically (as a DOT file).
+tune or simply fix a parser.
The textual file is generated when the options @option{--report} or
@option{--verbose} are specified, see @ref{Invocation, , Invoking
@example
%token NUM STR
+@group
%left '+' '-'
%left '*'
+@end group
%%
+@group
exp:
exp '+' exp
| exp '-' exp
| exp '/' exp
| NUM
;
+@end group
useless: STR;
%%
@end example
@example
calc.y: warning: 1 nonterminal useless in grammar
calc.y: warning: 1 rule useless in grammar
-calc.y:11.1-7: warning: nonterminal useless in grammar: useless
-calc.y:11.10-12: warning: rule useless in grammar: useless: STR
+calc.y:12.1-7: warning: nonterminal useless in grammar: useless
+calc.y:12.10-12: warning: rule useless in grammar: useless: STR
calc.y: conflicts: 7 shift/reduce
@end example
the location of the input cursor.
@example
-state 0
+State 0
0 $accept: . exp $end
@option{--report=itemset} to list the derived items as well:
@example
-state 0
+State 0
0 $accept: . exp $end
1 exp: . exp '+' exp
In the state 1@dots{}
@example
-state 1
+State 1
5 exp: NUM .
@noindent
the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token
(@samp{$default}), the parser will reduce it. If it was coming from
-state 0, then, after this reduction it will return to state 0, and will
+State 0, then, after this reduction it will return to state 0, and will
jump to state 2 (@samp{exp: go to state 2}).
@example
-state 2
+State 2
0 $accept: exp . $end
1 exp: exp . '+' exp
state}:
@example
-state 3
+State 3
0 $accept: exp $end .
the reader.
@example
-state 4
+State 4
1 exp: exp '+' . exp
exp go to state 8
-state 5
+State 5
2 exp: exp '-' . exp
exp go to state 9
-state 6
+State 6
3 exp: exp '*' . exp
exp go to state 10
-state 7
+State 7
4 exp: exp '/' . exp
1 shift/reduce}:
@example
-state 8
+State 8
1 exp: exp . '+' exp
1 | exp '+' exp .
@option{--report=lookahead}, Bison specifies these lookahead tokens:
@example
-state 8
+State 8
1 exp: exp . '+' exp
1 | exp '+' exp . [$end, '+', '-', '/']
@example
@group
-state 9
+State 9
1 exp: exp . '+' exp
2 | exp . '-' exp
@end group
@group
-state 10
+State 10
1 exp: exp . '+' exp
2 | exp . '-' exp
@end group
@group
-state 11
+State 11
1 exp: exp . '+' exp
2 | exp . '-' exp
@noindent
Observe that state 11 contains conflicts not only due to the lack of
-precedence of @samp{/} with respect to @samp{+}, @samp{-}, and
-@samp{*}, but also because the
-associativity of @samp{/} is not specified.
+precedence of @samp{/} with respect to @samp{+}, @samp{-}, and @samp{*}, but
+also because the associativity of @samp{/} is not specified.
+
+Bison may also produce an HTML version of this output, via an XML file and
+XSLT processing (@pxref{Xml,,Visualizing your parser in multiple formats}).
+
+@c ================================================= Graphical Representation
+@node Graphviz
+@section Visualizing Your Parser
+@cindex dot
+
+As another means to gain better understanding of the shift/reduce
+automaton corresponding to the Bison parser, a DOT file can be generated. Note
+that debugging a real grammar with this is tedious at best, and impractical
+most of the times, because the generated files are huge (the generation of
+a PDF or PNG file from it will take very long, and more often than not it will
+fail due to memory exhaustion). This option was rather designed for beginners,
+to help them understand LR parsers.
+
+This file is generated when the @option{--graph} option is specified
+(@pxref{Invocation, , Invoking Bison}). Its name is made by removing
+@samp{.tab.c} or @samp{.c} from the parser implementation file name, and
+adding @samp{.dot} instead. If the grammar file is @file{foo.y}, the
+Graphviz output file is called @file{foo.dot}. A DOT file may also be
+produced via an XML file and XSLT processing (@pxref{Xml,,Visualizing your
+parser in multiple formats}).
+
+
+The following grammar file, @file{rr.y}, will be used in the sequel:
+
+@example
+%%
+@group
+exp: a ";" | b ".";
+a: "0";
+b: "0";
+@end group
+@end example
+
+The graphical output
+@ifnotinfo
+(see @ref{fig:graph})
+@end ifnotinfo
+is very similar to the textual one, and as such it is easier understood by
+making direct comparisons between them. @xref{Debugging, , Debugging Your
+Parser}, for a detailled analysis of the textual report.
+
+@ifnotinfo
+@float Figure,fig:graph
+@image{figs/example, 430pt}
+@caption{A graphical rendering of the parser.}
+@end float
+@end ifnotinfo
+
+@subheading Graphical Representation of States
+
+The items (pointed rules) for each state are grouped together in graph nodes.
+Their numbering is the same as in the verbose file. See the following points,
+about transitions, for examples
+
+When invoked with @option{--report=lookaheads}, the lookahead tokens, when
+needed, are shown next to the relevant rule between square brackets as a
+comma separated list. This is the case in the figure for the representation of
+reductions, below.
+
+@sp 1
+
+The transitions are represented as directed edges between the current and
+the target states.
+
+@subheading Graphical Representation of Shifts
+
+Shifts are shown as solid arrows, labelled with the lookahead token for that
+shift. The following describes a reduction in the @file{rr.output} file:
+
+@example
+@group
+State 3
+
+ 1 exp: a . ";"
+
+ ";" shift, and go to state 6
+@end group
+@end example
+
+A Graphviz rendering of this portion of the graph could be:
+
+@center @image{figs/example-shift, 100pt}
+
+@subheading Graphical Representation of Reductions
+
+Reductions are shown as solid arrows, leading to a diamond-shaped node
+bearing the number of the reduction rule. The arrow is labelled with the
+appropriate comma separated lookahead tokens. If the reduction is the default
+action for the given state, there is no such label.
+
+This is how reductions are represented in the verbose file @file{rr.output}:
+@example
+State 1
+
+ 3 a: "0" . [";"]
+ 4 b: "0" . ["."]
+
+ "." reduce using rule 4 (b)
+ $default reduce using rule 3 (a)
+@end example
+
+A Graphviz rendering of this portion of the graph could be:
+
+@center @image{figs/example-reduce, 120pt}
+
+When unresolved conflicts are present, because in deterministic parsing
+a single decision can be made, Bison can arbitrarily choose to disable a
+reduction, see @ref{Shift/Reduce, , Shift/Reduce Conflicts}. Discarded actions
+are distinguished by a red filling color on these nodes, just like how they are
+reported between square brackets in the verbose file.
+
+The reduction corresponding to the rule number 0 is the acceptation
+state. It is shown as a blue diamond, labelled ``Acc''.
+
+@subheading Graphical representation of go tos
+
+The @samp{go to} jump transitions are represented as dotted lines bearing
+the name of the rule being jumped to.
+
+@c ================================================= XML
+
+@node Xml
+@section Visualizing your parser in multiple formats
+@cindex xml
+
+Bison supports two major report formats: textual output
+(@pxref{Understanding, ,Understanding Your Parser}) when invoked
+with option @option{--verbose}, and DOT
+(@pxref{Graphviz,, Visualizing Your Parser}) when invoked with
+option @option{--graph}. However,
+another alternative is to output an XML file that may then be, with
+@command{xsltproc}, rendered as either a raw text format equivalent to the
+verbose file, or as an HTML version of the same file, with clickable
+transitions, or even as a DOT. The @file{.output} and DOT files obtained via
+XSLT have no difference whatsoever with those obtained by invoking
+@command{bison} with options @option{--verbose} or @option{--graph}.
+
+The XML file is generated when the options @option{-x} or
+@option{--xml[=FILE]} are specified, see @ref{Invocation,,Invoking Bison}.
+If not specified, its name is made by removing @samp{.tab.c} or @samp{.c}
+from the parser implementation file name, and adding @samp{.xml} instead.
+For instance, if the grammar file is @file{foo.y}, the default XML output
+file is @file{foo.xml}.
+
+Bison ships with a @file{data/xslt} directory, containing XSL Transformation
+files to apply to the XML file. Their names are non-ambiguous:
+
+@table @file
+@item xml2dot.xsl
+Used to output a copy of the DOT visualization of the automaton.
+@item xml2text.xsl
+Used to output a copy of the @samp{.output} file.
+@item xml2xhtml.xsl
+Used to output an xhtml enhancement of the @samp{.output} file.
+@end table
+
+Sample usage (requires @command{xsltproc}):
+@example
+$ bison -x gr.y
+@group
+$ bison --print-datadir
+/usr/local/share/bison
+@end group
+$ xsltproc /usr/local/share/bison/xslt/xml2xhtml.xsl gr.xml >gr.html
+@end example
+
+@c ================================================= Tracing
@node Tracing
@section Tracing Your Parser
@noindent
The previous reduction demonstrates the @code{%printer} directive for
-@code{<val>}: both the token @code{NUM} and the resulting non-terminal
+@code{<val>}: both the token @code{NUM} and the resulting nonterminal
@code{exp} have @samp{1} as value.
@example
A category can be turned off by prefixing its name with @samp{no-}. For
instance, @option{-Wno-yacc} will hide the warnings about
POSIX Yacc incompatibilities.
+
+@item -f [@var{feature}]
+@itemx --feature[=@var{feature}]
+Activate miscellaneous @var{feature}. @var{feature} can be one of:
+@table @code
+@item caret
+@itemx diagnostics-show-caret
+Show caret errors, in a manner similar to GCC's
+@option{-fdiagnostics-show-caret}, or Clang's @option{-fcaret-diagnotics}. The
+location provided with the message is used to quote the corresponding line of
+the source file, underlining the important part of it with carets (^). Here is
+an example, using the following file @file{in.y}:
+
+@example
+%type <ival> exp
+%%
+exp: exp '+' exp @{ $exp = $1 + $2; @};
+@end example
+
+When invoked with @option{-fcaret}, Bison will report:
+
+@example
+@group
+in.y:3.20-23: error: ambiguous reference: '$exp'
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^^
+@end group
+@group
+in.y:3.1-3: refers to: $exp at $$
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^
+@end group
+@group
+in.y:3.6-8: refers to: $exp at $1
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^
+@end group
+@group
+in.y:3.14-16: refers to: $exp at $3
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^^
+@end group
+@group
+in.y:3.32-33: error: $2 of 'exp' has no declared type
+ exp: exp '+' exp @{ $exp = $1 + $2; @};
+ ^^
+@end group
+@end example
+
+@end table
@end table
@noindent
Summary}). Currently supported languages include C, C++, and Java.
@var{language} is case-insensitive.
-This option is experimental and its effect may be modified in future
-releases.
-
@item --locations
Pretend that @code{%locations} was specified. @xref{Decl Summary}.
@table @file
@item position.hh
@itemx location.hh
-The definition of the classes @code{position} and @code{location},
-used for location tracking. @xref{C++ Location Values}.
+The definition of the classes @code{position} and @code{location}, used for
+location tracking. These files are not generated if the @code{%define}
+variable @code{api.location.type} is defined. @xref{C++ Location Values}.
@item stack.hh
An auxiliary class @code{stack} used by the parser.
@c - %define filename_type "const symbol::Symbol"
When the directive @code{%locations} is used, the C++ parser supports
-location tracking, see @ref{Tracking Locations}. Two auxiliary classes
-define a @code{position}, a single point in a file, and a @code{location}, a
-range composed of a pair of @code{position}s (possibly spanning several
-files).
+location tracking, see @ref{Tracking Locations}.
+
+By default, two auxiliary classes define a @code{position}, a single point
+in a file, and a @code{location}, a range composed of a pair of
+@code{position}s (possibly spanning several files). But if the
+@code{%define} variable @code{api.location.type} is defined, then these
+classes will not be generated, and the user defined type will be used.
@tindex uint
In this section @code{uint} is an abbreviation for @code{unsigned int}: in
genuine code only the latter is used.
@menu
-* C++ position:: One point in the source file
-* C++ location:: Two points in the source file
+* C++ position:: One point in the source file
+* C++ location:: Two points in the source file
+* User Defined Location Type:: Required interface for locations
@end menu
@node C++ position
@code{filename} defined, or equal filename/line or column.
@end deftypefun
+@node User Defined Location Type
+@subsubsection User Defined Location Type
+@findex %define api.location.type
+
+Instead of using the built-in types you may use the @code{%define} variable
+@code{api.location.type} to specify your own type:
+
+@example
+%define api.location.type @var{LocationType}
+@end example
+
+The requirements over your @var{LocationType} are:
+@itemize
+@item
+it must be copyable;
+
+@item
+in order to compute the (default) value of @code{@@$} in a reduction, the
+parser basically runs
+@example
+@@$.begin = @@$1.begin;
+@@$.end = @@$@var{N}.end; // The location of last right-hand side symbol.
+@end example
+@noindent
+so there must be copyable @code{begin} and @code{end} members;
+
+@item
+alternatively you may redefine the computation of the default location, in
+which case these members are not required (@pxref{Location Default Action});
+
+@item
+if traces are enabled, then there must exist an @samp{std::ostream&
+ operator<< (std::ostream& o, const @var{LocationType}& s)} function.
+@end itemize
+
+@sp 1
+
+In programs with several C++ parsers, you may also use the @code{%define}
+variable @code{api.location.type} to share a common set of built-in
+definitions for @code{position} and @code{location}. For instance, one
+parser @file{master/parser.yy} might use:
+
+@example
+%defines
+%locations
+%define namespace "master::"
+@end example
+
+@noindent
+to generate the @file{master/position.hh} and @file{master/location.hh}
+files, reused by other parsers as follows:
+
+@example
+%define api.location.type "master::location"
+%code requires @{ #include <master/location.hh> @}
+@end example
+
@node C++ Parser Interface
@subsection C++ Parser Interface
@c - define parser_class_name
@deftypemethod {parser} {int} parse ()
Run the syntactic analysis, and return 0 on success, 1 otherwise.
+
+@cindex exceptions
+The whole function is wrapped in a @code{try}/@code{catch} block, so that
+when an exception is thrown, the @code{%destructor}s are called to release
+the lookahead symbol, and the symbols pushed on the stack.
@end deftypemethod
@deftypemethod {parser} {std::ostream&} debug_stream ()
The parser invokes the scanner by calling @code{yylex}. Contrary to C
parsers, C++ parsers are always pure: there is no point in using the
-@code{%define api.pure} directive. Therefore the interface is as follows.
+@code{%define api.pure full} directive. Therefore the interface is as follows.
@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...)
Return the next token. Its type is the return value, its semantic
// Tell Flex the lexer's prototype ...
# define YY_DECL \
yy::calcxx_parser::token_type \
- yylex (yy::calcxx_parser::semantic_type *yylval, \
- yy::calcxx_parser::location_type *yylloc, \
+ yylex (yy::calcxx_parser::semantic_type* yylval, \
+ yy::calcxx_parser::location_type* yylloc, \
calcxx_driver& driver)
// ... and declare it for the parser's sake.
YY_DECL;
%@{
typedef yy::calcxx_parser::token token;
%@}
- /* Convert ints to the actual type of tokens. */
-[-+*/] return yy::calcxx_parser::token_type (yytext[0]);
-":=" return token::ASSIGN;
-@{int@} @{
- errno = 0;
- long n = strtol (yytext, NULL, 10);
- if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
- driver.error (*yylloc, "integer is out of range");
- yylval->ival = n;
- return token::NUMBER;
-@}
-@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER;
-. driver.error (*yylloc, "invalid character");
+ /* Convert ints to the actual type of tokens. */
+[-+*/] return yy::calcxx_parser::token_type (yytext[0]);
+
+":=" return token::ASSIGN;
+
+@group
+@{int@} @{
+ errno = 0;
+ long n = strtol (yytext, NULL, 10);
+ if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
+ driver.error (*yylloc, "integer is out of range");
+ yylval->ival = n;
+ return token::NUMBER;
+ @}
+@end group
+
+@group
+@{id@} @{
+ yylval->sval = new std::string (yytext);
+ return token::IDENTIFIER;
+ @}
+@end group
+
+. driver.error (*yylloc, "invalid character");
%%
@end example
Contrary to C parsers, Java parsers do not use global variables; the
state of the parser is always local to an instance of the parser class.
Therefore, all Java parsers are ``pure'', and the @code{%pure-parser}
-and @code{%define api.pure} directives does not do anything when used in
+and @code{%define api.pure full} directives does not do anything when used in
Java.
Push parsers are currently unsupported in Java and @code{%define
defines a class representing a @dfn{location}, a range composed of a pair of
positions (possibly spanning several files). The location class is an inner
class of the parser; the name is @code{Location} by default, and may also be
-renamed using @code{%define location_type "@var{class-name}"}.
+renamed using @code{%define api.location.type "@var{class-name}"}.
The location class treats the position as a completely opaque value.
By default, the class name is @code{Position}, but this can be changed
-with @code{%define position_type "@var{class-name}"}. This class must
+with @code{%define api.position.type "@var{class-name}"}. This class must
be supplied by the user.
@deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg})
This method is defined by the user to emit an error message. The first
parameter is omitted if location tracking is not active. Its type can be
-changed using @code{%define location_type "@var{class-name}".}
+changed using @code{%define api.location.type "@var{class-name}".}
@end deftypemethod
@deftypemethod {Lexer} {int} yylex ()
@code{yylex} returned, and the first position beyond it. These
methods are not needed unless location tracking is active.
-The return type can be changed using @code{%define position_type
+The return type can be changed using @code{%define api.position.type
"@var{class-name}".}
@end deftypemethod
@xref{Java Scanner Interface}.
@end deffn
-@deffn {Directive} {%define location_type} "@var{class}"
+@deffn {Directive} {%define api.location.type} "@var{class}"
The name of the class used for locations (a range between two
positions). This class is generated as an inner class of the parser
class by @command{bison}. Default is @code{Location}.
+Formerly named @code{location_type}.
@xref{Java Location Values}.
@end deffn
@xref{Java Bison Interface}.
@end deffn
-@deffn {Directive} {%define position_type} "@var{class}"
+@deffn {Directive} {%define api.position.type} "@var{class}"
The name of the class used for positions. This class must be supplied by
the user. Default is @code{Position}.
+Formerly named @code{position_type}.
@xref{Java Location Values}.
@end deffn
@quotation
My parser includes support for an @samp{#include}-like feature, in
which case I run @code{yyparse} from @code{yyparse}. This fails
-although I did specify @samp{%define api.pure}.
+although I did specify @samp{%define api.pure full}.
@end quotation
These problems typically come not from Bison itself, but from
@end deffn
@deffn {Variable} @@@var{n}
+@deffnx {Symbol} @@@var{n}
In an action, the location of the @var{n}-th symbol of the right-hand side
of the rule. @xref{Tracking Locations}.
+
+In a grammar, the Bison-generated nonterminal symbol for a mid-rule action
+with a semantical value. @xref{Mid-Rule Action Translation}.
@end deffn
@deffn {Variable} @@@var{name}
-In an action, the location of a symbol addressed by name. @xref{Tracking
-Locations}.
+@deffnx {Variable} @@[@var{name}]
+In an action, the location of a symbol addressed by @var{name}.
+@xref{Tracking Locations}.
@end deffn
-@deffn {Variable} @@[@var{name}]
-In an action, the location of a symbol addressed by name. @xref{Tracking
-Locations}.
+@deffn {Symbol} $@@@var{n}
+In a grammar, the Bison-generated nonterminal symbol for a mid-rule action
+with no semantical value. @xref{Mid-Rule Action Translation}.
@end deffn
@deffn {Variable} $$
@end deffn
@deffn {Variable} $@var{name}
-In an action, the semantic value of a symbol addressed by name.
-@xref{Actions}.
-@end deffn
-
-@deffn {Variable} $[@var{name}]
-In an action, the semantic value of a symbol addressed by name.
+@deffnx {Variable} $[@var{name}]
+In an action, the semantic value of a symbol addressed by @var{name}.
@xref{Actions}.
@end deffn
Grammar}.
@end deffn
-@deffn {Construct} /*@dots{}*/
-Comment delimiters, as in C.
+@deffn {Construct} /* @dots{} */
+@deffnx {Construct} // @dots{}
+Comments, as in C/C++.
@end deffn
@deffn {Delimiter} :
@item Accepting state
A state whose only action is the accept action.
The accepting state is thus a consistent state.
-@xref{Understanding,,}.
+@xref{Understanding, ,Understanding Your Parser}.
@item Backus-Naur Form (BNF; also called ``Backus Normal Form'')
Formal method of specifying context-free grammars originally proposed
@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's
@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy ints
@c LocalWords: Scannerless ispell american ChangeLog smallexample CSTYPE CLTYPE
-@c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate
+@c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate LocationType
@c LocalWords: parsers parser's
@c LocalWords: associativity subclasses precedences unresolvable runnable
@c LocalWords: allocators subunit initializations unreferenced untyped