This manual (@value{UPDATED}) is for @acronym{GNU} Bison (version
@value{VERSION}), the @acronym{GNU} parser generator.
-Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
-1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free
+Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
+2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Free
Software Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the @acronym{GNU} Free Documentation License,
-Version 1.2 or any later version published by the Free Software
+Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, with the Front-Cover texts
being ``A @acronym{GNU} Manual,'' and with the Back-Cover Texts as in
(a) below. A copy of the license is included in the section entitled
* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
* GLR Semantic Actions:: Deferred semantic actions have special concerns.
+* Semantic Predicates:: Controlling a parse with arbitrary computations.
* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
Examples
* Mid-Rule Actions:: Most actions go at the end of a rule.
This says when, why and how to use the exceptional
action in the middle of a rule.
+* Named References:: Using named references in actions.
Tracking Locations
@cindex introduction
@dfn{Bison} is a general-purpose parser generator that converts an
-annotated context-free grammar into a deterministic or @acronym{GLR}
-parser employing @acronym{LALR}(1), @acronym{IELR}(1), or canonical
-@acronym{LR}(1) parser tables.
+annotated context-free grammar into a deterministic @acronym{LR} or
+generalized @acronym{LR} (@acronym{GLR}) parser employing
+@acronym{LALR}(1), @acronym{IELR}(1), or canonical @acronym{LR}(1)
+parser tables.
Once you are proficient with Bison, you can use it to develop a wide
range of language parsers, from those used in simple desk calculators to
complex programming languages.
* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
* GLR Semantic Actions:: Deferred semantic actions have special concerns.
+* Semantic Predicates:: Controlling a parse with arbitrary computations.
* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
@end menu
Also, see @ref{Location Default Action, ,Default Action for Locations}, which
describes a special usage of @code{YYLLOC_DEFAULT} in @acronym{GLR} parsers.
+@node Semantic Predicates
+@subsection Controlling a Parse with Arbitrary Predicates
+@findex %?
+@cindex Semantic predicates in @acronym{GLR} parsers
+
+In addition to the @code{%dprec} and @code{%merge} directives,
+@acronym{GLR} parsers
+allow you to reject parses on the basis of arbitrary computations executed
+in user code, without having Bison treat this rejection as an error
+if there are alternative parses. (This feature is experimental and may
+evolve. We welcome user feedback.) For example,
+
+@smallexample
+widget :
+ %?@{ new_syntax @} "widget" id new_args @{ $$ = f($3, $4); @}
+ | %?@{ !new_syntax @} "widget" id old_args @{ $$ = f($3, $4); @}
+ ;
+@end smallexample
+
+@noindent
+is one way to allow the same parser to handle two different syntaxes for
+widgets. The clause preceded by @code{%?} is treated like an ordinary
+action, except that its text is treated as an expression and is always
+evaluated immediately (even when in nondeterministic mode). If the
+expression yields 0 (false), the clause is treated as a syntax error,
+which, in a nondeterministic parser, causes the stack in which it is reduced
+to die. In a deterministic parser, it acts like YYERROR.
+
+As the example shows, predicates otherwise look like semantic actions, and
+therefore you must be take them into account when determining the numbers
+to use for denoting the semantic values of right-hand side symbols.
+Predicate actions, however, have no defined value, and may not be given
+labels.
+
+There is a subtle difference between semantic predicates and ordinary
+actions in nondeterministic mode, since the latter are deferred.
+For example, we could try to rewrite the previous example as
+
+@smallexample
+widget :
+ @{ if (!new_syntax) YYERROR; @} "widget" id new_args @{ $$ = f($3, $4); @}
+ | @{ if (new_syntax) YYERROR; @} "widget" id old_args @{ $$ = f($3, $4); @}
+ ;
+@end smallexample
+
+@noindent
+(reversing the sense of the predicate tests to cause an error when they are
+false). However, this
+does @emph{not} have the same effect if @code{new_args} and @code{old_args}
+have overlapping syntax.
+Since the mid-rule actions testing @code{new_syntax} are deferred,
+a @acronym{GLR} parser first encounters the unresolved ambiguous reduction
+for cases where @code{new_args} and @code{old_args} recognize the same string
+@emph{before} performing the tests of @code{new_syntax}. It therefore
+reports an error.
+
+Finally, be careful in writing predicates: deferred actions have not been
+evaluated, so that using them in a predicate will have undefined effects.
+
@node Compiler Requirements
@subsection Considerations when Compiling @acronym{GLR} Parsers
@cindex @code{inline}
* Mid-Rule Actions:: Most actions go at the end of a rule.
This says when, why and how to use the exceptional
action in the middle of a rule.
+* Named References:: Using named references in actions.
@end menu
@node Value Type
@cindex action
@vindex $$
@vindex $@var{n}
+@vindex $@var{name}
+@vindex $[@var{name}]
An action accompanies a syntactic rule and contains C code to be executed
each time an instance of that rule is recognized. The task of most actions
The C code in an action can refer to the semantic values of the components
matched by the rule with the construct @code{$@var{n}}, which stands for
the value of the @var{n}th component. The semantic value for the grouping
-being constructed is @code{$$}. Bison translates both of these
+being constructed is @code{$$}. In addition, the semantic values of
+symbols can be accessed with the named references construct
+@code{$@var{name}} or @code{$[@var{name}]}. Bison translates both of these
constructs into expressions of the appropriate type when it copies the
-actions into the parser file. @code{$$} is translated to a modifiable
+actions into the parser file. @code{$$} (or @code{$@var{name}}, when it
+stands for the current grouping) is translated to a modifiable
lvalue, so it can be assigned to.
Here is a typical example:
@end group
@end example
+Or, in terms of named references:
+
+@example
+@group
+exp[result]: @dots{}
+ | exp[left] '+' exp[right]
+ @{ $result = $left + $right; @}
+@end group
+@end example
+
@noindent
This rule constructs an @code{exp} from two smaller @code{exp} groupings
connected by a plus-sign token. In the action, @code{$1} and @code{$3}
+(@code{$left} and @code{$right})
refer to the semantic values of the two component @code{exp} groupings,
which are the first and third symbols on the right hand side of the rule.
-The sum is stored into @code{$$} so that it becomes the semantic value of
+The sum is stored into @code{$$} (@code{$result}) so that it becomes the
+semantic value of
the addition-expression just recognized by the rule. If there were a
useful semantic value associated with the @samp{+} token, it could be
referred to as @code{$2}.
+@xref{Named References,,Using Named References}, for more information
+about using the named references construct.
+
Note that the vertical-bar character @samp{|} is really a rule
separator, and actions are attached to a single rule. This is a
difference with tools like Flex, for which @samp{|} stands for either
Now Bison can execute the action in the rule for @code{subroutine} without
deciding which rule for @code{compound} it will eventually use.
+@node Named References
+@subsection Using Named References
+@cindex named references
+
+While every semantic value can be accessed with positional references
+@code{$@var{n}} and @code{$$}, it's often much more convenient to refer to
+them by name. First of all, original symbol names may be used as named
+references. For example:
+
+@example
+@group
+invocation: op '(' args ')'
+ @{ $invocation = new_invocation ($op, $args, @@invocation); @}
+@end group
+@end example
+
+@noindent
+The positional @code{$$}, @code{@@$}, @code{$n}, and @code{@@n} can be
+mixed with @code{$name} and @code{@@name} arbitrarily. For example:
+
+@example
+@group
+invocation: op '(' args ')'
+ @{ $$ = new_invocation ($op, $args, @@$); @}
+@end group
+@end example
+
+@noindent
+However, sometimes regular symbol names are not sufficient due to
+ambiguities:
+
+@example
+@group
+exp: exp '/' exp
+ @{ $exp = $exp / $exp; @} // $exp is ambiguous.
+
+exp: exp '/' exp
+ @{ $$ = $1 / $exp; @} // One usage is ambiguous.
+
+exp: exp '/' exp
+ @{ $$ = $1 / $3; @} // No error.
+@end group
+@end example
+
+@noindent
+When ambiguity occurs, explicitly declared names may be used for values and
+locations. Explicit names are declared as a bracketed name after a symbol
+appearance in rule definitions. For example:
+@example
+@group
+exp[result]: exp[left] '/' exp[right]
+ @{ $result = $left / $right; @}
+@end group
+@end example
+
+@noindent
+Explicit names may be declared for RHS and for LHS symbols as well. In order
+to access a semantic value generated by a mid-rule action, an explicit name
+may also be declared by putting a bracketed name after the closing brace of
+the mid-rule action code:
+@example
+@group
+exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
+ @{ $res = $left + $right; @}
+@end group
+@end example
+
+@noindent
+
+In references, in order to specify names containing dots and dashes, an explicit
+bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
+@example
+@group
+if-stmt: IF '(' expr ')' THEN then.stmt ';'
+ @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
+@end group
+@end example
+
+It often happens that named references are followed by a dot, dash or other
+C punctuation marks and operators. By default, Bison will read
+@code{$name.suffix} as a reference to symbol value @code{$name} followed by
+@samp{.suffix}, i.e., an access to the @samp{suffix} field of the semantic
+value. In order to force Bison to recognize @code{name.suffix} in its entirety
+as the name of a semantic value, bracketed syntax @code{$[name.suffix]}
+must be used.
+
+
@node Locations
@section Tracking Locations
@cindex location
@cindex actions, location
@vindex @@$
@vindex @@@var{n}
+@vindex @@@var{name}
+@vindex @@[@var{name}]
Actions are not only useful for defining language semantics, but also for
describing the behavior of the output parser with locations.
@code{@@@var{n}}, while the location of the left hand side grouping is
@code{@@$}.
+In addition, the named references construct @code{@@@var{name}} and
+@code{@@[@var{name}]} may also be used to address the symbol locations.
+@xref{Named References,,Using Named References}, for more information
+about using the named references construct.
+
Here is a basic example using the default data type for locations:
@example
@c - initial action
The C++ deterministic parser is selected using the skeleton directive,
-@samp{%skeleton "lalr1.c"}, or the synonymous command-line option
-@option{--skeleton=lalr1.c}.
+@samp{%skeleton "lalr1.cc"}, or the synonymous command-line option
+@option{--skeleton=lalr1.cc}.
@xref{Decl Summary}.
When run, @command{bison} will create several entities in the @samp{yy}
The types for semantic values and locations (if enabled).
@end defcv
+@defcv {Type} {parser} {token}
+A structure that contains (only) the definition of the tokens as the
+@code{yytokentype} enumeration. To refer to the token @code{FOO}, the
+scanner should use @code{yy::parser::token::FOO}. The scanner can use
+@samp{typedef yy::parser::token token;} to ``import'' the token enumeration
+(@pxref{Calc++ Scanner}).
+@end defcv
+
@defcv {Type} {parser} {syntax_error}
This class derives from @code{std::runtime_error}. Throw instances of it
from user actions to raise parse errors. This is equivalent with first
Therefore the interface is as follows.
-@deftypemethod {parser} {int} yylex (semantic_type& @var{yylval}, location_type& @var{yylloc}, @var{type1} @var{arg1}, ...)
-@deftypemethodx {parser} {int} yylex (semantic_type& @var{yylval}, @var{type1} @var{arg1}, ...)
+@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...)
+@deftypemethodx {parser} {int} yylex (semantic_type* @var{yylval}, @var{type1} @var{arg1}, ...)
Return the next token. Its type is the return value, its semantic value and
location (if enabled) being @var{yylval} and @var{yylloc}. Invocations of
@samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments.
| exp "-" exp @{ $$ = $1 - $3; @}
| exp "*" exp @{ $$ = $1 * $3; @}
| exp "/" exp @{ $$ = $1 / $3; @}
-| "(" exp ")" @{ std::swap($$, $2); @}
+| "(" exp ")" @{ std::swap ($$, $2); @}
| "identifier" @{ $$ = driver.variables[$1]; @}
-| "number" @{ std::swap($$, $1); @};
+| "number" @{ std::swap ($$, $1); @};
%%
@end example
@xref{Error Recovery}.
@end deffn
-@deffn {Statement} {return YYFAIL;}
-Print an error message and start error recovery.
-@xref{Error Recovery}.
-@end deffn
-
@deftypefn {Function} {boolean} recovering ()
Return whether error recovery is being done. In this state, the parser
reads token until it reaches a known state, and then restarts normal
side of the rule. @xref{Locations, , Locations Overview}.
@end deffn
+@deffn {Variable} @@@var{name}
+In an action, the location of a symbol addressed by name.
+@xref{Locations, , Locations Overview}.
+@end deffn
+
+@deffn {Variable} @@[@var{name}]
+In an action, the location of a symbol addressed by name.
+@xref{Locations, , Locations Overview}.
+@end deffn
+
@deffn {Variable} $$
In an action, the semantic value of the left-hand side of the rule.
@xref{Actions}.
right-hand side of the rule. @xref{Actions}.
@end deffn
+@deffn {Variable} $@var{name}
+In an action, the semantic value of a symbol addressed by name.
+@xref{Actions}.
+@end deffn
+
+@deffn {Variable} $[@var{name}]
+In an action, the semantic value of a symbol addressed by name.
+@xref{Actions}.
+@end deffn
+
@deffn {Delimiter} %%
Delimiter used to separate the grammar rule section from the
Bison declarations section or the epilogue.
Grammar}.
@end deffn
+@deffn {Directive} %?@{@var{expression}@}
+Predicate actions. This is a type of action clause that may appear in
+rules. The expression is evaluated, and if false, causes a syntax error. In
+@acronym{GLR} parsers during nondeterministic operation,
+this silently causes an alternative parse to die. During deterministic
+operation, it is the same as the effect of YYERROR.
+@xref{Semantic Predicates}.
+
+This feature is experimental.
+More user feedback will help to determine whether it should become a permanent
+feature.
+@end deffn
+
@deffn {Construct} /*@dots{}*/
Comment delimiters, as in C.
@end deffn
@bye
-@c Local Variables:
-@c fill-column: 76
-@c End:
-
@c LocalWords: texinfo setfilename settitle setchapternewpage finalout texi FSF
@c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex FSF's
@c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry Naur
@c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv
@c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url
@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos
-@c LocalWords: getLVal defvar YYFAIL deftypefn deftypefnx gotos msgfmt
+@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt
@c LocalWords: subdirectory Solaris nonassociativity
+
+@c Local Variables:
+@c ispell-dictionary: "american"
+@c fill-column: 76
+@c End: