@cindex @acronym{GLR} parsing
@cindex generalized @acronym{LR} (@acronym{GLR}) parsing
@cindex ambiguous grammars
-@cindex non-deterministic parsing
+@cindex nondeterministic parsing
Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning
roughly that the next grammar rule to apply at any point in the input is
(called a @dfn{look-ahead}) of the remaining input. A context-free
grammar can be @dfn{ambiguous}, meaning that there are multiple ways to
apply the grammar rules to get the same inputs. Even unambiguous
-grammars can be @dfn{non-deterministic}, meaning that no fixed
+grammars can be @dfn{nondeterministic}, meaning that no fixed
look-ahead always suffices to determine the next grammar rule to apply.
With the proper declarations, Bison is also able to parse these more
general context-free grammars, using a technique known as @acronym{GLR}
It is easy to add new operators to the infix calculator as long as they are
only single-character literals. The lexical analyzer @code{yylex} passes
-back all nonnumber characters as tokens, so new grammar rules suffice for
+back all nonnumeric characters as tokens, so new grammar rules suffice for
adding a new operator. But we want something more flexible: built-in
functions whose syntax has this form:
The function @code{yylex} must now recognize variables, numeric values, and
the single-character arithmetic operators. Strings of alphanumeric
-characters with a leading non-digit are recognized as either variables or
+characters with a leading letter are recognized as either variables or
functions depending on what the symbol table says about them.
The string is passed to @code{getsym} for look up in the symbol table. If
in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
If you want to write a grammar that is portable to any Standard C
-host, you must use only non-null character tokens taken from the basic
+host, you must use only nonnull character tokens taken from the basic
execution character set of Standard C@. This set consists of the ten
digits, the 52 lower- and upper-case English letters, and the
characters in the following C-language string:
%parse-param @{ char const *file_name @};
%initial-action
@{
- @@$.begin.filename = @@$.end.filename = file_name;
+ @@$.initialize (file_name);
@};
@end example
@cindex freeing discarded symbols
@findex %destructor
-Some symbols can be discarded by the parser. During error recovery
-(@pxref{Error Recovery}), symbols already pushed on the stack and tokens
-coming from the rest of the file are discarded until the parser falls on
-its feet. If the parser runs out of memory, all the symbols on the
-stack must be discarded. Even if the parser succeeds, it must discard
-the start symbol.
+During error recovery (@pxref{Error Recovery}), symbols already pushed
+on the stack and tokens coming from the rest of the file are discarded
+until the parser falls on its feet. If the parser runs out of memory,
+or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the
+symbols on the stack must be discarded. Even if the parser succeeds, it
+must discard the start symbol.
When discarded symbols convey heap based information, this memory is
lost. While this behavior can be tolerable for batch parsers, such as
in traditional compilers, it is unacceptable for programs like shells or
protocol implementations that may parse and execute indefinitely.
-The @code{%destructor} directive defines code that
-is called when a symbol is discarded.
+The @code{%destructor} directive defines code that is called when a
+symbol is automatically discarded.
@deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
@findex %destructor
with the discarded symbol. The additional parser parameters are also
available (@pxref{Parser Function, , The Parser Function
@code{yyparse}}).
-
-@strong{Warning:} as of Bison 2.1, this feature is still
-experimental, as there has not been enough user feedback. In particular,
-the syntax might still change.
@end deffn
For instance:
guarantees that when a @code{STRING} or a @code{string} is discarded,
its associated memory will be freed.
-Note that in the future, Bison might also consider that right hand side
-members that are not mentioned in the action can be destroyed. For
-instance, in:
-
-@smallexample
-comment: "/*" STRING "*/";
-@end smallexample
-
-@noindent
-the parser is entitled to destroy the semantic value of the
-@code{string}. Of course, this will not apply to the default action;
-compare:
-
-@smallexample
-typeless: string; // $$ = $1 does not apply; $1 is destroyed.
-typefull: string; // $$ = $1 applies, $1 is not destroyed.
-@end smallexample
-
@sp 1
@cindex discarded symbols
@item
incoming terminals during the second phase of error recovery,
@item
-the current look-ahead and the entire stack when the parser aborts
-(either via an explicit call to @code{YYABORT}, or as a consequence of
-a failed error recovery or of memory exhaustion), and
+the current look-ahead and the entire stack (except the current
+right-hand side symbols) when the parser returns immediately, and
@item
the start symbol, when the parser succeeds.
@end itemize
+The parser can @dfn{return immediately} because of an explicit call to
+@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory
+exhaustion.
+
+Right-hand size symbols of a rule that explicitly triggers a syntax
+error via @code{YYERROR} are not discarded automatically. As a rule
+of thumb, destructors are invoked only when user actions cannot manage
+the memory.
@node Expect Decl
@subsection Suppressing Conflict Warnings
%expect @var{n}
@end example
-Here @var{n} is a decimal integer. The declaration says there should be
-no warning if there are @var{n} shift/reduce conflicts and no
-reduce/reduce conflicts. The usual warning is
-given if there are either more or fewer conflicts, or if there are any
-reduce/reduce conflicts.
+Here @var{n} is a decimal integer. The declaration says there should
+be @var{n} shift/reduce conflicts and no reduce/reduce conflicts.
+Bison reports an error if the number of shift/reduce conflicts differs
+from @var{n}, or if there are any reduce/reduce conflicts.
-For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more serious,
-and should be eliminated entirely. Bison will always report
-reduce/reduce conflicts for these parsers. With @acronym{GLR} parsers, however,
-both shift/reduce and reduce/reduce are routine (otherwise, there
-would be no need to use @acronym{GLR} parsing). Therefore, it is also possible
-to specify an expected number of reduce/reduce conflicts in @acronym{GLR}
-parsers, using the declaration:
+For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more
+serious, and should be eliminated entirely. Bison will always report
+reduce/reduce conflicts for these parsers. With @acronym{GLR}
+parsers, however, both kinds of conflicts are routine; otherwise,
+there would be no need to use @acronym{GLR} parsing. Therefore, it is
+also possible to specify an expected number of reduce/reduce conflicts
+in @acronym{GLR} parsers, using the declaration:
@example
%expect-rr @var{n}
@item
Add an @code{%expect} declaration, copying the number @var{n} from the
-number which Bison printed.
+number which Bison printed. With @acronym{GLR} parsers, add an
+@code{%expect-rr} declaration as well.
@end itemize
-Now Bison will stop annoying you if you do not change the number of
-conflicts, but it will warn you again if changes in the grammar result
-in more or fewer conflicts.
+Now Bison will warn you if you introduce an unexpected conflict, but
+will keep silent otherwise.
@node Start Decl
@subsection The Start-Symbol
A @dfn{reentrant} program is one which does not alter in the course of
execution; in other words, it consists entirely of @dfn{pure} (read-only)
code. Reentrancy is important whenever asynchronous execution is possible;
-for example, a non-reentrant program may not be safe to call from a signal
-handler. In systems with multiple threads of control, a non-reentrant
+for example, a nonreentrant program may not be safe to call from a signal
+handler. In systems with multiple threads of control, a nonreentrant
program must be called only within interlocks.
Normally, Bison generates a parser which is not reentrant. This is
@subsection Semantic Values of Tokens
@vindex yylval
-In an ordinary (non-reentrant) parser, the semantic value of the token must
+In an ordinary (nonreentrant) parser, the semantic value of the token must
be stored into the global variable @code{yylval}. When you are using
just one data type for semantic values, @code{yylval} has that type.
Thus, if the type is @code{int} (the default), you might write this in
@cindex @acronym{GLR} parsing
@cindex generalized @acronym{LR} (@acronym{GLR}) parsing
@cindex ambiguous grammars
-@cindex non-deterministic parsing
+@cindex nondeterministic parsing
Bison produces @emph{deterministic} parsers that choose uniquely
when to reduce and which reduction to apply
context-free grammar in cubic worst-case time. However, Bison currently
uses a simpler data structure that requires time proportional to the
length of the input times the maximum number of stacks required for any
-prefix of the input. Thus, really ambiguous or non-deterministic
+prefix of the input. Thus, really ambiguous or nondeterministic
grammars can require exponential time and space to process. Such badly
behaving examples, however, are not generally of practical interest.
-Usually, non-determinism in a grammar is local---the parser is ``in
+Usually, nondeterminism in a grammar is local---the parser is ``in
doubt'' only for a few tokens at a time. Therefore, the current data
structure should generally be adequate. On @acronym{LALR}(1) portions of a
grammar, in particular, it is only slightly slower than with the default
declare and define the parser class in the namespace @code{yy}. The
class name defaults to @code{parser}, but may be changed using
@samp{%define "parser_class_name" "@var{name}"}. The interface of
-this class is detailled below. It can be extended using the
+this class is detailed below. It can be extended using the
@code{%parse-param} feature: its semantics is slightly changed since
it describes an additional member of the parser class, and an
additional argument for its constructor.
@deftypemethod {parser} {debug_level_type} debug_level ()
@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l})
Get or set the tracing level. Currently its value is either 0, no trace,
-or non-zero, full tracing.
+or nonzero, full tracing.
@end deftypemethod
@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m})
@subsection Calc++ --- C++ Calculator
Of course the grammar is dedicated to arithmetics, a single
-expression, possibily preceded by variable assignments. An
+expression, possibly preceded by variable assignments. An
environment containing possibly predefined variables such as
@code{one} and @code{two}, is exchanged with the parser. An example
of valid input follows.
unit: assignments exp @{ driver.result = $2; @};
assignments: assignments assignment @{@}
- | /* Nothing. */ @{@};
+ | /* Nothing. */ @{@};
assignment: "identifier" ":=" exp @{ driver.variables[*$1] = $3; @};
@end example
@noindent
-The following paragraph suffices to track locations acurately. Each
+The following paragraph suffices to track locations accurately. Each
time @code{yylex} is invoked, the begin position is moved onto the end
position. Then when a pattern is matched, the end position is
advanced of its width. In case it matched ends of lines, the end
The rules are simple, just note the use of the driver to report errors.
It is convenient to use a typedef to shorten
@code{yy::calcxx_parser::token::identifier} into
-@code{token::identifier} for isntance.
+@code{token::identifier} for instance.
@comment file: calc++-scanner.ll
@example
@end deffn
@deffn {Directive} %nonassoc
-Bison declaration to assign non-associativity to token(s).
+Bison declaration to assign nonassociativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
@end deffn