This manual (@value{UPDATED}) is for @acronym{GNU} Bison (version
@value{VERSION}), the @acronym{GNU} parser generator.
-Copyright @copyright{} 1988-1993, 1995, 1998-2010 Free Software
-Foundation, Inc.
+Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
+2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Free
+Software Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the @acronym{GNU} Free Documentation License,
-Version 1.2 or any later version published by the Free Software
+Version 1.3 or any later version published by the Free Software
Foundation; with no Invariant Sections, with the Front-Cover texts
being ``A @acronym{GNU} Manual,'' and with the Back-Cover Texts as in
(a) below. A copy of the license is included in the section entitled
restrictions of @acronym{LALR}(1), which is hard to explain simply.
@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for
more information on this.
-To escape these additional restrictions, you can request
-@acronym{IELR}(1) or canonical @acronym{LR}(1) parser tables.
+As an experimental feature, you can escape these additional restrictions by
+requesting @acronym{IELR}(1) or canonical @acronym{LR}(1) parser tables.
@xref{Decl Summary,,lr.type}, to learn how.
@cindex @acronym{GLR} parsing
@code{%expect-rr} declaration as well.
@end itemize
-Now Bison will warn you if you introduce an unexpected conflict, but
-will keep silent otherwise.
+Now Bison will report an error if you introduce an unexpected conflict,
+but will keep silent otherwise.
@node Start Decl
@subsection The Start-Symbol
@itemize @bullet
@item Language(s): C (deterministic parsers only)
-@item Purpose: Requests a pull parser, a push parser, or both.
+@item Purpose: Request a pull parser, a push parser, or both.
@xref{Push Decl, ,A Push Parser}.
(The current push parsing interface is experimental and may evolve.
More user feedback will help to stabilize it.)
@findex %define lr.default-reductions
@cindex delayed syntax errors
@cindex syntax errors delayed
+@cindex @acronym{LAC}
+@findex %nonassoc
@itemize @bullet
@item Language(s): all
-@item Purpose: Specifies the kind of states that are permitted to
+@item Purpose: Specify the kind of states that are permitted to
contain default reductions.
-That is, in such a state, Bison declares the reduction with the largest
-lookahead set to be the default reduction and then removes that
+That is, in such a state, Bison selects the reduction with the largest
+lookahead set to be the default parser action and then removes that
lookahead set.
-The advantages of default reductions are discussed below.
-The disadvantage is that, when the generated parser encounters a
-syntactically unacceptable token, the parser might then perform
-unnecessary default reductions before it can detect the syntax error.
-
-(This feature is experimental.
+(The ability to specify where default reductions should be used is
+experimental.
More user feedback will help to stabilize it.)
@item Accepted Values:
@itemize
@item @code{all}.
-For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl
-Summary,,lr.type}) by default, all states are permitted to contain
-default reductions.
-The advantage is that parser table sizes can be significantly reduced.
-The reason Bison does not by default attempt to address the disadvantage
-of delayed syntax error detection is that this disadvantage is already
-inherent in @acronym{LALR} and @acronym{IELR} parser tables.
-That is, unlike in a canonical @acronym{LR} state, the lookahead sets of
-reductions in an @acronym{LALR} or @acronym{IELR} state can contain
-tokens that are syntactically incorrect for some left contexts.
+This is the traditional Bison behavior.
+The main advantage is a significant decrease in the size of the parser
+tables.
+The disadvantage is that, when the generated parser encounters a
+syntactically unacceptable token, the parser might then perform
+unnecessary default reductions before it can detect the syntax error.
+Such delayed syntax error detection is usually inherent in
+@acronym{LALR} and @acronym{IELR} parser tables anyway due to
+@acronym{LR} state merging (@pxref{Decl Summary,,lr.type}).
+Furthermore, the use of @code{%nonassoc} can contribute to delayed
+syntax error detection even in the case of canonical @acronym{LR}.
+As an experimental feature, delayed syntax error detection can be
+overcome in all cases by enabling @acronym{LAC} (@pxref{Decl
+Summary,,parse.lac}, for details, including a discussion of the effects
+of delayed syntax error detection).
@item @code{consistent}.
@cindex consistent states
A consistent state is a state that has only one possible action.
If that action is a reduction, then the parser does not need to request
a lookahead token from the scanner before performing that action.
-However, the parser only recognizes the ability to ignore the lookahead
-token when such a reduction is encoded as a default reduction.
-Thus, if default reductions are permitted in and only in consistent
-states, then a canonical @acronym{LR} parser reports a syntax error as
-soon as it @emph{needs} the syntactically unacceptable token from the
-scanner.
+However, the parser recognizes the ability to ignore the lookahead token
+in this way only when such a reduction is encoded as a default
+reduction.
+Thus, if default reductions are permitted only in consistent states,
+then a canonical @acronym{LR} parser that does not employ
+@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the
+syntactically unacceptable token from the scanner.
@item @code{accepting}.
@cindex accepting state
-By default, the only default reduction permitted in a canonical
-@acronym{LR} parser is the accept action in the accepting state, which
-the parser reaches only after reading all tokens from the input.
-Thus, the default canonical @acronym{LR} parser reports a syntax error
-as soon as it @emph{reaches} the syntactically unacceptable token
-without performing any extra reductions.
+In the accepting state, the default reduction is actually the accept
+action.
+In this case, a canonical @acronym{LR} parser that does not employ
+@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the
+syntactically unacceptable token in the input.
+That is, it does not perform any extra reductions.
@end itemize
@item Default Value:
@itemize @bullet
@item Language(s): all
-@item Purpose: Requests that Bison allow unreachable parser states to remain in
-the parser tables.
+@item Purpose: Request that Bison allow unreachable parser states to
+remain in the parser tables.
Bison considers a state to be unreachable if there exists no sequence of
transitions from the start state to that state.
A state can become unreachable during conflict resolution if Bison disables a
@itemize @bullet
@item Language(s): all
-@item Purpose: Specifies the type of parser tables within the
+@item Purpose: Specify the type of parser tables within the
@acronym{LR}(1) family.
(This feature is experimental.
More user feedback will help to stabilize it.)
to alter the language accepted by the parser.
@acronym{LALR} parser tables are the smallest parser tables Bison can
currently generate, so they may be preferable.
+Nevertheless, once you begin to resolve conflicts statically,
+@acronym{GLR} begins to behave more like a deterministic parser, and so
+@acronym{IELR} and canonical @acronym{LR} can be helpful to avoid
+@acronym{LALR}'s mysterious behavior.
@item Occasionally during development, an especially malformed grammar
with a major recurring flaw may severely impede the @acronym{IELR} or
@item @code{canonical-lr}.
@cindex delayed syntax errors
@cindex syntax errors delayed
-The only advantage of canonical @acronym{LR} over @acronym{IELR} is
-that, for every left context of every canonical @acronym{LR} state, the
-set of tokens accepted by that state is the exact set of tokens that is
-syntactically acceptable in that left context.
-Thus, the only difference in parsing behavior is that the canonical
-@acronym{LR} parser can report a syntax error as soon as possible
-without performing any unnecessary reductions.
-@xref{Decl Summary,,lr.default-reductions}, for further details.
-Even when canonical @acronym{LR} behavior is ultimately desired,
-@acronym{IELR}'s elimination of duplicate conflicts should still
-facilitate the development of a grammar.
+@cindex @acronym{LAC}
+@findex %nonassoc
+While inefficient, canonical @acronym{LR} parser tables can be an
+interesting means to explore a grammar because they have a property that
+@acronym{IELR} and @acronym{LALR} tables do not.
+That is, if @code{%nonassoc} is not used and default reductions are left
+disabled (@pxref{Decl Summary,,lr.default-reductions}), then, for every
+left context of every canonical @acronym{LR} state, the set of tokens
+accepted by that state is guaranteed to be the exact set of tokens that
+is syntactically acceptable in that left context.
+It might then seem that an advantage of canonical @acronym{LR} parsers
+in production is that, under the above constraints, they are guaranteed
+to detect a syntax error as soon as possible without performing any
+unnecessary reductions.
+However, @acronym{IELR} parsers using @acronym{LAC} (@pxref{Decl
+Summary,,parse.lac}) are also able to achieve this behavior without
+sacrificing @code{%nonassoc} or default reductions.
@end itemize
@item Default Value: @code{lalr}
@itemize
@item Languages(s): C++
-@item Purpose: Specifies the namespace for the parser class.
+@item Purpose: Specify the namespace for the parser class.
For example, if you specify:
@smallexample
The parser namespace is @code{foo} and @code{yylex} is referenced as
@code{bar::lex}.
@end itemize
+
+@c ================================================== parse.lac
+@item parse.lac
+@findex %define parse.lac
+@cindex @acronym{LAC}
+@cindex lookahead correction
+
+@itemize
+@item Languages(s): C
+
+@item Purpose: Enable @acronym{LAC} (lookahead correction) to improve
+syntax error handling.
+
+Canonical @acronym{LR}, @acronym{IELR}, and @acronym{LALR} can suffer
+from a couple of problems upon encountering a syntax error. First, the
+parser might perform additional parser stack reductions before
+discovering the syntax error. Such reductions perform user semantic
+actions that are unexpected because they are based on an invalid token,
+and they cause error recovery to begin in a different syntactic context
+than the one in which the invalid token was encountered. Second, when
+verbose error messages are enabled (with @code{%error-verbose} or
+@code{#define YYERROR_VERBOSE}), the expected token list in the syntax
+error message can both contain invalid tokens and omit valid tokens.
+
+The culprits for the above problems are @code{%nonassoc}, default
+reductions in inconsistent states, and parser state merging. Thus,
+@acronym{IELR} and @acronym{LALR} suffer the most. Canonical
+@acronym{LR} can suffer only if @code{%nonassoc} is used or if default
+reductions are enabled for inconsistent states.
+
+@acronym{LAC} is a new mechanism within the parsing algorithm that
+completely solves these problems for canonical @acronym{LR},
+@acronym{IELR}, and @acronym{LALR} without sacrificing @code{%nonassoc},
+default reductions, or state mering. Conceptually, the mechanism is
+straight-forward. Whenever the parser fetches a new token from the
+scanner so that it can determine the next parser action, it immediately
+suspends normal parsing and performs an exploratory parse using a
+temporary copy of the normal parser state stack. During this
+exploratory parse, the parser does not perform user semantic actions.
+If the exploratory parse reaches a shift action, normal parsing then
+resumes on the normal parser stacks. If the exploratory parse reaches
+an error instead, the parser reports a syntax error. If verbose syntax
+error messages are enabled, the parser must then discover the list of
+expected tokens, so it performs a separate exploratory parse for each
+token in the grammar.
+
+There is one subtlety about the use of @acronym{LAC}. That is, when in
+a consistent parser state with a default reduction, the parser will not
+attempt to fetch a token from the scanner because no lookahead is needed
+to determine the next parser action. Thus, whether default reductions
+are enabled in consistent states (@pxref{Decl
+Summary,,lr.default-reductions}) affects how soon the parser detects a
+syntax error: when it @emph{reaches} an erroneous token or when it
+eventually @emph{needs} that token as a lookahead. The latter behavior
+is probably more intuitive, so Bison currently provides no way to
+achieve the former behavior while default reductions are fully enabled.
+
+Thus, when @acronym{LAC} is in use, for some fixed decision of whether
+to enable default reductions in consistent states, canonical
+@acronym{LR} and @acronym{IELR} behave exactly the same for both
+syntactically acceptable and syntactically unacceptable input. While
+@acronym{LALR} still does not support the full language-recognition
+power of canonical @acronym{LR} and @acronym{IELR}, @acronym{LAC} at
+least enables @acronym{LALR}'s syntax error handling to correctly
+reflect @acronym{LALR}'s language-recognition power.
+
+Because @acronym{LAC} requires many parse actions to be performed twice,
+it can have a performance penalty. However, not all parse actions must
+be performed twice. Specifically, during a series of default reductions
+in consistent states and shift actions, the parser never has to initiate
+an exploratory parse. Moreover, the most time-consuming tasks in a
+parse are often the file I/O, the lexical analysis performed by the
+scanner, and the user's semantic actions, but none of these are
+performed during the exploratory parse. Finally, the base of the
+temporary stack used during an exploratory parse is a pointer into the
+normal parser state stack so that the stack is never physically copied.
+In our experience, the performance penalty of @acronym{LAC} has proven
+insignificant for practical grammars.
+
+@item Accepted Values: @code{none}, @code{full}
+
+@item Default Value: @code{none}
+@end itemize
@end itemize
@end deffn
Algol 60 and is called the ``dangling @code{else}'' ambiguity.
To avoid warnings from Bison about predictable, legitimate shift/reduce
-conflicts, use the @code{%expect @var{n}} declaration. There will be no
-warning as long as the number of shift/reduce conflicts is exactly @var{n}.
+conflicts, use the @code{%expect @var{n}} declaration.
+There will be no warning as long as the number of shift/reduce conflicts
+is exactly @var{n}, and Bison will report an error if there is a
+different number.
@xref{Expect Decl, ,Suppressing Conflict Warnings}.
The definition of @code{if_stmt} above is solely to blame for the
@end table
A category can be turned off by prefixing its name with @samp{no-}. For
-instance, @option{-Wno-syntax} will hide the warnings about unused
-variables.
+instance, @option{-Wno-yacc} will hide the warnings about
+@acronym{POSIX} Yacc incompatibilities.
@end table
@noindent
@item Input stream
A continuous flow of data between devices or programs.
+@item @acronym{LAC} (Lookahead Correction)
+A parsing mechanism that fixes the problem of delayed syntax error
+detection, which is caused by LR state merging, default reductions, and
+the use of @code{%nonassoc}. Delayed syntax error detection results in
+unexpected semantic actions, initiation of error recovery in the wrong
+syntactic context, and an incorrect list of expected tokens in a verbose
+syntax error message. @xref{Decl Summary,,parse.lac}.
+
@item Language construct
One of the typical usage schemas of the language. For example, one of
the constructs of the C language is the @code{if} statement.
@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
-@c LocalWords: subexpressions declarator nondeferred config libintl postfix
+@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC
@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs
@c LocalWords: yytokentype filename destructor multicharacter nonnull EBCDIC
@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK