X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/241ac701baa8f133e042a0d7dd7c896b085a7ef5..f52e1e5e44fd155cfe0fbbf0fd370e99a256c20e:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 8d4cf642..0c09e35e 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -33,8 +33,9 @@ This manual (@value{UPDATED}) is for @acronym{GNU} Bison (version @value{VERSION}), the @acronym{GNU} parser generator. -Copyright @copyright{} 1988-1993, 1995, 1998-2010 Free Software -Foundation, Inc. +Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, +2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Free +Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -474,8 +475,8 @@ For historical reasons, Bison by default is limited by the additional restrictions of @acronym{LALR}(1), which is hard to explain simply. @xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for more information on this. -To escape these additional restrictions, you can request -@acronym{IELR}(1) or canonical @acronym{LR}(1) parser tables. +As an experimental feature, you can escape these additional restrictions by +requesting @acronym{IELR}(1) or canonical @acronym{LR}(1) parser tables. @xref{Decl Summary,,lr.type}, to learn how. @cindex @acronym{GLR} parsing @@ -4614,8 +4615,8 @@ number which Bison printed. With @acronym{GLR} parsers, add an @code{%expect-rr} declaration as well. @end itemize -Now Bison will warn you if you introduce an unexpected conflict, but -will keep silent otherwise. +Now Bison will report an error if you introduce an unexpected conflict, +but will keep silent otherwise. @node Start Decl @subsection The Start-Symbol @@ -5010,7 +5011,7 @@ Some of the accepted @var{variable}s are: @itemize @bullet @item Language(s): C (deterministic parsers only) -@item Purpose: Requests a pull parser, a push parser, or both. +@item Purpose: Request a pull parser, a push parser, or both. @xref{Push Decl, ,A Push Parser}. (The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.) @@ -5027,57 +5028,61 @@ More user feedback will help to stabilize it.) @findex %define lr.default-reductions @cindex delayed syntax errors @cindex syntax errors delayed +@cindex @acronym{LAC} +@findex %nonassoc @itemize @bullet @item Language(s): all -@item Purpose: Specifies the kind of states that are permitted to +@item Purpose: Specify the kind of states that are permitted to contain default reductions. -That is, in such a state, Bison declares the reduction with the largest -lookahead set to be the default reduction and then removes that +That is, in such a state, Bison selects the reduction with the largest +lookahead set to be the default parser action and then removes that lookahead set. -The advantages of default reductions are discussed below. -The disadvantage is that, when the generated parser encounters a -syntactically unacceptable token, the parser might then perform -unnecessary default reductions before it can detect the syntax error. - -(This feature is experimental. +(The ability to specify where default reductions should be used is +experimental. More user feedback will help to stabilize it.) @item Accepted Values: @itemize @item @code{all}. -For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl -Summary,,lr.type}) by default, all states are permitted to contain -default reductions. -The advantage is that parser table sizes can be significantly reduced. -The reason Bison does not by default attempt to address the disadvantage -of delayed syntax error detection is that this disadvantage is already -inherent in @acronym{LALR} and @acronym{IELR} parser tables. -That is, unlike in a canonical @acronym{LR} state, the lookahead sets of -reductions in an @acronym{LALR} or @acronym{IELR} state can contain -tokens that are syntactically incorrect for some left contexts. +This is the traditional Bison behavior. +The main advantage is a significant decrease in the size of the parser +tables. +The disadvantage is that, when the generated parser encounters a +syntactically unacceptable token, the parser might then perform +unnecessary default reductions before it can detect the syntax error. +Such delayed syntax error detection is usually inherent in +@acronym{LALR} and @acronym{IELR} parser tables anyway due to +@acronym{LR} state merging (@pxref{Decl Summary,,lr.type}). +Furthermore, the use of @code{%nonassoc} can contribute to delayed +syntax error detection even in the case of canonical @acronym{LR}. +As an experimental feature, delayed syntax error detection can be +overcome in all cases by enabling @acronym{LAC} (@pxref{Decl +Summary,,parse.lac}, for details, including a discussion of the effects +of delayed syntax error detection). @item @code{consistent}. @cindex consistent states A consistent state is a state that has only one possible action. If that action is a reduction, then the parser does not need to request a lookahead token from the scanner before performing that action. -However, the parser only recognizes the ability to ignore the lookahead -token when such a reduction is encoded as a default reduction. -Thus, if default reductions are permitted in and only in consistent -states, then a canonical @acronym{LR} parser reports a syntax error as -soon as it @emph{needs} the syntactically unacceptable token from the -scanner. +However, the parser recognizes the ability to ignore the lookahead token +in this way only when such a reduction is encoded as a default +reduction. +Thus, if default reductions are permitted only in consistent states, +then a canonical @acronym{LR} parser that does not employ +@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the +syntactically unacceptable token from the scanner. @item @code{accepting}. @cindex accepting state -By default, the only default reduction permitted in a canonical -@acronym{LR} parser is the accept action in the accepting state, which -the parser reaches only after reading all tokens from the input. -Thus, the default canonical @acronym{LR} parser reports a syntax error -as soon as it @emph{reaches} the syntactically unacceptable token -without performing any extra reductions. +In the accepting state, the default reduction is actually the accept +action. +In this case, a canonical @acronym{LR} parser that does not employ +@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the +syntactically unacceptable token in the input. +That is, it does not perform any extra reductions. @end itemize @item Default Value: @@ -5095,8 +5100,8 @@ without performing any extra reductions. @itemize @bullet @item Language(s): all -@item Purpose: Requests that Bison allow unreachable parser states to remain in -the parser tables. +@item Purpose: Request that Bison allow unreachable parser states to +remain in the parser tables. Bison considers a state to be unreachable if there exists no sequence of transitions from the start state to that state. A state can become unreachable during conflict resolution if Bison disables a @@ -5143,7 +5148,7 @@ However, Bison does not compute which goto actions are useless. @itemize @bullet @item Language(s): all -@item Purpose: Specifies the type of parser tables within the +@item Purpose: Specify the type of parser tables within the @acronym{LR}(1) family. (This feature is experimental. More user feedback will help to stabilize it.) @@ -5170,6 +5175,10 @@ In this case, the use of @acronym{LALR} parser tables is guaranteed not to alter the language accepted by the parser. @acronym{LALR} parser tables are the smallest parser tables Bison can currently generate, so they may be preferable. +Nevertheless, once you begin to resolve conflicts statically, +@acronym{GLR} begins to behave more like a deterministic parser, and so +@acronym{IELR} and canonical @acronym{LR} can be helpful to avoid +@acronym{LALR}'s mysterious behavior. @item Occasionally during development, an especially malformed grammar with a major recurring flaw may severely impede the @acronym{IELR} or @@ -5196,17 +5205,23 @@ This can significantly reduce the complexity of developing of a grammar. @item @code{canonical-lr}. @cindex delayed syntax errors @cindex syntax errors delayed -The only advantage of canonical @acronym{LR} over @acronym{IELR} is -that, for every left context of every canonical @acronym{LR} state, the -set of tokens accepted by that state is the exact set of tokens that is -syntactically acceptable in that left context. -Thus, the only difference in parsing behavior is that the canonical -@acronym{LR} parser can report a syntax error as soon as possible -without performing any unnecessary reductions. -@xref{Decl Summary,,lr.default-reductions}, for further details. -Even when canonical @acronym{LR} behavior is ultimately desired, -@acronym{IELR}'s elimination of duplicate conflicts should still -facilitate the development of a grammar. +@cindex @acronym{LAC} +@findex %nonassoc +While inefficient, canonical @acronym{LR} parser tables can be an +interesting means to explore a grammar because they have a property that +@acronym{IELR} and @acronym{LALR} tables do not. +That is, if @code{%nonassoc} is not used and default reductions are left +disabled (@pxref{Decl Summary,,lr.default-reductions}), then, for every +left context of every canonical @acronym{LR} state, the set of tokens +accepted by that state is guaranteed to be the exact set of tokens that +is syntactically acceptable in that left context. +It might then seem that an advantage of canonical @acronym{LR} parsers +in production is that, under the above constraints, they are guaranteed +to detect a syntax error as soon as possible without performing any +unnecessary reductions. +However, @acronym{IELR} parsers using @acronym{LAC} (@pxref{Decl +Summary,,parse.lac}) are also able to achieve this behavior without +sacrificing @code{%nonassoc} or default reductions. @end itemize @item Default Value: @code{lalr} @@ -5218,7 +5233,7 @@ facilitate the development of a grammar. @itemize @item Languages(s): C++ -@item Purpose: Specifies the namespace for the parser class. +@item Purpose: Specify the namespace for the parser class. For example, if you specify: @smallexample @@ -5263,6 +5278,89 @@ For example, if you specify: The parser namespace is @code{foo} and @code{yylex} is referenced as @code{bar::lex}. @end itemize + +@c ================================================== parse.lac +@item parse.lac +@findex %define parse.lac +@cindex @acronym{LAC} +@cindex lookahead correction + +@itemize +@item Languages(s): C + +@item Purpose: Enable @acronym{LAC} (lookahead correction) to improve +syntax error handling. + +Canonical @acronym{LR}, @acronym{IELR}, and @acronym{LALR} can suffer +from a couple of problems upon encountering a syntax error. First, the +parser might perform additional parser stack reductions before +discovering the syntax error. Such reductions perform user semantic +actions that are unexpected because they are based on an invalid token, +and they cause error recovery to begin in a different syntactic context +than the one in which the invalid token was encountered. Second, when +verbose error messages are enabled (with @code{%error-verbose} or +@code{#define YYERROR_VERBOSE}), the expected token list in the syntax +error message can both contain invalid tokens and omit valid tokens. + +The culprits for the above problems are @code{%nonassoc}, default +reductions in inconsistent states, and parser state merging. Thus, +@acronym{IELR} and @acronym{LALR} suffer the most. Canonical +@acronym{LR} can suffer only if @code{%nonassoc} is used or if default +reductions are enabled for inconsistent states. + +@acronym{LAC} is a new mechanism within the parsing algorithm that +completely solves these problems for canonical @acronym{LR}, +@acronym{IELR}, and @acronym{LALR} without sacrificing @code{%nonassoc}, +default reductions, or state mering. Conceptually, the mechanism is +straight-forward. Whenever the parser fetches a new token from the +scanner so that it can determine the next parser action, it immediately +suspends normal parsing and performs an exploratory parse using a +temporary copy of the normal parser state stack. During this +exploratory parse, the parser does not perform user semantic actions. +If the exploratory parse reaches a shift action, normal parsing then +resumes on the normal parser stacks. If the exploratory parse reaches +an error instead, the parser reports a syntax error. If verbose syntax +error messages are enabled, the parser must then discover the list of +expected tokens, so it performs a separate exploratory parse for each +token in the grammar. + +There is one subtlety about the use of @acronym{LAC}. That is, when in +a consistent parser state with a default reduction, the parser will not +attempt to fetch a token from the scanner because no lookahead is needed +to determine the next parser action. Thus, whether default reductions +are enabled in consistent states (@pxref{Decl +Summary,,lr.default-reductions}) affects how soon the parser detects a +syntax error: when it @emph{reaches} an erroneous token or when it +eventually @emph{needs} that token as a lookahead. The latter behavior +is probably more intuitive, so Bison currently provides no way to +achieve the former behavior while default reductions are fully enabled. + +Thus, when @acronym{LAC} is in use, for some fixed decision of whether +to enable default reductions in consistent states, canonical +@acronym{LR} and @acronym{IELR} behave exactly the same for both +syntactically acceptable and syntactically unacceptable input. While +@acronym{LALR} still does not support the full language-recognition +power of canonical @acronym{LR} and @acronym{IELR}, @acronym{LAC} at +least enables @acronym{LALR}'s syntax error handling to correctly +reflect @acronym{LALR}'s language-recognition power. + +Because @acronym{LAC} requires many parse actions to be performed twice, +it can have a performance penalty. However, not all parse actions must +be performed twice. Specifically, during a series of default reductions +in consistent states and shift actions, the parser never has to initiate +an exploratory parse. Moreover, the most time-consuming tasks in a +parse are often the file I/O, the lexical analysis performed by the +scanner, and the user's semantic actions, but none of these are +performed during the exploratory parse. Finally, the base of the +temporary stack used during an exploratory parse is a pointer into the +normal parser state stack so that the stack is never physically copied. +In our experience, the performance penalty of @acronym{LAC} has proven +insignificant for practical grammars. + +@item Accepted Values: @code{none}, @code{full} + +@item Default Value: @code{none} +@end itemize @end itemize @end deffn @@ -6450,8 +6548,10 @@ This particular ambiguity was first encountered in the specifications of Algol 60 and is called the ``dangling @code{else}'' ambiguity. To avoid warnings from Bison about predictable, legitimate shift/reduce -conflicts, use the @code{%expect @var{n}} declaration. There will be no -warning as long as the number of shift/reduce conflicts is exactly @var{n}. +conflicts, use the @code{%expect @var{n}} declaration. +There will be no warning as long as the number of shift/reduce conflicts +is exactly @var{n}, and Bison will report an error if there is a +different number. @xref{Expect Decl, ,Suppressing Conflict Warnings}. The definition of @code{if_stmt} above is solely to blame for the @@ -8100,8 +8200,8 @@ Treat warnings as errors. @end table A category can be turned off by prefixing its name with @samp{no-}. For -instance, @option{-Wno-syntax} will hide the warnings about unused -variables. +instance, @option{-Wno-yacc} will hide the warnings about +@acronym{POSIX} Yacc incompatibilities. @end table @noindent @@ -10585,6 +10685,14 @@ performs some operation. @item Input stream A continuous flow of data between devices or programs. +@item @acronym{LAC} (Lookahead Correction) +A parsing mechanism that fixes the problem of delayed syntax error +detection, which is caused by LR state merging, default reductions, and +the use of @code{%nonassoc}. Delayed syntax error detection results in +unexpected semantic actions, initiation of error recovery in the wrong +syntactic context, and an incorrect list of expected tokens in a verbose +syntax error message. @xref{Decl Summary,,parse.lac}. + @item Language construct One of the typical usage schemas of the language. For example, one of the constructs of the C language is the @code{if} statement. @@ -10745,7 +10853,7 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead @c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th @c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps -@c LocalWords: subexpressions declarator nondeferred config libintl postfix +@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC @c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs @c LocalWords: yytokentype filename destructor multicharacter nonnull EBCDIC @c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK