X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/232be91a83217de5f5794a70b8b20b3e3e97cdd4..4c38b19e2650ca8b79b0d72a9995605ca12d9875:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 5a601fa1..8a1031ef 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -33,14 +33,14 @@ This manual (@value{UPDATED}) is for @acronym{GNU} Bison (version @value{VERSION}), the @acronym{GNU} parser generator. -Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, -1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free +Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, +2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document under the terms of the @acronym{GNU} Free Documentation License, -Version 1.2 or any later version published by the Free Software +Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover texts being ``A @acronym{GNU} Manual,'' and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled @@ -206,6 +206,7 @@ Defining Language Semantics * Mid-Rule Actions:: Most actions go at the end of a rule. This says when, why and how to use the exceptional action in the middle of a rule. +* Named References:: Using named references in actions. Tracking Locations @@ -351,9 +352,10 @@ Copying This Manual @cindex introduction @dfn{Bison} is a general-purpose parser generator that converts an -annotated context-free grammar into a deterministic or @acronym{GLR} -parser employing @acronym{LALR}(1), @acronym{IELR}(1), or canonical -@acronym{LR}(1) parser tables. +annotated context-free grammar into a deterministic @acronym{LR} or +generalized @acronym{LR} (@acronym{GLR}) parser employing +@acronym{LALR}(1), @acronym{IELR}(1), or canonical @acronym{LR}(1) +parser tables. Once you are proficient with Bison, you can use it to develop a wide range of language parsers, from those used in simple desk calculators to complex programming languages. @@ -3365,6 +3367,7 @@ the numbers associated with @var{x} and @var{y}. * Mid-Rule Actions:: Most actions go at the end of a rule. This says when, why and how to use the exceptional action in the middle of a rule. +* Named References:: Using named references in actions. @end menu @node Value Type @@ -3426,6 +3429,8 @@ Decl, ,Nonterminal Symbols}). @cindex action @vindex $$ @vindex $@var{n} +@vindex $@var{name} +@vindex $[@var{name}] An action accompanies a syntactic rule and contains C code to be executed each time an instance of that rule is recognized. The task of most actions @@ -3442,9 +3447,12 @@ Actions, ,Actions in Mid-Rule}). The C code in an action can refer to the semantic values of the components matched by the rule with the construct @code{$@var{n}}, which stands for the value of the @var{n}th component. The semantic value for the grouping -being constructed is @code{$$}. Bison translates both of these +being constructed is @code{$$}. In addition, the semantic values of +symbols can be accessed with the named references construct +@code{$@var{name}} or @code{$[@var{name}]}. Bison translates both of these constructs into expressions of the appropriate type when it copies the -actions into the parser file. @code{$$} is translated to a modifiable +actions into the parser file. @code{$$} (or @code{$@var{name}}, when it +stands for the current grouping) is translated to a modifiable lvalue, so it can be assigned to. Here is a typical example: @@ -3457,16 +3465,31 @@ exp: @dots{} @end group @end example +Or, in terms of named references: + +@example +@group +exp[result]: @dots{} + | exp[left] '+' exp[right] + @{ $result = $left + $right; @} +@end group +@end example + @noindent This rule constructs an @code{exp} from two smaller @code{exp} groupings connected by a plus-sign token. In the action, @code{$1} and @code{$3} +(@code{$left} and @code{$right}) refer to the semantic values of the two component @code{exp} groupings, which are the first and third symbols on the right hand side of the rule. -The sum is stored into @code{$$} so that it becomes the semantic value of +The sum is stored into @code{$$} (@code{$result}) so that it becomes the +semantic value of the addition-expression just recognized by the rule. If there were a useful semantic value associated with the @samp{+} token, it could be referred to as @code{$2}. +@xref{Named References,,Using Named References}, for more information +about using the named references construct. + Note that the vertical-bar character @samp{|} is really a rule separator, and actions are attached to a single rule. This is a difference with tools like Flex, for which @samp{|} stands for either @@ -3761,6 +3784,93 @@ compound: subroutine Now Bison can execute the action in the rule for @code{subroutine} without deciding which rule for @code{compound} it will eventually use. +@node Named References +@subsection Using Named References +@cindex named references + +While every semantic value can be accessed with positional references +@code{$@var{n}} and @code{$$}, it's often much more convenient to refer to +them by name. First of all, original symbol names may be used as named +references. For example: + +@example +@group +invocation: op '(' args ')' + @{ $invocation = new_invocation ($op, $args, @@invocation); @} +@end group +@end example + +@noindent +The positional @code{$$}, @code{@@$}, @code{$n}, and @code{@@n} can be +mixed with @code{$name} and @code{@@name} arbitrarily. For example: + +@example +@group +invocation: op '(' args ')' + @{ $$ = new_invocation ($op, $args, @@$); @} +@end group +@end example + +@noindent +However, sometimes regular symbol names are not sufficient due to +ambiguities: + +@example +@group +exp: exp '/' exp + @{ $exp = $exp / $exp; @} // $exp is ambiguous. + +exp: exp '/' exp + @{ $$ = $1 / $exp; @} // One usage is ambiguous. + +exp: exp '/' exp + @{ $$ = $1 / $3; @} // No error. +@end group +@end example + +@noindent +When ambiguity occurs, explicitly declared names may be used for values and +locations. Explicit names are declared as a bracketed name after a symbol +appearance in rule definitions. For example: +@example +@group +exp[result]: exp[left] '/' exp[right] + @{ $result = $left / $right; @} +@end group +@end example + +@noindent +Explicit names may be declared for RHS and for LHS symbols as well. In order +to access a semantic value generated by a mid-rule action, an explicit name +may also be declared by putting a bracketed name after the closing brace of +the mid-rule action code: +@example +@group +exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right] + @{ $res = $left + $right; @} +@end group +@end example + +@noindent + +In references, in order to specify names containing dots and dashes, an explicit +bracketed syntax @code{$[name]} and @code{@@[name]} must be used: +@example +@group +if-stmt: IF '(' expr ')' THEN then.stmt ';' + @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @} +@end group +@end example + +It often happens that named references are followed by a dot, dash or other +C punctuation marks and operators. By default, Bison will read +@code{$name.suffix} as a reference to symbol value @code{$name} followed by +@samp{.suffix}, i.e., an access to the @samp{suffix} field of the semantic +value. In order to force Bison to recognize @code{name.suffix} in its entirety +as the name of a semantic value, bracketed syntax @code{$[name.suffix]} +must be used. + + @node Locations @section Tracking Locations @cindex location @@ -3816,6 +3926,8 @@ Action Decl, , Performing Actions before Parsing}. @cindex actions, location @vindex @@$ @vindex @@@var{n} +@vindex @@@var{name} +@vindex @@[@var{name}] Actions are not only useful for defining language semantics, but also for describing the behavior of the output parser with locations. @@ -3827,6 +3939,11 @@ The location of the @var{n}th component of the right hand side is @code{@@@var{n}}, while the location of the left hand side grouping is @code{@@$}. +In addition, the named references construct @code{@@@var{name}} and +@code{@@[@var{name}]} may also be used to address the symbol locations. +@xref{Named References,,Using Named References}, for more information +about using the named references construct. + Here is a basic example using the default data type for locations: @example @@ -4498,8 +4615,8 @@ number which Bison printed. With @acronym{GLR} parsers, add an @code{%expect-rr} declaration as well. @end itemize -Now Bison will warn you if you introduce an unexpected conflict, but -will keep silent otherwise. +Now Bison will report an error if you introduce an unexpected conflict, +but will keep silent otherwise. @node Start Decl @subsection The Start-Symbol @@ -4911,57 +5028,61 @@ More user feedback will help to stabilize it.) @findex %define lr.default-reductions @cindex delayed syntax errors @cindex syntax errors delayed +@cindex @acronym{LAC} +@findex %nonassoc @itemize @bullet @item Language(s): all -@item Purpose: Specifies the kind of states that are permitted to +@item Purpose: Specify the kind of states that are permitted to contain default reductions. -That is, in such a state, Bison declares the reduction with the largest -lookahead set to be the default reduction and then removes that +That is, in such a state, Bison selects the reduction with the largest +lookahead set to be the default parser action and then removes that lookahead set. -The advantages of default reductions are discussed below. -The disadvantage is that, when the generated parser encounters a -syntactically unacceptable token, the parser might then perform -unnecessary default reductions before it can detect the syntax error. - -(This feature is experimental. +(The ability to specify where default reductions should be used is +experimental. More user feedback will help to stabilize it.) @item Accepted Values: @itemize @item @code{all}. -For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl -Summary,,lr.type}) by default, all states are permitted to contain -default reductions. -The advantage is that parser table sizes can be significantly reduced. -The reason Bison does not by default attempt to address the disadvantage -of delayed syntax error detection is that this disadvantage is already -inherent in @acronym{LALR} and @acronym{IELR} parser tables. -That is, unlike in a canonical @acronym{LR} state, the lookahead sets of -reductions in an @acronym{LALR} or @acronym{IELR} state can contain -tokens that are syntactically incorrect for some left contexts. +This is the traditional Bison behavior. +The main advantage is a significant decrease in the size of the parser +tables. +The disadvantage is that, when the generated parser encounters a +syntactically unacceptable token, the parser might then perform +unnecessary default reductions before it can detect the syntax error. +Such delayed syntax error detection is usually inherent in +@acronym{LALR} and @acronym{IELR} parser tables anyway due to +@acronym{LR} state merging (@pxref{Decl Summary,,lr.type}). +Furthermore, the use of @code{%nonassoc} can contribute to delayed +syntax error detection even in the case of canonical @acronym{LR}. +As an experimental feature, delayed syntax error detection can be +overcome in all cases by enabling @acronym{LAC} (@pxref{Decl +Summary,,parse.lac}, for details, including a discussion of the effects +of delayed syntax error detection). @item @code{consistent}. @cindex consistent states A consistent state is a state that has only one possible action. If that action is a reduction, then the parser does not need to request a lookahead token from the scanner before performing that action. -However, the parser only recognizes the ability to ignore the lookahead -token when such a reduction is encoded as a default reduction. -Thus, if default reductions are permitted in and only in consistent -states, then a canonical @acronym{LR} parser reports a syntax error as -soon as it @emph{needs} the syntactically unacceptable token from the -scanner. +However, the parser recognizes the ability to ignore the lookahead token +in this way only when such a reduction is encoded as a default +reduction. +Thus, if default reductions are permitted only in consistent states, +then a canonical @acronym{LR} parser that does not employ +@code{%nonassoc} detects a syntax error as soon as it @emph{needs} the +syntactically unacceptable token from the scanner. @item @code{accepting}. @cindex accepting state -By default, the only default reduction permitted in a canonical -@acronym{LR} parser is the accept action in the accepting state, which -the parser reaches only after reading all tokens from the input. -Thus, the default canonical @acronym{LR} parser reports a syntax error -as soon as it @emph{reaches} the syntactically unacceptable token -without performing any extra reductions. +In the accepting state, the default reduction is actually the accept +action. +In this case, a canonical @acronym{LR} parser that does not employ +@code{%nonassoc} detects a syntax error as soon as it @emph{reaches} the +syntactically unacceptable token in the input. +That is, it does not perform any extra reductions. @end itemize @item Default Value: @@ -5080,17 +5201,23 @@ This can significantly reduce the complexity of developing of a grammar. @item @code{canonical-lr}. @cindex delayed syntax errors @cindex syntax errors delayed -The only advantage of canonical @acronym{LR} over @acronym{IELR} is -that, for every left context of every canonical @acronym{LR} state, the -set of tokens accepted by that state is the exact set of tokens that is -syntactically acceptable in that left context. -Thus, the only difference in parsing behavior is that the canonical -@acronym{LR} parser can report a syntax error as soon as possible -without performing any unnecessary reductions. -@xref{Decl Summary,,lr.default-reductions}, for further details. -Even when canonical @acronym{LR} behavior is ultimately desired, -@acronym{IELR}'s elimination of duplicate conflicts should still -facilitate the development of a grammar. +@cindex @acronym{LAC} +@findex %nonassoc +While inefficient, canonical @acronym{LR} parser tables can be an +interesting means to explore a grammar because they have a property that +@acronym{IELR} and @acronym{LALR} tables do not. +That is, if @code{%nonassoc} is not used and default reductions are left +disabled (@pxref{Decl Summary,,lr.default-reductions}), then, for every +left context of every canonical @acronym{LR} state, the set of tokens +accepted by that state is guaranteed to be the exact set of tokens that +is syntactically acceptable in that left context. +It might then seem that an advantage of canonical @acronym{LR} parsers +in production is that, under the above constraints, they are guaranteed +to detect a syntax error as soon as possible without performing any +unnecessary reductions. +However, @acronym{IELR} parsers using @acronym{LAC} (@pxref{Decl +Summary,,parse.lac}) are also able to achieve this behavior without +sacrificing @code{%nonassoc} or default reductions. @end itemize @item Default Value: @code{lalr} @@ -5147,6 +5274,89 @@ For example, if you specify: The parser namespace is @code{foo} and @code{yylex} is referenced as @code{bar::lex}. @end itemize + +@c ================================================== parse.lac +@item parse.lac +@findex %define parse.lac +@cindex @acronym{LAC} +@cindex lookahead correction + +@itemize +@item Languages(s): C + +@item Purpose: Enable @acronym{LAC} (lookahead correction) to improve +syntax error handling. + +Canonical @acronym{LR}, @acronym{IELR}, and @acronym{LALR} can suffer +from a couple of problems upon encountering a syntax error. First, the +parser might perform additional parser stack reductions before +discovering the syntax error. Such reductions perform user semantic +actions that are unexpected because they are based on an invalid token, +and they cause error recovery to begin in a different syntactic context +than the one in which the invalid token was encountered. Second, when +verbose error messages are enabled (with @code{%error-verbose} or +@code{#define YYERROR_VERBOSE}), the expected token list in the syntax +error message can both contain invalid tokens and omit valid tokens. + +The culprits for the above problems are @code{%nonassoc}, default +reductions in inconsistent states, and parser state merging. Thus, +@acronym{IELR} and @acronym{LALR} suffer the most. Canonical +@acronym{LR} can suffer only if @code{%nonassoc} is used or if default +reductions are enabled for inconsistent states. + +@acronym{LAC} is a new mechanism within the parsing algorithm that +completely solves these problems for canonical @acronym{LR}, +@acronym{IELR}, and @acronym{LALR} without sacrificing @code{%nonassoc}, +default reductions, or state mering. Conceptually, the mechanism is +straight-forward. Whenever the parser fetches a new token from the +scanner so that it can determine the next parser action, it immediately +suspends normal parsing and performs an exploratory parse using a +temporary copy of the normal parser state stack. During this +exploratory parse, the parser does not perform user semantic actions. +If the exploratory parse reaches a shift action, normal parsing then +resumes on the normal parser stacks. If the exploratory parse reaches +an error instead, the parser reports a syntax error. If verbose syntax +error messages are enabled, the parser must then discover the list of +expected tokens, so it performs a separate exploratory parse for each +token in the grammar. + +There is one subtlety about the use of @acronym{LAC}. That is, when in +a consistent parser state with a default reduction, the parser will not +attempt to fetch a token from the scanner because no lookahead is needed +to determine the next parser action. Thus, whether default reductions +are enabled in consistent states (@pxref{Decl +Summary,,lr.default-reductions}) affects how soon the parser detects a +syntax error: when it @emph{reaches} an erroneous token or when it +eventually @emph{needs} that token as a lookahead. The latter behavior +is probably more intuitive, so Bison currently provides no way to +achieve the former behavior while default reductions are fully enabled. + +Thus, when @acronym{LAC} is in use, for some fixed decision of whether +to enable default reductions in consistent states, canonical +@acronym{LR} and @acronym{IELR} behave exactly the same for both +syntactically acceptable and syntactically unacceptable input. While +@acronym{LALR} still does not support the full language-recognition +power of canonical @acronym{LR} and @acronym{IELR}, @acronym{LAC} at +least enables @acronym{LALR}'s syntax error handling to correctly +reflect @acronym{LALR}'s language-recognition power. + +Because @acronym{LAC} requires many parse actions to be performed twice, +it can have a performance penalty. However, not all parse actions must +be performed twice. Specifically, during a series of default reductions +in consistent states and shift actions, the parser never has to initiate +an exploratory parse. Moreover, the most time-consuming tasks in a +parse are often the file I/O, the lexical analysis performed by the +scanner, and the user's semantic actions, but none of these are +performed during the exploratory parse. Finally, the base of the +temporary stack used during an exploratory parse is a pointer into the +normal parser state stack so that the stack is never physically copied. +In our experience, the performance penalty of @acronym{LAC} has proven +insignificant for practical grammars. + +@item Accepted Values: @code{none}, @code{full} + +@item Default Value: @code{none} +@end itemize @end itemize @end deffn @@ -6334,8 +6544,10 @@ This particular ambiguity was first encountered in the specifications of Algol 60 and is called the ``dangling @code{else}'' ambiguity. To avoid warnings from Bison about predictable, legitimate shift/reduce -conflicts, use the @code{%expect @var{n}} declaration. There will be no -warning as long as the number of shift/reduce conflicts is exactly @var{n}. +conflicts, use the @code{%expect @var{n}} declaration. +There will be no warning as long as the number of shift/reduce conflicts +is exactly @var{n}, and Bison will report an error if there is a +different number. @xref{Expect Decl, ,Suppressing Conflict Warnings}. The definition of @code{if_stmt} above is solely to blame for the @@ -7984,8 +8196,8 @@ Treat warnings as errors. @end table A category can be turned off by prefixing its name with @samp{no-}. For -instance, @option{-Wno-syntax} will hide the warnings about unused -variables. +instance, @option{-Wno-yacc} will hide the warnings about +@acronym{POSIX} Yacc incompatibilities. @end table @noindent @@ -8213,8 +8425,8 @@ int yyparse (void); @c - initial action The C++ deterministic parser is selected using the skeleton directive, -@samp{%skeleton "lalr1.c"}, or the synonymous command-line option -@option{--skeleton=lalr1.c}. +@samp{%skeleton "lalr1.cc"}, or the synonymous command-line option +@option{--skeleton=lalr1.cc}. @xref{Decl Summary}. When run, @command{bison} will create several entities in the @samp{yy} @@ -8363,11 +8575,19 @@ this class is detailed below. It can be extended using the it describes an additional member of the parser class, and an additional argument for its constructor. -@defcv {Type} {parser} {semantic_value_type} -@defcvx {Type} {parser} {location_value_type} +@defcv {Type} {parser} {semantic_type} +@defcvx {Type} {parser} {location_type} The types for semantics value and locations. @end defcv +@defcv {Type} {parser} {token} +A structure that contains (only) the definition of the tokens as the +@code{yytokentype} enumeration. To refer to the token @code{FOO}, the +scanner should use @code{yy::parser::token::FOO}. The scanner can use +@samp{typedef yy::parser::token token;} to ``import'' the token enumeration +(@pxref{Calc++ Scanner}). +@end defcv + @deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...) Build a new parser object. There are no arguments by default, unless @samp{%parse-param @{@var{type1} @var{arg1}@}} was used. @@ -8406,7 +8626,7 @@ The parser invokes the scanner by calling @code{yylex}. Contrary to C parsers, C++ parsers are always pure: there is no point in using the @code{%define api.pure} directive. Therefore the interface is as follows. -@deftypemethod {parser} {int} yylex (semantic_value_type& @var{yylval}, location_type& @var{yylloc}, @var{type1} @var{arg1}, ...) +@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...) Return the next token. Its type is the return value, its semantic value and location being @var{yylval} and @var{yylloc}. Invocations of @samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments. @@ -9268,11 +9488,6 @@ Start error recovery without printing an error message. @xref{Error Recovery}. @end deffn -@deffn {Statement} {return YYFAIL;} -Print an error message and start error recovery. -@xref{Error Recovery}. -@end deffn - @deftypefn {Function} {boolean} recovering () Return whether error recovery is being done. In this state, the parser reads token until it reaches a known state, and then restarts normal @@ -9889,6 +10104,16 @@ In an action, the location of the @var{n}-th symbol of the right-hand side of the rule. @xref{Locations, , Locations Overview}. @end deffn +@deffn {Variable} @@@var{name} +In an action, the location of a symbol addressed by name. +@xref{Locations, , Locations Overview}. +@end deffn + +@deffn {Variable} @@[@var{name}] +In an action, the location of a symbol addressed by name. +@xref{Locations, , Locations Overview}. +@end deffn + @deffn {Variable} $$ In an action, the semantic value of the left-hand side of the rule. @xref{Actions}. @@ -9899,6 +10124,16 @@ In an action, the semantic value of the @var{n}-th symbol of the right-hand side of the rule. @xref{Actions}. @end deffn +@deffn {Variable} $@var{name} +In an action, the semantic value of a symbol addressed by name. +@xref{Actions}. +@end deffn + +@deffn {Variable} $[@var{name}] +In an action, the semantic value of a symbol addressed by name. +@xref{Actions}. +@end deffn + @deffn {Delimiter} %% Delimiter used to separate the grammar rule section from the Bison declarations section or the epilogue. @@ -10446,6 +10681,14 @@ performs some operation. @item Input stream A continuous flow of data between devices or programs. +@item @acronym{LAC} (Lookahead Correction) +A parsing mechanism that fixes the problem of delayed syntax error +detection, which is caused by LR state merging, default reductions, and +the use of @code{%nonassoc}. Delayed syntax error detection results in +unexpected semantic actions, initiation of error recovery in the wrong +syntactic context, and an incorrect list of expected tokens in a verbose +syntax error message. @xref{Decl Summary,,parse.lac}. + @item Language construct One of the typical usage schemas of the language. For example, one of the constructs of the C language is the @code{if} statement. @@ -10606,7 +10849,7 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead @c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th @c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps -@c LocalWords: subexpressions declarator nondeferred config libintl postfix +@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC @c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs @c LocalWords: yytokentype filename destructor multicharacter nonnull EBCDIC @c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK @@ -10627,5 +10870,5 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv @c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url @c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos -@c LocalWords: getLVal defvar YYFAIL deftypefn deftypefnx gotos msgfmt +@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt @c LocalWords: subdirectory Solaris nonassociativity