@end ifinfo
@comment %**end of header
-@ifinfo
-@format
-START-INFO-DIR-ENTRY
-* bison: (bison). GNU Project parser generator (yacc replacement).
-END-INFO-DIR-ENTRY
-@end format
-@end ifinfo
+@copying
-@ifinfo
-This file documents the Bison parser generator.
-
-Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
-2000, 2001, 2002
-Free Software Foundation, Inc.
-
-Permission is granted to make and distribute verbatim copies of
-this manual provided the copyright notice and this permission notice
-are preserved on all copies.
-
-@ignore
-Permission is granted to process this file through Tex and print the
-results, provided the printed document carries copying permission
-notice identical to this one except for the removal of this paragraph
-(this paragraph not being relevant to the printed manual).
-
-@end ignore
-Permission is granted to copy and distribute modified versions of this
-manual under the conditions for verbatim copying, provided also that the
-sections entitled ``GNU General Public License'' and ``Conditions for
-Using Bison'' are included exactly as in the original, and provided that
-the entire resulting derived work is distributed under the terms of a
-permission notice identical to this one.
-
-Permission is granted to copy and distribute translations of this manual
-into another language, under the above conditions for modified versions,
-except that the sections entitled ``GNU General Public License'',
-``Conditions for Using Bison'' and this permission notice may be
-included in translations approved by the Free Software Foundation
-instead of in the original English.
-@end ifinfo
+This manual is for GNU Bison (version @value{VERSION}, @value{UPDATED}),
+the GNU parser generator.
+
+Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
+1999, 2000, 2001, 2002 Free Software Foundation, Inc.
+
+@quotation
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.1 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
+and with the Back-Cover Texts as in (a) below. A copy of the
+license is included in the section entitled ``GNU Free Documentation
+License.''
+
+(a) The FSF's Back-Cover Text is: ``You have freedom to copy and modify
+this GNU Manual, like GNU software. Copies published by the Free
+Software Foundation raise funds for GNU development.''
+@end quotation
+@end copying
+
+@dircategory GNU programming tools
+@direntry
+* bison: (bison). GNU parser generator (yacc replacement).
+@end direntry
@ifset shorttitlepage-enabled
@shorttitlepage Bison
@page
@vskip 0pt plus 1filll
-Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
-1999, 2000, 2001, 2002
-Free Software Foundation, Inc.
-
+@insertcopying
@sp 2
Published by the Free Software Foundation @*
59 Temple Place, Suite 330 @*
Boston, MA 02111-1307 USA @*
Printed copies are available from the Free Software Foundation.@*
ISBN 1-882114-44-2
-
-Permission is granted to make and distribute verbatim copies of
-this manual provided the copyright notice and this permission notice
-are preserved on all copies.
-
-@ignore
-Permission is granted to process this file through TeX and print the
-results, provided the printed document carries copying permission
-notice identical to this one except for the removal of this paragraph
-(this paragraph not being relevant to the printed manual).
-
-@end ignore
-Permission is granted to copy and distribute modified versions of this
-manual under the conditions for verbatim copying, provided also that the
-sections entitled ``GNU General Public License'' and ``Conditions for
-Using Bison'' are included exactly as in the original, and provided that
-the entire resulting derived work is distributed under the terms of a
-permission notice identical to this one.
-
-Permission is granted to copy and distribute translations of this manual
-into another language, under the above conditions for modified versions,
-except that the sections entitled ``GNU General Public License'',
-``Conditions for Using Bison'' and this permission notice may be
-included in translations approved by the Free Software Foundation
-instead of in the original English.
@sp 2
Cover art by Etienne Suvasa.
@end titlepage
@ifnottex
@node Top
@top Bison
-
-This manual documents version @value{VERSION} of Bison, updated
-@value{UPDATED}.
+@insertcopying
@end ifnottex
@menu
* Parser States:: The parser is a finite-state-machine with stack.
* Reduce/Reduce:: When two rules are applicable in the same situation.
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
+* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
* Stack Overflow:: What happens when stack gets full. How to avoid it.
Operator Precedence
a semantic value (the value of an integer,
the name of an identifier, etc.).
* Semantic Actions:: Each rule can have an action containing C code.
+* GLR Parsers:: Writing parsers for general context-free languages
* Locations Overview:: Tracking Locations.
* Bison Parser:: What are Bison's input and output,
how is the output used?
context-free grammar. The input to Bison is essentially machine-readable
BNF.
-Not all context-free languages can be handled by Bison, only those
-that are LALR(1). In brief, this means that it must be possible to
+@cindex LALR(1) grammars
+@cindex LR(1) grammars
+There are various important subclasses of context-free grammar. Although it
+can handle almost all context-free grammars, Bison is optimized for what
+are called LALR(1) grammars.
+In brief, in these grammars, it must be possible to
tell how to parse any portion of an input string with just a single
token of look-ahead. Strictly speaking, that is a description of an
LR(1) grammar, and LALR(1) involves additional restrictions that are
LR(1) grammar that fails to be LALR(1). @xref{Mystery Conflicts, ,
Mysterious Reduce/Reduce Conflicts}, for more information on this.
+@cindex GLR parsing
+@cindex generalized LR (GLR) parsing
+@cindex ambiguous grammars
+@cindex non-deterministic parsing
+Parsers for LALR(1) grammars are @dfn{deterministic}, meaning roughly that
+the next grammar rule to apply at any point in the input is uniquely
+determined by the preceding input and a fixed, finite portion (called
+a @dfn{look-ahead}) of the remaining input.
+A context-free grammar can be @dfn{ambiguous}, meaning that
+there are multiple ways to apply the grammar rules to get the some inputs.
+Even unambiguous grammars can be @dfn{non-deterministic}, meaning that no
+fixed look-ahead always suffices to determine the next grammar rule to apply.
+With the proper declarations, Bison is also able to parse these more general
+context-free grammars, using a technique known as GLR parsing (for
+Generalized LR). Bison's GLR parsers are able to handle any context-free
+grammar for which the number of possible parses of any given string
+is finite.
+
@cindex symbols (abstract)
@cindex token
@cindex syntactic grouping
The action says how to produce the semantic value of the sum expression
from the values of the two subexpressions.
+@node GLR Parsers
+@section Writing GLR Parsers
+@cindex GLR parsing
+@cindex generalized LR (GLR) parsing
+@findex %glr-parser
+@cindex conflicts
+@cindex shift/reduce conflicts
+
+In some grammars, there will be cases where Bison's standard LALR(1)
+parsing algorithm cannot decide whether to apply a certain grammar rule
+at a given point. That is, it may not be able to decide (on the basis
+of the input read so far) which of two possible reductions (applications
+of a grammar rule) applies, or whether to apply a reduction or read more
+of the input and apply a reduction later in the input. These are known
+respectively as @dfn{reduce/reduce} conflicts (@pxref{Reduce/Reduce}),
+and @dfn{shift/reduce} conflicts (@pxref{Shift/Reduce}).
+
+To use a grammar that is not easily modified to be LALR(1), a more
+general parsing algorithm is sometimes necessary. If you include
+@code{%glr-parser} among the Bison declarations in your file
+(@pxref{Grammar Outline}), the result will be a Generalized LR (GLR)
+parser. These parsers handle Bison grammars that contain no unresolved
+conflicts (i.e., after applying precedence declarations) identically to
+LALR(1) parsers. However, when faced with unresolved shift/reduce and
+reduce/reduce conflicts, GLR parsers use the simple expedient of doing
+both, effectively cloning the parser to follow both possibilities. Each
+of the resulting parsers can again split, so that at any given time,
+there can be any number of possible parses being explored. The parsers
+proceed in lockstep; that is, all of them consume (shift) a given input
+symbol before any of them proceed to the next. Each of the cloned
+parsers eventually meets one of two possible fates: either it runs into
+a parsing error, in which case it simply vanishes, or it merges with
+another parser, because the two of them have reduced the input to an
+identical set of symbols.
+
+During the time that there are multiple parsers, semantic actions are
+recorded, but not performed. When a parser disappears, its recorded
+semantic actions disappear as well, and are never performed. When a
+reduction makes two parsers identical, causing them to merge, Bison
+records both sets of semantic actions. Whenever the last two parsers
+merge, reverting to the single-parser case, Bison resolves all the
+outstanding actions either by precedences given to the grammar rules
+involved, or by performing both actions, and then calling a designated
+user-defined function on the resulting values to produce an arbitrary
+merged result.
+
+Let's consider an example, vastly simplified from C++.
+
+@example
+%@{
+ #define YYSTYPE const char*
+%@}
+
+%token TYPENAME ID
+
+%right '='
+%left '+'
+
+%glr-parser
+
+%%
+
+prog :
+ | prog stmt @{ printf ("\n"); @}
+ ;
+
+stmt : expr ';' %dprec 1
+ | decl %dprec 2
+ ;
+
+expr : ID @{ printf ("%s ", $$); @}
+ | TYPENAME '(' expr ')'
+ @{ printf ("%s <cast> ", $1); @}
+ | expr '+' expr @{ printf ("+ "); @}
+ | expr '=' expr @{ printf ("= "); @}
+ ;
+
+decl : TYPENAME declarator ';'
+ @{ printf ("%s <declare> ", $1); @}
+ | TYPENAME declarator '=' expr ';'
+ @{ printf ("%s <init-declare> ", $1); @}
+ ;
+
+declarator : ID @{ printf ("\"%s\" ", $1); @}
+ | '(' declarator ')'
+ ;
+@end example
+
+@noindent
+This models a problematic part of the C++ grammar---the ambiguity between
+certain declarations and statements. For example,
+
+@example
+T (x) = y+z;
+@end example
+
+@noindent
+parses as either an @code{expr} or a @code{stmt}
+(assuming that @samp{T} is recognized as a TYPENAME and @samp{x} as an ID).
+Bison detects this as a reduce/reduce conflict between the rules
+@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the
+time it encounters @code{x} in the example above. The two @code{%dprec}
+declarations, however, give precedence to interpreting the example as a
+@code{decl}, which implies that @code{x} is a declarator.
+The parser therefore prints
+
+@example
+"x" y z + T <init-declare>
+@end example
+
+Consider a different input string for this parser:
+
+@example
+T (x) + y;
+@end example
+
+@noindent
+Here, there is no ambiguity (this cannot be parsed as a declaration).
+However, at the time the Bison parser encounters @code{x}, it does not
+have enough information to resolve the reduce/reduce conflict (again,
+between @code{x} as an @code{expr} or a @code{declarator}). In this
+case, no precedence declaration is used. Instead, the parser splits
+into two, one assuming that @code{x} is an @code{expr}, and the other
+assuming @code{x} is a @code{declarator}. The second of these parsers
+then vanishes when it sees @code{+}, and the parser prints
+
+@example
+x T <cast> y +
+@end example
+
+Suppose that instead of resolving the ambiguity, you wanted to see all
+the possibilities. For this purpose, we must @dfn{merge} the semantic
+actions of the two possible parsers, rather than choosing one over the
+other. To do so, you could change the declaration of @code{stmt} as
+follows:
+
+@example
+stmt : expr ';' %merge <stmtMerge>
+ | decl %merge <stmtMerge>
+ ;
+@end example
+
+@noindent
+
+and define the @code{stmtMerge} function as:
+
+@example
+static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1)
+@{
+ printf ("<OR> ");
+ return "";
+@}
+@end example
+
+@noindent
+with an accompanying forward declaration
+in the C declarations at the beginning of the file:
+
+@example
+%@{
+ #define YYSTYPE const char*
+ static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
+%@}
+@end example
+
+@noindent
+With these declarations, the resulting parser will parse the first example
+as both an @code{expr} and a @code{decl}, and print
+
+@example
+"x" y z + T <init-declare> x T <cast> y z + = <OR>
+@end example
+
+
@node Locations Overview
@section Locations
@cindex location
is an array holding locations of all right hand side elements of the rule
being matched. The last one is the size of the right hand side rule.
-By default, it is defined this way:
+By default, it is defined this way for simple LALR(1) parsers:
@example
@group
@end group
@end example
+@noindent
+and like this for GLR parsers:
+
+@example
+@group
+#define YYLLOC_DEFAULT(Current, Rhs, N) \
+ Current.first_line = YYRHSLOC(Rhs,1).first_line; \
+ Current.first_column = YYRHSLOC(Rhs,1).first_column; \
+ Current.last_line = YYRHSLOC(Rhs,N).last_line; \
+ Current.last_column = YYRHSLOC(Rhs,N).last_column;
+@end group
+@end example
+
When defining @code{YYLLOC_DEFAULT}, you should consider that:
@itemize @bullet
@item %token-table
Generate an array of token names in the parser file. The name of the
array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
-token whose internal Bison token code number is @var{i}. The first three
-elements of @code{yytname} are always @code{"$"}, @code{"error"}, and
-@code{"$illegal"}; after these come the symbols defined in the grammar
-file.
+token whose internal Bison token code number is @var{i}. The first
+three elements of @code{yytname} are always @code{"$end"},
+@code{"error"}, and @code{"$undefined"}; after these come the symbols
+defined in the grammar file.
For single-character literal tokens and literal string tokens, the name
in the table includes the single-quote or double-quote characters: for
@findex YYBACKUP
Unshift a token. This macro is allowed only for rules that reduce
a single value, and only when there is no look-ahead token.
+It is also disallowed in GLR parsers.
It installs a look-ahead token with token type @var{token} and
semantic value @var{value}; then it discards the value that was
going to be reduced by this rule.
* Parser States:: The parser is a finite-state-machine with stack.
* Reduce/Reduce:: When two rules are applicable in the same situation.
* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
+* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
* Stack Overflow:: What happens when stack gets full. How to avoid it.
@end menu
;
@end example
+@node Generalized LR Parsing
+@section Generalized LR (GLR) Parsing
+@cindex GLR parsing
+@cindex generalized LR (GLR) parsing
+@cindex ambiguous grammars
+@cindex non-deterministic parsing
+
+Bison produces @emph{deterministic} parsers that choose uniquely
+when to reduce and which reduction to apply
+based on a summary of the preceding input and on one extra token of lookahead.
+As a result, normal Bison handles a proper subset of the family of
+context-free languages.
+Ambiguous grammars, since they have strings with more than one possible
+sequence of reductions cannot have deterministic parsers in this sense.
+The same is true of languages that require more than one symbol of
+lookahead, since the parser lacks the information necessary to make a
+decision at the point it must be made in a shift-reduce parser.
+Finally, as previously mentioned (@pxref{Mystery Conflicts}),
+there are languages where Bison's particular choice of how to
+summarize the input seen so far loses necessary information.
+
+When you use the @samp{%glr-parser} declaration in your grammar file,
+Bison generates a parser that uses a different algorithm, called
+Generalized LR (or GLR). A Bison GLR parser uses the same basic
+algorithm for parsing as an ordinary Bison parser, but behaves
+differently in cases where there is a shift-reduce conflict that has not
+been resolved by precedence rules (@pxref{Precedence}) or a
+reduce-reduce conflict. When a GLR parser encounters such a situation, it
+effectively @emph{splits} into a several parsers, one for each possible
+shift or reduction. These parsers then proceed as usual, consuming
+tokens in lock-step. Some of the stacks may encounter other conflicts
+and split further, with the result that instead of a sequence of states,
+a Bison GLR parsing stack is what is in effect a tree of states.
+
+In effect, each stack represents a guess as to what the proper parse
+is. Additional input may indicate that a guess was wrong, in which case
+the appropriate stack silently disappears. Otherwise, the semantics
+actions generated in each stack are saved, rather than being executed
+immediately. When a stack disappears, its saved semantic actions never
+get executed. When a reduction causes two stacks to become equivalent,
+their sets of semantic actions are both saved with the state that
+results from the reduction. We say that two stacks are equivalent
+when they both represent the same sequence of states,
+and each pair of corresponding states represents a
+grammar symbol that produces the same segment of the input token
+stream.
+
+Whenever the parser makes a transition from having multiple
+states to having one, it reverts to the normal LALR(1) parsing
+algorithm, after resolving and executing the saved-up actions.
+At this transition, some of the states on the stack will have semantic
+values that are sets (actually multisets) of possible actions. The
+parser tries to pick one of the actions by first finding one whose rule
+has the highest dynamic precedence, as set by the @samp{%dprec}
+declaration. Otherwise, if the alternative actions are not ordered by
+precedence, but there the same merging function is declared for both
+rules by the @samp{%merge} declaration,
+Bison resolves and evaluates both and then calls the merge function on
+the result. Otherwise, it reports an ambiguity.
+
+It is possible to use a data structure for the GLR parsing tree that
+permits the processing of any LALR(1) grammar in linear time (in the
+size of the input), any unambiguous (not necessarily LALR(1)) grammar in
+quadratic worst-case time, and any general (possibly ambiguous)
+context-free grammar in cubic worst-case time. However, Bison currently
+uses a simpler data structure that requires time proportional to the
+length of the input times the maximum number of stacks required for any
+prefix of the input. Thus, really ambiguous or non-deterministic
+grammars can require exponential time and space to process. Such badly
+behaving examples, however, are not generally of practical interest.
+Usually, non-determinism in a grammar is local---the parser is ``in
+doubt'' only for a few tokens at a time. Therefore, the current data
+structure should generally be adequate. On LALR(1) portions of a
+grammar, in particular, it is only slightly slower than with the default
+Bison parser.
+
@node Stack Overflow
@section Stack Overflow, and How to Avoid It
@cindex stack overflow
%%
@end example
-@command{bison} reports that @samp{calc.y contains 1 useless nonterminal
-and 1 useless rule} and that @samp{calc.y contains 7 shift/reduce
-conflicts}. When given @option{--report=state}, in addition to
-@file{calc.tab.c}, it creates a file @file{calc.output} with contents
-detailed below. The order of the output and the exact presentation
-might vary, but the interpretation is the same.
+@command{bison} reports:
+
+@example
+calc.y: warning: 1 useless nonterminal and 1 useless rule
+calc.y:11.1-7: warning: useless nonterminal: useless
+calc.y:11.8-12: warning: useless rule: useless: STR
+calc.y contains 7 shift/reduce conflicts.
+@end example
+
+When given @option{--report=state}, in addition to @file{calc.tab.c}, it
+creates a file @file{calc.output} with contents detailed below. The
+order of the output and the exact presentation might vary, but the
+interpretation is the same.
The first section includes details on conflicts that were solved thanks
to precedence and/or associativity:
Grammar
Number, Line, Rule
- 0 5 $axiom -> exp $
+ 0 5 $accept -> exp $end
1 5 exp -> exp '+' exp
2 6 exp -> exp '-' exp
3 7 exp -> exp '*' exp
@example
Terminals, with rules where they appear
-$ (0) 0
+$end (0) 0
'*' (42) 3
'+' (43) 1
'-' (45) 2
Nonterminals, with rules where they appear
-$axiom (8)
+$accept (8)
on left: 0
exp (9)
on left: 1 2 3 4 5, on right: 0 1 2 3 4
@example
state 0
- $axiom -> . exp $ (rule 0)
+ $accept -> . exp $ (rule 0)
NUM shift, and go to state 1
@example
state 0
- $axiom -> . exp $ (rule 0)
+ $accept -> . exp $ (rule 0)
exp -> . exp '+' exp (rule 1)
exp -> . exp '-' exp (rule 2)
exp -> . exp '*' exp (rule 3)
@example
state 2
- $axiom -> exp . $ (rule 0)
+ $accept -> exp . $ (rule 0)
exp -> exp . '+' exp (rule 1)
exp -> exp . '-' exp (rule 2)
exp -> exp . '*' exp (rule 3)
@example
state 3
- $axiom -> exp $ . (rule 0)
+ $accept -> exp $ . (rule 0)
$default accept
@end example
@table @code
@item @@$
In an action, the location of the left-hand side of the rule.
- @xref{Locations, , Locations Overview}.
+@xref{Locations, , Locations Overview}.
@item @@@var{n}
In an action, the location of the @var{n}-th symbol of the right-hand
In an action, the semantic value of the @var{n}-th symbol of the
right-hand side of the rule. @xref{Actions}.
+@item $accept
+The predefined nonterminal whose only rule is @samp{$accept: @var{start}
+$end}, where @var{start} is the start symbol. @xref{Start Decl, , The
+Start-Symbol}. It cannot be used in the grammar.
+
+@item $end
+The predefined token marking the end of the token stream. It cannot be
+used in the grammar.
+
+@item $undefined
+The predefined token onto which all undefined values returned by
+@code{yylex} are mapped. It cannot be used in the grammar, rather, use
+@code{error}.
+
@item error
A token name reserved for error recovery. This token may be used in
grammar rules so as to allow the Bison parser to recognize an error in
Bison declaration to create a header file meant for the scanner.
@xref{Decl Summary}.
+@item %dprec
+Bison declaration to assign a precedence to a rule that is used at parse
+time to resolve reduce/reduce conflicts. @xref{GLR Parsers}.
+
@item %file-prefix="@var{prefix}"
-Bison declaration to set tge prefix of the output files. @xref{Decl
+Bison declaration to set the prefix of the output files. @xref{Decl
Summary}.
+@item %glr-parser
+Bison declaration to produce a GLR parser. @xref{GLR Parsers}.
+
@c @item %source-extension
@c Bison declaration to specify the generated parser output file extension.
@c @xref{Decl Summary}.
Bison declaration to assign left associativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
+@item %merge
+Bison declaration to assign a merging function to a rule. If there is a
+reduce/reduce conflict with a rule having the same merging function, the
+function is applied to the two semantic values to get a single result.
+@xref{GLR Parsers}.
+
@item %name-prefix="@var{prefix}"
Bison declaration to rename the external symbols. @xref{Decl Summary}.
parsed, and the states correspond to various stages in the grammar
rules. @xref{Algorithm, ,The Bison Parser Algorithm }.
+@item Generalized LR (GLR)
+A parsing algorithm that can handle all context-free grammars, including those
+that are not LALR(1). It resolves situations that Bison's usual LALR(1)
+algorithm cannot by effectively splitting off multiple parsers, trying all
+possible parsers, and discarding those that fail in the light of additional
+right context. @xref{Generalized LR Parsing, ,Generalized LR Parsing}.
+
@item Grouping
A language construct that is (in general) grammatically divisible;
for example, `expression' or `declaration' in C.