X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/4f646c3794c45940aaf96d5409eff02a2c74978e..8d90395dd7aa01e6616cb46eddc1dbddafd8d9ee:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index a3a85a2b..6ed33668 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -352,10 +352,12 @@ Copying This Manual @cindex introduction @dfn{Bison} is a general-purpose parser generator that converts an -annotated context-free grammar into an @acronym{LALR}(1) or -@acronym{GLR} parser for that grammar. Once you are proficient with -Bison, you can use it to develop a wide range of language parsers, from those -used in simple desk calculators to complex programming languages. +annotated context-free grammar into a deterministic or @acronym{GLR} +parser employing @acronym{LALR}(1), @acronym{IELR}(1), or canonical +@acronym{LR}(1) parser tables. +Once you are proficient with Bison, you can use it to develop a wide +range of language parsers, from those used in simple desk calculators to +complex programming languages. Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc @@ -461,26 +463,27 @@ order to specify the language Algol 60. Any grammar expressed in essentially machine-readable @acronym{BNF}. @cindex @acronym{LALR}(1) grammars +@cindex @acronym{IELR}(1) grammars @cindex @acronym{LR}(1) grammars -There are various important subclasses of context-free grammar. Although it -can handle almost all context-free grammars, Bison is optimized for what -are called @acronym{LALR}(1) grammars. -In brief, in these grammars, it must be possible to -tell how to parse any portion of an input string with just a single -token of lookahead. Strictly speaking, that is a description of an -@acronym{LR}(1) grammar, and @acronym{LALR}(1) involves additional -restrictions that are -hard to explain simply; but it is rare in actual practice to find an -@acronym{LR}(1) grammar that fails to be @acronym{LALR}(1). +There are various important subclasses of context-free grammars. +Although it can handle almost all context-free grammars, Bison is +optimized for what are called @acronym{LR}(1) grammars. +In brief, in these grammars, it must be possible to tell how to parse +any portion of an input string with just a single token of lookahead. +For historical reasons, Bison by default is limited by the additional +restrictions of @acronym{LALR}(1), which is hard to explain simply. @xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for more information on this. +To escape these additional restrictions, you can request +@acronym{IELR}(1) or canonical @acronym{LR}(1) parser tables. +@xref{Decl Summary,,lr.type}, to learn how. @cindex @acronym{GLR} parsing @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars @cindex nondeterministic parsing -Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning +Parsers for @acronym{LR}(1) grammars are @dfn{deterministic}, meaning roughly that the next grammar rule to apply at any point in the input is uniquely determined by the preceding input and a fixed, finite portion (called a @dfn{lookahead}) of the remaining input. A context-free @@ -709,8 +712,8 @@ from the values of the two subexpressions. @cindex shift/reduce conflicts @cindex reduce/reduce conflicts -In some grammars, Bison's standard -@acronym{LALR}(1) parsing algorithm cannot decide whether to apply a +In some grammars, Bison's deterministic +@acronym{LR}(1) parsing algorithm cannot decide whether to apply a certain grammar rule at a given point. That is, it may not be able to decide (on the basis of the input read so far) which of two possible reductions (applications of a grammar rule) applies, or whether to apply @@ -719,13 +722,13 @@ input. These are known respectively as @dfn{reduce/reduce} conflicts (@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts (@pxref{Shift/Reduce}). -To use a grammar that is not easily modified to be @acronym{LALR}(1), a +To use a grammar that is not easily modified to be @acronym{LR}(1), a more general parsing algorithm is sometimes necessary. If you include @code{%glr-parser} among the Bison declarations in your file (@pxref{Grammar Outline}), the result is a Generalized @acronym{LR} (@acronym{GLR}) parser. These parsers handle Bison grammars that contain no unresolved conflicts (i.e., after applying precedence -declarations) identically to @acronym{LALR}(1) parsers. However, when +declarations) identically to deterministic parsers. However, when faced with unresolved shift/reduce and reduce/reduce conflicts, @acronym{GLR} parsers use the simple expedient of doing both, effectively cloning the parser to follow both possibilities. Each of @@ -767,11 +770,8 @@ merged result. @cindex shift/reduce conflicts In the simplest cases, you can use the @acronym{GLR} algorithm -to parse grammars that are unambiguous, but fail to be @acronym{LALR}(1). -Such grammars typically require more than one symbol of lookahead, -or (in rare cases) fall into the category of grammars in which the -@acronym{LALR}(1) algorithm throws away too much information (they are in -@acronym{LR}(1), but not @acronym{LALR}(1), @ref{Mystery Conflicts}). +to parse grammars that are unambiguous but fail to be @acronym{LR}(1). +Such grammars typically require more than one symbol of lookahead. Consider a problem that arises in the declaration of enumerated and subrange types in the @@ -808,7 +808,7 @@ type enum = (a); valid, and more-complicated cases can come up in practical programs.) These two declarations look identical until the @samp{..} token. -With normal @acronym{LALR}(1) one-token lookahead it is not +With normal @acronym{LR}(1) one-token lookahead it is not possible to decide between the two forms when the identifier @samp{a} is parsed. It is, however, desirable for a parser to decide this, since in the latter case @@ -847,9 +847,9 @@ reports a syntax error as usual. The effect of all this is that the parser seems to ``guess'' the correct branch to take, or in other words, it seems to use more -lookahead than the underlying @acronym{LALR}(1) algorithm actually allows -for. In this example, @acronym{LALR}(2) would suffice, but also some cases -that are not @acronym{LALR}(@math{k}) for any @math{k} can be handled this way. +lookahead than the underlying @acronym{LR}(1) algorithm actually allows +for. In this example, @acronym{LR}(2) would suffice, but also some cases +that are not @acronym{LR}(@math{k}) for any @math{k} can be handled this way. In general, a @acronym{GLR} parser can take quadratic or cubic worst-case time, and the current Bison parser even takes exponential time and space @@ -902,7 +902,7 @@ expr : '(' expr ')' @end group @end example -When used as a normal @acronym{LALR}(1) grammar, Bison correctly complains +When used as a normal @acronym{LR}(1) grammar, Bison correctly complains about one reduce/reduce conflict. In the conflicting situation the parser chooses one of the alternatives, arbitrarily the one declared first. Therefore the following correct input is not @@ -934,7 +934,7 @@ there are at least two potential problems to beware. First, always analyze the conflicts reported by Bison to make sure that @acronym{GLR} splitting is only done where it is intended. A @acronym{GLR} parser splitting inadvertently may cause problems less obvious than an -@acronym{LALR} parser statically choosing the wrong alternative in a +@acronym{LR} parser statically choosing the wrong alternative in a conflict. Second, consider interactions with the lexer (@pxref{Semantic Tokens}) with great care. Since a split parser consumes tokens without performing any actions during the split, the lexer cannot obtain @@ -1154,7 +1154,7 @@ Another Bison feature requiring special consideration is @code{YYERROR} (@pxref{Action Features}), which you can invoke in a semantic action to initiate error recovery. During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is -the same as its effect in an @acronym{LALR}(1) parser. +the same as its effect in a deterministic parser. In a deferred semantic action, its effect is undefined. @c The effect is probably a syntax error at the split point. @@ -2706,9 +2706,6 @@ feature test macros can affect the behavior of Bison-generated @findex %code requires @findex %code provides @findex %code top -(The prologue alternatives described here are experimental. -More user feedback will help to determine whether they should become permanent -features.) The functionality of @var{Prologue} sections can often be subtle and inflexible. @@ -3053,8 +3050,8 @@ A @dfn{nonterminal symbol} stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case. -Symbol names can contain letters, underscores, period, and (not at the -beginning) digits and dashes. Dashes in symbol names are a GNU +Symbol names can contain letters, underscores, periods, dashes, and (not +at the beginning) digits. Dashes in symbol names are a GNU extension, incompatible with @acronym{POSIX} Yacc. Terminal symbols that contain periods or dashes make little sense: since they are not valid symbols (in most programming languages) they are not exported as @@ -4475,7 +4472,7 @@ be @var{n} shift/reduce conflicts and no reduce/reduce conflicts. Bison reports an error if the number of shift/reduce conflicts differs from @var{n}, or if there are any reduce/reduce conflicts. -For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more +For deterministic parsers, reduce/reduce conflicts are more serious, and should be eliminated entirely. Bison will always report reduce/reduce conflicts for these parsers. With @acronym{GLR} parsers, however, both kinds of conflicts are routine; otherwise, @@ -4544,7 +4541,7 @@ statically allocated variables for communication with @code{yylex}, including @code{yylval} and @code{yylloc}.) Alternatively, you can generate a pure, reentrant parser. The Bison -declaration @code{%define api.pure} says that you want the parser to be +declaration @samp{%define api.pure} says that you want the parser to be reentrant. It looks like this: @example @@ -4569,7 +4566,7 @@ valid grammar. @subsection A Push Parser @cindex push parser @cindex push parser -@findex %define api.push_pull +@findex %define api.push-pull (The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.) @@ -4585,10 +4582,10 @@ within a certain time period. Normally, Bison generates a pull parser. The following Bison declaration says that you want the parser to be a push -parser (@pxref{Decl Summary,,%define api.push_pull}): +parser (@pxref{Decl Summary,,%define api.push-pull}): @example -%define api.push_pull "push" +%define api.push-pull "push" @end example In almost all cases, you want to ensure that your push parser is also @@ -4599,7 +4596,7 @@ what you are doing, your declarations should look like this: @example %define api.pure -%define api.push_pull "push" +%define api.push-pull "push" @end example There is a major notable functional difference between the pure push parser @@ -4648,14 +4645,14 @@ for use by the next invocation of the @code{yypush_parse} function. Bison also supports both the push parser interface along with the pull parser interface in the same generated parser. In order to get this functionality, -you should replace the @code{%define api.push_pull "push"} declaration with the -@code{%define api.push_pull "both"} declaration. Doing this will create all of +you should replace the @samp{%define api.push-pull "push"} declaration with the +@samp{%define api.push-pull "both"} declaration. Doing this will create all of the symbols mentioned earlier along with the two extra symbols, @code{yyparse} and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally would be used. However, the user should note that it is implemented in the generated parser by calling @code{yypull_parse}. This makes the @code{yyparse} function that is generated with the -@code{%define api.push_pull "both"} declaration slower than the normal +@samp{%define api.push-pull "both"} declaration slower than the normal @code{yyparse} function. If the user calls the @code{yypull_parse} function it will parse the rest of the input stream. It is possible to @code{yypush_parse} tokens to select a subgrammar @@ -4671,9 +4668,9 @@ yypull_parse (ps); /* Will call the lexer */ yypstate_delete (ps); @end example -Adding the @code{%define api.pure} declaration does exactly the same thing to -the generated parser with @code{%define api.push_pull "both"} as it did for -@code{%define api.push_pull "push"}. +Adding the @samp{%define api.pure} declaration does exactly the same thing to +the generated parser with @samp{%define api.push-pull "both"} as it did for +@samp{%define api.push-pull "push"}. @node Decl Summary @subsection Bison Declaration Summary @@ -4753,10 +4750,6 @@ Thus, @code{%code} replaces the traditional Yacc prologue, For a detailed discussion, see @ref{Prologue Alternatives}. For Java, the default location is inside the parser class. - -(Like all the Yacc prologue alternatives, this directive is experimental. -More user feedback will help to determine whether it should become a permanent -feature.) @end deffn @deffn {Directive} %code @var{qualifier} @{@var{code}@} @@ -4834,10 +4827,6 @@ before any class definitions. @end itemize @end itemize -(Like all the Yacc prologue alternatives, this directive is experimental. -More user feedback will help to determine whether it should become a permanent -feature.) - @cindex Prologue For a detailed discussion of how to use @code{%code} in place of the traditional Yacc prologue for C/C++, see @ref{Prologue Alternatives}. @@ -4856,7 +4845,8 @@ The possible choices for @var{variable}, as well as their meanings, depend on the selected target language and/or the parser skeleton (@pxref{Decl Summary,,%language}, @pxref{Decl Summary,,%skeleton}). -Bison will warn if a @var{variable} is defined multiple times. +It is an error if a @var{variable} is defined by @code{%define} multiple +times, but @ref{Bison Options,,-D @var{name}[=@var{value}]}. Omitting @code{"@var{value}"} is always equivalent to specifying it as @code{""}. @@ -4881,6 +4871,61 @@ target language and/or parser skeleton. Some of the accepted @var{variable}s are: @table @code +@c ================================================== namespace +@item api.namespace +@findex %define api.namespace +@itemize +@item Languages(s): C++ + +@item Purpose: Specifies the namespace for the parser class. +For example, if you specify: + +@smallexample +%define api.namespace "foo::bar" +@end smallexample + +Bison uses @code{foo::bar} verbatim in references such as: + +@smallexample +foo::bar::parser::semantic_type +@end smallexample + +However, to open a namespace, Bison removes any leading @code{::} and then +splits on any remaining occurrences: + +@smallexample +namespace foo @{ namespace bar @{ + class position; + class location; +@} @} +@end smallexample + +@item Accepted Values: +Any absolute or relative C++ namespace reference without a trailing +@code{"::"}. For example, @code{"foo"} or @code{"::foo::bar"}. + +@item Default Value: +The value specified by @code{%name-prefix}, which defaults to @code{yy}. +This usage of @code{%name-prefix} is for backward compatibility and can +be confusing since @code{%name-prefix} also specifies the textual prefix +for the lexical analyzer function. Thus, if you specify +@code{%name-prefix}, it is best to also specify @samp{%define +api.namespace} so that @code{%name-prefix} @emph{only} affects the +lexical analyzer function. For example, if you specify: + +@smallexample +%define api.namespace "foo" +%name-prefix "bar::" +@end smallexample + +The parser namespace is @code{foo} and @code{yylex} is referenced as +@code{bar::lex}. +@end itemize +@c namespace + + + +@c ================================================== api.pure @item api.pure @findex %define api.pure @@ -4894,12 +4939,16 @@ Some of the accepted @var{variable}s are: @item Default Value: @code{"false"} @end itemize +@c api.pure -@item api.push_pull -@findex %define api.push_pull + + +@c ================================================== api.push-pull +@item api.push-pull +@findex %define api.push-pull @itemize @bullet -@item Language(s): C (LALR(1) only) +@item Language(s): C (deterministic parsers only) @item Purpose: Requests a pull parser, a push parser, or both. @xref{Push Decl, ,A Push Parser}. @@ -4910,9 +4959,112 @@ More user feedback will help to stabilize it.) @item Default Value: @code{"pull"} @end itemize +@c api.push-pull + +@item api.tokens.prefix +@findex %define api.tokens.prefix + +@itemize +@item Languages(s): all + +@item Purpose: +Add a prefix to the token names when generating their definition in the +target language. For instance + +@example +%token FILE for ERROR +%define api.tokens.prefix "TOK_" +%% +start: FILE for ERROR; +@end example + +@noindent +generates the definition of the symbols @code{TOK_FILE}, @code{TOK_for}, +and @code{TOK_ERROR} in the generated source files. In particular, the +scanner must use these prefixed token names, while the grammar itself +may still use the short names (as in the sample rule given above). The +generated informational files (@file{*.output}, @file{*.xml}, +@file{*.dot}) are not modified by this prefix. See @ref{Calc++ Parser} +and @ref{Calc++ Scanner}, for a complete example. + +@item Accepted Values: +Any string. Should be a valid identifier prefix in the target language, +in other words, it should typically be an identifier itself (sequence of +letters, underscores, and ---not at the beginning--- digits). + +@item Default Value: +empty +@end itemize +@c api.tokens.prefix + + +@item lr.default-reductions +@cindex default reductions +@findex %define lr.default-reductions +@cindex delayed syntax errors +@cindex syntax errors delayed + +@itemize @bullet +@item Language(s): all + +@item Purpose: Specifies the kind of states that are permitted to +contain default reductions. +That is, in such a state, Bison declares the reduction with the largest +lookahead set to be the default reduction and then removes that +lookahead set. +The advantages of default reductions are discussed below. +The disadvantage is that, when the generated parser encounters a +syntactically unacceptable token, the parser might then perform +unnecessary default reductions before it can detect the syntax error. + +(This feature is experimental. +More user feedback will help to stabilize it.) + +@item Accepted Values: +@itemize +@item @code{"all"}. +For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl +Summary,,lr.type}) by default, all states are permitted to contain +default reductions. +The advantage is that parser table sizes can be significantly reduced. +The reason Bison does not by default attempt to address the disadvantage +of delayed syntax error detection is that this disadvantage is already +inherent in @acronym{LALR} and @acronym{IELR} parser tables. +That is, unlike in a canonical @acronym{LR} state, the lookahead sets of +reductions in an @acronym{LALR} or @acronym{IELR} state can contain +tokens that are syntactically incorrect for some left contexts. + +@item @code{"consistent"}. +@cindex consistent states +A consistent state is a state that has only one possible action. +If that action is a reduction, then the parser does not need to request +a lookahead token from the scanner before performing that action. +However, the parser only recognizes the ability to ignore the lookahead +token when such a reduction is encoded as a default reduction. +Thus, if default reductions are permitted in and only in consistent +states, then a canonical @acronym{LR} parser reports a syntax error as +soon as it @emph{needs} the syntactically unacceptable token from the +scanner. + +@item @code{"accepting"}. +@cindex accepting state +By default, the only default reduction permitted in a canonical +@acronym{LR} parser is the accept action in the accepting state, which +the parser reaches only after reading all tokens from the input. +Thus, the default canonical @acronym{LR} parser reports a syntax error +as soon as it @emph{reaches} the syntactically unacceptable token +without performing any extra reductions. +@end itemize + +@item Default Value: +@itemize +@item @code{"accepting"} if @code{lr.type} is @code{"canonical LR"}. +@item @code{"all"} otherwise. +@end itemize +@end itemize -@item lr.keep_unreachable_states -@findex %define lr.keep_unreachable_states +@item lr.keep-unreachable-states +@findex %define lr.keep-unreachable-states @itemize @bullet @item Language(s): all @@ -4953,60 +5105,95 @@ states. However, Bison does not compute which goto actions are useless. @end itemize @end itemize +@c lr.keep-unreachable-states -@item namespace -@findex %define namespace - -@itemize -@item Languages(s): C++ - -@item Purpose: Specifies the namespace for the parser class. -For example, if you specify: - -@smallexample -%define namespace "foo::bar" -@end smallexample +@item lr.type +@findex %define lr.type +@cindex @acronym{LALR} +@cindex @acronym{IELR} +@cindex @acronym{LR} -Bison uses @code{foo::bar} verbatim in references such as: +@itemize @bullet +@item Language(s): all -@smallexample -foo::bar::parser::semantic_type -@end smallexample +@item Purpose: Specifies the type of parser tables within the +@acronym{LR}(1) family. +(This feature is experimental. +More user feedback will help to stabilize it.) -However, to open a namespace, Bison removes any leading @code{::} and then -splits on any remaining occurrences: +@item Accepted Values: +@itemize +@item @code{"LALR"}. +While Bison generates @acronym{LALR} parser tables by default for +historical reasons, @acronym{IELR} or canonical @acronym{LR} is almost +always preferable for deterministic parsers. +The trouble is that @acronym{LALR} parser tables can suffer from +mysterious conflicts and thus may not accept the full set of sentences +that @acronym{IELR} and canonical @acronym{LR} accept. +@xref{Mystery Conflicts}, for details. +However, there are at least two scenarios where @acronym{LALR} may be +worthwhile: +@itemize +@cindex @acronym{GLR} with @acronym{LALR} +@item When employing @acronym{GLR} parsers (@pxref{GLR Parsers}), if you +do not resolve any conflicts statically (for example, with @code{%left} +or @code{%prec}), then the parser explores all potential parses of any +given input. +In this case, the use of @acronym{LALR} parser tables is guaranteed not +to alter the language accepted by the parser. +@acronym{LALR} parser tables are the smallest parser tables Bison can +currently generate, so they may be preferable. + +@item Occasionally during development, an especially malformed grammar +with a major recurring flaw may severely impede the @acronym{IELR} or +canonical @acronym{LR} parser table generation algorithm. +@acronym{LALR} can be a quick way to generate parser tables in order to +investigate such problems while ignoring the more subtle differences +from @acronym{IELR} and canonical @acronym{LR}. +@end itemize -@smallexample -namespace foo @{ namespace bar @{ - class position; - class location; -@} @} -@end smallexample +@item @code{"IELR"}. +@acronym{IELR} is a minimal @acronym{LR} algorithm. +That is, given any grammar (@acronym{LR} or non-@acronym{LR}), +@acronym{IELR} and canonical @acronym{LR} always accept exactly the same +set of sentences. +However, as for @acronym{LALR}, the number of parser states is often an +order of magnitude less for @acronym{IELR} than for canonical +@acronym{LR}. +More importantly, because canonical @acronym{LR}'s extra parser states +may contain duplicate conflicts in the case of non-@acronym{LR} +grammars, the number of conflicts for @acronym{IELR} is often an order +of magnitude less as well. +This can significantly reduce the complexity of developing of a grammar. + +@item @code{"canonical LR"}. +@cindex delayed syntax errors +@cindex syntax errors delayed +The only advantage of canonical @acronym{LR} over @acronym{IELR} is +that, for every left context of every canonical @acronym{LR} state, the +set of tokens accepted by that state is the exact set of tokens that is +syntactically acceptable in that left context. +Thus, the only difference in parsing behavior is that the canonical +@acronym{LR} parser can report a syntax error as soon as possible +without performing any unnecessary reductions. +@xref{Decl Summary,,lr.default-reductions}, for further details. +Even when canonical @acronym{LR} behavior is ultimately desired, +@acronym{IELR}'s elimination of duplicate conflicts should still +facilitate the development of a grammar. +@end itemize -@item Accepted Values: Any absolute or relative C++ namespace reference without -a trailing @code{"::"}. -For example, @code{"foo"} or @code{"::foo::bar"}. - -@item Default Value: The value specified by @code{%name-prefix}, which defaults -to @code{yy}. -This usage of @code{%name-prefix} is for backward compatibility and can be -confusing since @code{%name-prefix} also specifies the textual prefix for the -lexical analyzer function. -Thus, if you specify @code{%name-prefix}, it is best to also specify -@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the -lexical analyzer function. -For example, if you specify: +@item Default Value: @code{"LALR"} +@end itemize -@smallexample -%define namespace "foo" -%name-prefix "bar::" -@end smallexample -The parser namespace is @code{foo} and @code{yylex} is referenced as -@code{bar::lex}. -@end itemize +@c ================================================== namespace +@item namespace +@findex %define namespace +Obsoleted by @code{api.namespace} @c namespace + +@c ================================================== parse.assert @item parse.assert @findex %define parse.assert @@ -5023,6 +5210,34 @@ destroyed properly. This option checks these constraints. @end itemize @c parse.assert + +@c ================================================== parse.error +@item parse.error +@findex %define parse.error +@itemize +@item Languages(s): +all. +@item Purpose: +Control the kind of error messages passed to the error reporting +function. @xref{Error Reporting, ,The Error Reporting Function +@code{yyerror}}. +@item Accepted Values: +@itemize +@item @code{"simple"} +Error messages passed to @code{yyerror} are simply @w{@code{"syntax +error"}}. +@item @code{"verbose"} +Error messages report the unexpected token, and possibly the expected +ones. +@end itemize + +@item Default Value: +@code{simple} +@end itemize +@c parse.error + + +@c ================================================== parse.trace @item parse.trace @findex %define parse.trace @@ -5038,10 +5253,11 @@ is not already defined, so that the debugging facilities are compiled. @item Default Value: @code{false} @end itemize -@end table @c parse.trace + +@end table @end deffn -@c %define +@c ---------------------------------------------------------- %define @deffn {Directive} %defines Write a header file containing macro definitions for the token type @@ -5125,7 +5341,7 @@ is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs}, @code{yypstate_new} and @code{yypstate_delete} will also be renamed. For example, if you use @samp{%name-prefix "c_"}, the names become @code{c_parse}, @code{c_lex}, and so on. -For C++ parsers, see the @code{%define namespace} documentation in this +For C++ parsers, see the @samp{%define api.namespace} documentation in this section. @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. @end deffn @@ -5152,7 +5368,7 @@ Specify @var{file} for the parser file. @end deffn @deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{Decl Summary, ,%define}), +Deprecated version of @samp{%define api.pure} (@pxref{Decl Summary, ,%define}), for which Bison is more careful to warn about unreasonable usage. @end deffn @@ -5366,8 +5582,8 @@ exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} More user feedback will help to stabilize it.) You call the function @code{yypush_parse} to parse a single token. This -function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +function is available if either the @samp{%define api.push-pull "push"} or +@samp{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun int yypush_parse (yypstate *yyps) @@ -5384,7 +5600,7 @@ is required to finish parsing the grammar. More user feedback will help to stabilize it.) You call the function @code{yypull_parse} to parse the rest of the input -stream. This function is available if the @code{%define api.push_pull "both"} +stream. This function is available if the @samp{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @@ -5400,8 +5616,8 @@ The value returned by @code{yypull_parse} is the same as for @code{yyparse}. More user feedback will help to stabilize it.) You call the function @code{yypstate_new} to create a new parser instance. -This function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +This function is available if either the @samp{%define api.push-pull "push"} or +@samp{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun yypstate *yypstate_new (void) @@ -5419,8 +5635,8 @@ allocated. More user feedback will help to stabilize it.) You call the function @code{yypstate_delete} to delete a parser instance. -function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +function is available if either the @samp{%define api.push-pull "push"} or +@samp{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun void yypstate_delete (yypstate *yyps) @@ -5608,7 +5824,7 @@ The data type of @code{yylloc} has the name @code{YYLTYPE}. @node Pure Calling @subsection Calling Conventions for Pure Parsers -When you use the Bison declaration @code{%define api.pure} to request a +When you use the Bison declaration @samp{%define api.pure} to request a pure, reentrant parser, the global communication variables @code{yylval} and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.) In such parsers the two global variables are replaced by @@ -5659,7 +5875,7 @@ int yylex (int *nastiness); int yyparse (int *nastiness, int *randomness); @end example -If @code{%define api.pure} is added: +If @samp{%define api.pure} is added: @example int yylex (YYSTYPE *lvalp, int *nastiness); @@ -5667,7 +5883,7 @@ int yyparse (int *nastiness, int *randomness); @end example @noindent -and finally, if both @code{%define api.pure} and @code{%locations} are used: +and finally, if both @samp{%define api.pure} and @code{%locations} are used: @example int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); @@ -5681,7 +5897,7 @@ int yyparse (int *nastiness, int *randomness); @cindex parse error @cindex syntax error -The Bison parser detects a @dfn{syntax error} or @dfn{parse error} +The Bison parser detects a @dfn{syntax error} (or @dfn{parse error}) whenever it reads a token which cannot satisfy any syntax rule. An action in the grammar can also explicitly proclaim an error, using the macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use @@ -5693,8 +5909,8 @@ called by @code{yyparse} whenever a syntax error is found, and it receives one argument. For a syntax error, the string is normally @w{@code{"syntax error"}}. -@findex %error-verbose -If you invoke the directive @code{%error-verbose} in the Bison +@findex %define parse.error +If you invoke @samp{%define parse.error "verbose"} in the Bison declarations section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose and specific error message string instead of just plain @w{@code{"syntax error"}}. @@ -5751,7 +5967,7 @@ void yyerror (int *nastiness, char const *msg); /* GLR parsers. */ Finally, @acronym{GLR} and Yacc parsers share the same @code{yyerror} calling convention for absolutely pure parsers, i.e., when the calling convention of @code{yylex} @emph{and} the calling convention of -@code{%define api.pure} are pure. +@samp{%define api.pure} are pure. I.e.: @example @@ -6708,12 +6924,13 @@ a @code{name} if a comma or colon follows, or a @code{type} if another @cindex @acronym{LR}(1) @cindex @acronym{LALR}(1) -However, Bison, like most parser generators, cannot actually handle all -@acronym{LR}(1) grammars. In this grammar, two contexts, that after -an @code{ID} -at the beginning of a @code{param_spec} and likewise at the beginning of -a @code{return_spec}, are similar enough that Bison assumes they are the -same. They appear similar because the same set of rules would be +However, for historical reasons, Bison cannot by default handle all +@acronym{LR}(1) grammars. +In this grammar, two contexts, that after an @code{ID} at the beginning +of a @code{param_spec} and likewise at the beginning of a +@code{return_spec}, are similar enough that Bison assumes they are the +same. +They appear similar because the same set of rules would be active---the rule for reducing to a @code{name} and that for reducing to a @code{type}. Bison is unable to determine at that stage of processing that the rules would require different lookahead tokens in the two @@ -6721,16 +6938,22 @@ contexts, so it makes a single parser state for them both. Combining the two contexts causes a conflict later. In parser terminology, this occurrence means that the grammar is not @acronym{LALR}(1). -In general, it is better to fix deficiencies than to document them. But -this particular deficiency is intrinsically hard to fix; parser -generators that can handle @acronym{LR}(1) grammars are hard to write -and tend to -produce parsers that are very large. In practice, Bison is more useful -as it is now. - -When the problem arises, you can often fix it by identifying the two -parser states that are being confused, and adding something to make them -look distinct. In the above example, adding one rule to +For many practical grammars (specifically those that fall into the +non-@acronym{LR}(1) class), the limitations of @acronym{LALR}(1) result in +difficulties beyond just mysterious reduce/reduce conflicts. +The best way to fix all these problems is to select a different parser +table generation algorithm. +Either @acronym{IELR}(1) or canonical @acronym{LR}(1) would suffice, but +the former is more efficient and easier to debug during development. +@xref{Decl Summary,,lr.type}, for details. +(Bison's @acronym{IELR}(1) and canonical @acronym{LR}(1) implementations +are experimental. +More user feedback will help to stabilize them.) + +If you instead wish to work around @acronym{LALR}(1)'s limitations, you +can often fix a mysterious conflict by identifying the two parser states +that are being confused, and adding something to make them look +distinct. In the above example, adding one rule to @code{return_spec} as follows makes the problem go away: @example @@ -6798,7 +7021,7 @@ The same is true of languages that require more than one symbol of lookahead, since the parser lacks the information necessary to make a decision at the point it must be made in a shift-reduce parser. Finally, as previously mentioned (@pxref{Mystery Conflicts}), -there are languages where Bison's particular choice of how to +there are languages where Bison's default choice of how to summarize the input seen so far loses necessary information. When you use the @samp{%glr-parser} declaration in your grammar file, @@ -6830,7 +7053,7 @@ grammar symbol that produces the same segment of the input token stream. Whenever the parser makes a transition from having multiple -states to having one, it reverts to the normal @acronym{LALR}(1) parsing +states to having one, it reverts to the normal deterministic parsing algorithm, after resolving and executing the saved-up actions. At this transition, some of the states on the stack will have semantic values that are sets (actually multisets) of possible actions. The @@ -6843,9 +7066,9 @@ Bison resolves and evaluates both and then calls the merge function on the result. Otherwise, it reports an ambiguity. It is possible to use a data structure for the @acronym{GLR} parsing tree that -permits the processing of any @acronym{LALR}(1) grammar in linear time (in the +permits the processing of any @acronym{LR}(1) grammar in linear time (in the size of the input), any unambiguous (not necessarily -@acronym{LALR}(1)) grammar in +@acronym{LR}(1)) grammar in quadratic worst-case time, and any general (possibly ambiguous) context-free grammar in cubic worst-case time. However, Bison currently uses a simpler data structure that requires time proportional to the @@ -6855,9 +7078,9 @@ grammars can require exponential time and space to process. Such badly behaving examples, however, are not generally of practical interest. Usually, nondeterminism in a grammar is local---the parser is ``in doubt'' only for a few tokens at a time. Therefore, the current data -structure should generally be adequate. On @acronym{LALR}(1) portions of a -grammar, in particular, it is only slightly slower than with the default -Bison parser. +structure should generally be adequate. On @acronym{LR}(1) portions of a +grammar, in particular, it is only slightly slower than with the +deterministic @acronym{LR}(1) Bison parser. For a more detailed exposition of @acronym{GLR} parsers, please see: Elizabeth Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style @@ -6906,16 +7129,16 @@ The default value of @code{YYMAXDEPTH}, if you do not define it, is @vindex YYINITDEPTH You can control how much stack is allocated initially by defining the -macro @code{YYINITDEPTH} to a positive integer. For the C -@acronym{LALR}(1) parser, this value must be a compile-time constant +macro @code{YYINITDEPTH} to a positive integer. For the deterministic +parser in C, this value must be a compile-time constant unless you are assuming C99 or some other target language or compiler that allows variable-length arrays. The default is 200. Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}. @c FIXME: C++ output. -Because of semantical differences between C and C++, the -@acronym{LALR}(1) parsers in C produced by Bison cannot grow when compiled +Because of semantical differences between C and C++, the deterministic +parsers in C produced by Bison cannot grow when compiled by C++ compilers. In this precise case (compiling a C parser as C++) you are suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix this deficiency in a future release. @@ -7303,7 +7526,8 @@ useless: STR; @command{bison} reports: @example -calc.y: warning: 1 nonterminal and 1 rule useless in grammar +calc.y: warning: 1 nonterminal useless in grammar +calc.y: warning: 1 rule useless in grammar calc.y:11.1-7: warning: nonterminal useless in grammar: useless calc.y:11.10-12: warning: rule useless in grammar: useless: STR calc.y: conflicts: 7 shift/reduce @@ -7491,6 +7715,7 @@ control will jump to state 4, corresponding to the item @samp{exp -> exp '+' . exp}. Since there is no default action, any other token than those listed above will trigger a syntax error. +@cindex accepting state The state 3 is named the @dfn{final state}, or the @dfn{accepting state}: @@ -7571,7 +7796,7 @@ sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM / NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) / NUM}, which corresponds to reducing rule 1. -Because in @acronym{LALR}(1) parsing a single decision can be made, Bison +Because in deterministic parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, , Shift/Reduce Conflicts}. Discarded actions are reported in between square brackets. @@ -7871,7 +8096,7 @@ other minor ways. Most importantly, imitate Yacc's output file name conventions, so that the parser output file is called @file{y.tab.c}, and the other outputs are called @file{y.output} and @file{y.tab.h}. -Also, if generating an @acronym{LALR}(1) parser in C, generate @code{#define} +Also, if generating a deterministic parser in C, generate @code{#define} statements in addition to an @code{enum} to associate token numbers with token names. Thus, the following shell script can substitute for Yacc, and the Bison @@ -7941,8 +8166,32 @@ already defined, so that the debugging facilities are compiled. @item -D @var{name}[=@var{value}] @itemx --define=@var{name}[=@var{value}] -Same as running @samp{%define @var{name} "@var{value}"} (@pxref{Decl -Summary, ,%define}). +@item -F @var{name}[=@var{value}] +@itemx --force-define=@var{name}[=@var{value}] +Each of these is equivalent to @samp{%define @var{name} "@var{value}"} +(@pxref{Decl Summary, ,%define}) except that Bison processes multiple +definitions for the same @var{name} as follows: + +@itemize +@item +Bison quietly ignores all command-line definitions for @var{name} except +the last. +@item +If that command-line definition is specified by a @code{-D} or +@code{--define}, Bison reports an error for any @code{%define} +definition for @var{name}. +@item +If that command-line definition is specified by a @code{-F} or +@code{--force-define} instead, Bison quietly ignores all @code{%define} +definitions for @var{name}. +@item +Otherwise, Bison reports an error if there are multiple @code{%define} +definitions for @var{name}. +@end itemize + +You should avoid using @code{-F} and @code{--force-define} in your +makefiles unless you are confident that it is safe to quietly ignore any +conflicting @code{%define} that may be added to the grammar file. @item -L @var{language} @itemx --language=@var{language} @@ -8018,7 +8267,7 @@ separated list of @var{things} among: @table @code @item state Description of the grammar, conflicts (resolved and unresolved), and -@acronym{LALR} automaton. +parser's automaton. @item lookahead Implies @code{state} and augments the description of the automaton with @@ -8047,7 +8296,7 @@ described under the @samp{-v} and @samp{-d} options. @item -g [@var{file}] @itemx --graph[=@var{file}] -Output a graphical representation of the @acronym{LALR}(1) grammar +Output a graphical representation of the parser's automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz} @uref{http://www.graphviz.org/doc/info/lang.html, @acronym{DOT}} format. @code{@var{file}} is optional. @@ -8056,7 +8305,7 @@ If omitted and the grammar file is @file{foo.y}, the output file will be @item -x [@var{file}] @itemx --xml[=@var{file}] -Output an XML report of the @acronym{LALR}(1) automaton computed by Bison. +Output an XML report of the parser's automaton computed by Bison. @code{@var{file}} is optional. If omitted and the grammar file is @file{foo.y}, the output file will be @file{foo.xml}. @@ -8068,9 +8317,9 @@ More user feedback will help to stabilize it.) @section Option Cross Key Here is a list of options, alphabetized by long option, to help you find -the corresponding short option. +the corresponding short option and directive. -@multitable {@option{--defines=@var{defines-file}}} {@option{-D @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}} +@multitable {@option{--force-define=@var{name}[=@var{value}]}} {@option{-F @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}} @headitem Long Option @tab Short Option @tab Bison Directive @include cross-options.texi @end multitable @@ -8129,15 +8378,16 @@ int yyparse (void); @c - Always pure @c - initial action -The C++ @acronym{LALR}(1) parser is selected using the skeleton directive, +The C++ deterministic parser is selected using the skeleton directive, @samp{%skeleton "lalr1.c"}, or the synonymous command-line option @option{--skeleton=lalr1.c}. @xref{Decl Summary}. When run, @command{bison} will create several entities in the @samp{yy} namespace. -@findex %define namespace -Use the @samp{%define namespace} directive to change the namespace name, see +@findex %define api.namespace +Use the @samp{%define api.namespace} directive to change the namespace +name, see @ref{Decl Summary}. The various classes are generated in the following files: @@ -8321,7 +8571,7 @@ described by @var{m}. The parser invokes the scanner by calling @code{yylex}. Contrary to C parsers, C++ parsers are always pure: there is no point in using the -@code{%define api.pure} directive. Therefore the interface is as follows. +@samp{%define api.pure} directive. Therefore the interface is as follows. @deftypemethod {parser} {int} yylex (semantic_value_type& @var{yylval}, location_type& @var{yylloc}, @var{type1} @var{arg1}, ...) Return the next token. Its type is the return value, its semantic @@ -8518,10 +8768,10 @@ calcxx_driver::error (const std::string& m) @subsubsection Calc++ Parser The parser definition file @file{calc++-parser.yy} starts by asking for -the C++ LALR(1) skeleton, the creation of the parser header file, and -specifies the name of the parser class. Because the C++ skeleton -changed several times, it is safer to require the version you designed -the grammar for. +the C++ deterministic parser skeleton, the creation of the parser header +file, and specifies the name of the parser class. +Because the C++ skeleton changed several times, it is safer to require +the version you designed the grammar for. @comment file: calc++-parser.yy @example @@ -8584,7 +8834,7 @@ error messages. @comment file: calc++-parser.yy @example %define parse.trace -%error-verbose +%define parse.error "verbose" @end example @noindent @@ -8616,13 +8866,14 @@ The code between @samp{%code @{} and @samp{@}} is output in the @noindent The token numbered as 0 corresponds to end of file; the following line -allows for nicer error messages referring to ``end of file'' instead -of ``$end''. Similarly user friendly named are provided for each -symbol. Note that the tokens names are prefixed by @code{TOKEN_} to -avoid name clashes. +allows for nicer error messages referring to ``end of file'' instead of +``$end''. Similarly user friendly names are provided for each symbol. +To avoid name clashes in the generated files (@pxref{Calc++ Scanner}), +prefix tokens with @code{TOK_} (@pxref{Decl Summary,, api.tokens.prefix}). @comment file: calc++-parser.yy @example +%define api.tokens.prefix "TOK_" %token END 0 "end of file" %token ASSIGN ":=" %token IDENTIFIER "identifier" @@ -8652,22 +8903,24 @@ The grammar itself is straightforward. %start unit; unit: assignments exp @{ driver.result = $2; @}; -assignments: assignments assignment @{@} - | /* Nothing. */ @{@}; +assignments: + assignments assignment @{@} +| /* Nothing. */ @{@}; assignment: - "identifier" ":=" exp + "identifier" ":=" exp @{ driver.variables[*$1] = $3; delete $1; @}; %left '+' '-'; %left '*' '/'; -exp: exp '+' exp @{ $$ = $1 + $3; @} - | exp '-' exp @{ $$ = $1 - $3; @} - | exp '*' exp @{ $$ = $1 * $3; @} - | exp '/' exp @{ $$ = $1 / $3; @} - | '(' exp ')' @{ $$ = $2; @} - | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} - | "number" @{ $$ = $1; @}; +exp: + exp '+' exp @{ $$ = $1 + $3; @} +| exp '-' exp @{ $$ = $1 - $3; @} +| exp '*' exp @{ $$ = $1 * $3; @} +| exp '/' exp @{ $$ = $1 / $3; @} +| '(' exp ')' @{ $$ = $2; @} +| "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} +| "number" @{ $$ = $1; @}; %% @end example @@ -8695,8 +8948,8 @@ parser's to get the set of defined tokens. @example %@{ /* -*- C++ -*- */ # include -# include -# include +# include +# include # include # include "calc++-driver.hh" # include "calc++-parser.hh" @@ -8708,10 +8961,10 @@ parser's to get the set of defined tokens. # undef yywrap # define yywrap() 1 -/* By default yylex returns int, we use token_type. - Unfortunately yyterminate by default returns 0, which is +/* By default yylex returns an int; we use token_type. + The default yyterminate implementation returns 0, which is not of token_type. */ -#define yyterminate() return token::END +#define yyterminate() return TOKEN(END) %@} @end example @@ -8759,28 +9012,32 @@ preceding tokens. Comments would be treated equally. @end example @noindent -The rules are simple, just note the use of the driver to report errors. -It is convenient to use a typedef to shorten -@code{yy::calcxx_parser::token::identifier} into -@code{token::identifier} for instance. +The rules are simple. The driver is used to report errors. It is +convenient to use a macro to shorten +@code{yy::calcxx_parser::token::TOK_@var{Name}} into +@code{TOKEN(@var{Name})}; note the token prefix, @code{TOK_}. @comment file: calc++-scanner.ll @example %@{ - typedef yy::calcxx_parser::token token; +# define TOKEN(Name) \ + yy::calcxx_parser::token::TOK_ ## Name %@} /* Convert ints to the actual type of tokens. */ [-+*/()] return yy::calcxx_parser::token_type (yytext[0]); -":=" return token::ASSIGN; +":=" return TOKEN(ASSIGN); @{int@} @{ errno = 0; long n = strtol (yytext, NULL, 10); if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) driver.error (*yylloc, "integer is out of range"); yylval->ival = n; - return token::NUMBER; + return TOKEN(NUMBER); +@} +@{id@} @{ + yylval->sval = new std::string (yytext); + return TOKEN(IDENTIFIER); @} -@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; . driver.error (*yylloc, "invalid character"); %% @end example @@ -8878,11 +9135,11 @@ You can create documentation for generated parsers using Javadoc. Contrary to C parsers, Java parsers do not use global variables; the state of the parser is always local to an instance of the parser class. Therefore, all Java parsers are ``pure'', and the @code{%pure-parser} -and @code{%define api.pure} directives does not do anything when used in +and @samp{%define api.pure} directives does not do anything when used in Java. Push parsers are currently unsupported in Java and @code{%define -api.push_pull} have no effect. +api.push-pull} have no effect. @acronym{GLR} parsers are currently unsupported in Java. Do not use the @code{glr-parser} directive. @@ -8926,7 +9183,7 @@ semantic values' types (class names) should be specified in the By default, the semantic stack is declared to have @code{Object} members, which means that the class types you specify can be of any class. To improve the type safety of the parser, you can declare the common -superclass of all the semantic values using the @code{%define stype} +superclass of all the semantic values using the @samp{%define stype} directive. For example, after the following declaration: @example @@ -8966,11 +9223,11 @@ in a file; Bison itself defines a class representing a @dfn{location}, a range composed of a pair of positions (possibly spanning several files). The location class is an inner class of the parser; the name is @code{Location} by default, and may also be renamed using -@code{%define location_type "@var{class-name}}. +@samp{%define location_type "@var{class-name}}. The location class treats the position as a completely opaque value. By default, the class name is @code{Position}, but this can be changed -with @code{%define position_type "@var{class-name}"}. This class must +with @samp{%define position_type "@var{class-name}"}. This class must be supplied by the user. @@ -9005,22 +9262,22 @@ properly, the position class should override the @code{equals} and The name of the generated parser class defaults to @code{YYParser}. The @code{YY} prefix may be changed using the @code{%name-prefix} directive or the @option{-p}/@option{--name-prefix} option. Alternatively, use -@code{%define parser_class_name "@var{name}"} to give a custom name to +@samp{%define parser_class_name "@var{name}"} to give a custom name to the class. The interface of this class is detailed below. By default, the parser class has package visibility. A declaration -@code{%define public} will change to public visibility. Remember that, +@samp{%define public} will change to public visibility. Remember that, according to the Java language specification, the name of the @file{.java} file should match the name of the class in this case. Similarly, you can use @code{abstract}, @code{final} and @code{strictfp} with the @code{%define} declaration to add other modifiers to the parser class. -A single @code{%define annotations "@var{annotations}"} directive can +A single @samp{%define annotations "@var{annotations}"} directive can be used to add any number of annotations to the parser class. The Java package name of the parser class can be specified using the -@code{%define package} directive. The superclass and the implemented +@samp{%define package} directive. The superclass and the implemented interfaces of the parser class can be specified with the @code{%define -extends} and @code{%define implements} directives. +extends} and @samp{%define implements} directives. The parser class defines an inner class, @code{Location}, that is used for location tracking (see @ref{Java Location Values}), and a inner @@ -9041,7 +9298,7 @@ used. Use @code{%code init} for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use -@code{%define init_throws} to specify any uncatch exceptions. +@samp{%define init_throws} to specify any uncatch exceptions. @end deftypeop @deftypeop {Constructor} {YYParser} {} YYParser (Lexer @var{lexer}, @var{parse_param}, @dots{}) @@ -9054,7 +9311,7 @@ created with the correct @code{%lex-param}s. Use @code{%code init} for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use -@code{%define init_throws} to specify any uncatch exceptions. +@samp{%define init_throws} to specify any uncatch exceptions. @end deftypeop @deftypemethod {YYParser} {boolean} parse () @@ -9065,7 +9322,7 @@ Run the syntactic analysis, and return @code{true} on success, @deftypemethod {YYParser} {boolean} getErrorVerbose () @deftypemethodx {YYParser} {void} setErrorVerbose (boolean @var{verbose}) Get or set the option to produce verbose error messages. These are only -available with the @code{%error-verbose} directive, which also turn on +available with @samp{%define parse.error "verbose"}, which also turns on verbose error messages. @end deftypemethod @@ -9131,7 +9388,7 @@ In both cases, the scanner has to implement the following methods. @deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg}) This method is defined by the user to emit an error message. The first parameter is omitted if location tracking is not active. Its type can be -changed using @code{%define location_type "@var{class-name}".} +changed using @samp{%define location_type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {int} yylex () @@ -9139,7 +9396,7 @@ Return the next token. Its type is the return value, its semantic value and location are saved and returned by the ther methods in the interface. -Use @code{%define lex_throws} to specify any uncaught exceptions. +Use @samp{%define lex_throws} to specify any uncaught exceptions. Default is @code{java.io.IOException}. @end deftypemethod @@ -9149,14 +9406,14 @@ Return respectively the first position of the last token that @code{yylex} returned, and the first position beyond it. These methods are not needed unless location tracking is active. -The return type can be changed using @code{%define position_type +The return type can be changed using @samp{%define position_type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {Object} getLVal () Return the semantical value of the last token that yylex returned. -The return type can be changed using @code{%define stype +The return type can be changed using @samp{%define stype "@var{class-name}".} @end deftypemethod @@ -9167,7 +9424,7 @@ The return type can be changed using @code{%define stype The following special constructs can be uses in Java actions. Other analogous C action features are currently unavailable for Java. -Use @code{%define throws} to specify any uncaught exceptions from parser +Use @samp{%define throws} to specify any uncaught exceptions from parser actions, and initial actions specified by @code{%initial-action}. @defvar $@var{n} @@ -9184,7 +9441,7 @@ Like @code{$@var{n}} but specifies a alternative type @var{typealt}. @defvar $$ The semantic value for the grouping made by the current rule. As a value, this is in the base type (@code{Object} or as specified by -@code{%define stype}) as in not cast to the declared subtype because +@samp{%define stype}) as in not cast to the declared subtype because casts are not allowed on the left-hand side of Java assignments. Use an explicit Java cast if the correct subtype is needed. @xref{Java Semantic Values}. @@ -9271,7 +9528,7 @@ corresponds to these C macros.}. @item Java lacks unions, so @code{%union} has no effect. Instead, semantic values have a common base type: @code{Object} or as specified by -@code{%define stype}. Angle backets on @code{%token}, @code{type}, +@samp{%define stype}. Angle backets on @code{%token}, @code{type}, @code{$@var{n}} and @code{$$} specify subtypes rather than fields of an union. The type of @code{$$}, even with angle brackets, is the base type since Java casts are not allow on the left-hand side of assignments. @@ -9285,7 +9542,7 @@ The prolog declarations have a different meaning than in C/C++ code. @item @code{%code imports} blocks are placed at the beginning of the Java source code. They may include copyright notices. For a @code{package} declarations, it is -suggested to use @code{%define package} instead. +suggested to use @samp{%define package} instead. @item unqualified @code{%code} blocks are placed inside the parser class. @@ -9326,7 +9583,7 @@ constructor that @emph{creates} a lexer. Default is none. @deffn {Directive} %name-prefix "@var{prefix}" The prefix of the parser class name @code{@var{prefix}Parser} if -@code{%define parser_class_name} is not used. Default is @code{YY}. +@samp{%define parser_class_name} is not used. Default is @code{YY}. @xref{Java Bison Interface}. @end deffn @@ -9516,7 +9773,7 @@ or @display My parser includes support for an @samp{#include}-like feature, in which case I run @code{yyparse} from @code{yyparse}. This fails -although I did specify @code{%define api.pure}. +although I did specify @samp{%define api.pure}. @end display These problems typically come not from Bison itself, but from @@ -9994,8 +10251,7 @@ token is reset to the token that originally caused the violation. @end deffn @deffn {Directive} %error-verbose -Bison declaration to request verbose, specific error message strings -when @code{yyerror} is called. +An obsolete directive standing for @samp{%define parse.error "verbose"}. @end deffn @deffn {Directive} %file-prefix "@var{prefix}" @@ -10079,7 +10335,7 @@ Bison declaration to assign precedence to token(s), but no associativity @end deffn @deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{Decl Summary, ,%define}), +Deprecated version of @samp{%define api.pure} (@pxref{Decl Summary, ,%define}), for which Bison is more careful to warn about unreasonable usage. @end deffn @@ -10193,16 +10449,16 @@ instead. @deffn {Function} yyerror User-supplied function to be called by @code{yyparse} on error. -@xref{Error Reporting, ,The Error -Reporting Function @code{yyerror}}. +@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @deffn {Macro} YYERROR_VERBOSE -An obsolete macro that you define with @code{#define} in the prologue -to request verbose, specific error message strings -when @code{yyerror} is called. It doesn't matter what definition you -use for @code{YYERROR_VERBOSE}, just whether you define it. Using -@code{%error-verbose} is preferred. +An obsolete macro used in the @file{yacc.c} skeleton, that you define +with @code{#define} in the prologue to request verbose, specific error +message strings when @code{yyerror} is called. It doesn't matter what +definition you use for @code{YYERROR_VERBOSE}, just whether you define +it. Using @samp{%define parse.error "verbose"} is preferred +(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). @end deffn @deffn {Macro} YYINITDEPTH @@ -10316,8 +10572,8 @@ is recovering from a syntax error, and 0 otherwise. @end deffn @deffn {Macro} YYSTACK_USE_ALLOCA -Macro used to control the use of @code{alloca} when the C -@acronym{LALR}(1) parser needs to extend its stacks. If defined to 0, +Macro used to control the use of @code{alloca} when the +deterministic parser in C needs to extend its stacks. If defined to 0, the parser will use @code{malloc} to extend its stacks. If defined to 1, the parser will use @code{alloca}. Values other than 0 and 1 are reserved for future Bison extensions. If not defined, @@ -10342,12 +10598,21 @@ Data type of semantic values; @code{int} by default. @cindex glossary @table @asis +@item Accepting State +A state whose only action is the accept action. +The accepting state is thus a consistent state. +@xref{Understanding,,}. + @item Backus-Naur Form (@acronym{BNF}; also called ``Backus Normal Form'') Formal method of specifying context-free grammars originally proposed by John Backus, and slightly improved by Peter Naur in his 1960-01-02 committee document contributing to what became the Algol 60 report. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item Consistent State +A state containing only one possible action. +@xref{Decl Summary,,lr.default-reductions}. + @item Context-free grammars Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an @@ -10355,6 +10620,14 @@ expression, integers are allowed @emph{anywhere} an expression is permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item Default Reduction +The reduction that a parser should perform if the current parser state +contains no other action for the lookahead token. +In permitted parser states, Bison declares the reduction with the +largest lookahead set to be the default reduction and removes that +lookahead set. +@xref{Decl Summary,,lr.default-reductions}. + @item Dynamic allocation Allocation of memory that occurs during execution, rather than at compile time or on entry to a function. @@ -10373,8 +10646,8 @@ rules. @xref{Algorithm, ,The Bison Parser Algorithm}. @item Generalized @acronym{LR} (@acronym{GLR}) A parsing algorithm that can handle all context-free grammars, including those -that are not @acronym{LALR}(1). It resolves situations that Bison's -usual @acronym{LALR}(1) +that are not @acronym{LR}(1). It resolves situations that Bison's +deterministic parsing algorithm cannot by effectively splitting off multiple parsers, trying all possible parsers, and discarding those that fail in the light of additional right context. @xref{Generalized LR Parsing, ,Generalized @@ -10385,6 +10658,20 @@ A language construct that is (in general) grammatically divisible; for example, `expression' or `declaration' in C@. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item @acronym{IELR}(1) +A minimal @acronym{LR}(1) parser table generation algorithm. +That is, given any context-free grammar, @acronym{IELR}(1) generates +parser tables with the full language recognition power of canonical +@acronym{LR}(1) but with nearly the same number of parser states as +@acronym{LALR}(1). +This reduction in parser states is often an order of magnitude. +More importantly, because canonical @acronym{LR}(1)'s extra parser +states may contain duplicate conflicts in the case of +non-@acronym{LR}(1) grammars, the number of conflicts for +@acronym{IELR}(1) is often an order of magnitude less as well. +This can significantly reduce the complexity of developing of a grammar. +@xref{Decl Summary,,lr.type}. + @item Infix operator An arithmetic operator that is placed between the operands on which it performs some operation. @@ -10428,8 +10715,8 @@ Tokens}. @item @acronym{LALR}(1) The class of context-free grammars that Bison (like most other parser -generators) can handle; a subset of @acronym{LR}(1). @xref{Mystery -Conflicts, ,Mysterious Reduce/Reduce Conflicts}. +generators) can handle by default; a subset of @acronym{LR}(1). +@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}. @item @acronym{LR}(1) The class of context-free grammars in which at most one token of @@ -10547,4 +10834,4 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: infile ypp yxx outfile itemx tex leaderfill @c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll @c LocalWords: nbar yytext fst snd osplit ntwo strdup AST -@c LocalWords: YYSTACK DVI fdl printindex +@c LocalWords: YYSTACK DVI fdl printindex IELR