X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/7aaaad6c6dc0fae412d608dcf20a3977d8902cd1..c6b1772473d0a26faa22464df98718d0d0ae2e2e:/doc/bison.texi diff --git a/doc/bison.texi b/doc/bison.texi index d1223bdc..1c4bfd4e 100644 --- a/doc/bison.texi +++ b/doc/bison.texi @@ -110,7 +110,7 @@ Reference sections: * Glossary:: Basic concepts are explained. * Copying This Manual:: License for copying this manual. * Bibliography:: Publications cited in this manual. -* Index:: Cross-references to the text. +* Index of Terms:: Cross-references to the text. @detailmenu --- The Detailed Node Listing --- @@ -280,6 +280,7 @@ Operator Precedence * Precedence Only:: How to specify precedence only. * Precedence Examples:: How these features are used in the previous example. * How Precedence:: How they work. +* Non Operators:: Using precedence for general conflicts. Tuning LR @@ -298,6 +299,8 @@ Handling Context Dependencies Debugging Your Parser * Understanding:: Understanding the structure of your parser. +* Graphviz:: Getting a visual representation of the parser. +* Xml:: Getting a markup representation of the parser. * Tracing:: Tracing the execution of your parser. Tracing Your Parser @@ -331,6 +334,7 @@ C++ Location Values * C++ position:: One point in the source file * C++ location:: Two points in the source file +* User Defined Location Type:: Required interface for locations A Complete C++ Example @@ -4634,9 +4638,9 @@ code. @deffn {Directive} %initial-action @{ @var{code} @} @findex %initial-action Declare that the braced @var{code} must be invoked before parsing each time -@code{yyparse} is called. The @var{code} may use @code{$$} and -@code{@@$} --- initial value and location of the lookahead --- and the -@code{%parse-param}. +@code{yyparse} is called. The @var{code} may use @code{$$} (or +@code{$<@var{tag}>$}) and @code{@@$} --- initial value and location of the +lookahead --- and the @code{%parse-param}. @end deffn For instance, if your locations use a file name, you may use @@ -4674,11 +4678,11 @@ symbol is automatically discarded. @deffn {Directive} %destructor @{ @var{code} @} @var{symbols} @findex %destructor Invoke the braced @var{code} whenever the parser discards one of the -@var{symbols}. -Within @var{code}, @code{$$} designates the semantic value associated -with the discarded symbol, and @code{@@$} designates its location. -The additional parser parameters are also available (@pxref{Parser Function, , -The Parser Function @code{yyparse}}). +@var{symbols}. Within @var{code}, @code{$$} (or @code{$<@var{tag}>$}) +designates the semantic value associated with the discarded symbol, and +@code{@@$} designates its location. The additional parser parameters are +also available (@pxref{Parser Function, , The Parser Function +@code{yyparse}}). When a symbol is listed among @var{symbols}, its @code{%destructor} is called a per-symbol @code{%destructor}. @@ -4783,6 +4787,10 @@ incoming terminals during the second phase of error recovery, the current lookahead and the entire stack (except the current right-hand side symbols) when the parser returns immediately, and @item +the current lookahead and the entire stack (including the current right-hand +side symbols) when the C++ parser (@file{lalr1.cc}) catches an exception in +@code{parse}, +@item the start symbol, when the parser succeeds. @end itemize @@ -4816,10 +4824,11 @@ Decl, , Freeing Discarded Symbols}). @c This is the same text as for %destructor. Invoke the braced @var{code} whenever the parser displays one of the @var{symbols}. Within @var{code}, @code{yyoutput} denotes the output stream -(a @code{FILE*} in C, and an @code{std::ostream&} in C++), -@code{$$} designates the semantic value associated with the symbol, and -@code{@@$} its location. The additional parser parameters are also -available (@pxref{Parser Function, , The Parser Function @code{yyparse}}). +(a @code{FILE*} in C, and an @code{std::ostream&} in C++), @code{$$} (or +@code{$<@var{tag}>$}) designates the semantic value associated with the +symbol, and @code{@@$} its location. The additional parser parameters are +also available (@pxref{Parser Function, , The Parser Function +@code{yyparse}}). The @var{symbols} are defined as for @code{%destructor} (@pxref{Destructor Decl, , Freeing Discarded Symbols}.): they can be per-type (e.g., @@ -5156,7 +5165,7 @@ default location or at the location specified by @var{qualifier}. @end deffn @deffn {Directive} %debug -Instrument the output parser for traces. Obsoleted by @samp{%define +Instrument the parser for traces. Obsoleted by @samp{%define parse.trace}. @xref{Tracing, ,Tracing Your Parser}. @end deffn @@ -5203,6 +5212,23 @@ Values, ,Semantic Values of Tokens}. If you have declared @code{%code requires} or @code{%code provides}, the output header also contains their code. @xref{%code Summary}. + +@cindex Header guard +The generated header is protected against multiple inclusions with a C +preprocessor guard: @samp{YY_@var{PREFIX}_@var{FILE}_INCLUDED}, where +@var{PREFIX} and @var{FILE} are the prefix (@pxref{Multiple Parsers, +,Multiple Parsers in the Same Program}) and generated file name turned +uppercase, with each series of non alphanumerical characters converted to a +single underscore. + +For instance with @samp{%define api.prefix "calc"} and @samp{%defines +"lib/parse.h"}, the header will be guarded as follows. +@example +#ifndef YY_CALC_LIB_PARSE_H_INCLUDED +# define YY_CALC_LIB_PARSE_H_INCLUDED +... +#endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */ +@end example @end deffn @deffn {Directive} %defines @var{defines-file} @@ -5451,7 +5477,39 @@ The parser namespace is @code{foo} and @code{yylex} is referenced as @end itemize @c namespace +@c ================================================== api.location.type +@item @code{api.location.type} +@findex %define api.location.type + +@itemize @bullet +@item Language(s): C++, Java + +@item Purpose: Define the location type. +@xref{User Defined Location Type}. +@item Accepted Values: String + +@item Default Value: none + +@item History: introduced in Bison 2.7 +@end itemize + +@c ================================================== api.prefix +@item api.prefix +@findex %define api.prefix + +@itemize @bullet +@item Language(s): All + +@item Purpose: Rename exported symbols. +@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. + +@item Accepted Values: String + +@item Default Value: @code{yy} + +@item History: introduced in Bison 2.6 +@end itemize @c ================================================== api.pure @item api.pure @@ -5491,9 +5549,33 @@ More user feedback will help to stabilize it.) -@c ================================================== api.tokens.prefix -@item api.tokens.prefix -@findex %define api.tokens.prefix +@c ================================================== api.token.constructor +@item api.token.constructor +@findex %define api.token.constructor + +@itemize @bullet +@item Language(s): +C++ + +@item Purpose: +When variant-based semantic values are enabled (@pxref{C++ Variants}), +request that symbols be handled as a whole (type, value, and possibly +location) in the scanner. @xref{Complete Symbols}, for details. + +@item Accepted Values: +Boolean. + +@item Default Value: +@code{false} +@item History: +introduced in Bison 2.8 +@end itemize +@c api.token.constructor + + +@c ================================================== api.token.prefix +@item api.token.prefix +@findex %define api.token.prefix @itemize @item Languages(s): all @@ -5504,7 +5586,7 @@ target language. For instance @example %token FILE for ERROR -%define api.tokens.prefix "TOK_" +%define api.token.prefix "TOK_" %% start: FILE for ERROR; @end example @@ -5525,36 +5607,16 @@ letters, underscores, and ---not at the beginning--- digits). @item Default Value: empty +@item History: +introduced in Bison 2.8 @end itemize -@c api.tokens.prefix - +@c api.token.prefix -@c ================================================== lex_symbol -@item lex_symbol -@findex %define lex_symbol -@itemize @bullet -@item Language(s): -C++ +@c ================================================== lr.default-reduction -@item Purpose: -When variant-based semantic values are enabled (@pxref{C++ Variants}), -request that symbols be handled as a whole (type, value, and possibly -location) in the scanner. @xref{Complete Symbols}, for details. - -@item Accepted Values: -Boolean. - -@item Default Value: -@code{false} -@end itemize -@c lex_symbol - - -@c ================================================== lr.default-reductions - -@item lr.default-reductions -@findex %define lr.default-reductions +@item lr.default-reduction +@findex %define lr.default-reduction @itemize @bullet @item Language(s): all @@ -5570,12 +5632,15 @@ feedback will help to stabilize it.) @item @code{accepting} if @code{lr.type} is @code{canonical-lr}. @item @code{most} otherwise. @end itemize +@item History: +introduced as @code{lr.default-reduction} in 2.5, renamed as +@code{lr.default-reduction} in 2.8. @end itemize -@c ============================================ lr.keep-unreachable-states +@c ============================================ lr.keep-unreachable-state -@item lr.keep-unreachable-states -@findex %define lr.keep-unreachable-states +@item lr.keep-unreachable-state +@findex %define lr.keep-unreachable-state @itemize @bullet @item Language(s): all @@ -5584,7 +5649,10 @@ remain in the parser tables. @xref{Unreachable States}. @item Accepted Values: Boolean @item Default Value: @code{false} @end itemize -@c lr.keep-unreachable-states +introduced as @code{lr.keep_unreachable_states} in 2.3b, renamed as +@code{lr.keep-unreachable-state} in 2.5, and as +@code{lr.keep-unreachable-state} in 2.8. +@c lr.keep-unreachable-state @c ================================================== lr.type @@ -5676,12 +5744,16 @@ syntax error handling. @xref{LAC}. @findex %define parse.trace @itemize -@item Languages(s): C, C++ +@item Languages(s): C, C++, Java @item Purpose: Require parser instrumentation for tracing. -In C/C++, define the macro @code{YYDEBUG} to 1 in the parser implementation +@xref{Tracing, ,Tracing Your Parser}. + +In C/C++, define the macro @code{YYDEBUG} (or @code{@var{prefix}DEBUG} with +@samp{%define api.prefix @var{prefix}}), see @ref{Multiple Parsers, +,Multiple Parsers in the Same Program}) to 1 in the parser implementation file if it is not already defined, so that the debugging facilities are -compiled. @xref{Tracing, ,Tracing Your Parser}. +compiled. @item Accepted Values: Boolean @@ -5830,34 +5902,88 @@ of the standard Bison skeletons. @section Multiple Parsers in the Same Program Most programs that use Bison parse only one language and therefore contain -only one Bison parser. But what if you want to parse more than one -language with the same program? Then you need to avoid a name conflict -between different definitions of @code{yyparse}, @code{yylval}, and so on. - -The easy way to do this is to use the option @samp{-p @var{prefix}} -(@pxref{Invocation, ,Invoking Bison}). This renames the interface -functions and variables of the Bison parser to start with @var{prefix} -instead of @samp{yy}. You can use this to give each parser distinct -names that do not conflict. - -The precise list of symbols renamed is @code{yyparse}, @code{yylex}, -@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yylloc}, -@code{yychar} and @code{yydebug}. If you use a push parser, -@code{yypush_parse}, @code{yypull_parse}, @code{yypstate}, -@code{yypstate_new} and @code{yypstate_delete} will also be renamed. -For example, if you use @samp{-p c}, the names become @code{cparse}, -@code{clex}, and so on. +only one Bison parser. But what if you want to parse more than one language +with the same program? Then you need to avoid name conflicts between +different definitions of functions and variables such as @code{yyparse}, +@code{yylval}. To use different parsers from the same compilation unit, you +also need to avoid conflicts on types and macros (e.g., @code{YYSTYPE}) +exported in the generated header. + +The easy way to do this is to define the @code{%define} variable +@code{api.prefix}. With different @code{api.prefix}s it is guaranteed that +headers do not conflict when included together, and that compiled objects +can be linked together too. Specifying @samp{%define api.prefix +@var{prefix}} (or passing the option @samp{-Dapi.prefix=@var{prefix}}, see +@ref{Invocation, ,Invoking Bison}) renames the interface functions and +variables of the Bison parser to start with @var{prefix} instead of +@samp{yy}, and all the macros to start by @var{PREFIX} (i.e., @var{prefix} +upper-cased) instead of @samp{YY}. + +The renamed symbols include @code{yyparse}, @code{yylex}, @code{yyerror}, +@code{yynerrs}, @code{yylval}, @code{yylloc}, @code{yychar} and +@code{yydebug}. If you use a push parser, @code{yypush_parse}, +@code{yypull_parse}, @code{yypstate}, @code{yypstate_new} and +@code{yypstate_delete} will also be renamed. The renamed macros include +@code{YYSTYPE}, @code{YYLTYPE}, and @code{YYDEBUG}, which is treated +specifically --- more about this below. + +For example, if you use @samp{%define api.prefix c}, the names become +@code{cparse}, @code{clex}, @dots{}, @code{CSTYPE}, @code{CLTYPE}, and so +on. + +The @code{%define} variable @code{api.prefix} works in two different ways. +In the implementation file, it works by adding macro definitions to the +beginning of the parser implementation file, defining @code{yyparse} as +@code{@var{prefix}parse}, and so on: + +@example +#define YYSTYPE CTYPE +#define yyparse cparse +#define yylval clval +... +YYSTYPE yylval; +int yyparse (void); +@end example -@strong{All the other variables and macros associated with Bison are not -renamed.} These others are not global; there is no conflict if the same -name is used in different parsers. For example, @code{YYSTYPE} is not -renamed, but defining this in different ways in different parsers causes -no trouble (@pxref{Value Type, ,Data Types of Semantic Values}). +This effectively substitutes one name for the other in the entire parser +implementation file, thus the ``original'' names (@code{yylex}, +@code{YYSTYPE}, @dots{}) are also usable in the parser implementation file. -The @samp{-p} option works by adding macro definitions to the -beginning of the parser implementation file, defining @code{yyparse} -as @code{@var{prefix}parse}, and so on. This effectively substitutes -one name for the other in the entire parser implementation file. +However, in the parser header file, the symbols are defined renamed, for +instance: + +@example +extern CSTYPE clval; +int cparse (void); +@end example + +The macro @code{YYDEBUG} is commonly used to enable the tracing support in +parsers. To comply with this tradition, when @code{api.prefix} is used, +@code{YYDEBUG} (not renamed) is used as a default value: + +@example +/* Enabling traces. */ +#ifndef CDEBUG +# if defined YYDEBUG +# if YYDEBUG +# define CDEBUG 1 +# else +# define CDEBUG 0 +# endif +# else +# define CDEBUG 0 +# endif +#endif +#if CDEBUG +extern int cdebug; +#endif +@end example + +@sp 2 + +Prior to Bison 2.6, a feature similar to @code{api.prefix} was provided by +the obsolete directive @code{%name-prefix} (@pxref{Table of Symbols, ,Bison +Symbols}) and the option @code{--name-prefix} (@pxref{Bison Options}). @node Interface @chapter Parser C-Language Interface @@ -5973,9 +6099,9 @@ function is available if either the @samp{%define api.push-pull push} or @xref{Push Decl, ,A Push Parser}. @deftypefun int yypush_parse (yypstate *yyps) -The value returned by @code{yypush_parse} is the same as for yyparse with the -following exception. @code{yypush_parse} will return YYPUSH_MORE if more input -is required to finish parsing the grammar. +The value returned by @code{yypush_parse} is the same as for yyparse with +the following exception: it returns @code{YYPUSH_MORE} if more input is +required to finish parsing the grammar. @end deftypefun @node Pull Parser Function @@ -6750,7 +6876,7 @@ expr: term: '(' expr ')' | term '!' -| NUMBER +| "number" ; @end group @end example @@ -6789,20 +6915,20 @@ statements, with a pair of rules like this: @example @group if_stmt: - IF expr THEN stmt -| IF expr THEN stmt ELSE stmt + "if" expr "then" stmt +| "if" expr "then" stmt "else" stmt ; @end group @end example @noindent -Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are -terminal symbols for specific keyword tokens. +Here @code{"if"}, @code{"then"} and @code{"else"} are terminal symbols for +specific keyword tokens. -When the @code{ELSE} token is read and becomes the lookahead token, the +When the @code{"else"} token is read and becomes the lookahead token, the contents of the stack (assuming the input is valid) are just right for reduction by the first rule. But it is also legitimate to shift the -@code{ELSE}, because that would lead to eventual reduction by the second +@code{"else"}, because that would lead to eventual reduction by the second rule. This situation, where either a shift or a reduction would be valid, is @@ -6811,14 +6937,14 @@ these conflicts by choosing to shift, unless otherwise directed by operator precedence declarations. To see the reason for this, let's contrast it with the other alternative. -Since the parser prefers to shift the @code{ELSE}, the result is to attach +Since the parser prefers to shift the @code{"else"}, the result is to attach the else-clause to the innermost if-statement, making these two inputs equivalent: @example -if x then if y then win (); else lose; +if x then if y then win; else lose; -if x then do; if y then win (); else lose; end; +if x then do; if y then win; else lose; end; @end example But if the parser chose to reduce when possible rather than shift, the @@ -6826,9 +6952,9 @@ result would be to attach the else-clause to the outermost if-statement, making these two inputs equivalent: @example -if x then if y then win (); else lose; +if x then if y then win; else lose; -if x then do; if y then win (); end; else lose; +if x then do; if y then win; end; else lose; @end example The conflict exists because the grammar as written is ambiguous: either @@ -6841,11 +6967,16 @@ This particular ambiguity was first encountered in the specifications of Algol 60 and is called the ``dangling @code{else}'' ambiguity. To avoid warnings from Bison about predictable, legitimate shift/reduce -conflicts, use the @code{%expect @var{n}} declaration. +conflicts, you can use the @code{%expect @var{n}} declaration. There will be no warning as long as the number of shift/reduce conflicts is exactly @var{n}, and Bison will report an error if there is a different number. -@xref{Expect Decl, ,Suppressing Conflict Warnings}. +@xref{Expect Decl, ,Suppressing Conflict Warnings}. However, we don't +recommend the use of @code{%expect} (except @samp{%expect 0}!), as an equal +number of conflicts does not mean that they are the @emph{same}. When +possible, you should rather use precedence directives to @emph{fix} the +conflicts explicitly (@pxref{Non Operators,, Using Precedence For Non +Operators}). The definition of @code{if_stmt} above is solely to blame for the conflict, but the conflict does not actually appear without additional @@ -6854,7 +6985,6 @@ the conflict: @example @group -%token IF THEN ELSE variable %% @end group @group @@ -6866,13 +6996,13 @@ stmt: @group if_stmt: - IF expr THEN stmt -| IF expr THEN stmt ELSE stmt + "if" expr "then" stmt +| "if" expr "then" stmt "else" stmt ; @end group expr: - variable + "identifier" ; @end example @@ -6892,6 +7022,7 @@ shift and when to reduce. * Precedence Only:: How to specify precedence only. * Precedence Examples:: How these features are used in the previous example. * How Precedence:: How they work. +* Non Operators:: Using precedence for general conflicts. @end menu @node Why Precedence @@ -7030,16 +7161,11 @@ would declare them in groups of equal precedence. For example, @code{'+'} is declared with @code{'-'}: @example -%left '<' '>' '=' NE LE GE +%left '<' '>' '=' "!=" "<=" ">=" %left '+' '-' %left '*' '/' @end example -@noindent -(Here @code{NE} and so on stand for the operators for ``not equal'' -and so on. We assume that these tokens are more than one character long -and therefore are represented by names, not character literals.) - @node How Precedence @subsection How Precedence Works @@ -7062,6 +7188,44 @@ resolved. Not all rules and not all tokens have precedence. If either the rule or the lookahead token has no precedence, then the default is to shift. +@node Non Operators +@subsection Using Precedence For Non Operators + +Using properly precedence and associativity directives can help fixing +shift/reduce conflicts that do not involve arithmetics-like operators. For +instance, the ``dangling @code{else}'' problem (@pxref{Shift/Reduce, , +Shift/Reduce Conflicts}) can be solved elegantly in two different ways. + +In the present case, the conflict is between the token @code{"else"} willing +to be shifted, and the rule @samp{if_stmt: "if" expr "then" stmt}, asking +for reduction. By default, the precedence of a rule is that of its last +token, here @code{"then"}, so the conflict will be solved appropriately +by giving @code{"else"} a precedence higher than that of @code{"then"}, for +instance as follows: + +@example +@group +%nonassoc "then" +%nonassoc "else" +@end group +@end example + +Alternatively, you may give both tokens the same precedence, in which case +associativity is used to solve the conflict. To preserve the shift action, +use right associativity: + +@example +%right "then" "else" +@end example + +Neither solution is perfect however. Since Bison does not provide, so far, +support for ``scoped'' precedence, both force you to declare the precedence +of these keywords with respect to the other operators your grammar. +Therefore, instead of being warned about new conflicts you would be unaware +of (e.g., a shift/reduce conflict due to @samp{if test then 1 else 2 + 3} +being ambiguous: @samp{if test then 1 else (2 + 3)} or @samp{(if test then 1 +else 2) + 3}?), the conflict will be already ``fixed''. + @node Contextual Precedence @section Context-Dependent Precedence @cindex context-dependent precedence @@ -7222,30 +7386,38 @@ reduce/reduce conflict must be studied and usually eliminated. Here is the proper way to define @code{sequence}: @example +@group sequence: /* empty */ @{ printf ("empty sequence\n"); @} | sequence word @{ printf ("added word %s\n", $2); @} ; +@end group @end example Here is another common error that yields a reduce/reduce conflict: @example sequence: +@group /* empty */ | sequence words | sequence redirects ; +@end group +@group words: /* empty */ | words word ; +@end group +@group redirects: /* empty */ | redirects redirect ; +@end group @end example @noindent @@ -7298,6 +7470,58 @@ redirects: @end group @end example +Yet this proposal introduces another kind of ambiguity! The input +@samp{word word} can be parsed as a single @code{words} composed of two +@samp{word}s, or as two one-@code{word} @code{words} (and likewise for +@code{redirect}/@code{redirects}). However this ambiguity is now a +shift/reduce conflict, and therefore it can now be addressed with precedence +directives. + +To simplify the matter, we will proceed with @code{word} and @code{redirect} +being tokens: @code{"word"} and @code{"redirect"}. + +To prefer the longest @code{words}, the conflict between the token +@code{"word"} and the rule @samp{sequence: sequence words} must be resolved +as a shift. To this end, we use the same techniques as exposed above, see +@ref{Non Operators,, Using Precedence For Non Operators}. One solution +relies on precedences: use @code{%prec} to give a lower precedence to the +rule: + +@example +%nonassoc "word" +%nonassoc "sequence" +%% +@group +sequence: + /* empty */ +| sequence word %prec "sequence" +| sequence redirect %prec "sequence" +; +@end group + +@group +words: + word +| words "word" +; +@end group +@end example + +Another solution relies on associativity: provide both the token and the +rule with the same precedence, but make them right-associative: + +@example +%right "word" "redirect" +%% +@group +sequence: + /* empty */ +| sequence word %prec "word" +| sequence redirect %prec "redirect" +; +@end group +@end example + @node Mysterious Conflicts @section Mysterious Conflicts @cindex Mysterious Conflicts @@ -7307,8 +7531,6 @@ Here is an example: @example @group -%token ID - %% def: param_spec return_spec ','; param_spec: @@ -7323,10 +7545,10 @@ return_spec: ; @end group @group -type: ID; +type: "id"; @end group @group -name: ID; +name: "id"; name_list: name | name ',' name_list @@ -7334,16 +7556,16 @@ name_list: @end group @end example -It would seem that this grammar can be parsed with only a single token -of lookahead: when a @code{param_spec} is being read, an @code{ID} is -a @code{name} if a comma or colon follows, or a @code{type} if another -@code{ID} follows. In other words, this grammar is LR(1). +It would seem that this grammar can be parsed with only a single token of +lookahead: when a @code{param_spec} is being read, an @code{"id"} is a +@code{name} if a comma or colon follows, or a @code{type} if another +@code{"id"} follows. In other words, this grammar is LR(1). @cindex LR @cindex LALR However, for historical reasons, Bison cannot by default handle all LR(1) grammars. -In this grammar, two contexts, that after an @code{ID} at the beginning +In this grammar, two contexts, that after an @code{"id"} at the beginning of a @code{param_spec} and likewise at the beginning of a @code{return_spec}, are similar enough that Bison assumes they are the same. @@ -7374,27 +7596,24 @@ distinct. In the above example, adding one rule to @example @group -%token BOGUS -@dots{} -%% @dots{} return_spec: type | name ':' type -| ID BOGUS /* This rule is never used. */ +| "id" "bogus" /* This rule is never used. */ ; @end group @end example This corrects the problem because it introduces the possibility of an -additional active rule in the context after the @code{ID} at the beginning of +additional active rule in the context after the @code{"id"} at the beginning of @code{return_spec}. This rule is not active in the corresponding context in a @code{param_spec}, so the two contexts receive distinct parser states. -As long as the token @code{BOGUS} is never generated by @code{yylex}, +As long as the token @code{"bogus"} is never generated by @code{yylex}, the added rule cannot alter the way actual input is parsed. In this particular example, there is another way to solve the problem: -rewrite the rule for @code{return_spec} to use @code{ID} directly +rewrite the rule for @code{return_spec} to use @code{"id"} directly instead of via @code{name}. This also causes the two confusing contexts to have different sets of active rules, because the one for @code{return_spec} activates the altered rule for @code{return_spec} @@ -7407,7 +7626,7 @@ param_spec: ; return_spec: type -| ID ':' type +| "id" ':' type ; @end example @@ -7570,7 +7789,7 @@ and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and @node Default Reductions @subsection Default Reductions @cindex default reductions -@findex %define lr.default-reductions +@findex %define lr.default-reduction @findex %nonassoc After parser table construction, Bison identifies the reduction with the @@ -7652,9 +7871,9 @@ token for which there is a conflict. The correct action in this case is to split the parse instead. To adjust which states have default reductions enabled, use the -@code{%define lr.default-reductions} directive. +@code{%define lr.default-reduction} directive. -@deffn {Directive} {%define lr.default-reductions @var{WHERE}} +@deffn {Directive} {%define lr.default-reduction @var{WHERE}} Specify the kind of states that are permitted to contain default reductions. The accepted values of @var{WHERE} are: @itemize @@ -7777,7 +7996,7 @@ parser community for years, for the publication that introduces LAC, @node Unreachable States @subsection Unreachable States -@findex %define lr.keep-unreachable-states +@findex %define lr.keep-unreachable-state @cindex unreachable states If there exists no sequence of transitions from the parser's start state to @@ -7790,7 +8009,7 @@ resolution because they are useless in the generated parser. However, keeping unreachable states is sometimes useful when trying to understand the relationship between the parser and the grammar. -@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}} +@deffn {Directive} {%define lr.keep-unreachable-state @var{VALUE}} Request that Bison allow unreachable states to remain in the parser tables. @var{VALUE} must be a Boolean. The default is @code{false}. @end deffn @@ -8301,6 +8520,8 @@ automaton, and how to enable and understand the parser run-time traces. @menu * Understanding:: Understanding the structure of your parser. +* Graphviz:: Getting a visual representation of the parser. +* Xml:: Getting a markup representation of the parser. * Tracing:: Tracing the execution of your parser. @end menu @@ -8717,6 +8938,165 @@ precedence of @samp{/} with respect to @samp{+}, @samp{-}, and @samp{*}, but also because the associativity of @samp{/} is not specified. +Note that Bison may also produce an HTML version of this output, via an XML +file and XSLT processing (@pxref{Xml}). + +@c ================================================= Graphical Representation + +@node Graphviz +@section Visualizing Your Parser +@cindex dot + +As another means to gain better understanding of the shift/reduce +automaton corresponding to the Bison parser, a DOT file can be generated. Note +that debugging a real grammar with this is tedious at best, and impractical +most of the times, because the generated files are huge (the generation of +a PDF or PNG file from it will take very long, and more often than not it will +fail due to memory exhaustion). This option was rather designed for beginners, +to help them understand LR parsers. + +This file is generated when the @option{--graph} option is specified +(@pxref{Invocation, , Invoking Bison}). Its name is made by removing +@samp{.tab.c} or @samp{.c} from the parser implementation file name, and +adding @samp{.dot} instead. If the grammar file is @file{foo.y}, the +Graphviz output file is called @file{foo.dot}. + +The following grammar file, @file{rr.y}, will be used in the sequel: + +@example +%% +@group +exp: a ";" | b "."; +a: "0"; +b: "0"; +@end group +@end example + +The graphical output is very similar to the textual one, and as such it is +easier understood by making direct comparisons between them. See +@ref{Debugging, , Debugging Your Parser} for a detailled analysis of the +textual report. + +@subheading Graphical Representation of States + +The items (pointed rules) for each state are grouped together in graph nodes. +Their numbering is the same as in the verbose file. See the following points, +about transitions, for examples + +When invoked with @option{--report=lookaheads}, the lookahead tokens, when +needed, are shown next to the relevant rule between square brackets as a +comma separated list. This is the case in the figure for the representation of +reductions, below. + +@sp 1 + +The transitions are represented as directed edges between the current and +the target states. + +@subheading Graphical Representation of Shifts + +Shifts are shown as solid arrows, labelled with the lookahead token for that +shift. The following describes a reduction in the @file{rr.output} file: + +@example +@group +state 3 + + 1 exp: a . ";" + + ";" shift, and go to state 6 +@end group +@end example + +A Graphviz rendering of this portion of the graph could be: + +@center @image{figs/example-shift, 100pt} + +@subheading Graphical Representation of Reductions + +Reductions are shown as solid arrows, leading to a diamond-shaped node +bearing the number of the reduction rule. The arrow is labelled with the +appropriate comma separated lookahead tokens. If the reduction is the default +action for the given state, there is no such label. + +This is how reductions are represented in the verbose file @file{rr.output}: +@example +state 1 + + 3 a: "0" . [";"] + 4 b: "0" . ["."] + + "." reduce using rule 4 (b) + $default reduce using rule 3 (a) +@end example + +A Graphviz rendering of this portion of the graph could be: + +@center @image{figs/example-reduce, 120pt} + +When unresolved conflicts are present, because in deterministic parsing +a single decision can be made, Bison can arbitrarily choose to disable a +reduction, see @ref{Shift/Reduce, , Shift/Reduce Conflicts}. Discarded actions +are distinguished by a red filling color on these nodes, just like how they are +reported between square brackets in the verbose file. + +The reduction corresponding to the rule number 0 is the acceptation state. It +is shown as a blue diamond, labelled "Acc". + +@subheading Graphical representation of go tos + +The @samp{go to} jump transitions are represented as dotted lines bearing +the name of the rule being jumped to. + +Note that a DOT file may also be produced via an XML file and XSLT +processing (@pxref{Xml}). + +@c ================================================= XML + +@node Xml +@section Visualizing your parser in multiple formats +@cindex xml + +Bison supports two major report formats: textual output +(@pxref{Understanding}) when invoked with option @option{--verbose}, and DOT +(@pxref{Graphviz}) when invoked with option @option{--graph}. However, +another alternative is to output an XML file that may then be, with +@command{xsltproc}, rendered as either a raw text format equivalent to the +verbose file, or as an HTML version of the same file, with clickable +transitions, or even as a DOT. The @file{.output} and DOT files obtained via +XSLT have no difference whatsoever with those obtained by invoking +@command{bison} with options @option{--verbose} or @option{--graph}. + +The textual file is generated when the options @option{-x} or +@option{--xml[=FILE]} are specified, see @ref{Invocation,,Invoking Bison}. +If not specified, its name is made by removing @samp{.tab.c} or @samp{.c} +from the parser implementation file name, and adding @samp{.xml} instead. +For instance, if the grammar file is @file{foo.y}, the default XML output +file is @file{foo.xml}. + +Bison ships with a @file{data/xslt} directory, containing XSL Transformation +files to apply to the XML file. Their names are non-ambiguous: + +@table @file +@item xml2dot.xsl +Used to output a copy of the DOT visualization of the automaton. +@item xml2text.xsl +Used to output a copy of the .output file. +@item xml2xhtml.xsl +Used to output an xhtml enhancement of the .output file. +@end table + +Sample usage (requires @code{xsltproc}): +@example +$ bison -x input.y +@group +$ bison --print-datadir +/usr/local/share/bison +@end group +$ xsltproc /usr/local/share/bison/xslt/xml2xhtml.xsl input.xml > input.html +@end example + +@c ================================================= Tracing @node Tracing @section Tracing Your Parser @@ -8746,9 +9126,17 @@ parser. This is compliant with POSIX Yacc. You could use YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue, , The Prologue}). -@item the option @option{-t}, @option{--debug} -Use the @samp{-t} option when you run Bison (@pxref{Invocation, -,Invoking Bison}). This is POSIX compliant too. +If the @code{%define} variable @code{api.prefix} is used (@pxref{Multiple +Parsers, ,Multiple Parsers in the Same Program}), for instance @samp{%define +api.prefix x}, then if @code{CDEBUG} is defined, its value controls the +tracing feature (enabled if and only if nonzero); otherwise tracing is +enabled if and only if @code{YYDEBUG} is nonzero. + +@item the option @option{-t} (POSIX Yacc compliant) +@itemx the option @option{--debug} (Bison extension) +Use the @samp{-t} option when you run Bison (@pxref{Invocation, ,Invoking +Bison}). With @samp{%define api.prefix c}, it defines @code{CDEBUG} to 1, +otherwise it defines @code{YYDEBUG} to 1. @item the directive @samp{%debug} @findex %debug @@ -9164,6 +9552,10 @@ unexpected number of conflicts is an error, and an expected number of conflicts is not reported, so @option{-W} and @option{--warning} then have no effect on the conflict report. +@item deprecated +Deprecated constructs whose support will be removed in future versions of +Bison. + @item other All warnings not categorized above. These warnings are enabled by default. @@ -9176,12 +9568,33 @@ All the warnings. @item none Turn off all the warnings. @item error -Treat warnings as errors. +See @option{-Werror}, below. @end table A category can be turned off by prefixing its name with @samp{no-}. For instance, @option{-Wno-yacc} will hide the warnings about POSIX Yacc incompatibilities. + +@item -Werror[=@var{category}] +@itemx -Wno-error[=@var{category}] +Enable warnings falling in @var{category}, and treat them as errors. If no +@var{category} is given, it defaults to making all enabled warnings into errors. + +@var{category} is the same as for @option{--warnings}, with the exception that +it may not be prefixed with @samp{no-} (see above). + +Prefixed with @samp{no}, it deactivates the error treatment for this +@var{category}. However, the warning itself won't be disabled, or enabled, by +this option. + +Note that the precedence of the @samp{=} and @samp{,} operators is such that +the following commands are @emph{not} equivalent, as the first will not treat +S/R conflicts as errors. + +@example +$ bison -Werror=yacc,conflicts-sr input.y +$ bison -Werror=yacc,error=conflicts-sr input.y +@end example @end table @noindent @@ -9238,8 +9651,9 @@ Pretend that @code{%locations} was specified. @xref{Decl Summary}. @item -p @var{prefix} @itemx --name-prefix=@var{prefix} -Pretend that @code{%name-prefix "@var{prefix}"} was specified. -@xref{Decl Summary}. +Pretend that @code{%name-prefix "@var{prefix}"} was specified (@pxref{Decl +Summary}). Obsoleted by @code{-Dapi.prefix=@var{prefix}}. @xref{Multiple +Parsers, ,Multiple Parsers in the Same Program}. @item -l @itemx --no-lines @@ -9300,13 +9714,23 @@ separated list of @var{things} among: Description of the grammar, conflicts (resolved and unresolved), and parser's automaton. +@item itemset +Implies @code{state} and augments the description of the automaton with +the full set of items for each state, instead of its core only. + @item lookahead Implies @code{state} and augments the description of the automaton with each rule's lookahead set. -@item itemset -Implies @code{state} and augments the description of the automaton with -the full set of items for each state, instead of its core only. +@item solved +Implies @code{state}. Explain how conflicts were solved thanks to +precedence and associativity directives. + +@item all +Enable all the items. + +@item none +Do not generate the report. @end table @item --report-file=@var{file} @@ -9424,8 +9848,10 @@ in the following files: @table @file @item position.hh @itemx location.hh -The definition of the classes @code{position} and @code{location}, -used for location tracking when enabled. @xref{C++ Location Values}. +The definition of the classes @code{position} and @code{location}, used for +location tracking when enabled. These files are not generated if the +@code{%define} variable @code{api.location.type} is defined. @xref{C++ +Location Values}. @item stack.hh An auxiliary class @code{stack} used by the parser. @@ -9584,10 +10010,13 @@ is some time and/or some talented C++ hacker willing to contribute to Bison. @c - %define filename_type "const symbol::Symbol" When the directive @code{%locations} is used, the C++ parser supports -location tracking, see @ref{Tracking Locations}. Two auxiliary classes -define a @code{position}, a single point in a file, and a @code{location}, a -range composed of a pair of @code{position}s (possibly spanning several -files). +location tracking, see @ref{Tracking Locations}. + +By default, two auxiliary classes define a @code{position}, a single point +in a file, and a @code{location}, a range composed of a pair of +@code{position}s (possibly spanning several files). But if the +@code{%define} variable @code{api.location.type} is defined, then these +classes will not be generated, and the user defined type will be used. @tindex uint In this section @code{uint} is an abbreviation for @code{unsigned int}: in @@ -9596,6 +10025,7 @@ genuine code only the latter is used. @menu * C++ position:: One point in the source file * C++ location:: Two points in the source file +* User Defined Location Type:: Required interface for locations @end menu @node C++ position @@ -9699,6 +10129,63 @@ Report @var{p} on @var{o}, taking care of special cases such as: no @code{filename} defined, or equal filename/line or column. @end deftypefun +@node User Defined Location Type +@subsubsection User Defined Location Type +@findex %define api.location.type + +Instead of using the built-in types you may use the @code{%define} variable +@code{api.location.type} to specify your own type: + +@example +%define api.location.type @var{LocationType} +@end example + +The requirements over your @var{LocationType} are: +@itemize +@item +it must be copyable; + +@item +in order to compute the (default) value of @code{@@$} in a reduction, the +parser basically runs +@example +@@$.begin = @@$1.begin; +@@$.end = @@$@var{N}.end; // The location of last right-hand side symbol. +@end example +@noindent +so there must be copyable @code{begin} and @code{end} members; + +@item +alternatively you may redefine the computation of the default location, in +which case these members are not required (@pxref{Location Default Action}); + +@item +if traces are enabled, then there must exist an @samp{std::ostream& + operator<< (std::ostream& o, const @var{LocationType}& s)} function. +@end itemize + +@sp 1 + +In programs with several C++ parsers, you may also use the @code{%define} +variable @code{api.location.type} to share a common set of built-in +definitions for @code{position} and @code{location}. For instance, one +parser @file{master/parser.yy} might use: + +@example +%defines +%locations +%define namespace "master::" +@end example + +@noindent +to generate the @file{master/position.hh} and @file{master/location.hh} +files, reused by other parsers as follows: + +@example +%define api.location.type "master::location" +%code requires @{ #include @} +@end example + @node C++ Parser Interface @subsection C++ Parser Interface @c - define parser_class_name @@ -9752,6 +10239,11 @@ Instantiate a syntax-error exception. @deftypemethod {parser} {int} parse () Run the syntactic analysis, and return 0 on success, 1 otherwise. + +@cindex exceptions +The whole function is wrapped in a @code{try}/@code{catch} block, so that +when an exception is thrown, the @code{%destructor}s are called to release +the lookahead symbol, and the symbols pushed on the stack. @end deftypemethod @deftypemethod {parser} {std::ostream&} debug_stream () @@ -9851,7 +10343,8 @@ or @node Complete Symbols @subsubsection Complete Symbols -If you specified both @code{%define variant} and @code{%define lex_symbol}, +If you specified both @code{%define variant} and +@code{%define api.token.constructor}, the @code{parser} class also defines the class @code{parser::symbol_type} which defines a @emph{complete} symbol, aggregating its type (i.e., the traditional value returned by @code{yylex}), its semantic value (i.e., the @@ -9873,7 +10366,7 @@ So for each token type, Bison generates named constructors as follows. @deftypemethod {symbol_type} {} make_@var{token} (const @var{value_type}& @var{value}, const location_type& @var{location}) @deftypemethodx {symbol_type} {} make_@var{token} (const location_type& @var{location}) Build a complete terminal symbol for the token type @var{token} (not -including the @code{api.tokens.prefix}) whose possible semantic value is +including the @code{api.token.prefix}) whose possible semantic value is @var{value} of adequate @var{value_type}. If location tracking is enabled, also pass the @var{location}. @end deftypemethod @@ -9881,7 +10374,7 @@ also pass the @var{location}. For instance, given the following declarations: @example -%define api.tokens.prefix "TOK_" +%define api.token.prefix "TOK_" %token IDENTIFIER; %token INTEGER; %token COLON; @@ -10113,18 +10606,18 @@ the grammar for. @end example @noindent +@findex %define api.token.constructor @findex %define variant -@findex %define lex_symbol This example will use genuine C++ objects as semantic values, therefore, we require the variant-based interface. To make sure we properly use it, we enable assertions. To fully benefit from type-safety and more natural -definition of ``symbol'', we enable @code{lex_symbol}. +definition of ``symbol'', we enable @code{api.token.constructor}. @comment file: calc++-parser.yy @example -%define variant +%define api.token.constructor %define parse.assert -%define lex_symbol +%define variant @end example @noindent @@ -10203,11 +10696,11 @@ The token numbered as 0 corresponds to end of file; the following line allows for nicer error messages referring to ``end of file'' instead of ``$end''. Similarly user friendly names are provided for each symbol. To avoid name clashes in the generated files (@pxref{Calc++ Scanner}), prefix -tokens with @code{TOK_} (@pxref{%define Summary,,api.tokens.prefix}). +tokens with @code{TOK_} (@pxref{%define Summary,,api.token.prefix}). @comment file: calc++-parser.yy @example -%define api.tokens.prefix "TOK_" +%define api.token.prefix "TOK_" %token END 0 "end of file" ASSIGN ":=" @@ -10236,9 +10729,8 @@ tags. No @code{%destructor} is needed to enable memory deallocation during error recovery; the memory, for strings for instance, will be reclaimed by the regular destructors. All the values are printed using their -@code{operator<<}. +@code{operator<<} (@pxref{Printer Decl, , Printing Semantic Values}). -@c FIXME: Document %printer, and mention that it takes a braced-code operand. @comment file: calc++-parser.yy @example %printer @{ yyoutput << $$; @} <*>; @@ -10580,11 +11072,11 @@ class defines a @dfn{position}, a single point in a file; Bison itself defines a class representing a @dfn{location}, a range composed of a pair of positions (possibly spanning several files). The location class is an inner class of the parser; the name is @code{Location} by default, and may also be -renamed using @samp{%define location_type "@var{class-name}"}. +renamed using @code{%define api.location.type "@var{class-name}"}. The location class treats the position as a completely opaque value. By default, the class name is @code{Position}, but this can be changed -with @samp{%define position_type "@var{class-name}"}. This class must +with @code{%define api.position.type "@var{class-name}"}. This class must be supplied by the user. @@ -10746,7 +11238,7 @@ In both cases, the scanner has to implement the following methods. @deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg}) This method is defined by the user to emit an error message. The first parameter is omitted if location tracking is not active. Its type can be -changed using @samp{%define location_type "@var{class-name}".} +changed using @code{%define api.location.type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {int} yylex () @@ -10764,7 +11256,7 @@ Return respectively the first position of the last token that @code{yylex} returned, and the first position beyond it. These methods are not needed unless location tracking is active. -The return type can be changed using @samp{%define position_type +The return type can be changed using @code{%define api.position.type "@var{class-name}".} @end deftypemethod @@ -11026,10 +11518,11 @@ comma-separated list. Default is @code{java.io.IOException}. @xref{Java Scanner Interface}. @end deffn -@deffn {Directive} {%define location_type} "@var{class}" +@deffn {Directive} {%define api.location.type} "@var{class}" The name of the class used for locations (a range between two positions). This class is generated as an inner class of the parser class by @command{bison}. Default is @code{Location}. +Formerly named @code{location_type}. @xref{Java Location Values}. @end deffn @@ -11044,9 +11537,10 @@ The name of the parser class. Default is @code{YYParser} or @xref{Java Bison Interface}. @end deffn -@deffn {Directive} {%define position_type} "@var{class}" +@deffn {Directive} {%define api.position.type} "@var{class}" The name of the class used for positions. This class must be supplied by the user. Default is @code{Position}. +Formerly named @code{position_type}. @xref{Java Location Values}. @end deffn @@ -11701,9 +12195,24 @@ function is applied to the two semantic values to get a single result. @end deffn @deffn {Directive} %name-prefix "@var{prefix}" -Bison declaration to rename the external symbols. @xref{Decl Summary}. +Obsoleted by the @code{%define} variable @code{api.prefix} (@pxref{Multiple +Parsers, ,Multiple Parsers in the Same Program}). + +Rename the external symbols (variables and functions) used in the parser so +that they start with @var{prefix} instead of @samp{yy}. Contrary to +@code{api.prefix}, do no rename types and macros. + +The precise list of symbols renamed in C parsers is @code{yyparse}, +@code{yylex}, @code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yychar}, +@code{yydebug}, and (if locations are used) @code{yylloc}. If you use a +push parser, @code{yypush_parse}, @code{yypull_parse}, @code{yypstate}, +@code{yypstate_new} and @code{yypstate_delete} will also be renamed. For +example, if you use @samp{%name-prefix "c_"}, the names become +@code{c_parse}, @code{c_lex}, and so on. For C++ parsers, see the +@code{%define namespace} documentation in this section. @end deffn + @ifset defaultprec @deffn {Directive} %no-default-prec Do not assign a precedence to rules that lack an explicit @samp{%prec} @@ -11985,13 +12494,6 @@ parse a single token. @xref{Push Parser Function, ,The Push Parser Function More user feedback will help to stabilize it.) @end deffn -@deffn {Macro} YYPARSE_PARAM -An obsolete macro for specifying the name of a parameter that -@code{yyparse} should accept. The use of this macro is deprecated, and -is supported only for Yacc like parsers. @xref{Pure Calling,, Calling -Conventions for Pure Parsers}. -@end deffn - @deffn {Macro} YYRECOVERING The expression @code{YYRECOVERING ()} yields 1 when the parser is recovering from a syntax error, and 0 otherwise. @@ -12277,8 +12779,8 @@ London, Department of Computer Science, TR-00-12 (December 2000). @uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps} @end table -@node Index -@unnumbered Index +@node Index of Terms +@unnumbered Index of Terms @printindex cp @@ -12333,10 +12835,15 @@ London, Department of Computer Science, TR-00-12 (December 2000). @c LocalWords: toString deftypeivar deftypeivarx deftypeop YYParser strictfp @c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv @c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url -@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos +@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos uint @c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's -@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy -@c LocalWords: Scannerless ispell american +@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy ints +@c LocalWords: Scannerless ispell american ChangeLog smallexample CSTYPE CLTYPE +@c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate LocationType +@c LocalWords: parsers parser's +@c LocalWords: associativity subclasses precedences unresolvable runnable +@c LocalWords: allocators subunit initializations unreferenced untyped +@c LocalWords: errorVerbose subtype subtypes @c Local Variables: @c ispell-dictionary: "american"