X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/56da1e5288143da723662cd22c18646ee7ad9abb..6cf3716c3166d6d41cdbeea45d018b1760184421:/doc/bison.texinfo?ds=sidebyside diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 6574704a..d1223bdc 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -135,7 +135,8 @@ Writing GLR Parsers * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* GLR Semantic Actions:: Considerations for semantic values and deferred actions. +* Semantic Predicates:: Controlling a parse with arbitrary computations. * Compiler Requirements:: GLR parsers require a modern C compiler. Examples @@ -162,9 +163,9 @@ Reverse Polish Notation Calculator Grammar Rules for @code{rpcalc} -* Rpcalc Input:: -* Rpcalc Line:: -* Rpcalc Expr:: +* Rpcalc Input:: Explanation of the @code{input} nonterminal +* Rpcalc Line:: Explanation of the @code{line} nonterminal +* Rpcalc Expr:: Explanation of the @code{expr} nonterminal Location Tracking Calculator: @code{ltcalc} @@ -177,6 +178,8 @@ Multi-Function Calculator: @code{mfcalc} * Mfcalc Declarations:: Bison declarations for multi-function calculator. * Mfcalc Rules:: Grammar rules for the calculator. * Mfcalc Symbol Table:: Symbol table management subroutines. +* Mfcalc Lexer:: The lexical analyzer. +* Mfcalc Main:: The controlling function. Bison Grammar Files @@ -223,6 +226,7 @@ Bison Declarations * Type Decl:: Declaring the choice of type for a nonterminal symbol. * Initial Action Decl:: Code run before parsing starts. * Destructor Decl:: Declaring how symbols are freed. +* Printer Decl:: Declaring how symbol values are displayed. * Expect Decl:: Suppressing warnings about parsing conflicts. * Start Decl:: Specifying the start symbol. * Pure Decl:: Requesting a reentrant parser. @@ -272,7 +276,8 @@ The Bison Parser Algorithm Operator Precedence * Why Precedence:: An example showing why precedence is needed. -* Using Precedence:: How to specify precedence in Bison grammars. +* Using Precedence:: How to specify precedence and associativity. +* Precedence Only:: How to specify precedence only. * Precedence Examples:: How these features are used in the previous example. * How Precedence:: How they work. @@ -295,6 +300,12 @@ Debugging Your Parser * Understanding:: Understanding the structure of your parser. * Tracing:: Tracing the execution of your parser. +Tracing Your Parser + +* Enabling Traces:: Activating run-time trace support +* Mfcalc Traces:: Extending @code{mfcalc} to support traces +* The YYPRINT Macro:: Obsolete interface for semantic value reports + Invoking Bison * Bison Options:: All the options described in detail, @@ -316,6 +327,11 @@ C++ Parsers * C++ Scanner Interface:: Exchanges between yylex and parse * A Complete C++ Example:: Demonstrating their use +C++ Location Values + +* C++ position:: One point in the source file +* C++ location:: Two points in the source file + A Complete C++ Example * Calc++ --- C++ Calculator:: The specifications @@ -757,7 +773,8 @@ merged result. @menu * Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. * Merging GLR Parses:: Using GLR parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* GLR Semantic Actions:: Considerations for semantic values and deferred actions. +* Semantic Predicates:: Controlling a parse with arbitrary computations. * Compiler Requirements:: GLR parsers require a modern C compiler. @end menu @@ -1125,6 +1142,10 @@ the offending merge. @node GLR Semantic Actions @subsection GLR Semantic Actions +The nature of GLR parsing and the structure of the generated +parsers give rise to certain restrictions on semantic values and actions. + +@subsubsection Deferred semantic actions @cindex deferred semantic actions By definition, a deferred semantic action is not performed at the same time as the associated reduction. @@ -1158,6 +1179,7 @@ For example, if a semantic action might be deferred, you should never write it to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free memory referenced by @code{yylval}. +@subsubsection YYERROR @findex YYERROR @cindex GLR parsers and @code{YYERROR} Another Bison feature requiring special consideration is @code{YYERROR} @@ -1165,11 +1187,78 @@ Another Bison feature requiring special consideration is @code{YYERROR} initiate error recovery. During deterministic GLR operation, the effect of @code{YYERROR} is the same as its effect in a deterministic parser. -In a deferred semantic action, its effect is undefined. -@c The effect is probably a syntax error at the split point. +The effect in a deferred action is similar, but the precise point of the +error is undefined; instead, the parser reverts to deterministic operation, +selecting an unspecified stack on which to continue with a syntax error. +In a semantic predicate (see @ref{Semantic Predicates}) during nondeterministic +parsing, @code{YYERROR} silently prunes +the parse that invoked the test. + +@subsubsection Restrictions on semantic values and locations +GLR parsers require that you use POD (Plain Old Data) types for +semantic values and location types when using the generated parsers as +C++ code. + +@node Semantic Predicates +@subsection Controlling a Parse with Arbitrary Predicates +@findex %? +@cindex Semantic predicates in GLR parsers + +In addition to the @code{%dprec} and @code{%merge} directives, +GLR parsers +allow you to reject parses on the basis of arbitrary computations executed +in user code, without having Bison treat this rejection as an error +if there are alternative parses. (This feature is experimental and may +evolve. We welcome user feedback.) For example, + +@example +widget: + %?@{ new_syntax @} "widget" id new_args @{ $$ = f($3, $4); @} +| %?@{ !new_syntax @} "widget" id old_args @{ $$ = f($3, $4); @} +; +@end example + +@noindent +is one way to allow the same parser to handle two different syntaxes for +widgets. The clause preceded by @code{%?} is treated like an ordinary +action, except that its text is treated as an expression and is always +evaluated immediately (even when in nondeterministic mode). If the +expression yields 0 (false), the clause is treated as a syntax error, +which, in a nondeterministic parser, causes the stack in which it is reduced +to die. In a deterministic parser, it acts like YYERROR. + +As the example shows, predicates otherwise look like semantic actions, and +therefore you must be take them into account when determining the numbers +to use for denoting the semantic values of right-hand side symbols. +Predicate actions, however, have no defined value, and may not be given +labels. + +There is a subtle difference between semantic predicates and ordinary +actions in nondeterministic mode, since the latter are deferred. +For example, we could try to rewrite the previous example as + +@example +widget: + @{ if (!new_syntax) YYERROR; @} + "widget" id new_args @{ $$ = f($3, $4); @} +| @{ if (new_syntax) YYERROR; @} + "widget" id old_args @{ $$ = f($3, $4); @} +; +@end example -Also, see @ref{Location Default Action, ,Default Action for Locations}, which -describes a special usage of @code{YYLLOC_DEFAULT} in GLR parsers. +@noindent +(reversing the sense of the predicate tests to cause an error when they are +false). However, this +does @emph{not} have the same effect if @code{new_args} and @code{old_args} +have overlapping syntax. +Since the mid-rule actions testing @code{new_syntax} are deferred, +a GLR parser first encounters the unresolved ambiguous reduction +for cases where @code{new_args} and @code{old_args} recognize the same string +@emph{before} performing the tests of @code{new_syntax}. It therefore +reports an error. + +Finally, be careful in writing predicates: deferred actions have not been +evaluated, so that using them in a predicate will have undefined effects. @node Compiler Requirements @subsection Considerations when Compiling GLR Parsers @@ -1438,11 +1527,13 @@ The source code for this calculator is named @file{rpcalc.y}. The Here are the C and Bison declarations for the reverse polish notation calculator. As in C, comments are placed between @samp{/*@dots{}*/}. +@comment file: rpcalc.y @example /* Reverse polish notation calculator. */ %@{ #define YYSTYPE double + #include #include int yylex (void); void yyerror (char const *); @@ -1487,6 +1578,7 @@ type for numeric constants. Here are the grammar rules for the reverse polish notation calculator. +@comment file: rpcalc.y @example @group input: @@ -1535,9 +1627,9 @@ main job of most actions. The semantic values of the components of the rule are referred to as @code{$1}, @code{$2}, and so on. @menu -* Rpcalc Input:: -* Rpcalc Line:: -* Rpcalc Expr:: +* Rpcalc Input:: Explanation of the @code{input} nonterminal +* Rpcalc Line:: Explanation of the @code{line} nonterminal +* Rpcalc Expr:: Explanation of the @code{expr} nonterminal @end menu @node Rpcalc Input @@ -1701,6 +1793,7 @@ A token type code of zero is returned if the end-of-input is encountered. Here is the code for the lexical analyzer: +@comment file: rpcalc.y @example @group /* The lexical analyzer returns a double floating point @@ -1749,6 +1842,7 @@ In keeping with the spirit of this example, the controlling function is kept to the bare minimum. The only requirement is that it call @code{yyparse} to start the process of parsing. +@comment file: rpcalc.y @example @group int @@ -1769,6 +1863,7 @@ always @code{"syntax error"}). It is up to the programmer to supply @code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so here is the definition we will use: +@comment file: rpcalc.y @example @group #include @@ -1853,15 +1948,15 @@ example session using @code{rpcalc}. @example $ @kbd{rpcalc} @kbd{4 9 +} -13 +@result{} 13 @kbd{3 7 + 3 4 5 *+-} --13 +@result{} -13 @kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}} -13 +@result{} 13 @kbd{5 6 / 4 n +} --3.166666667 +@result{} -3.166666667 @kbd{3 4 ^} @r{Exponentiation} -81 +@result{} 81 @kbd{^D} @r{End-of-file indicator} $ @end example @@ -1895,8 +1990,8 @@ parentheses nested to arbitrary depth. Here is the Bison code for %token NUM %left '-' '+' %left '*' '/' -%left NEG /* negation--unary minus */ -%right '^' /* exponentiation */ +%precedence NEG /* negation--unary minus */ +%right '^' /* exponentiation */ @end group %% /* The grammar follows. */ @@ -1939,15 +2034,16 @@ In the second section (Bison declarations), @code{%left} declares token types and says they are left-associative operators. The declarations @code{%left} and @code{%right} (right associativity) take the place of @code{%token} which is used to declare a token type name without -associativity. (These tokens are single-character literals, which +associativity/precedence. (These tokens are single-character literals, which ordinarily don't need to be declared. We declare them here to specify -the associativity.) +the associativity/precedence.) Operator precedence is determined by the line ordering of the declarations; the higher the line number of the declaration (lower on the page or screen), the higher the precedence. Hence, exponentiation has the highest precedence, unary minus (@code{NEG}) is next, followed -by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator +by @samp{*} and @samp{/}, and so on. Unary minus is not associative, +only precedence matters (@code{%precedence}. @xref{Precedence, ,Operator Precedence}. The other important new feature is the @code{%prec} in the grammar @@ -2051,7 +2147,7 @@ the same as the declarations for the infix notation calculator. %left '-' '+' %left '*' '/' -%left NEG +%precedence NEG %right '^' %% /* The grammar follows. */ @@ -2252,19 +2348,23 @@ to create named variables, store values in them, and use them later. Here is a sample session with the multi-function calculator: @example +@group $ @kbd{mfcalc} @kbd{pi = 3.141592653589} -3.1415926536 +@result{} 3.1415926536 +@end group +@group @kbd{sin(pi)} -0.0000000000 +@result{} 0.0000000000 +@end group @kbd{alpha = beta1 = 2.3} -2.3000000000 +@result{} 2.3000000000 @kbd{alpha} -2.3000000000 +@result{} 2.3000000000 @kbd{ln(alpha)} -0.8329091229 +@result{} 0.8329091229 @kbd{exp(ln(beta1))} -2.3000000000 +@result{} 2.3000000000 $ @end example @@ -2274,6 +2374,8 @@ Note that multiple assignment and nested function calls are permitted. * Mfcalc Declarations:: Bison declarations for multi-function calculator. * Mfcalc Rules:: Grammar rules for the calculator. * Mfcalc Symbol Table:: Symbol table management subroutines. +* Mfcalc Lexer:: The lexical analyzer. +* Mfcalc Main:: The controlling function. @end menu @node Mfcalc Declarations @@ -2281,16 +2383,18 @@ Note that multiple assignment and nested function calls are permitted. Here are the C and Bison declarations for the multi-function calculator. -@comment file: mfcalc.y +@comment file: mfcalc.y: 1 @example @group %@{ - #include /* For math functions, cos(), sin(), etc. */ - #include "calc.h" /* Contains definition of `symrec'. */ + #include /* For printf, etc. */ + #include /* For pow, used in the grammar. */ + #include "calc.h" /* Contains definition of `symrec'. */ int yylex (void); void yyerror (char const *); %@} @end group + @group %union @{ double val; /* For returning numbers. */ @@ -2298,17 +2402,16 @@ Here are the C and Bison declarations for the multi-function calculator. @} @end group %token NUM /* Simple double precision number. */ -%token VAR FNCT /* Variable and Function. */ +%token VAR FNCT /* Variable and function. */ %type exp @group %right '=' %left '-' '+' %left '*' '/' -%left NEG /* negation--unary minus */ -%right '^' /* exponentiation */ +%precedence NEG /* negation--unary minus */ +%right '^' /* exponentiation */ @end group -%% /* The grammar follows. */ @end example The above grammar introduces only two new features of the Bison language. @@ -2340,8 +2443,9 @@ Here are the grammar rules for the multi-function calculator. Most of them are copied directly from @code{calc}; three rules, those which mention @code{VAR} or @code{FNCT}, are new. -@comment file: mfcalc.y +@comment file: mfcalc.y: 3 @example +%% /* The grammar follows. */ @group input: /* empty */ @@ -2422,22 +2526,11 @@ symrec *getsym (char const *); @end group @end example -The new version of @code{main} includes a call to @code{init_table}, a -function that initializes the symbol table. Here it is, and -@code{init_table} as well: +The new version of @code{main} will call @code{init_table} to initialize +the symbol table: +@comment file: mfcalc.y: 3 @example -#include - -@group -/* Called by yyparse on error. */ -void -yyerror (char const *s) -@{ - printf ("%s\n", s); -@} -@end group - @group struct init @{ @@ -2449,13 +2542,13 @@ struct init @group struct init const arith_fncts[] = @{ - "sin", sin, - "cos", cos, - "atan", atan, - "ln", log, - "exp", exp, - "sqrt", sqrt, - 0, 0 + @{ "atan", atan @}, + @{ "cos", cos @}, + @{ "exp", exp @}, + @{ "ln", log @}, + @{ "sin", sin @}, + @{ "sqrt", sqrt @}, + @{ 0, 0 @}, @}; @end group @@ -2466,6 +2559,7 @@ symrec *sym_table; @group /* Put arithmetic functions in table. */ +static void init_table (void) @{ @@ -2477,15 +2571,6 @@ init_table (void) @} @} @end group - -@group -int -main (void) -@{ - init_table (); - return yyparse (); -@} -@end group @end example By simply editing the initialization list and adding the necessary include @@ -2498,7 +2583,7 @@ linked to the front of the list, and a pointer to the object is returned. The function @code{getsym} is passed the name of the symbol to look up. If found, a pointer to that symbol is returned; otherwise zero is returned. -@comment file: mfcalc.y +@comment file: mfcalc.y: 3 @example #include /* malloc. */ #include /* strlen. */ @@ -2525,13 +2610,16 @@ getsym (char const *sym_name) symrec *ptr; for (ptr = sym_table; ptr != (symrec *) 0; ptr = (symrec *)ptr->next) - if (strcmp (ptr->name,sym_name) == 0) + if (strcmp (ptr->name, sym_name) == 0) return ptr; return 0; @} @end group @end example +@node Mfcalc Lexer +@subsection The @code{mfcalc} Lexer + The function @code{yylex} must now recognize variables, numeric values, and the single-character arithmetic operators. Strings of alphanumeric characters with a leading letter are recognized as either variables or @@ -2547,7 +2635,7 @@ returned to @code{yyparse}. No change is needed in the handling of numeric values and arithmetic operators in @code{yylex}. -@comment file: mfcalc.y +@comment file: mfcalc.y: 3 @example @group #include @@ -2588,7 +2676,6 @@ yylex (void) symrec *s; int i; @end group - if (!symbuf) symbuf = (char *) malloc (length + 1); @@ -2629,6 +2716,39 @@ yylex (void) @end group @end example +@node Mfcalc Main +@subsection The @code{mfcalc} Main + +The error reporting function is unchanged, and the new version of +@code{main} includes a call to @code{init_table} and sets the @code{yydebug} +on user demand (@xref{Tracing, , Tracing Your Parser}, for details): + +@comment file: mfcalc.y: 3 +@example +@group +/* Called by yyparse on error. */ +void +yyerror (char const *s) +@{ + fprintf (stderr, "%s\n", s); +@} +@end group + +@group +int +main (int argc, char const* argv[]) +@{ + int i; + /* Enable parse traces on option -p. */ + for (i = 1; i < argc; ++i) + if (!strcmp(argv[i], "-p")) + yydebug = 1; + init_table (); + return yyparse (); +@} +@end group +@end example + This program is both powerful and flexible. You may easily add new functions, and it is a simple job to modify this code to install predefined variables such as @code{pi} or @code{e} as well. @@ -3035,14 +3155,14 @@ type: %code requires @{ #include "type1.h" @} %union @{ type1 field1; @} %destructor @{ type1_free ($$); @} -%printer @{ type1_print ($$); @} +%printer @{ type1_print (yyoutput, $$); @} @end group @group %code requires @{ #include "type2.h" @} %union @{ type2 field2; @} %destructor @{ type2_free ($$); @} -%printer @{ type2_print ($$); @} +%printer @{ type2_print (yyoutput, $$); @} @end group @end example @@ -4215,6 +4335,7 @@ and Context-Free Grammars}). * Type Decl:: Declaring the choice of type for a nonterminal symbol. * Initial Action Decl:: Code run before parsing starts. * Destructor Decl:: Declaring how symbols are freed. +* Printer Decl:: Declaring how symbol values are displayed. * Expect Decl:: Suppressing warnings about parsing conflicts. * Start Decl:: Specifying the start symbol. * Pure Decl:: Requesting a reentrant parser. @@ -4255,7 +4376,8 @@ Bison will convert this into a @code{#define} directive in the parser, so that the function @code{yylex} (if it is in this file) can use the name @var{name} to stand for this token type's code. -Alternatively, you can use @code{%left}, @code{%right}, or +Alternatively, you can use @code{%left}, @code{%right}, +@code{%precedence}, or @code{%nonassoc} instead of @code{%token}, if you wish to specify associativity and precedence. @xref{Precedence Decl, ,Operator Precedence}. @@ -4331,7 +4453,8 @@ of ``$end'': @cindex declaring operator precedence @cindex operator precedence, declaring -Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to +Use the @code{%left}, @code{%right}, @code{%nonassoc}, or +@code{%precedence} declaration to declare a token and specify its precedence and associativity, all at once. These are called @dfn{precedence declarations}. @xref{Precedence, ,Operator Precedence}, for general information on @@ -4367,6 +4490,10 @@ left-associativity (grouping @var{x} with @var{y} first) and means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is considered a syntax error. +@code{%precedence} gives only precedence to the @var{symbols}, and +defines no associativity at all. Use this to define precedence only, +and leave any potential conflict due to associativity enabled. + @item The precedence of an operator determines how it nests with other operators. All the tokens declared in a single precedence declaration have equal @@ -4668,6 +4795,69 @@ error via @code{YYERROR} are not discarded automatically. As a rule of thumb, destructors are invoked only when user actions cannot manage the memory. +@node Printer Decl +@subsection Printing Semantic Values +@cindex printing semantic values +@findex %printer +@findex <*> +@findex <> +When run-time traces are enabled (@pxref{Tracing, ,Tracing Your Parser}), +the parser reports its actions, such as reductions. When a symbol involved +in an action is reported, only its kind is displayed, as the parser cannot +know how semantic values should be formatted. + +The @code{%printer} directive defines code that is called when a symbol is +reported. Its syntax is the same as @code{%destructor} (@pxref{Destructor +Decl, , Freeing Discarded Symbols}). + +@deffn {Directive} %printer @{ @var{code} @} @var{symbols} +@findex %printer +@vindex yyoutput +@c This is the same text as for %destructor. +Invoke the braced @var{code} whenever the parser displays one of the +@var{symbols}. Within @var{code}, @code{yyoutput} denotes the output stream +(a @code{FILE*} in C, and an @code{std::ostream&} in C++), +@code{$$} designates the semantic value associated with the symbol, and +@code{@@$} its location. The additional parser parameters are also +available (@pxref{Parser Function, , The Parser Function @code{yyparse}}). + +The @var{symbols} are defined as for @code{%destructor} (@pxref{Destructor +Decl, , Freeing Discarded Symbols}.): they can be per-type (e.g., +@samp{}), per-symbol (e.g., @samp{exp}, @samp{NUM}, @samp{"float"}), +typed per-default (i.e., @samp{<*>}, or untyped per-default (i.e., +@samp{<>}). +@end deffn + +@noindent +For example: + +@example +%union @{ char *string; @} +%token STRING1 +%token STRING2 +%type string1 +%type string2 +%union @{ char character; @} +%token CHR +%type chr +%token TAGLESS + +%printer @{ fprintf (yyoutput, "'%c'", $$); @} +%printer @{ fprintf (yyoutput, "&%p", $$); @} <*> +%printer @{ fprintf (yyoutput, "\"%s\"", $$); @} STRING1 string1 +%printer @{ fprintf (yyoutput, "<>"); @} <> +@end example + +@noindent +guarantees that, when the parser print any symbol that has a semantic type +tag other than @code{}, it display the address of the semantic +value by default. However, when the parser displays a @code{STRING1} or a +@code{string1}, it formats it as a string in double quotes. It performs +only the second @code{%printer} in this case, so it prints only once. +Finally, the parser print @samp{<>} for any symbol, such as @code{TAGLESS}, +that has no semantic type tag. See also + + @node Expect Decl @subsection Suppressing Conflict Warnings @cindex suppressing conflict warnings @@ -4764,7 +4954,7 @@ statically allocated variables for communication with @code{yylex}, including @code{yylval} and @code{yylloc}.) Alternatively, you can generate a pure, reentrant parser. The Bison -declaration @code{%define api.pure} says that you want the parser to be +declaration @samp{%define api.pure} says that you want the parser to be reentrant. It looks like this: @example @@ -4868,14 +5058,14 @@ for use by the next invocation of the @code{yypush_parse} function. Bison also supports both the push parser interface along with the pull parser interface in the same generated parser. In order to get this functionality, -you should replace the @code{%define api.push-pull push} declaration with the -@code{%define api.push-pull both} declaration. Doing this will create all of +you should replace the @samp{%define api.push-pull push} declaration with the +@samp{%define api.push-pull both} declaration. Doing this will create all of the symbols mentioned earlier along with the two extra symbols, @code{yyparse} and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally would be used. However, the user should note that it is implemented in the generated parser by calling @code{yypull_parse}. This makes the @code{yyparse} function that is generated with the -@code{%define api.push-pull both} declaration slower than the normal +@samp{%define api.push-pull both} declaration slower than the normal @code{yyparse} function. If the user calls the @code{yypull_parse} function it will parse the rest of the input stream. It is possible to @code{yypush_parse} tokens to select a subgrammar @@ -4891,9 +5081,9 @@ yypull_parse (ps); /* Will call the lexer */ yypstate_delete (ps); @end example -Adding the @code{%define api.pure} declaration does exactly the same thing to -the generated parser with @code{%define api.push-pull both} as it did for -@code{%define api.push-pull push}. +Adding the @samp{%define api.pure} declaration does exactly the same thing to +the generated parser with @samp{%define api.push-pull both} as it did for +@samp{%define api.push-pull push}. @node Decl Summary @subsection Bison Declaration Summary @@ -4966,9 +5156,9 @@ default location or at the location specified by @var{qualifier}. @end deffn @deffn {Directive} %debug -In the parser implementation file, define the macro @code{YYDEBUG} to -1 if it is not already defined, so that the debugging facilities are -compiled. @xref{Tracing, ,Tracing Your Parser}. +Instrument the output parser for traces. Obsoleted by @samp{%define +parse.trace}. +@xref{Tracing, ,Tracing Your Parser}. @end deffn @deffn {Directive} %define @var{variable} @@ -5057,7 +5247,7 @@ is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs}, @code{yypstate_new} and @code{yypstate_delete} will also be renamed. For example, if you use @samp{%name-prefix "c_"}, the names become @code{c_parse}, @code{c_lex}, and so on. -For C++ parsers, see the @code{%define namespace} documentation in this +For C++ parsers, see the @samp{%define api.namespace} documentation in this section. @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. @end deffn @@ -5085,7 +5275,7 @@ Specify @var{file} for the parser implementation file. @end deffn @deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{%define +Deprecated version of @samp{%define api.pure} (@pxref{%define Summary,,api.pure}), for which Bison is more careful to warn about unreasonable usage. @end deffn @@ -5208,7 +5398,61 @@ Summary,,%skeleton}). Unaccepted @var{variable}s produce an error. Some of the accepted @var{variable}s are: -@itemize @bullet +@table @code +@c ================================================== api.namespace +@item api.namespace +@findex %define api.namespace +@itemize +@item Languages(s): C++ + +@item Purpose: Specify the namespace for the parser class. +For example, if you specify: + +@example +%define api.namespace "foo::bar" +@end example + +Bison uses @code{foo::bar} verbatim in references such as: + +@example +foo::bar::parser::semantic_type +@end example + +However, to open a namespace, Bison removes any leading @code{::} and then +splits on any remaining occurrences: + +@example +namespace foo @{ namespace bar @{ + class position; + class location; +@} @} +@end example + +@item Accepted Values: +Any absolute or relative C++ namespace reference without a trailing +@code{"::"}. For example, @code{"foo"} or @code{"::foo::bar"}. + +@item Default Value: +The value specified by @code{%name-prefix}, which defaults to @code{yy}. +This usage of @code{%name-prefix} is for backward compatibility and can +be confusing since @code{%name-prefix} also specifies the textual prefix +for the lexical analyzer function. Thus, if you specify +@code{%name-prefix}, it is best to also specify @samp{%define +api.namespace} so that @code{%name-prefix} @emph{only} affects the +lexical analyzer function. For example, if you specify: + +@example +%define api.namespace "foo" +%name-prefix "bar::" +@end example + +The parser namespace is @code{foo} and @code{yylex} is referenced as +@code{bar::lex}. +@end itemize +@c namespace + + + @c ================================================== api.pure @item api.pure @findex %define api.pure @@ -5223,7 +5467,11 @@ Some of the accepted @var{variable}s are: @item Default Value: @code{false} @end itemize +@c api.pure + + +@c ================================================== api.push-pull @item api.push-pull @findex %define api.push-pull @@ -5239,6 +5487,69 @@ More user feedback will help to stabilize it.) @item Default Value: @code{pull} @end itemize +@c api.push-pull + + + +@c ================================================== api.tokens.prefix +@item api.tokens.prefix +@findex %define api.tokens.prefix + +@itemize +@item Languages(s): all + +@item Purpose: +Add a prefix to the token names when generating their definition in the +target language. For instance + +@example +%token FILE for ERROR +%define api.tokens.prefix "TOK_" +%% +start: FILE for ERROR; +@end example + +@noindent +generates the definition of the symbols @code{TOK_FILE}, @code{TOK_for}, +and @code{TOK_ERROR} in the generated source files. In particular, the +scanner must use these prefixed token names, while the grammar itself +may still use the short names (as in the sample rule given above). The +generated informational files (@file{*.output}, @file{*.xml}, +@file{*.dot}) are not modified by this prefix. See @ref{Calc++ Parser} +and @ref{Calc++ Scanner}, for a complete example. + +@item Accepted Values: +Any string. Should be a valid identifier prefix in the target language, +in other words, it should typically be an identifier itself (sequence of +letters, underscores, and ---not at the beginning--- digits). + +@item Default Value: +empty +@end itemize +@c api.tokens.prefix + + +@c ================================================== lex_symbol +@item lex_symbol +@findex %define lex_symbol + +@itemize @bullet +@item Language(s): +C++ + +@item Purpose: +When variant-based semantic values are enabled (@pxref{C++ Variants}), +request that symbols be handled as a whole (type, value, and possibly +location) in the scanner. @xref{Complete Symbols}, for details. + +@item Accepted Values: +Boolean. + +@item Default Value: +@code{false} +@end itemize +@c lex_symbol + @c ================================================== lr.default-reductions @@ -5273,6 +5584,7 @@ remain in the parser tables. @xref{Unreachable States}. @item Accepted Values: Boolean @item Default Value: @code{false} @end itemize +@c lr.keep-unreachable-states @c ================================================== lr.type @@ -5291,57 +5603,59 @@ More user feedback will help to stabilize it.) @item Default Value: @code{lalr} @end itemize + +@c ================================================== namespace @item namespace @findex %define namespace +Obsoleted by @code{api.namespace} +@c namespace -@itemize -@item Languages(s): C++ -@item Purpose: Specify the namespace for the parser class. -For example, if you specify: +@c ================================================== parse.assert +@item parse.assert +@findex %define parse.assert -@smallexample -%define namespace "foo::bar" -@end smallexample +@itemize +@item Languages(s): C++ -Bison uses @code{foo::bar} verbatim in references such as: +@item Purpose: Issue runtime assertions to catch invalid uses. +In C++, when variants are used (@pxref{C++ Variants}), symbols must be +constructed and +destroyed properly. This option checks these constraints. -@smallexample -foo::bar::parser::semantic_type -@end smallexample +@item Accepted Values: Boolean -However, to open a namespace, Bison removes any leading @code{::} and then -splits on any remaining occurrences: +@item Default Value: @code{false} +@end itemize +@c parse.assert -@smallexample -namespace foo @{ namespace bar @{ - class position; - class location; -@} @} -@end smallexample - -@item Accepted Values: Any absolute or relative C++ namespace reference without -a trailing @code{"::"}. -For example, @code{"foo"} or @code{"::foo::bar"}. - -@item Default Value: The value specified by @code{%name-prefix}, which defaults -to @code{yy}. -This usage of @code{%name-prefix} is for backward compatibility and can be -confusing since @code{%name-prefix} also specifies the textual prefix for the -lexical analyzer function. -Thus, if you specify @code{%name-prefix}, it is best to also specify -@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the -lexical analyzer function. -For example, if you specify: -@smallexample -%define namespace "foo" -%name-prefix "bar::" -@end smallexample +@c ================================================== parse.error +@item parse.error +@findex %define parse.error +@itemize +@item Languages(s): +all +@item Purpose: +Control the kind of error messages passed to the error reporting +function. @xref{Error Reporting, ,The Error Reporting Function +@code{yyerror}}. +@item Accepted Values: +@itemize +@item @code{simple} +Error messages passed to @code{yyerror} are simply @w{@code{"syntax +error"}}. +@item @code{verbose} +Error messages report the unexpected token, and possibly the expected ones. +However, this report can often be incorrect when LAC is not enabled +(@pxref{LAC}). +@end itemize -The parser namespace is @code{foo} and @code{yylex} is referenced as -@code{bar::lex}. +@item Default Value: +@code{simple} @end itemize +@c parse.error + @c ================================================== parse.lac @item parse.lac @@ -5355,7 +5669,46 @@ syntax error handling. @xref{LAC}. @item Accepted Values: @code{none}, @code{full} @item Default Value: @code{none} @end itemize +@c parse.lac + +@c ================================================== parse.trace +@item parse.trace +@findex %define parse.trace + +@itemize +@item Languages(s): C, C++ + +@item Purpose: Require parser instrumentation for tracing. +In C/C++, define the macro @code{YYDEBUG} to 1 in the parser implementation +file if it is not already defined, so that the debugging facilities are +compiled. @xref{Tracing, ,Tracing Your Parser}. + +@item Accepted Values: Boolean + +@item Default Value: @code{false} +@end itemize +@c parse.trace + +@c ================================================== variant +@item variant +@findex %define variant + +@itemize @bullet +@item Language(s): +C++ + +@item Purpose: +Request variant-based semantic values. +@xref{C++ Variants}. + +@item Accepted Values: +Boolean. + +@item Default Value: +@code{false} @end itemize +@c variant +@end table @node %code Summary @@ -5401,7 +5754,7 @@ file. Not all qualifiers are accepted for all target languages. Unaccepted qualifiers produce an error. Some of the accepted qualifiers are: -@itemize @bullet +@table @code @item requires @findex %code requires @@ -5465,7 +5818,7 @@ parser implementation file. For example: @item Location(s): The parser Java file after any Java package directive and before any class definitions. @end itemize -@end itemize +@end table Though we say the insertion locations are language-dependent, they are technically skeleton-dependent. Writers of non-standard skeletons @@ -5573,10 +5926,10 @@ If you use a reentrant parser, you can optionally pass additional parameter information to it in a reentrant way. To do so, use the declaration @code{%parse-param}: -@deffn {Directive} %parse-param @{@var{argument-declaration}@} +@deffn {Directive} %parse-param @{@var{argument-declaration}@} @dots{} @findex %parse-param -Declare that an argument declared by the braced-code -@var{argument-declaration} is an additional @code{yyparse} argument. +Declare that one or more +@var{argument-declaration} are additional @code{yyparse} arguments. The @var{argument-declaration} is used when declaring functions or prototypes. The last identifier in @var{argument-declaration} must be the argument name. @@ -5585,8 +5938,7 @@ functions or prototypes. The last identifier in Here's an example. Write this in the parser: @example -%parse-param @{int *nastiness@} -%parse-param @{int *randomness@} +%parse-param @{int *nastiness@} @{int *randomness@} @end example @noindent @@ -5616,8 +5968,8 @@ exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} More user feedback will help to stabilize it.) You call the function @code{yypush_parse} to parse a single token. This -function is available if either the @code{%define api.push-pull push} or -@code{%define api.push-pull both} declaration is used. +function is available if either the @samp{%define api.push-pull push} or +@samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun int yypush_parse (yypstate *yyps) @@ -5634,7 +5986,7 @@ is required to finish parsing the grammar. More user feedback will help to stabilize it.) You call the function @code{yypull_parse} to parse the rest of the input -stream. This function is available if the @code{%define api.push-pull both} +stream. This function is available if the @samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. @@ -5650,11 +6002,11 @@ The value returned by @code{yypull_parse} is the same as for @code{yyparse}. More user feedback will help to stabilize it.) You call the function @code{yypstate_new} to create a new parser instance. -This function is available if either the @code{%define api.push-pull push} or -@code{%define api.push-pull both} declaration is used. +This function is available if either the @samp{%define api.push-pull push} or +@samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. -@deftypefun yypstate *yypstate_new (void) +@deftypefun {yypstate*} yypstate_new (void) The function will return a valid parser instance if there was memory available or 0 if no memory was available. In impure mode, it will also return 0 if a parser instance is currently @@ -5669,8 +6021,8 @@ allocated. More user feedback will help to stabilize it.) You call the function @code{yypstate_delete} to delete a parser instance. -function is available if either the @code{%define api.push-pull push} or -@code{%define api.push-pull both} declaration is used. +function is available if either the @samp{%define api.push-pull push} or +@samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun void yypstate_delete (yypstate *yyps) @@ -5859,7 +6211,7 @@ The data type of @code{yylloc} has the name @code{YYLTYPE}. @node Pure Calling @subsection Calling Conventions for Pure Parsers -When you use the Bison declaration @code{%define api.pure} to request a +When you use the Bison declaration @samp{%define api.pure} to request a pure, reentrant parser, the global communication variables @code{yylval} and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.) In such parsers the two global variables are replaced by @@ -5883,46 +6235,57 @@ textual locations, then the type @code{YYLTYPE} will not be defined. In this case, omit the second argument; @code{yylex} will be called with only one argument. - -If you wish to pass the additional parameter data to @code{yylex}, use +If you wish to pass additional arguments to @code{yylex}, use @code{%lex-param} just like @code{%parse-param} (@pxref{Parser -Function}). +Function}). To pass additional arguments to both @code{yylex} and +@code{yyparse}, use @code{%param}. -@deffn {Directive} lex-param @{@var{argument-declaration}@} +@deffn {Directive} %lex-param @{@var{argument-declaration}@} @dots{} @findex %lex-param -Declare that the braced-code @var{argument-declaration} is an -additional @code{yylex} argument declaration. +Specify that @var{argument-declaration} are additional @code{yylex} argument +declarations. You may pass one or more such declarations, which is +equivalent to repeating @code{%lex-param}. +@end deffn + +@deffn {Directive} %param @{@var{argument-declaration}@} @dots{} +@findex %param +Specify that @var{argument-declaration} are additional +@code{yylex}/@code{yyparse} argument declaration. This is equivalent to +@samp{%lex-param @{@var{argument-declaration}@} @dots{} %parse-param +@{@var{argument-declaration}@} @dots{}}. You may pass one or more +declarations, which is equivalent to repeating @code{%param}. @end deffn For instance: @example -%parse-param @{int *nastiness@} -%lex-param @{int *nastiness@} -%parse-param @{int *randomness@} +%lex-param @{scanner_mode *mode@} +%parse-param @{parser_mode *mode@} +%param @{environment_type *env@} @end example @noindent -results in the following signature: +results in the following signatures: @example -int yylex (int *nastiness); -int yyparse (int *nastiness, int *randomness); +int yylex (scanner_mode *mode, environment_type *env); +int yyparse (parser_mode *mode, environment_type *env); @end example -If @code{%define api.pure} is added: +If @samp{%define api.pure} is added: @example -int yylex (YYSTYPE *lvalp, int *nastiness); -int yyparse (int *nastiness, int *randomness); +int yylex (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env); +int yyparse (parser_mode *mode, environment_type *env); @end example @noindent -and finally, if both @code{%define api.pure} and @code{%locations} are used: +and finally, if both @samp{%define api.pure} and @code{%locations} are used: @example -int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); -int yyparse (int *nastiness, int *randomness); +int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, + scanner_mode *mode, environment_type *env); +int yyparse (parser_mode *mode, environment_type *env); @end example @node Error Reporting @@ -5932,7 +6295,7 @@ int yyparse (int *nastiness, int *randomness); @cindex parse error @cindex syntax error -The Bison parser detects a @dfn{syntax error} or @dfn{parse error} +The Bison parser detects a @dfn{syntax error} (or @dfn{parse error}) whenever it reads a token which cannot satisfy any syntax rule. An action in the grammar can also explicitly proclaim an error, using the macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use @@ -5944,8 +6307,8 @@ called by @code{yyparse} whenever a syntax error is found, and it receives one argument. For a syntax error, the string is normally @w{@code{"syntax error"}}. -@findex %error-verbose -If you invoke the directive @code{%error-verbose} in the Bison declarations +@findex %define parse.error +If you invoke @samp{%define parse.error verbose} in the Bison declarations section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose and specific error message string instead of just plain @w{@code{"syntax error"}}. However, that message sometimes @@ -6003,7 +6366,7 @@ void yyerror (int *nastiness, char const *msg); /* GLR parsers. */ Finally, GLR and Yacc parsers share the same @code{yyerror} calling convention for absolutely pure parsers, i.e., when the calling convention of @code{yylex} @emph{and} the calling convention of -@code{%define api.pure} are pure. +@samp{%define api.pure} are pure. I.e.: @example @@ -6076,17 +6439,17 @@ union specified by the @code{%union} declaration. @xref{Action Types, ,Data Types of Values in Actions}. @end deffn -@deffn {Macro} YYABORT; +@deffn {Macro} YYABORT @code{;} Return immediately from @code{yyparse}, indicating failure. @xref{Parser Function, ,The Parser Function @code{yyparse}}. @end deffn -@deffn {Macro} YYACCEPT; +@deffn {Macro} YYACCEPT @code{;} Return immediately from @code{yyparse}, indicating success. @xref{Parser Function, ,The Parser Function @code{yyparse}}. @end deffn -@deffn {Macro} YYBACKUP (@var{token}, @var{value}); +@deffn {Macro} YYBACKUP (@var{token}, @var{value})@code{;} @findex YYBACKUP Unshift a token. This macro is allowed only for rules that reduce a single value, and only when there is no lookahead token. @@ -6104,18 +6467,15 @@ In either case, the rest of the action is not executed. @end deffn @deffn {Macro} YYEMPTY -@vindex YYEMPTY Value stored in @code{yychar} when there is no lookahead token. @end deffn @deffn {Macro} YYEOF -@vindex YYEOF Value stored in @code{yychar} when the lookahead is the end of the input stream. @end deffn -@deffn {Macro} YYERROR; -@findex YYERROR +@deffn {Macro} YYERROR @code{;} Cause an immediate syntax error. This statement initiates error recovery just as if the parser itself had detected an error; however, it does not call @code{yyerror}, and does not print any message. If you @@ -6139,7 +6499,7 @@ Actions}). @xref{Lookahead, ,Lookahead Tokens}. @end deffn -@deffn {Macro} yyclearin; +@deffn {Macro} yyclearin @code{;} Discard the current lookahead token. This is useful primarily in error rules. Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR @@ -6147,7 +6507,7 @@ Semantic Actions}). @xref{Error Recovery}. @end deffn -@deffn {Macro} yyerrok; +@deffn {Macro} yyerrok @code{;} Resume generating error messages immediately for subsequent syntax errors. This is useful primarily in error rules. @xref{Error Recovery}. @@ -6528,7 +6888,8 @@ shift and when to reduce. @menu * Why Precedence:: An example showing why precedence is needed. -* Using Precedence:: How to specify precedence in Bison grammars. +* Using Precedence:: How to specify precedence and associativity. +* Precedence Only:: How to specify precedence only. * Precedence Examples:: How these features are used in the previous example. * How Precedence:: How they work. @end menu @@ -6584,8 +6945,9 @@ makes right-associativity. @node Using Precedence @subsection Specifying Operator Precedence @findex %left -@findex %right @findex %nonassoc +@findex %precedence +@findex %right Bison allows you to specify these choices with the operator precedence declarations @code{%left} and @code{%right}. Each such declaration @@ -6595,13 +6957,63 @@ those operators left-associative and the @code{%right} declaration makes them right-associative. A third alternative is @code{%nonassoc}, which declares that it is a syntax error to find the same operator twice ``in a row''. +The last alternative, @code{%precedence}, allows to define only +precedence and no associativity at all. As a result, any +associativity-related conflict that remains will be reported as an +compile-time error. The directive @code{%nonassoc} creates run-time +error: using the operator in a associative way is a syntax error. The +directive @code{%precedence} creates compile-time errors: an operator +@emph{can} be involved in an associativity-related conflict, contrary to +what expected the grammar author. The relative precedence of different operators is controlled by the -order in which they are declared. The first @code{%left} or -@code{%right} declaration in the file declares the operators whose +order in which they are declared. The first precedence/associativity +declaration in the file declares the operators whose precedence is lowest, the next such declaration declares the operators whose precedence is a little higher, and so on. +@node Precedence Only +@subsection Specifying Precedence Only +@findex %precedence + +Since POSIX Yacc defines only @code{%left}, @code{%right}, and +@code{%nonassoc}, which all defines precedence and associativity, little +attention is paid to the fact that precedence cannot be defined without +defining associativity. Yet, sometimes, when trying to solve a +conflict, precedence suffices. In such a case, using @code{%left}, +@code{%right}, or @code{%nonassoc} might hide future (associativity +related) conflicts that would remain hidden. + +The dangling @code{else} ambiguity (@pxref{Shift/Reduce, , Shift/Reduce +Conflicts}) can be solved explicitly. This shift/reduce conflicts occurs +in the following situation, where the period denotes the current parsing +state: + +@example +if @var{e1} then if @var{e2} then @var{s1} . else @var{s2} +@end example + +The conflict involves the reduction of the rule @samp{IF expr THEN +stmt}, which precedence is by default that of its last token +(@code{THEN}), and the shifting of the token @code{ELSE}. The usual +disambiguation (attach the @code{else} to the closest @code{if}), +shifting must be preferred, i.e., the precedence of @code{ELSE} must be +higher than that of @code{THEN}. But neither is expected to be involved +in an associativity related conflict, which can be specified as follows. + +@example +%precedence THEN +%precedence ELSE +@end example + +The unary-minus is another typical example where associativity is +usually over-specified, see @ref{Infix Calc, , Infix Notation +Calculator: @code{calc}}. The @code{%left} directive is traditionally +used to declare the precedence of @code{NEG}, which is more than needed +since it also defines its associativity. While this is harmless in the +traditional example, who knows how @code{NEG} might be used in future +evolutions of the grammar@dots{} + @node Precedence Examples @subsection Precedence Examples @@ -6663,8 +7075,8 @@ outlandish at first, but it is really very common. For example, a minus sign typically has a very high precedence as a unary operator, and a somewhat lower precedence (lower than multiplication) as a binary operator. -The Bison precedence declarations, @code{%left}, @code{%right} and -@code{%nonassoc}, can only be used once for a given token; so a token has +The Bison precedence declarations +can only be used once for a given token; so a token has only one precedence declared in this way. For context-dependent precedence, you need to use an additional mechanism: the @code{%prec} modifier for rules. @@ -7009,9 +7421,9 @@ The default behavior of Bison's LR-based parsers is chosen mostly for historical reasons, but that behavior is often not robust. For example, in the previous section, we discussed the mysterious conflicts that can be produced by LALR(1), Bison's default parser table construction algorithm. -Another example is Bison's @code{%error-verbose} directive, which instructs -the generated parser to produce verbose syntax error messages, which can -sometimes contain incorrect information. +Another example is Bison's @code{%define parse.error verbose} directive, +which instructs the generated parser to produce verbose syntax error +messages, which can sometimes contain incorrect information. In this section, we explore several modern features of Bison that allow you to tune fundamental aspects of the generated LR-based parsers. Some of @@ -7502,7 +7914,7 @@ calls @code{yyerror} and then returns 2. Because Bison parsers have growing stacks, hitting the upper limit usually results from using a right recursion instead of a left -recursion, @xref{Recursion, ,Recursive Rules}. +recursion, see @ref{Recursion, ,Recursive Rules}. @vindex YYMAXDEPTH By defining the macro @code{YYMAXDEPTH}, you can control how deep the @@ -7535,12 +7947,14 @@ that allows variable-length arrays. The default is 200. Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}. -@c FIXME: C++ output. -Because of semantic differences between C and C++, the deterministic -parsers in C produced by Bison cannot grow when compiled -by C++ compilers. In this precise case (compiling a C parser as C++) you are -suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix -this deficiency in a future release. +You can generate a deterministic parser containing C++ user code from +the default (C) skeleton, as well as from the C++ skeleton +(@pxref{C++ Parsers}). However, if you do use the default skeleton +and want to allow the parsing stack to grow, +be careful not to use semantic types or location types that require +non-trivial copy constructors. +The C skeleton bypasses these constructors when copying data to +new, larger stacks. @node Error Recovery @chapter Error Recovery @@ -7880,12 +8294,10 @@ clear the flag. @node Debugging @chapter Debugging Your Parser -Developing a parser can be a challenge, especially if you don't -understand the algorithm (@pxref{Algorithm, ,The Bison Parser -Algorithm}). Even so, sometimes a detailed description of the automaton -can help (@pxref{Understanding, , Understanding Your Parser}), or -tracing the execution of the parser can give some insight on why it -behaves improperly (@pxref{Tracing, , Tracing Your Parser}). +Developing a parser can be a challenge, especially if you don't understand +the algorithm (@pxref{Algorithm, ,The Bison Parser Algorithm}). This +chapter explains how to generate and read the detailed description of the +automaton, and how to enable and understand the parser run-time traces. @menu * Understanding:: Understanding the structure of your parser. @@ -7902,7 +8314,7 @@ tune or simply fix a parser. Bison provides two different representation of it, either textually or graphically (as a DOT file). The textual file is generated when the options @option{--report} or -@option{--verbose} are specified, see @xref{Invocation, , Invoking +@option{--verbose} are specified, see @ref{Invocation, , Invoking Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from the parser implementation file name, and adding @samp{.output} instead. Therefore, if the grammar file is @file{foo.y}, then the @@ -7942,14 +8354,27 @@ creates a file @file{calc.output} with contents detailed below. The order of the output and the exact presentation might vary, but the interpretation is the same. -The first section includes details on conflicts that were solved thanks -to precedence and/or associativity: +@noindent +@cindex token, useless +@cindex useless token +@cindex nonterminal, useless +@cindex useless nonterminal +@cindex rule, useless +@cindex useless rule +The first section reports useless tokens, nonterminals and rules. Useless +nonterminals and rules are removed in order to produce a smaller parser, but +useless tokens are preserved, since they might be used by the scanner (note +the difference between ``useless'' and ``unused'' below): @example -Conflict in state 8 between rule 2 and token '+' resolved as reduce. -Conflict in state 8 between rule 2 and token '-' resolved as reduce. -Conflict in state 8 between rule 2 and token '*' resolved as shift. -@exdent @dots{} +Nonterminals useless in grammar + useless + +Terminals unused in grammar + STR + +Rules useless in grammar + 6 useless: STR @end example @noindent @@ -7963,50 +8388,26 @@ State 11 conflicts: 4 shift/reduce @end example @noindent -@cindex token, useless -@cindex useless token -@cindex nonterminal, useless -@cindex useless nonterminal -@cindex rule, useless -@cindex useless rule -The next section reports useless tokens, nonterminal and rules. Useless -nonterminals and rules are removed in order to produce a smaller parser, -but useless tokens are preserved, since they might be used by the -scanner (note the difference between ``useless'' and ``unused'' -below): +Then Bison reproduces the exact grammar it used: @example -Nonterminals useless in grammar: - useless +Grammar -Terminals unused in grammar: - STR + 0 $accept: exp $end -Rules useless in grammar: -#6 useless: STR; + 1 exp: exp '+' exp + 2 | exp '-' exp + 3 | exp '*' exp + 4 | exp '/' exp + 5 | NUM @end example @noindent -The next section reproduces the exact grammar that Bison used: +and reports the uses of the symbols: @example -Grammar - - Number, Line, Rule - 0 5 $accept -> exp $end - 1 5 exp -> exp '+' exp - 2 6 exp -> exp '-' exp - 3 7 exp -> exp '*' exp - 4 8 exp -> exp '/' exp - 5 9 exp -> NUM -@end example - -@noindent -and reports the uses of the symbols: - -@example -@group -Terminals, with rules where they appear +@group +Terminals, with rules where they appear $end (0) 0 '*' (42) 3 @@ -8015,14 +8416,15 @@ $end (0) 0 '/' (47) 4 error (256) NUM (258) 5 +STR (259) @end group @group Nonterminals, with rules where they appear -$accept (8) +$accept (9) on left: 0 -exp (9) +exp (10) on left: 1 2 3 4 5, on right: 0 1 2 3 4 @end group @end example @@ -8039,11 +8441,11 @@ the location of the input cursor. @example state 0 - $accept -> . exp $ (rule 0) + 0 $accept: . exp $end - NUM shift, and go to state 1 + NUM shift, and go to state 1 - exp go to state 2 + exp go to state 2 @end example This reads as follows: ``state 0 corresponds to being at the very @@ -8069,27 +8471,27 @@ you want to see more detail you can invoke @command{bison} with @example state 0 - $accept -> . exp $ (rule 0) - exp -> . exp '+' exp (rule 1) - exp -> . exp '-' exp (rule 2) - exp -> . exp '*' exp (rule 3) - exp -> . exp '/' exp (rule 4) - exp -> . NUM (rule 5) + 0 $accept: . exp $end + 1 exp: . exp '+' exp + 2 | . exp '-' exp + 3 | . exp '*' exp + 4 | . exp '/' exp + 5 | . NUM - NUM shift, and go to state 1 + NUM shift, and go to state 1 - exp go to state 2 + exp go to state 2 @end example @noindent -In the state 1... +In the state 1@dots{} @example state 1 - exp -> NUM . (rule 5) + 5 exp: NUM . - $default reduce using rule 5 (exp) + $default reduce using rule 5 (exp) @end example @noindent @@ -8101,24 +8503,24 @@ jump to state 2 (@samp{exp: go to state 2}). @example state 2 - $accept -> exp . $ (rule 0) - exp -> exp . '+' exp (rule 1) - exp -> exp . '-' exp (rule 2) - exp -> exp . '*' exp (rule 3) - exp -> exp . '/' exp (rule 4) + 0 $accept: exp . $end + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp - $ shift, and go to state 3 - '+' shift, and go to state 4 - '-' shift, and go to state 5 - '*' shift, and go to state 6 - '/' shift, and go to state 7 + $end shift, and go to state 3 + '+' shift, and go to state 4 + '-' shift, and go to state 5 + '*' shift, and go to state 6 + '/' shift, and go to state 7 @end example @noindent In state 2, the automaton can only shift a symbol. For instance, -because of the item @samp{exp -> exp . '+' exp}, if the lookahead is +because of the item @samp{exp: exp . '+' exp}, if the lookahead is @samp{+} it is shifted onto the parse stack, and the automaton -jumps to state 4, corresponding to the item @samp{exp -> exp '+' . exp}. +jumps to state 4, corresponding to the item @samp{exp: exp '+' . exp}. Since there is no default action, any lookahead not listed triggers a syntax error. @@ -8129,14 +8531,14 @@ state}: @example state 3 - $accept -> exp $ . (rule 0) + 0 $accept: exp $end . - $default accept + $default accept @end example @noindent -the initial rule is completed (the start symbol and the end -of input were read), the parsing exits successfully. +the initial rule is completed (the start symbol and the end-of-input were +read), the parsing exits successfully. The interpretation of states 4 to 7 is straightforward, and is left to the reader. @@ -8144,35 +8546,38 @@ the reader. @example state 4 - exp -> exp '+' . exp (rule 1) + 1 exp: exp '+' . exp - NUM shift, and go to state 1 + NUM shift, and go to state 1 + + exp go to state 8 - exp go to state 8 state 5 - exp -> exp '-' . exp (rule 2) + 2 exp: exp '-' . exp + + NUM shift, and go to state 1 - NUM shift, and go to state 1 + exp go to state 9 - exp go to state 9 state 6 - exp -> exp '*' . exp (rule 3) + 3 exp: exp '*' . exp + + NUM shift, and go to state 1 - NUM shift, and go to state 1 + exp go to state 10 - exp go to state 10 state 7 - exp -> exp '/' . exp (rule 4) + 4 exp: exp '/' . exp - NUM shift, and go to state 1 + NUM shift, and go to state 1 - exp go to state 11 + exp go to state 11 @end example As was announced in beginning of the report, @samp{State 8 conflicts: @@ -8181,17 +8586,17 @@ As was announced in beginning of the report, @samp{State 8 conflicts: @example state 8 - exp -> exp . '+' exp (rule 1) - exp -> exp '+' exp . (rule 1) - exp -> exp . '-' exp (rule 2) - exp -> exp . '*' exp (rule 3) - exp -> exp . '/' exp (rule 4) + 1 exp: exp . '+' exp + 1 | exp '+' exp . + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp - '*' shift, and go to state 6 - '/' shift, and go to state 7 + '*' shift, and go to state 6 + '/' shift, and go to state 7 - '/' [reduce using rule 1 (exp)] - $default reduce using rule 1 (exp) + '/' [reduce using rule 1 (exp)] + $default reduce using rule 1 (exp) @end example Indeed, there are two actions associated to the lookahead @samp{/}: @@ -8205,7 +8610,7 @@ NUM}, which corresponds to reducing rule 1. Because in deterministic parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, , -Shift/Reduce Conflicts}. Discarded actions are reported in between +Shift/Reduce Conflicts}. Discarded actions are reported between square brackets. Note that all the previous states had a single possible action: either @@ -8224,72 +8629,85 @@ with some set of possible lookahead tokens. When run with @example state 8 - exp -> exp . '+' exp (rule 1) - exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1) - exp -> exp . '-' exp (rule 2) - exp -> exp . '*' exp (rule 3) - exp -> exp . '/' exp (rule 4) + 1 exp: exp . '+' exp + 1 | exp '+' exp . [$end, '+', '-', '/'] + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp - '*' shift, and go to state 6 - '/' shift, and go to state 7 + '*' shift, and go to state 6 + '/' shift, and go to state 7 - '/' [reduce using rule 1 (exp)] - $default reduce using rule 1 (exp) + '/' [reduce using rule 1 (exp)] + $default reduce using rule 1 (exp) @end example +Note however that while @samp{NUM + NUM / NUM} is ambiguous (which results in +the conflicts on @samp{/}), @samp{NUM + NUM * NUM} is not: the conflict was +solved thanks to associativity and precedence directives. If invoked with +@option{--report=solved}, Bison includes information about the solved +conflicts in the report: + +@example +Conflict between rule 1 and token '+' resolved as reduce (%left '+'). +Conflict between rule 1 and token '-' resolved as reduce (%left '-'). +Conflict between rule 1 and token '*' resolved as shift ('+' < '*'). +@end example + + The remaining states are similar: @example @group state 9 - exp -> exp . '+' exp (rule 1) - exp -> exp . '-' exp (rule 2) - exp -> exp '-' exp . (rule 2) - exp -> exp . '*' exp (rule 3) - exp -> exp . '/' exp (rule 4) + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 2 | exp '-' exp . + 3 | exp . '*' exp + 4 | exp . '/' exp - '*' shift, and go to state 6 - '/' shift, and go to state 7 + '*' shift, and go to state 6 + '/' shift, and go to state 7 - '/' [reduce using rule 2 (exp)] - $default reduce using rule 2 (exp) + '/' [reduce using rule 2 (exp)] + $default reduce using rule 2 (exp) @end group @group state 10 - exp -> exp . '+' exp (rule 1) - exp -> exp . '-' exp (rule 2) - exp -> exp . '*' exp (rule 3) - exp -> exp '*' exp . (rule 3) - exp -> exp . '/' exp (rule 4) + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 3 | exp . '*' exp + 3 | exp '*' exp . + 4 | exp . '/' exp - '/' shift, and go to state 7 + '/' shift, and go to state 7 - '/' [reduce using rule 3 (exp)] - $default reduce using rule 3 (exp) + '/' [reduce using rule 3 (exp)] + $default reduce using rule 3 (exp) @end group @group state 11 - exp -> exp . '+' exp (rule 1) - exp -> exp . '-' exp (rule 2) - exp -> exp . '*' exp (rule 3) - exp -> exp . '/' exp (rule 4) - exp -> exp '/' exp . (rule 4) + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp + 4 | exp '/' exp . - '+' shift, and go to state 4 - '-' shift, and go to state 5 - '*' shift, and go to state 6 - '/' shift, and go to state 7 + '+' shift, and go to state 4 + '-' shift, and go to state 5 + '*' shift, and go to state 6 + '/' shift, and go to state 7 - '+' [reduce using rule 4 (exp)] - '-' [reduce using rule 4 (exp)] - '*' [reduce using rule 4 (exp)] - '/' [reduce using rule 4 (exp)] - $default reduce using rule 4 (exp) + '+' [reduce using rule 4 (exp)] + '-' [reduce using rule 4 (exp)] + '*' [reduce using rule 4 (exp)] + '/' [reduce using rule 4 (exp)] + $default reduce using rule 4 (exp) @end group @end example @@ -8306,9 +8724,17 @@ associativity of @samp{/} is not specified. @cindex debugging @cindex tracing the parser -If a Bison grammar compiles properly but doesn't do what you want when it -runs, the @code{yydebug} parser-trace feature can help you figure out why. +When a Bison grammar compiles properly but parses ``incorrectly'', the +@code{yydebug} parser-trace feature helps figuring out why. +@menu +* Enabling Traces:: Activating run-time trace support +* Mfcalc Traces:: Extending @code{mfcalc} to support traces +* The YYPRINT Macro:: Obsolete interface for semantic value reports +@end menu + +@node Enabling Traces +@subsection Enabling Traces There are several means to enable compilation of trace facilities: @table @asis @@ -8326,17 +8752,23 @@ Use the @samp{-t} option when you run Bison (@pxref{Invocation, @item the directive @samp{%debug} @findex %debug -Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison -Declaration Summary}). This is a Bison extension, which will prove -useful when Bison will output parsers for languages that don't use a -preprocessor. Unless POSIX and Yacc portability matter to -you, this is -the preferred solution. +Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison Declaration +Summary}). This Bison extension is maintained for backward +compatibility with previous versions of Bison. + +@item the variable @samp{parse.trace} +@findex %define parse.trace +Add the @samp{%define parse.trace} directive (@pxref{%define +Summary,,parse.trace}), or pass the @option{-Dparse.trace} option +(@pxref{Bison Options}). This is a Bison extension, which is especially +useful for languages that don't use a preprocessor. Unless POSIX and Yacc +portability matter to you, this is the preferred solution. @end table -We suggest that you always enable the debug option so that debugging is +We suggest that you always enable the trace option so that debugging is always possible. +@findex YYFPRINTF The trace facility outputs messages with macro calls of the form @code{YYFPRINTF (stderr, @var{format}, @var{args})} where @var{format} and @var{args} are the usual @code{printf} format and variadic @@ -8366,9 +8798,9 @@ Each time a rule is reduced, which rule it is, and the complete contents of the state stack afterward. @end itemize -To make sense of this information, it helps to refer to the listing file -produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking -Bison}). This file shows the meaning of each state in terms of +To make sense of this information, it helps to refer to the automaton +description file (@pxref{Understanding, ,Understanding Your Parser}). +This file shows the meaning of each state in terms of positions in various rules, and also what each state will do with each possible input token. As you read the successive trace messages, you can see that the parser is functioning according to its specification in @@ -8376,19 +8808,197 @@ the listing file. Eventually you will arrive at the place where something undesirable happens, and you will see which parts of the grammar are to blame. -The parser implementation file is a C program and you can use C +The parser implementation file is a C/C++/Java program and you can use debuggers on it, but it's not easy to interpret what it is doing. The parser function is a finite-state machine interpreter, and aside from the actions it executes the same code over and over. Only the values of variables show where in the grammar it is working. +@node Mfcalc Traces +@subsection Enabling Debug Traces for @code{mfcalc} + +The debugging information normally gives the token type of each token read, +but not its semantic value. The @code{%printer} directive allows specify +how semantic values are reported, see @ref{Printer Decl, , Printing +Semantic Values}. For backward compatibility, Yacc like C parsers may also +use the @code{YYPRINT} (@pxref{The YYPRINT Macro, , The @code{YYPRINT} +Macro}), but its use is discouraged. + +As a demonstration of @code{%printer}, consider the multi-function +calculator, @code{mfcalc} (@pxref{Multi-function Calc}). To enable run-time +traces, and semantic value reports, insert the following directives in its +prologue: + +@comment file: mfcalc.y: 2 +@example +/* Generate the parser description file. */ +%verbose +/* Enable run-time traces (yydebug). */ +%define parse.trace + +/* Formatting semantic values. */ +%printer @{ fprintf (yyoutput, "%s", $$->name); @} VAR; +%printer @{ fprintf (yyoutput, "%s()", $$->name); @} FNCT; +%printer @{ fprintf (yyoutput, "%g", $$); @} ; +@end example + +The @code{%define} directive instructs Bison to generate run-time trace +support. Then, activation of these traces is controlled at run-time by the +@code{yydebug} variable, which is disabled by default. Because these traces +will refer to the ``states'' of the parser, it is helpful to ask for the +creation of a description of that parser; this is the purpose of (admittedly +ill-named) @code{%verbose} directive. + +The set of @code{%printer} directives demonstrates how to format the +semantic value in the traces. Note that the specification can be done +either on the symbol type (e.g., @code{VAR} or @code{FNCT}), or on the type +tag: since @code{} is the type for both @code{NUM} and @code{exp}, this +printer will be used for them. + +Here is a sample of the information provided by run-time traces. The traces +are sent onto standard error. + +@example +$ @kbd{echo 'sin(1-1)' | ./mfcalc -p} +Starting parse +Entering state 0 +Reducing stack by rule 1 (line 34): +-> $$ = nterm input () +Stack now 0 +Entering state 1 +@end example + +@noindent +This first batch shows a specific feature of this grammar: the first rule +(which is in line 34 of @file{mfcalc.y} can be reduced without even having +to look for the first token. The resulting left-hand symbol (@code{$$}) is +a valueless (@samp{()}) @code{input} non terminal (@code{nterm}). + +Then the parser calls the scanner. +@example +Reading a token: Next token is token FNCT (sin()) +Shifting token FNCT (sin()) +Entering state 6 +@end example + +@noindent +That token (@code{token}) is a function (@code{FNCT}) whose value is +@samp{sin} as formatted per our @code{%printer} specification: @samp{sin()}. +The parser stores (@code{Shifting}) that token, and others, until it can do +something about it. + +@example +Reading a token: Next token is token '(' () +Shifting token '(' () +Entering state 14 +Reading a token: Next token is token NUM (1.000000) +Shifting token NUM (1.000000) +Entering state 4 +Reducing stack by rule 6 (line 44): + $1 = token NUM (1.000000) +-> $$ = nterm exp (1.000000) +Stack now 0 1 6 14 +Entering state 24 +@end example + +@noindent +The previous reduction demonstrates the @code{%printer} directive for +@code{}: both the token @code{NUM} and the resulting non-terminal +@code{exp} have @samp{1} as value. + +@example +Reading a token: Next token is token '-' () +Shifting token '-' () +Entering state 17 +Reading a token: Next token is token NUM (1.000000) +Shifting token NUM (1.000000) +Entering state 4 +Reducing stack by rule 6 (line 44): + $1 = token NUM (1.000000) +-> $$ = nterm exp (1.000000) +Stack now 0 1 6 14 24 17 +Entering state 26 +Reading a token: Next token is token ')' () +Reducing stack by rule 11 (line 49): + $1 = nterm exp (1.000000) + $2 = token '-' () + $3 = nterm exp (1.000000) +-> $$ = nterm exp (0.000000) +Stack now 0 1 6 14 +Entering state 24 +@end example + +@noindent +The rule for the subtraction was just reduced. The parser is about to +discover the end of the call to @code{sin}. + +@example +Next token is token ')' () +Shifting token ')' () +Entering state 31 +Reducing stack by rule 9 (line 47): + $1 = token FNCT (sin()) + $2 = token '(' () + $3 = nterm exp (0.000000) + $4 = token ')' () +-> $$ = nterm exp (0.000000) +Stack now 0 1 +Entering state 11 +@end example + +@noindent +Finally, the end-of-line allow the parser to complete the computation, and +display its result. + +@example +Reading a token: Next token is token '\n' () +Shifting token '\n' () +Entering state 22 +Reducing stack by rule 4 (line 40): + $1 = nterm exp (0.000000) + $2 = token '\n' () +@result{} 0 +-> $$ = nterm line () +Stack now 0 1 +Entering state 10 +Reducing stack by rule 2 (line 35): + $1 = nterm input () + $2 = nterm line () +-> $$ = nterm input () +Stack now 0 +Entering state 1 +@end example + +The parser has returned into state 1, in which it is waiting for the next +expression to evaluate, or for the end-of-file token, which causes the +completion of the parsing. + +@example +Reading a token: Now at end of input. +Shifting token $end () +Entering state 2 +Stack now 0 1 2 +Cleanup: popping token $end () +Cleanup: popping nterm input () +@end example + + +@node The YYPRINT Macro +@subsection The @code{YYPRINT} Macro + @findex YYPRINT -The debugging information normally gives the token type of each token -read, but not its semantic value. You can optionally define a macro -named @code{YYPRINT} to provide a way to print the value. If you define -@code{YYPRINT}, it should take three arguments. The parser will pass a -standard I/O stream, the numeric code for the token type, and the token -value (from @code{yylval}). +Before @code{%printer} support, semantic values could be displayed using the +@code{YYPRINT} macro, which works only for terminal symbols and only with +the @file{yacc.c} skeleton. + +@deffn {Macro} YYPRINT (@var{stream}, @var{token}, @var{value}); +@findex YYPRINT +If you define @code{YYPRINT}, it should take three arguments. The parser +will pass a standard I/O stream, the numeric code for the token type, and +the token value (from @code{yylval}). + +For @file{yacc.c} only. Obsoleted by @code{%printer}. +@end deffn Here is an example of @code{YYPRINT} suitable for the multi-function calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}): @@ -8396,8 +9006,8 @@ calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}): @example %@{ static void print_token_value (FILE *, int, YYSTYPE); - #define YYPRINT(file, type, value) \ - print_token_value (file, type, value) + #define YYPRINT(File, Type, Value) \ + print_token_value (File, Type, Value) %@} @dots{} %% @dots{} %% @dots{} @@ -8806,16 +9416,16 @@ The C++ deterministic parser is selected using the skeleton directive, When run, @command{bison} will create several entities in the @samp{yy} namespace. -@findex %define namespace -Use the @samp{%define namespace} directive to change the namespace -name, see @ref{%define Summary,,namespace}. The various classes are -generated in the following files: +@findex %define api.namespace +Use the @samp{%define api.namespace} directive to change the namespace name, +see @ref{%define Summary,,api.namespace}. The various classes are generated +in the following files: @table @file @item position.hh @itemx location.hh The definition of the classes @code{position} and @code{location}, -used for location tracking. @xref{C++ Location Values}. +used for location tracking when enabled. @xref{C++ Location Values}. @item stack.hh An auxiliary class @code{stack} used by the parser. @@ -8841,11 +9451,22 @@ for a complete and accurate documentation. @c - YYSTYPE @c - Printer and destructor +Bison supports two different means to handle semantic values in C++. One is +alike the C interface, and relies on unions (@pxref{C++ Unions}). As C++ +practitioners know, unions are inconvenient in C++, therefore another +approach is provided, based on variants (@pxref{C++ Variants}). + +@menu +* C++ Unions:: Semantic values cannot be objects +* C++ Variants:: Using objects as semantic values +@end menu + +@node C++ Unions +@subsubsection C++ Unions + The @code{%union} directive works as for C, see @ref{Union Decl, ,The Collection of Value Types}. In particular it produces a genuine -@code{union}@footnote{In the future techniques to allow complex types -within pseudo-unions (similar to Boost variants) might be implemented to -alleviate these issues.}, which have a few specific features in C++. +@code{union}, which have a few specific features in C++. @itemize @minus @item The type @code{YYSTYPE} is defined but its use is discouraged: rather @@ -8862,6 +9483,98 @@ reclaimed automatically: using the @code{%destructor} directive is the only means to avoid leaks. @xref{Destructor Decl, , Freeing Discarded Symbols}. +@node C++ Variants +@subsubsection C++ Variants + +Starting with version 2.6, Bison provides a @emph{variant} based +implementation of semantic values for C++. This alleviates all the +limitations reported in the previous section, and in particular, object +types can be used without pointers. + +To enable variant-based semantic values, set @code{%define} variable +@code{variant} (@pxref{%define Summary,, variant}). Once this defined, +@code{%union} is ignored, and instead of using the name of the fields of the +@code{%union} to ``type'' the symbols, use genuine types. + +For instance, instead of + +@example +%union +@{ + int ival; + std::string* sval; +@} +%token NUMBER; +%token STRING; +@end example + +@noindent +write + +@example +%token NUMBER; +%token STRING; +@end example + +@code{STRING} is no longer a pointer, which should fairly simplify the user +actions in the grammar and in the scanner (in particular the memory +management). + +Since C++ features destructors, and since it is customary to specialize +@code{operator<<} to support uniform printing of values, variants also +typically simplify Bison printers and destructors. + +Variants are stricter than unions. When based on unions, you may play any +dirty game with @code{yylval}, say storing an @code{int}, reading a +@code{char*}, and then storing a @code{double} in it. This is no longer +possible with variants: they must be initialized, then assigned to, and +eventually, destroyed. + +@deftypemethod {semantic_type} {T&} build () +Initialize, but leave empty. Returns the address where the actual value may +be stored. Requires that the variant was not initialized yet. +@end deftypemethod + +@deftypemethod {semantic_type} {T&} build (const T& @var{t}) +Initialize, and copy-construct from @var{t}. +@end deftypemethod + + +@strong{Warning}: We do not use Boost.Variant, for two reasons. First, it +appeared unacceptable to require Boost on the user's machine (i.e., the +machine on which the generated parser will be compiled, not the machine on +which @command{bison} was run). Second, for each possible semantic value, +Boost.Variant not only stores the value, but also a tag specifying its +type. But the parser already ``knows'' the type of the semantic value, so +that would be duplicating the information. + +Therefore we developed light-weight variants whose type tag is external (so +they are really like @code{unions} for C++ actually). But our code is much +less mature that Boost.Variant. So there is a number of limitations in +(the current implementation of) variants: +@itemize +@item +Alignment must be enforced: values should be aligned in memory according to +the most demanding type. Computing the smallest alignment possible requires +meta-programming techniques that are not currently implemented in Bison, and +therefore, since, as far as we know, @code{double} is the most demanding +type on all platforms, alignments are enforced for @code{double} whatever +types are actually used. This may waste space in some cases. + +@item +Our implementation is not conforming with strict aliasing rules. Alias +analysis is a technique used in optimizing compilers to detect when two +pointers are disjoint (they cannot ``meet''). Our implementation breaks +some of the rules that G++ 4.4 uses in its alias analysis, so @emph{strict +alias analysis must be disabled}. Use the option +@option{-fno-strict-aliasing} to compile the generated parser. + +@item +There might be portability issues we are not aware of. +@end itemize + +As far as we know, these limitations @emph{can} be alleviated. All it takes +is some time and/or some talented C++ hacker willing to contribute to Bison. @node C++ Location Values @subsection C++ Location Values @@ -8876,55 +9589,98 @@ define a @code{position}, a single point in a file, and a @code{location}, a range composed of a pair of @code{position}s (possibly spanning several files). -@deftypemethod {position} {std::string*} file +@tindex uint +In this section @code{uint} is an abbreviation for @code{unsigned int}: in +genuine code only the latter is used. + +@menu +* C++ position:: One point in the source file +* C++ location:: Two points in the source file +@end menu + +@node C++ position +@subsubsection C++ @code{position} + +@deftypeop {Constructor} {position} {} position (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) +Create a @code{position} denoting a given point. Note that @code{file} is +not reclaimed when the @code{position} is destroyed: memory managed must be +handled elsewhere. +@end deftypeop + +@deftypemethod {position} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) +Reset the position to the given values. +@end deftypemethod + +@deftypeivar {position} {std::string*} file The name of the file. It will always be handled as a pointer, the parser will never duplicate nor deallocate it. As an experimental feature you may change it to @samp{@var{type}*} using @samp{%define filename_type "@var{type}"}. -@end deftypemethod +@end deftypeivar -@deftypemethod {position} {unsigned int} line +@deftypeivar {position} {uint} line The line, starting at 1. -@end deftypemethod +@end deftypeivar -@deftypemethod {position} {unsigned int} lines (int @var{height} = 1) +@deftypemethod {position} {uint} lines (int @var{height} = 1) Advance by @var{height} lines, resetting the column number. @end deftypemethod -@deftypemethod {position} {unsigned int} column -The column, starting at 0. -@end deftypemethod +@deftypeivar {position} {uint} column +The column, starting at 1. +@end deftypeivar -@deftypemethod {position} {unsigned int} columns (int @var{width} = 1) +@deftypemethod {position} {uint} columns (int @var{width} = 1) Advance by @var{width} columns, without changing the line number. @end deftypemethod -@deftypemethod {position} {position&} operator+= (position& @var{pos}, int @var{width}) -@deftypemethodx {position} {position} operator+ (const position& @var{pos}, int @var{width}) -@deftypemethodx {position} {position&} operator-= (const position& @var{pos}, int @var{width}) -@deftypemethodx {position} {position} operator- (position& @var{pos}, int @var{width}) +@deftypemethod {position} {position&} operator+= (int @var{width}) +@deftypemethodx {position} {position} operator+ (int @var{width}) +@deftypemethodx {position} {position&} operator-= (int @var{width}) +@deftypemethodx {position} {position} operator- (int @var{width}) Various forms of syntactic sugar for @code{columns}. @end deftypemethod -@deftypemethod {position} {position} operator<< (std::ostream @var{o}, const position& @var{p}) +@deftypemethod {position} {bool} operator== (const position& @var{that}) +@deftypemethodx {position} {bool} operator!= (const position& @var{that}) +Whether @code{*this} and @code{that} denote equal/different positions. +@end deftypemethod + +@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const position& @var{p}) Report @var{p} on @var{o} like this: @samp{@var{file}:@var{line}.@var{column}}, or @samp{@var{line}.@var{column}} if @var{file} is null. +@end deftypefun + +@node C++ location +@subsubsection C++ @code{location} + +@deftypeop {Constructor} {location} {} location (const position& @var{begin}, const position& @var{end}) +Create a @code{Location} from the endpoints of the range. +@end deftypeop + +@deftypeop {Constructor} {location} {} location (const position& @var{pos} = position()) +@deftypeopx {Constructor} {location} {} location (std::string* @var{file}, uint @var{line}, uint @var{col}) +Create a @code{Location} denoting an empty range located at a given point. +@end deftypeop + +@deftypemethod {location} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) +Reset the location to an empty range at the given values. @end deftypemethod -@deftypemethod {location} {position} begin -@deftypemethodx {location} {position} end +@deftypeivar {location} {position} begin +@deftypeivarx {location} {position} end The first, inclusive, position of the range, and the first beyond. -@end deftypemethod +@end deftypeivar -@deftypemethod {location} {unsigned int} columns (int @var{width} = 1) -@deftypemethodx {location} {unsigned int} lines (int @var{height} = 1) +@deftypemethod {location} {uint} columns (int @var{width} = 1) +@deftypemethodx {location} {uint} lines (int @var{height} = 1) Advance the @code{end} position. @end deftypemethod -@deftypemethod {location} {location} operator+ (const location& @var{begin}, const location& @var{end}) -@deftypemethodx {location} {location} operator+ (const location& @var{begin}, int @var{width}) -@deftypemethodx {location} {location} operator+= (const location& @var{loc}, int @var{width}) +@deftypemethod {location} {location} operator+ (const location& @var{end}) +@deftypemethodx {location} {location} operator+ (int @var{width}) +@deftypemethodx {location} {location} operator+= (int @var{width}) Various forms of syntactic sugar. @end deftypemethod @@ -8932,6 +9688,16 @@ Various forms of syntactic sugar. Move @code{begin} onto @code{end}. @end deftypemethod +@deftypemethod {location} {bool} operator== (const location& @var{that}) +@deftypemethodx {location} {bool} operator!= (const location& @var{that}) +Whether @code{*this} and @code{that} denote equal/different ranges of +positions. +@end deftypemethod + +@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const location& @var{p}) +Report @var{p} on @var{o}, taking care of special cases such as: no +@code{filename} defined, or equal filename/line or column. +@end deftypefun @node C++ Parser Interface @subsection C++ Parser Interface @@ -8952,7 +9718,7 @@ additional argument for its constructor. @defcv {Type} {parser} {semantic_type} @defcvx {Type} {parser} {location_type} -The types for semantics value and locations. +The types for semantic values and locations (if enabled). @end defcv @defcv {Type} {parser} {token} @@ -8963,11 +9729,27 @@ use @code{yy::parser::token::FOO}. The scanner can use (@pxref{Calc++ Scanner}). @end defcv +@defcv {Type} {parser} {syntax_error} +This class derives from @code{std::runtime_error}. Throw instances of it +from the scanner or from the user actions to raise parse errors. This is +equivalent with first +invoking @code{error} to report the location and message of the syntax +error, and then to invoke @code{YYERROR} to enter the error-recovery mode. +But contrary to @code{YYERROR} which can only be invoked from user actions +(i.e., written in the action itself), the exception can be thrown from +function invoked from the user action. +@end defcv + @deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...) Build a new parser object. There are no arguments by default, unless @samp{%parse-param @{@var{type1} @var{arg1}@}} was used. @end deftypemethod +@deftypemethod {syntax_error} {} syntax_error (const location_type& @var{l}, const std::string& @var{m}) +@deftypemethodx {syntax_error} {} syntax_error (const std::string& @var{m}) +Instantiate a syntax-error exception. +@end deftypemethod + @deftypemethod {parser} {int} parse () Run the syntactic analysis, and return 0 on success, 1 otherwise. @end deftypemethod @@ -8985,9 +9767,11 @@ or nonzero, full tracing. @end deftypemethod @deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m}) +@deftypemethodx {parser} {void} error (const std::string& @var{m}) The definition for this member function must be supplied by the user: the parser uses it to report a parser error occurring at @var{l}, -described by @var{m}. +described by @var{m}. If location tracking is not enabled, the second +signature is used. @end deftypemethod @@ -8999,25 +9783,143 @@ described by @var{m}. The parser invokes the scanner by calling @code{yylex}. Contrary to C parsers, C++ parsers are always pure: there is no point in using the -@code{%define api.pure} directive. Therefore the interface is as follows. +@samp{%define api.pure} directive. The actual interface with @code{yylex} +depends whether you use unions, or variants. + +@menu +* Split Symbols:: Passing symbols as two/three components +* Complete Symbols:: Making symbols a whole +@end menu + +@node Split Symbols +@subsubsection Split Symbols + +Therefore the interface is as follows. @deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...) -Return the next token. Its type is the return value, its semantic -value and location being @var{yylval} and @var{yylloc}. Invocations of +@deftypemethodx {parser} {int} yylex (semantic_type* @var{yylval}, @var{type1} @var{arg1}, ...) +Return the next token. Its type is the return value, its semantic value and +location (if enabled) being @var{yylval} and @var{yylloc}. Invocations of @samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments. @end deftypemethod +Note that when using variants, the interface for @code{yylex} is the same, +but @code{yylval} is handled differently. + +Regular union-based code in Lex scanner typically look like: + +@example +[0-9]+ @{ + yylval.ival = text_to_int (yytext); + return yy::parser::INTEGER; + @} +[a-z]+ @{ + yylval.sval = new std::string (yytext); + return yy::parser::IDENTIFIER; + @} +@end example + +Using variants, @code{yylval} is already constructed, but it is not +initialized. So the code would look like: + +@example +[0-9]+ @{ + yylval.build() = text_to_int (yytext); + return yy::parser::INTEGER; + @} +[a-z]+ @{ + yylval.build = yytext; + return yy::parser::IDENTIFIER; + @} +@end example + +@noindent +or + +@example +[0-9]+ @{ + yylval.build(text_to_int (yytext)); + return yy::parser::INTEGER; + @} +[a-z]+ @{ + yylval.build(yytext); + return yy::parser::IDENTIFIER; + @} +@end example + + +@node Complete Symbols +@subsubsection Complete Symbols + +If you specified both @code{%define variant} and @code{%define lex_symbol}, +the @code{parser} class also defines the class @code{parser::symbol_type} +which defines a @emph{complete} symbol, aggregating its type (i.e., the +traditional value returned by @code{yylex}), its semantic value (i.e., the +value passed in @code{yylval}, and possibly its location (@code{yylloc}). + +@deftypemethod {symbol_type} {} symbol_type (token_type @var{type}, const semantic_type& @var{value}, const location_type& @var{location}) +Build a complete terminal symbol which token type is @var{type}, and which +semantic value is @var{value}. If location tracking is enabled, also pass +the @var{location}. +@end deftypemethod + +This interface is low-level and should not be used for two reasons. First, +it is inconvenient, as you still have to build the semantic value, which is +a variant, and second, because consistency is not enforced: as with unions, +it is still possible to give an integer as semantic value for a string. + +So for each token type, Bison generates named constructors as follows. + +@deftypemethod {symbol_type} {} make_@var{token} (const @var{value_type}& @var{value}, const location_type& @var{location}) +@deftypemethodx {symbol_type} {} make_@var{token} (const location_type& @var{location}) +Build a complete terminal symbol for the token type @var{token} (not +including the @code{api.tokens.prefix}) whose possible semantic value is +@var{value} of adequate @var{value_type}. If location tracking is enabled, +also pass the @var{location}. +@end deftypemethod + +For instance, given the following declarations: + +@example +%define api.tokens.prefix "TOK_" +%token IDENTIFIER; +%token INTEGER; +%token COLON; +@end example + +@noindent +Bison generates the following functions: + +@example +symbol_type make_IDENTIFIER(const std::string& v, + const location_type& l); +symbol_type make_INTEGER(const int& v, + const location_type& loc); +symbol_type make_COLON(const location_type& loc); +@end example + +@noindent +which should be used in a Lex-scanner as follows. + +@example +[0-9]+ return yy::parser::make_INTEGER(text_to_int (yytext), loc); +[a-z]+ return yy::parser::make_IDENTIFIER(yytext, loc); +":" return yy::parser::make_COLON(loc); +@end example + +Tokens that do not have an identifier are not accessible: you cannot simply +use characters such as @code{':'}, they must be declared with @code{%token}. @node A Complete C++ Example @subsection A Complete C++ Example This section demonstrates the use of a C++ parser with a simple but complete example. This example should be available on your system, -ready to compile, in the directory @dfn{../bison/examples/calc++}. It +ready to compile, in the directory @dfn{.../bison/examples/calc++}. It focuses on the use of Bison, therefore the design of the various C++ classes is very naive: no accessors, no encapsulation of members etc. We will use a Lex scanner, and more precisely, a Flex scanner, to -demonstrate the various interaction. A hand written scanner is +demonstrate the various interactions. A hand-written scanner is actually easier to interface with. @menu @@ -9081,11 +9983,8 @@ factor both as follows. @comment file: calc++-driver.hh @example // Tell Flex the lexer's prototype ... -# define YY_DECL \ - yy::calcxx_parser::token_type \ - yylex (yy::calcxx_parser::semantic_type* yylval, \ - yy::calcxx_parser::location_type* yylloc, \ - calcxx_driver& driver) +# define YY_DECL \ + yy::calcxx_parser::symbol_type yylex (calcxx_driver& driver) // ... and declare it for the parser's sake. YY_DECL; @end example @@ -9109,8 +10008,8 @@ public: @end example @noindent -To encapsulate the coordination with the Flex scanner, it is useful to -have two members function to open and close the scanning phase. +To encapsulate the coordination with the Flex scanner, it is useful to have +member functions to open and close the scanning phase. @comment file: calc++-driver.hh @example @@ -9125,9 +10024,13 @@ Similarly for the parser itself. @comment file: calc++-driver.hh @example - // Run the parser. Return 0 on success. + // Run the parser on file F. + // Return 0 on success. int parse (const std::string& f); + // The name of the file being parsed. + // Used later to pass the file name to the location tracker. std::string file; + // Whether parser traces should be generated. bool trace_parsing; @end example @@ -9209,19 +10112,35 @@ the grammar for. %define parser_class_name "calcxx_parser" @end example +@noindent +@findex %define variant +@findex %define lex_symbol +This example will use genuine C++ objects as semantic values, therefore, we +require the variant-based interface. To make sure we properly use it, we +enable assertions. To fully benefit from type-safety and more natural +definition of ``symbol'', we enable @code{lex_symbol}. + +@comment file: calc++-parser.yy +@example +%define variant +%define parse.assert +%define lex_symbol +@end example + @noindent @findex %code requires -Then come the declarations/inclusions needed to define the -@code{%union}. Because the parser uses the parsing driver and -reciprocally, both cannot include the header of the other. Because the +Then come the declarations/inclusions needed by the semantic values. +Because the parser uses the parsing driver and reciprocally, both would like +to include the header of the other, which is, of course, insane. This +mutual dependency will be broken using forward declarations. Because the driver's header needs detailed knowledge about the parser class (in -particular its inner types), it is the parser's header which will simply -use a forward declaration of the driver. -@xref{%code Summary}. +particular its inner types), it is the parser's header which will use a +forward declaration of the driver. @xref{%code Summary}. @comment file: calc++-parser.yy @example -%code requires @{ +%code requires +@{ # include class calcxx_driver; @} @@ -9235,15 +10154,14 @@ global variables. @comment file: calc++-parser.yy @example // The parsing context. -%parse-param @{ calcxx_driver& driver @} -%lex-param @{ calcxx_driver& driver @} +%param @{ calcxx_driver& driver @} @end example @noindent -Then we request the location tracking feature, and initialize the +Then we request location tracking, and initialize the first location's file name. Afterward new locations are computed relatively to the previous locations: the file name will be -automatically propagated. +propagated. @comment file: calc++-parser.yy @example @@ -9256,28 +10174,14 @@ automatically propagated. @end example @noindent -Use the two following directives to enable parser tracing and verbose error +Use the following two directives to enable parser tracing and verbose error messages. However, verbose error messages can contain incorrect information (@pxref{LAC}). @comment file: calc++-parser.yy @example -%debug -%error-verbose -@end example - -@noindent -Semantic values cannot use ``real'' objects, but only pointers to -them. - -@comment file: calc++-parser.yy -@example -// Symbols. -%union -@{ - int ival; - std::string *sval; -@}; +%define parse.trace +%define parse.error verbose @end example @noindent @@ -9287,7 +10191,8 @@ The code between @samp{%code @{} and @samp{@}} is output in the @comment file: calc++-parser.yy @example -%code @{ +%code +@{ # include "calc++-driver.hh" @} @end example @@ -9295,35 +10200,53 @@ The code between @samp{%code @{} and @samp{@}} is output in the @noindent The token numbered as 0 corresponds to end of file; the following line -allows for nicer error messages referring to ``end of file'' instead -of ``$end''. Similarly user friendly named are provided for each -symbol. Note that the tokens names are prefixed by @code{TOKEN_} to -avoid name clashes. +allows for nicer error messages referring to ``end of file'' instead of +``$end''. Similarly user friendly names are provided for each symbol. To +avoid name clashes in the generated files (@pxref{Calc++ Scanner}), prefix +tokens with @code{TOK_} (@pxref{%define Summary,,api.tokens.prefix}). @comment file: calc++-parser.yy @example -%token END 0 "end of file" -%token ASSIGN ":=" -%token IDENTIFIER "identifier" -%token NUMBER "number" -%type exp +%define api.tokens.prefix "TOK_" +%token + END 0 "end of file" + ASSIGN ":=" + MINUS "-" + PLUS "+" + STAR "*" + SLASH "/" + LPAREN "(" + RPAREN ")" +; @end example @noindent -To enable memory deallocation during error recovery, use -@code{%destructor}. +Since we use variant-based semantic values, @code{%union} is not used, and +both @code{%type} and @code{%token} expect genuine types, as opposed to type +tags. -@c FIXME: Document %printer, and mention that it takes a braced-code operand. @comment file: calc++-parser.yy @example -%printer @{ debug_stream () << *$$; @} "identifier" -%destructor @{ delete $$; @} "identifier" +%token IDENTIFIER "identifier" +%token NUMBER "number" +%type exp +@end example -%printer @{ debug_stream () << $$; @} +@noindent +No @code{%destructor} is needed to enable memory deallocation during error +recovery; the memory, for strings for instance, will be reclaimed by the +regular destructors. All the values are printed using their +@code{operator<<}. + +@c FIXME: Document %printer, and mention that it takes a braced-code operand. +@comment file: calc++-parser.yy +@example +%printer @{ yyoutput << $$; @} <*>; @end example @noindent -The grammar itself is straightforward. +The grammar itself is straightforward (@pxref{Location Tracking Calc, , +Location Tracking Calculator: @code{ltcalc}}). @comment file: calc++-parser.yy @example @@ -9336,17 +10259,18 @@ assignments: | assignments assignment @{@}; assignment: - "identifier" ":=" exp - @{ driver.variables[*$1] = $3; delete $1; @}; - -%left '+' '-'; -%left '*' '/'; -exp: exp '+' exp @{ $$ = $1 + $3; @} - | exp '-' exp @{ $$ = $1 - $3; @} - | exp '*' exp @{ $$ = $1 * $3; @} - | exp '/' exp @{ $$ = $1 / $3; @} - | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} - | "number" @{ $$ = $1; @}; + "identifier" ":=" exp @{ driver.variables[$1] = $3; @}; + +%left "+" "-"; +%left "*" "/"; +exp: + exp "+" exp @{ $$ = $1 + $3; @} +| exp "-" exp @{ $$ = $1 - $3; @} +| exp "*" exp @{ $$ = $1 * $3; @} +| exp "/" exp @{ $$ = $1 / $3; @} +| "(" exp ")" @{ std::swap ($$, $2); @} +| "identifier" @{ $$ = driver.variables[$1]; @} +| "number" @{ std::swap ($$, $1); @}; %% @end example @@ -9357,7 +10281,7 @@ driver. @comment file: calc++-parser.yy @example void -yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l, +yy::calcxx_parser::error (const location_type& l, const std::string& m) @{ driver.error (l, m); @@ -9373,24 +10297,22 @@ parser's to get the set of defined tokens. @comment file: calc++-scanner.ll @example %@{ /* -*- C++ -*- */ -# include # include # include +# include # include # include "calc++-driver.hh" # include "calc++-parser.hh" -/* Work around an incompatibility in flex (at least versions - 2.5.31 through 2.5.33): it generates code that does - not conform to C89. See Debian bug 333231 - . */ +// Work around an incompatibility in flex (at least versions +// 2.5.31 through 2.5.33): it generates code that does +// not conform to C89. See Debian bug 333231 +// . # undef yywrap # define yywrap() 1 -/* By default yylex returns int, we use token_type. - Unfortunately yyterminate by default returns 0, which is - not of token_type. */ -#define yyterminate() return token::END +// The location of the current token. +static yy::location loc; %@} @end example @@ -9398,7 +10320,7 @@ parser's to get the set of defined tokens. Because there is no @code{#include}-like feature we don't need @code{yywrap}, we don't need @code{unput} either, and we parse an actual file, this is not an interactive session with the user. -Finally we enable the scanner tracing features. +Finally, we enable scanner tracing. @comment file: calc++-scanner.ll @example @@ -9418,8 +10340,8 @@ blank [ \t] @noindent The following paragraph suffices to track locations accurately. Each time @code{yylex} is invoked, the begin position is moved onto the end -position. Then when a pattern is matched, the end position is -advanced of its width. In case it matched ends of lines, the end +position. Then when a pattern is matched, its width is added to the end +column. When matching ends of lines, the end cursor is adjusted, and each time blanks are matched, the begin cursor is moved onto the end cursor to effectively ignore the blanks preceding tokens. Comments would be treated equally. @@ -9428,46 +10350,51 @@ preceding tokens. Comments would be treated equally. @example @group %@{ -# define YY_USER_ACTION yylloc->columns (yyleng); + // Code run each time a pattern is matched. + # define YY_USER_ACTION loc.columns (yyleng); %@} @end group %% +@group %@{ - yylloc->step (); + // Code run each time yylex is called. + loc.step (); %@} -@{blank@}+ yylloc->step (); -[\n]+ yylloc->lines (yyleng); yylloc->step (); +@end group +@{blank@}+ loc.step (); +[\n]+ loc.lines (yyleng); loc.step (); @end example @noindent -The rules are simple, just note the use of the driver to report errors. -It is convenient to use a typedef to shorten -@code{yy::calcxx_parser::token::identifier} into -@code{token::identifier} for instance. +The rules are simple. The driver is used to report errors. @comment file: calc++-scanner.ll @example -%@{ - typedef yy::calcxx_parser::token token; -%@} - /* Convert ints to the actual type of tokens. */ -[-+*/] return yy::calcxx_parser::token_type (yytext[0]); -":=" return token::ASSIGN; +"-" return yy::calcxx_parser::make_MINUS(loc); +"+" return yy::calcxx_parser::make_PLUS(loc); +"*" return yy::calcxx_parser::make_STAR(loc); +"/" return yy::calcxx_parser::make_SLASH(loc); +"(" return yy::calcxx_parser::make_LPAREN(loc); +")" return yy::calcxx_parser::make_RPAREN(loc); +":=" return yy::calcxx_parser::make_ASSIGN(loc); + +@group @{int@} @{ errno = 0; long n = strtol (yytext, NULL, 10); if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) - driver.error (*yylloc, "integer is out of range"); - yylval->ival = n; - return token::NUMBER; + driver.error (loc, "integer is out of range"); + return yy::calcxx_parser::make_NUMBER(n, loc); @} -@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; -. driver.error (*yylloc, "invalid character"); +@end group +@{id@} return yy::calcxx_parser::make_IDENTIFIER(yytext, loc); +. driver.error (loc, "invalid character"); +<> return yy::calcxx_parser::make_END(loc); %% @end example @noindent -Finally, because the scanner related driver's member function depend +Finally, because the scanner-related driver's member-functions depend on the scanner's data, it is simpler to implement them in this file. @comment file: calc++-scanner.ll @@ -9477,7 +10404,7 @@ void calcxx_driver::scan_begin () @{ yy_flex_debug = trace_scanning; - if (file == "-") + if (file.empty () || file == "-") yyin = stdin; else if (!(yyin = fopen (file.c_str (), "r"))) @{ @@ -9510,14 +10437,18 @@ The top level file, @file{calc++.cc}, poses no problem. int main (int argc, char *argv[]) @{ + int res = 0; calcxx_driver driver; - for (++argv; argv[0]; ++argv) - if (*argv == std::string ("-p")) + for (int i = 1; i < argc; ++i) + if (argv[i] == std::string ("-p")) driver.trace_parsing = true; - else if (*argv == std::string ("-s")) + else if (argv[i] == std::string ("-s")) driver.trace_scanning = true; - else if (!driver.parse (*argv)) + else if (!driver.parse (argv[i])) std::cout << driver.result << std::endl; + else + res = 1; + return res; @} @end group @end example @@ -9562,7 +10493,7 @@ You can create documentation for generated parsers using Javadoc. Contrary to C parsers, Java parsers do not use global variables; the state of the parser is always local to an instance of the parser class. Therefore, all Java parsers are ``pure'', and the @code{%pure-parser} -and @code{%define api.pure} directives does not do anything when used in +and @samp{%define api.pure} directives does not do anything when used in Java. Push parsers are currently unsupported in Java and @code{%define @@ -9575,15 +10506,23 @@ No header file can be generated for Java parsers. Do not use the @code{%defines} directive or the @option{-d}/@option{--defines} options. @c FIXME: Possible code change. -Currently, support for debugging and verbose errors are always compiled -in. Thus the @code{%debug} and @code{%token-table} directives and the +Currently, support for tracing is always compiled +in. Thus the @samp{%define parse.trace} and @samp{%token-table} +directives and the @option{-t}/@option{--debug} and @option{-k}/@option{--token-table} options have no effect. This may change in the future to eliminate -unused code in the generated parser, so use @code{%debug} and -@code{%verbose-error} explicitly if needed. Also, in the future the +unused code in the generated parser, so use @samp{%define parse.trace} +explicitly +if needed. Also, in the future the @code{%token-table} directive might enable a public interface to access the token names and codes. +Getting a ``code too large'' error from the Java compiler means the code +hit the 64KB bytecode per method limitation of the Java class file. +Try reducing the amount of code in actions and static initializers; +otherwise, report a bug so that the parser skeleton will be improved. + + @node Java Semantic Values @subsection Java Semantic Values @c - No %union, specify type in %type/%token. @@ -9602,7 +10541,7 @@ semantic values' types (class names) should be specified in the By default, the semantic stack is declared to have @code{Object} members, which means that the class types you specify can be of any class. To improve the type safety of the parser, you can declare the common -superclass of all the semantic values using the @code{%define stype} +superclass of all the semantic values using the @samp{%define stype} directive. For example, after the following declaration: @example @@ -9641,11 +10580,11 @@ class defines a @dfn{position}, a single point in a file; Bison itself defines a class representing a @dfn{location}, a range composed of a pair of positions (possibly spanning several files). The location class is an inner class of the parser; the name is @code{Location} by default, and may also be -renamed using @code{%define location_type "@var{class-name}"}. +renamed using @samp{%define location_type "@var{class-name}"}. The location class treats the position as a completely opaque value. By default, the class name is @code{Position}, but this can be changed -with @code{%define position_type "@var{class-name}"}. This class must +with @samp{%define position_type "@var{class-name}"}. This class must be supplied by the user. @@ -9680,20 +10619,22 @@ properly, the position class should override the @code{equals} and The name of the generated parser class defaults to @code{YYParser}. The @code{YY} prefix may be changed using the @code{%name-prefix} directive or the @option{-p}/@option{--name-prefix} option. Alternatively, use -@code{%define parser_class_name "@var{name}"} to give a custom name to +@samp{%define parser_class_name "@var{name}"} to give a custom name to the class. The interface of this class is detailed below. By default, the parser class has package visibility. A declaration -@code{%define public} will change to public visibility. Remember that, +@samp{%define public} will change to public visibility. Remember that, according to the Java language specification, the name of the @file{.java} file should match the name of the class in this case. Similarly, you can use @code{abstract}, @code{final} and @code{strictfp} with the @code{%define} declaration to add other modifiers to the parser class. +A single @samp{%define annotations "@var{annotations}"} directive can +be used to add any number of annotations to the parser class. The Java package name of the parser class can be specified using the -@code{%define package} directive. The superclass and the implemented +@samp{%define package} directive. The superclass and the implemented interfaces of the parser class can be specified with the @code{%define -extends} and @code{%define implements} directives. +extends} and @samp{%define implements} directives. The parser class defines an inner class, @code{Location}, that is used for location tracking (see @ref{Java Location Values}), and a inner @@ -9702,30 +10643,33 @@ these inner class/interface, and the members described in the interface below, all the other members and fields are preceded with a @code{yy} or @code{YY} prefix to avoid clashes with user code. -@c FIXME: The following constants and variables are still undocumented: -@c @code{bisonVersion}, @code{bisonSkeleton} and @code{errorVerbose}. - The parser class can be extended using the @code{%parse-param} directive. Each occurrence of the directive will add a @code{protected final} field to the parser class, and an argument to its constructor, which initialize them automatically. -Token names defined by @code{%token} and the predefined @code{EOF} token -name are added as constant fields to the parser class. - @deftypeop {Constructor} {YYParser} {} YYParser (@var{lex_param}, @dots{}, @var{parse_param}, @dots{}) Build a new parser object with embedded @code{%code lexer}. There are -no parameters, unless @code{%parse-param}s and/or @code{%lex-param}s are -used. +no parameters, unless @code{%param}s and/or @code{%parse-param}s and/or +@code{%lex-param}s are used. + +Use @code{%code init} for code added to the start of the constructor +body. This is especially useful to initialize superclasses. Use +@samp{%define init_throws} to specify any uncaught exceptions. @end deftypeop @deftypeop {Constructor} {YYParser} {} YYParser (Lexer @var{lexer}, @var{parse_param}, @dots{}) Build a new parser object using the specified scanner. There are no -additional parameters unless @code{%parse-param}s are used. +additional parameters unless @code{%param}s and/or @code{%parse-param}s are +used. If the scanner is defined by @code{%code lexer}, this constructor is declared @code{protected} and is called automatically with a scanner -created with the correct @code{%lex-param}s. +created with the correct @code{%param}s and/or @code{%lex-param}s. + +Use @code{%code init} for code added to the start of the constructor +body. This is especially useful to initialize superclasses. Use +@samp{%define init_throws} to specify any uncaught exceptions. @end deftypeop @deftypemethod {YYParser} {boolean} parse () @@ -9733,6 +10677,21 @@ Run the syntactic analysis, and return @code{true} on success, @code{false} otherwise. @end deftypemethod +@deftypemethod {YYParser} {boolean} getErrorVerbose () +@deftypemethodx {YYParser} {void} setErrorVerbose (boolean @var{verbose}) +Get or set the option to produce verbose error messages. These are only +available with @samp{%define parse.error verbose}, which also turns on +verbose error messages. +@end deftypemethod + +@deftypemethod {YYParser} {void} yyerror (String @var{msg}) +@deftypemethodx {YYParser} {void} yyerror (Position @var{pos}, String @var{msg}) +@deftypemethodx {YYParser} {void} yyerror (Location @var{loc}, String @var{msg}) +Print an error message using the @code{yyerror} method of the scanner +instance in use. The @code{Location} and @code{Position} parameters are +available only if location tracking is active. +@end deftypemethod + @deftypemethod {YYParser} {boolean} recovering () During the syntactic analysis, return @code{true} if recovering from a syntax error. @@ -9751,6 +10710,11 @@ Get or set the tracing level. Currently its value is either 0, no trace, or nonzero, full tracing. @end deftypemethod +@deftypecv {Constant} {YYParser} {String} {bisonVersion} +@deftypecvx {Constant} {YYParser} {String} {bisonSkeleton} +Identify the Bison version and skeleton used to generate this parser. +@end deftypecv + @node Java Scanner Interface @subsection Java Scanner Interface @@ -9761,7 +10725,9 @@ or nonzero, full tracing. There are two possible ways to interface a Bison-generated Java parser with a scanner: the scanner may be defined by @code{%code lexer}, or defined elsewhere. In either case, the scanner has to implement the -@code{Lexer} inner interface of the parser class. +@code{Lexer} inner interface of the parser class. This interface also +contain constants for all user-defined token names and the predefined +@code{EOF} token. In the first case, the body of the scanner class is placed in @code{%code lexer} blocks. If you want to pass parameters from the @@ -9780,7 +10746,7 @@ In both cases, the scanner has to implement the following methods. @deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg}) This method is defined by the user to emit an error message. The first parameter is omitted if location tracking is not active. Its type can be -changed using @code{%define location_type "@var{class-name}".} +changed using @samp{%define location_type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {int} yylex () @@ -9788,7 +10754,7 @@ Return the next token. Its type is the return value, its semantic value and location are saved and returned by the their methods in the interface. -Use @code{%define lex_throws} to specify any uncaught exceptions. +Use @samp{%define lex_throws} to specify any uncaught exceptions. Default is @code{java.io.IOException}. @end deftypemethod @@ -9798,14 +10764,14 @@ Return respectively the first position of the last token that @code{yylex} returned, and the first position beyond it. These methods are not needed unless location tracking is active. -The return type can be changed using @code{%define position_type +The return type can be changed using @samp{%define position_type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {Object} getLVal () Return the semantic value of the last token that yylex returned. -The return type can be changed using @code{%define stype +The return type can be changed using @samp{%define stype "@var{class-name}".} @end deftypemethod @@ -9816,7 +10782,7 @@ The return type can be changed using @code{%define stype The following special constructs can be uses in Java actions. Other analogous C action features are currently unavailable for Java. -Use @code{%define throws} to specify any uncaught exceptions from parser +Use @samp{%define throws} to specify any uncaught exceptions from parser actions, and initial actions specified by @code{%initial-action}. @defvar $@var{n} @@ -9833,7 +10799,7 @@ Like @code{$@var{n}} but specifies a alternative type @var{typealt}. @defvar $$ The semantic value for the grouping made by the current rule. As a value, this is in the base type (@code{Object} or as specified by -@code{%define stype}) as in not cast to the declared subtype because +@samp{%define stype}) as in not cast to the declared subtype because casts are not allowed on the left-hand side of Java assignments. Use an explicit Java cast if the correct subtype is needed. @xref{Java Semantic Values}. @@ -9858,20 +10824,20 @@ The location information of the grouping made by the current rule. @xref{Java Location Values}. @end defvar -@deffn {Statement} {return YYABORT;} +@deftypefn {Statement} return YYABORT @code{;} Return immediately from the parser, indicating failure. @xref{Java Parser Interface}. -@end deffn +@end deftypefn -@deffn {Statement} {return YYACCEPT;} +@deftypefn {Statement} return YYACCEPT @code{;} Return immediately from the parser, indicating success. @xref{Java Parser Interface}. -@end deffn +@end deftypefn -@deffn {Statement} {return YYERROR;} -Start error recovery without printing an error message. +@deftypefn {Statement} {return} YYERROR @code{;} +Start error recovery (without printing an error message). @xref{Error Recovery}. -@end deffn +@end deftypefn @deftypefn {Function} {boolean} recovering () Return whether error recovery is being done. In this state, the parser @@ -9880,11 +10846,12 @@ operation. @xref{Error Recovery}. @end deftypefn -@deftypefn {Function} {protected void} yyerror (String msg) -@deftypefnx {Function} {protected void} yyerror (Position pos, String msg) -@deftypefnx {Function} {protected void} yyerror (Location loc, String msg) +@deftypefn {Function} {void} yyerror (String @var{msg}) +@deftypefnx {Function} {void} yyerror (Position @var{loc}, String @var{msg}) +@deftypefnx {Function} {void} yyerror (Location @var{loc}, String @var{msg}) Print an error message using the @code{yyerror} method of the scanner -instance in use. +instance in use. The @code{Location} and @code{Position} parameters are +available only if location tracking is active. @end deftypefn @@ -9903,7 +10870,7 @@ macros. Instead, they should be preceded by @code{return} when they appear in an action. The actual definition of these symbols is opaque to the Bison grammar, and it might change in the future. The only meaningful operation that you can do, is to return them. -See @pxref{Java Action Features}. +@xref{Java Action Features}. Note that of these three symbols, only @code{YYACCEPT} and @code{YYABORT} will cause a return from the @code{yyparse} @@ -9919,8 +10886,8 @@ values have a common base type: @code{Object} or as specified by an union. The type of @code{$$}, even with angle brackets, is the base type since Java casts are not allow on the left-hand side of assignments. Also, @code{$@var{n}} and @code{@@@var{n}} are not allowed on the -left-hand side of assignments. See @pxref{Java Semantic Values} and -@pxref{Java Action Features}. +left-hand side of assignments. @xref{Java Semantic Values}, and +@ref{Java Action Features}. @item The prologue declarations have a different meaning than in C/C++ code. @@ -9928,7 +10895,7 @@ The prologue declarations have a different meaning than in C/C++ code. @item @code{%code imports} blocks are placed at the beginning of the Java source code. They may include copyright notices. For a @code{package} declarations, it is -suggested to use @code{%define package} instead. +suggested to use @samp{%define package} instead. @item unqualified @code{%code} blocks are placed inside the parser class. @@ -9936,7 +10903,7 @@ blocks are placed inside the parser class. @item @code{%code lexer} blocks, if specified, should include the implementation of the scanner. If there is no such block, the scanner can be any class -that implements the appropriate interface (see @pxref{Java Scanner +that implements the appropriate interface (@pxref{Java Scanner Interface}). @end table @@ -9969,7 +10936,7 @@ constructor that @emph{creates} a lexer. Default is none. @deffn {Directive} %name-prefix "@var{prefix}" The prefix of the parser class name @code{@var{prefix}Parser} if -@code{%define parser_class_name} is not used. Default is @code{YY}. +@samp{%define parser_class_name} is not used. Default is @code{YY}. @xref{Java Bison Interface}. @end deffn @@ -10000,6 +10967,11 @@ Code inserted just after the @code{package} declaration. @xref{Java Differences}. @end deffn +@deffn {Directive} {%code init} @{ @var{code} @dots{} @} +Code inserted at the beginning of the parser constructor body. +@xref{Java Parser Interface}. +@end deffn + @deffn {Directive} {%code lexer} @{ @var{code} @dots{} @} Code added to the body of a inner lexer class within the parser class. @xref{Java Scanner Interface}. @@ -10012,7 +10984,7 @@ Code (after the second @code{%%}) appended to the end of the file, @end deffn @deffn {Directive} %@{ @var{code} @dots{} %@} -Not supported. Use @code{%code import} instead. +Not supported. Use @code{%code imports} instead. @xref{Java Differences}. @end deffn @@ -10021,6 +10993,11 @@ Whether the parser class is declared @code{abstract}. Default is false. @xref{Java Bison Interface}. @end deffn +@deffn {Directive} {%define annotations} "@var{annotations}" +The Java annotations for the parser class. Default is none. +@xref{Java Bison Interface}. +@end deffn + @deffn {Directive} {%define extends} "@var{superclass}" The superclass of the parser class. Default is none. @xref{Java Bison Interface}. @@ -10037,6 +11014,12 @@ Default is none. @xref{Java Bison Interface}. @end deffn +@deffn {Directive} {%define init_throws} "@var{exceptions}" +The exceptions thrown by @code{%code init} from the parser class +constructor. Default is none. +@xref{Java Parser Interface}. +@end deffn + @deffn {Directive} {%define lex_throws} "@var{exceptions}" The exceptions thrown by the @code{yylex} method of the lexer, a comma-separated list. Default is @code{java.io.IOException}. @@ -10122,8 +11105,8 @@ My parser returns with error with a @samp{memory exhausted} message. What can I do? @end quotation -This question is already addressed elsewhere, @xref{Recursion, -,Recursive Rules}. +This question is already addressed elsewhere, see @ref{Recursion, ,Recursive +Rules}. @node How Can I Reset the Parser @section How Can I Reset the Parser @@ -10554,6 +11537,19 @@ the grammar file. @xref{Grammar Outline, ,Outline of a Bison Grammar}. @end deffn +@deffn {Directive} %?@{@var{expression}@} +Predicate actions. This is a type of action clause that may appear in +rules. The expression is evaluated, and if false, causes a syntax error. In +GLR parsers during nondeterministic operation, +this silently causes an alternative parse to die. During deterministic +operation, it is the same as the effect of YYERROR. +@xref{Semantic Predicates}. + +This feature is experimental. +More user feedback will help to determine whether it should become a permanent +feature. +@end deffn + @deffn {Construct} /*@dots{}*/ Comment delimiters, as in C. @end deffn @@ -10663,8 +11659,8 @@ token is reset to the token that originally caused the violation. @end deffn @deffn {Directive} %error-verbose -Bison declaration to request verbose, specific error message strings -when @code{yyerror} is called. @xref{Error Reporting}. +An obsolete directive standing for @samp{%define parse.error verbose} +(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). @end deffn @deffn {Directive} %file-prefix "@var{prefix}" @@ -10687,12 +11683,12 @@ Specify the programming language for the generated parser. @end deffn @deffn {Directive} %left -Bison declaration to assign left associativity to token(s). +Bison declaration to assign precedence and left associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn -@deffn {Directive} %lex-param @{@var{argument-declaration}@} -Bison declaration to specifying an additional parameter that +@deffn {Directive} %lex-param @{@var{argument-declaration}@} @dots{} +Bison declaration to specifying additional arguments that @code{yylex} should accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}. @end deffn @@ -10722,7 +11718,7 @@ parser implementation file. @xref{Decl Summary}. @end deffn @deffn {Directive} %nonassoc -Bison declaration to assign nonassociativity to token(s). +Bison declaration to assign precedence and nonassociativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn @@ -10731,10 +11727,15 @@ Bison declaration to set the name of the parser implementation file. @xref{Decl Summary}. @end deffn -@deffn {Directive} %parse-param @{@var{argument-declaration}@} -Bison declaration to specifying an additional parameter that -@code{yyparse} should accept. @xref{Parser Function,, The Parser -Function @code{yyparse}}. +@deffn {Directive} %param @{@var{argument-declaration}@} @dots{} +Bison declaration to specify additional arguments that both +@code{yylex} and @code{yyparse} should accept. @xref{Parser Function,, The +Parser Function @code{yyparse}}. +@end deffn + +@deffn {Directive} %parse-param @{@var{argument-declaration}@} @dots{} +Bison declaration to specify additional arguments that @code{yyparse} +should accept. @xref{Parser Function,, The Parser Function @code{yyparse}}. @end deffn @deffn {Directive} %prec @@ -10742,8 +11743,13 @@ Bison declaration to assign a precedence to a specific rule. @xref{Contextual Precedence, ,Context-Dependent Precedence}. @end deffn +@deffn {Directive} %precedence +Bison declaration to assign precedence to token(s), but no associativity +@xref{Precedence Decl, ,Operator Precedence}. +@end deffn + @deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{%define +Deprecated version of @samp{%define api.pure} (@pxref{%define Summary,,api.pure}), for which Bison is more careful to warn about unreasonable usage. @end deffn @@ -10754,7 +11760,7 @@ Require a Version of Bison}. @end deffn @deffn {Directive} %right -Bison declaration to assign right associativity to token(s). +Bison declaration to assign precedence and right associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn @@ -10847,10 +11853,11 @@ after a syntax error. @xref{Error Recovery}. @end deffn @deffn {Macro} YYERROR -Macro to pretend that a syntax error has just been detected: call -@code{yyerror} and then perform normal error recovery if possible -(@pxref{Error Recovery}), or (if recovery is impossible) make -@code{yyparse} return 1. @xref{Error Recovery}. +Cause an immediate syntax error. This statement initiates error +recovery just as if the parser itself had detected an error; however, it +does not call @code{yyerror}, and does not print any message. If you +want to print an error message, call @code{yyerror} explicitly before +the @samp{YYERROR;} statement. @xref{Error Recovery}. For Java parsers, this functionality is invoked using @code{return YYERROR;} instead. @@ -10858,16 +11865,21 @@ instead. @deffn {Function} yyerror User-supplied function to be called by @code{yyparse} on error. -@xref{Error Reporting, ,The Error -Reporting Function @code{yyerror}}. +@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @deffn {Macro} YYERROR_VERBOSE -An obsolete macro that you define with @code{#define} in the prologue -to request verbose, specific error message strings -when @code{yyerror} is called. It doesn't matter what definition you -use for @code{YYERROR_VERBOSE}, just whether you define it. Using -@code{%error-verbose} is preferred. @xref{Error Reporting}. +An obsolete macro used in the @file{yacc.c} skeleton, that you define +with @code{#define} in the prologue to request verbose, specific error +message strings when @code{yyerror} is called. It doesn't matter what +definition you use for @code{YYERROR_VERBOSE}, just whether you define +it. Using @samp{%define parse.error verbose} is preferred +(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). +@end deffn + +@deffn {Macro} YYFPRINTF +Macro used to output run-time traces. +@xref{Enabling Traces}. @end deffn @deffn {Macro} YYINITDEPTH @@ -10932,6 +11944,12 @@ The parser function produced by Bison; call this function to start parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}. @end deffn +@deffn {Macro} YYPRINT +Macro used to output token semantic values. For @file{yacc.c} only. +Obsoleted by @code{%printer}. +@xref{The YYPRINT Macro, , The @code{YYPRINT} Macro}. +@end deffn + @deffn {Function} yypstate_delete The function to delete a parser instance, produced by Bison in push mode; call this function to delete the memory associated with a parser.