X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/96b93a3da4ceac3beaa0ae5669723a7df9e2c40c..231897ad21e478b0b0821636aa49fe49ca0f3301:/doc/bison.texinfo?ds=inline diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 68409563..3ed6b7f5 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -39,8 +39,8 @@ This manual is for @acronym{GNU} Bison (version @value{VERSION}, @value{UPDATED}), the @acronym{GNU} parser generator. -Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, -1999, 2000, 2001, 2002 Free Software Foundation, Inc. +Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 2003, +1999, 2000, 2001, 2002, 2003 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -237,7 +237,7 @@ The Lexical Analyzer Function @code{yylex} * Calling Convention:: How @code{yyparse} calls @code{yylex}. * Token Values:: How @code{yylex} must return the semantic value of the token it has read. -* Token Positions:: How @code{yylex} must return the text position +* Token Locations:: How @code{yylex} must return the text location (line number, etc.) of the token, if the actions want that. * Pure Calling:: How the calling convention differs @@ -284,6 +284,8 @@ Invoking Bison Frequently Asked Questions * Parser Stack Overflow:: Breaking the Stack Limits +* How Can I Reset @code{yyparse}:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings Copying This Manual @@ -855,12 +857,12 @@ will suffice. Otherwise, we suggest @node Locations Overview @section Locations @cindex location -@cindex textual position -@cindex position, textual +@cindex textual location +@cindex location, textual Many applications, like interpreters or compilers, have to produce verbose and useful error messages. To achieve this, one must be able to keep track of -the @dfn{textual position}, or @dfn{location}, of each syntactic construct. +the @dfn{textual location}, or @dfn{location}, of each syntactic construct. Bison provides a mechanism for handling these locations. Each token has a semantic value. In a similar fashion, each token has an @@ -3071,8 +3073,8 @@ actually does to implement mid-rule actions. @node Locations @section Tracking Locations @cindex location -@cindex textual position -@cindex position, textual +@cindex textual location +@cindex location, textual Though grammar rules and semantic actions are enough to write a fully functional parser, it can be useful to process some additional information, @@ -3558,7 +3560,7 @@ The declaration looks like this: Here @var{n} is a decimal integer. The declaration says there should be no warning if there are @var{n} shift/reduce conflicts and no -reduce/reduce conflicts. An error, instead of the usual warning, is +reduce/reduce conflicts. The usual warning is given if there are either more or fewer conflicts, or if there are any reduce/reduce conflicts. @@ -3580,9 +3582,9 @@ Add an @code{%expect} declaration, copying the number @var{n} from the number which Bison printed. @end itemize -Now Bison will stop annoying you about the conflicts you have checked, but -it will warn you again if changes in the grammar result in additional -conflicts. +Now Bison will stop annoying you if you do not change the number of +conflicts, but it will warn you again if changes in the grammar result +in more or fewer conflicts. @node Start Decl @subsection The Start-Symbol @@ -3902,12 +3904,6 @@ Return immediately with value 0 (to report success). Return immediately with value 1 (to report failure). @end defmac -@c For now, do not document %lex-param and %parse-param, since it's -@c not clear that the current behavior is stable enough. For example, -@c we may need to add %error-param. -@clear documentparam - -@ifset documentparam If you use a reentrant parser, you can optionally pass additional parameter information to it in a reentrant way. To do so, use the declaration @code{%parse-param}: @@ -3946,7 +3942,6 @@ In the grammar actions, use expressions like this to refer to the data: @example exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} @end example -@end ifset @node Lexical @@ -3971,7 +3966,7 @@ that need it. @xref{Invocation, ,Invoking Bison}. * Calling Convention:: How @code{yyparse} calls @code{yylex}. * Token Values:: How @code{yylex} must return the semantic value of the token it has read. -* Token Positions:: How @code{yylex} must return the text position +* Token Locations:: How @code{yylex} must return the text location (line number, etc.) of the token, if the actions want that. * Pure Calling:: How the calling convention differs @@ -4104,8 +4099,8 @@ then the code in @code{yylex} might look like this: @end group @end example -@node Token Positions -@subsection Textual Positions of Tokens +@node Token Locations +@subsection Textual Locations of Tokens @vindex yylloc If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , @@ -4148,12 +4143,11 @@ yylex (YYSTYPE *lvalp, YYLTYPE *llocp) @end example If the grammar file does not use the @samp{@@} constructs to refer to -textual positions, then the type @code{YYLTYPE} will not be defined. In +textual locations, then the type @code{YYLTYPE} will not be defined. In this case, omit the second argument; @code{yylex} will be called with only one argument. -@ifset documentparam If you wish to pass the additional parameter data to @code{yylex}, use @code{%lex-param} just like @code{%parse-param} (@pxref{Parser Function}). @@ -4194,7 +4188,6 @@ and finally, if both @code{%pure-parser} and @code{%locations} are used: int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); int yyparse (int *nastiness, int *randomness); @end example -@end ifset @node Error Reporting @section The Error Reporting Function @code{yyerror} @@ -4259,7 +4252,6 @@ void yyerror (char const *msg); /* Yacc parsers. */ void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ @end example -@ifset documentparam If @samp{%parse-param @{int *nastiness@}} is used, then: @example @@ -4293,7 +4285,6 @@ void yyerror (YYLTYPE *locp, int *nastiness, int *randomness, char const *msg); @end example -@end ifset @noindent The prototypes are only indications of how the code produced by Bison @@ -4410,7 +4401,7 @@ errors. This is useful primarily in error rules. @deffn {Value} @@$ @findex @@$ -Acts like a structure variable containing information on the textual position +Acts like a structure variable containing information on the textual location of the grouping made by the current rule. @xref{Locations, , Tracking Locations}. @@ -4436,7 +4427,7 @@ Tracking Locations}. @deffn {Value} @@@var{n} @findex @@@var{n} -Acts like a structure variable containing information on the textual position +Acts like a structure variable containing information on the textual location of the @var{n}th component of the current rule. @xref{Locations, , Tracking Locations}. @end deffn @@ -5182,6 +5173,13 @@ structure should generally be adequate. On @acronym{LALR}(1) portions of a grammar, in particular, it is only slightly slower than with the default Bison parser. +For a more detailed exposition of GLR parsers, please see: Elizabeth +Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style +Generalised @acronym{LR} Parsers, Royal Holloway, University of +London, Department of Computer Science, TR-00-12, +@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps}, +(2000-12-24). + @node Stack Overflow @section Stack Overflow, and How to Avoid It @cindex stack overflow @@ -5609,8 +5607,8 @@ useless: STR; @example calc.y: warning: 1 useless nonterminal and 1 useless rule calc.y:11.1-7: warning: useless nonterminal: useless -calc.y:11.8-12: warning: useless rule: useless: STR -calc.y contains 7 shift/reduce conflicts. +calc.y:11.10-12: warning: useless rule: useless: STR +calc.y: conflicts: 7 shift/reduce @end example When given @option{--report=state}, in addition to @file{calc.tab.c}, it @@ -5632,10 +5630,10 @@ Conflict in state 8 between rule 2 and token '*' resolved as shift. The next section lists states that still have conflicts. @example -State 8 contains 1 shift/reduce conflict. -State 9 contains 1 shift/reduce conflict. -State 10 contains 1 shift/reduce conflict. -State 11 contains 4 shift/reduce conflicts. +State 8 conflicts: 1 shift/reduce +State 9 conflicts: 1 shift/reduce +State 10 conflicts: 1 shift/reduce +State 11 conflicts: 4 shift/reduce @end example @noindent @@ -5847,8 +5845,8 @@ state 7 exp go to state 11 @end example -As was announced in beginning of the report, @samp{State 8 contains 1 -shift/reduce conflict}: +As was announced in beginning of the report, @samp{State 8 conflicts: +1 shift/reduce}: @example state 8 @@ -6356,6 +6354,8 @@ are addressed. @menu * Parser Stack Overflow:: Breaking the Stack Limits +* How Can I Reset @code{yyparse}:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings @end menu @node Parser Stack Overflow @@ -6369,6 +6369,148 @@ message. What can I do? This question is already addressed elsewhere, @xref{Recursion, ,Recursive Rules}. +@node How Can I Reset @code{yyparse} +@section How Can I Reset @code{yyparse} + +The following phenomenon gives raise to several incarnations, +resulting in the following typical questions: + +@display +I invoke @code{yyparse} several times, and on correct input it works +properly; but when a parse error is found, all the other calls fail +too. How can I reset @code{yyparse}'s error flag? +@end display + +@noindent +or + +@display +My parser includes support for a @samp{#include} like feature, in +which case I run @code{yyparse} from @code{yyparse}. This fails +although I did specify I needed a @code{%pure-parser}. +@end display + +These problems are not related to Bison itself, but with the Lex +generated scanners. Because these scanners use large buffers for +speed, they might not notice a change of input file. As a +demonstration, consider the following source file, +@file{first-line.l}: + +@verbatim +%{ +#include +#include +%} +%% +.*\n ECHO; return 1; +%% +int +yyparse (const char *file) +{ + yyin = fopen (file, "r"); + if (!yyin) + exit (2); + /* One token only. */ + yylex (); + if (!fclose (yyin)) + exit (3); + return 0; +} + +int +main () +{ + yyparse ("input"); + yyparse ("input"); + return 0; +} +@end verbatim + +@noindent +If the file @file{input} contains + +@verbatim +input:1: Hello, +input:2: World! +@end verbatim + +@noindent +then instead of getting twice the first line, you get: + +@example +$ @kbd{flex -ofirst-line.c first-line.l} +$ @kbd{gcc -ofirst-line first-line.c -ll} +$ @kbd{./first-line} +input:1: Hello, +input:2: World! +@end example + +Therefore, whenever you change @code{yyin}, you must tell the Lex +generated scanner to discard its current buffer, and to switch to the +new one. This depends upon your implementation of Lex, see its +documentation for more. For instance, in the case of Flex, a simple +call @samp{yyrestart (yyin)} suffices after each change to +@code{yyin}. + +@node Strings are Destroyed +@section Strings are Destroyed + +@display +My parser seems to destroy old strings, or maybe it losses track of +them. Instead of reporting @samp{"foo", "bar"}, it reports +@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}. +@end display + +This error is probably the single most frequent ``bug report'' sent to +Bison lists, but is only concerned with a misunderstanding of the role +of scanner. Consider the following Lex code: + +@verbatim +%{ +#include +char *yylval = NULL; +%} +%% +.* yylval = yytext; return 1; +\n /* IGNORE */ +%% +int +main () +{ + /* Similar to using $1, $2 in a Bison action. */ + char *fst = (yylex (), yylval); + char *snd = (yylex (), yylval); + printf ("\"%s\", \"%s\"\n", fst, snd); + return 0; +} +@end verbatim + +If you compile and run this code, you get: + +@example +$ @kbd{flex -osplit-lines.c split-lines.l} +$ @kbd{gcc -osplit-lines split-lines.c -ll} +$ @kbd{printf 'one\ntwo\n' | ./split-lines} +"one +two", "two" +@end example + +@noindent +this is because @code{yytext} is a buffer provided for @emph{reading} +in the action, but if you want to keep it, you have to duplicate it +(e.g., using @code{strdup}). Note that the output may depend on how +your implementation of Lex handles @code{yytext}. For instance, when +given the Lex compatibility option @option{-l} (which triggers the +option @samp{%array}) Flex generates a different behavior: + +@example +$ @kbd{flex -l -osplit-lines.c split-lines.l} +$ @kbd{gcc -osplit-lines split-lines.c -ll} +$ @kbd{printf 'one\ntwo\n' | ./split-lines} +"two", "two" +@end example + + @c ================================================= Table of Symbols @node Table of Symbols @@ -6555,8 +6697,8 @@ External variable in which @code{yylex} should place the line and column numbers associated with a token. (In a pure parser, it is a local variable within @code{yyparse}, and its address is passed to @code{yylex}.) You can ignore this variable if you don't use the -@samp{@@} feature in the grammar actions. @xref{Token Positions, -,Textual Positions of Tokens}. +@samp{@@} feature in the grammar actions. @xref{Token Locations, +,Textual Locations of Tokens}. @end deffn @deffn {Variable} yynerrs @@ -6610,13 +6752,11 @@ Bison declaration to assign left associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn -@ifset documentparam @deffn {Directive} %lex-param @{@var{argument-declaration}@} Bison declaration to specifying an additional parameter that @code{yylex} should accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}. @end deffn -@end ifset @deffn {Directive} %merge Bison declaration to assign a merging function to a rule. If there is a @@ -6644,13 +6784,11 @@ Bison declaration to set the name of the parser file. @xref{Decl Summary}. @end deffn -@ifset documentparam @deffn {Directive} %parse-param @{@var{argument-declaration}@} Bison declaration to specifying an additional parameter that @code{yyparse} should accept. @xref{Parser Function,, The Parser Function @code{yyparse}}. @end deffn -@end ifset @deffn {Directive} %prec Bison declaration to assign a precedence to a specific rule.