X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/052826fdd1832fac13c0e7dcb150154cfb22db4f..231897ad21e478b0b0821636aa49fe49ca0f3301:/doc/bison.texinfo

diff --git a/doc/bison.texinfo b/doc/bison.texinfo
index 03da6c8f..3ed6b7f5 100644
--- a/doc/bison.texinfo
+++ b/doc/bison.texinfo
@@ -284,6 +284,8 @@ Invoking Bison
 Frequently Asked Questions
 
 * Parser Stack Overflow::      Breaking the Stack Limits
+* How Can I Reset @code{yyparse}::    @code{yyparse} Keeps some State
+* Strings are Destroyed::      @code{yylval} Loses Track of Strings
 
 Copying This Manual
 
@@ -6352,6 +6354,8 @@ are addressed.
 
 @menu
 * Parser Stack Overflow::      Breaking the Stack Limits
+* How Can I Reset @code{yyparse}::    @code{yyparse} Keeps some State
+* Strings are Destroyed::      @code{yylval} Loses Track of Strings
 @end menu
 
 @node Parser Stack Overflow
@@ -6365,6 +6369,148 @@ message.  What can I do?
 This question is already addressed elsewhere, @xref{Recursion,
 ,Recursive Rules}.
 
+@node How Can I Reset @code{yyparse}
+@section How Can I Reset @code{yyparse}
+
+The following phenomenon gives raise to several incarnations,
+resulting in the following typical questions:
+
+@display
+I invoke @code{yyparse} several times, and on correct input it works
+properly; but when a parse error is found, all the other calls fail
+too.  How can I reset @code{yyparse}'s error flag?
+@end display
+
+@noindent
+or
+
+@display
+My parser includes support for a @samp{#include} like feature, in
+which case I run @code{yyparse} from @code{yyparse}.  This fails
+although I did specify I needed a @code{%pure-parser}.
+@end display
+
+These problems are not related to Bison itself, but with the Lex
+generated scanners.  Because these scanners use large buffers for
+speed, they might not notice a change of input file.  As a
+demonstration, consider the following source file,
+@file{first-line.l}:
+
+@verbatim
+%{
+#include <stdio.h>
+#include <stdlib.h>
+%}
+%%
+.*\n    ECHO; return 1;
+%%
+int
+yyparse (const char *file)
+{
+  yyin = fopen (file, "r");
+  if (!yyin)
+    exit (2);
+  /* One token only. */
+  yylex ();
+  if (!fclose (yyin))
+    exit (3);
+  return 0;
+}
+
+int
+main ()
+{
+  yyparse ("input");
+  yyparse ("input");
+  return 0;
+}
+@end verbatim
+
+@noindent
+If the file @file{input} contains
+
+@verbatim
+input:1: Hello,
+input:2: World!
+@end verbatim
+
+@noindent
+then instead of getting twice the first line, you get:
+
+@example
+$ @kbd{flex -ofirst-line.c first-line.l}
+$ @kbd{gcc  -ofirst-line   first-line.c -ll}
+$ @kbd{./first-line}
+input:1: Hello,
+input:2: World!
+@end example
+
+Therefore, whenever you change @code{yyin}, you must tell the Lex
+generated scanner to discard its current buffer, and to switch to the
+new one.  This depends upon your implementation of Lex, see its
+documentation for more.  For instance, in the case of Flex, a simple
+call @samp{yyrestart (yyin)} suffices after each change to
+@code{yyin}.
+
+@node Strings are Destroyed
+@section Strings are Destroyed
+
+@display
+My parser seems to destroy old strings, or maybe it losses track of
+them.  Instead of reporting @samp{"foo", "bar"}, it reports
+@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
+@end display
+
+This error is probably the single most frequent ``bug report'' sent to
+Bison lists, but is only concerned with a misunderstanding of the role
+of scanner.  Consider the following Lex code:
+
+@verbatim
+%{
+#include <stdio.h>
+char *yylval = NULL;
+%}
+%%
+.*    yylval = yytext; return 1;
+\n    /* IGNORE */
+%%
+int
+main ()
+{
+  /* Similar to using $1, $2 in a Bison action. */
+  char *fst = (yylex (), yylval);
+  char *snd = (yylex (), yylval);
+  printf ("\"%s\", \"%s\"\n", fst, snd);
+  return 0;
+}
+@end verbatim
+
+If you compile and run this code, you get:
+
+@example
+$ @kbd{flex -osplit-lines.c split-lines.l}
+$ @kbd{gcc  -osplit-lines   split-lines.c -ll}
+$ @kbd{printf 'one\ntwo\n' | ./split-lines}
+"one
+two", "two"
+@end example
+
+@noindent
+this is because @code{yytext} is a buffer provided for @emph{reading}
+in the action, but if you want to keep it, you have to duplicate it
+(e.g., using @code{strdup}).  Note that the output may depend on how
+your implementation of Lex handles @code{yytext}.  For instance, when
+given the Lex compatibility option @option{-l} (which triggers the
+option @samp{%array}) Flex generates a different behavior:
+
+@example
+$ @kbd{flex -l -osplit-lines.c split-lines.l}
+$ @kbd{gcc     -osplit-lines   split-lines.c -ll}
+$ @kbd{printf 'one\ntwo\n' | ./split-lines}
+"two", "two"
+@end example
+
+
 @c ================================================= Table of Symbols
 
 @node Table of Symbols