X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/69363a9e4d0b8d3b51536434d3395d38519f617b..eda429346ae70445be1cffbe729c63fca7848ad6:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index a82c0783..8f35ac60 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -16,6 +16,10 @@ @c @clear shorttitlepage-enabled @c @set shorttitlepage-enabled +@c Set following if you want to document %default-prec and %no-default-prec. +@c This feature is experimental and may change in future Bison versions. +@c @set defaultprec + @c ISPELL CHECK: done, 14 Jan 1993 --bob @c Check COPYRIGHT dates. should be updated in the titlepage, ifinfo @@ -237,7 +241,7 @@ The Lexical Analyzer Function @code{yylex} * Calling Convention:: How @code{yyparse} calls @code{yylex}. * Token Values:: How @code{yylex} must return the semantic value of the token it has read. -* Token Positions:: How @code{yylex} must return the text position +* Token Locations:: How @code{yylex} must return the text location (line number, etc.) of the token, if the actions want that. * Pure Calling:: How the calling convention differs @@ -284,6 +288,10 @@ Invoking Bison Frequently Asked Questions * Parser Stack Overflow:: Breaking the Stack Limits +* How Can I Reset the Parser:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings +* C++ Parsers:: Compiling Parsers with C++ Compilers +* Implementing Loops:: Control Flow in the Calculator Copying This Manual @@ -348,7 +356,7 @@ encourage people to make other software free. So we decided to make the practical conditions for using Bison match the practical conditions for using the other @acronym{GNU} tools. -This exception applies only when Bison is generating C code for a +This exception applies only when Bison is generating C code for an @acronym{LALR}(1) parser; otherwise, the @acronym{GPL} terms operate as usual. You can tell whether the exception applies to your @samp{.c} output file by @@ -855,12 +863,12 @@ will suffice. Otherwise, we suggest @node Locations Overview @section Locations @cindex location -@cindex textual position -@cindex position, textual +@cindex textual location +@cindex location, textual Many applications, like interpreters or compilers, have to produce verbose and useful error messages. To achieve this, one must be able to keep track of -the @dfn{textual position}, or @dfn{location}, of each syntactic construct. +the @dfn{textual location}, or @dfn{location}, of each syntactic construct. Bison provides a mechanism for handling these locations. Each token has a semantic value. In a similar fashion, each token has an @@ -1411,7 +1419,7 @@ here is the definition we will use: void yyerror (char const *s) @{ - printf ("%s\n", s); + fprintf (stderr, "%s\n", s); @} @end group @end example @@ -2473,7 +2481,8 @@ does not enforce this convention, but if you depart from it, people who read your program will be confused. All the escape sequences used in string literals in C can be used in -Bison as well. However, unlike Standard C, trigraphs have no special +Bison as well, except that you must not use a null character within a +string literal. Also, unlike Standard C, trigraphs have no special meaning in Bison string literals, nor is backslash-newline allowed. A literal string token must contain two or more characters; for a token containing just one character, use a character token (see above). @@ -2793,9 +2802,10 @@ Actions, ,Actions in Mid-Rule}). The C code in an action can refer to the semantic values of the components matched by the rule with the construct @code{$@var{n}}, which stands for the value of the @var{n}th component. The semantic value for the grouping -being constructed is @code{$$}. (Bison translates both of these constructs -into array element references when it copies the actions into the parser -file.) +being constructed is @code{$$}. Bison translates both of these +constructs into expressions of the appropriate type when it copies the +actions into the parser file. @code{$$} is translated to a modifiable +lvalue, so it can be assigned to. Here is a typical example: @@ -3071,15 +3081,13 @@ actually does to implement mid-rule actions. @node Locations @section Tracking Locations @cindex location -@cindex textual position -@cindex position, textual +@cindex textual location +@cindex location, textual Though grammar rules and semantic actions are enough to write a fully functional parser, it can be useful to process some additional information, especially symbol locations. -@c (terminal or not) ? - The way locations are handled is defined by providing a data type, and actions to take when rules are matched. @@ -3144,9 +3152,10 @@ exp: @dots{} else @{ $$ = 1; - printf("Division by zero, l%d,c%d-l%d,c%d", - @@3.first_line, @@3.first_column, - @@3.last_line, @@3.last_column); + fprintf (stderr, + "Division by zero, l%d,c%d-l%d,c%d", + @@3.first_line, @@3.first_column, + @@3.last_line, @@3.last_column); @} @} @end group @@ -3170,9 +3179,10 @@ exp: @dots{} else @{ $$ = 1; - printf("Division by zero, l%d,c%d-l%d,c%d", - @@3.first_line, @@3.first_column, - @@3.last_line, @@3.last_column); + fprintf (stderr, + "Division by zero, l%d,c%d-l%d,c%d", + @@3.first_line, @@3.first_column, + @@3.last_line, @@3.last_column); @} @} @end group @@ -3206,11 +3216,11 @@ By default, @code{YYLLOC_DEFAULT} is defined this way for simple @example @group -#define YYLLOC_DEFAULT(Current, Rhs, N) \ - Current.first_line = Rhs[1].first_line; \ - Current.first_column = Rhs[1].first_column; \ - Current.last_line = Rhs[N].last_line; \ - Current.last_column = Rhs[N].last_column; +# define YYLLOC_DEFAULT(Current, Rhs, N) \ + ((Current).first_line = (Rhs)[1].first_line, \ + (Current).first_column = (Rhs)[1].first_column, \ + (Current).last_line = (Rhs)[N].last_line, \ + (Current).last_column = (Rhs)[N].last_column) @end group @end example @@ -3219,11 +3229,11 @@ and like this for @acronym{GLR} parsers: @example @group -#define YYLLOC_DEFAULT(Current, Rhs, N) \ - Current.first_line = YYRHSLOC(Rhs,1).first_line; \ - Current.first_column = YYRHSLOC(Rhs,1).first_column; \ - Current.last_line = YYRHSLOC(Rhs,N).last_line; \ - Current.last_column = YYRHSLOC(Rhs,N).last_column; +# define YYLLOC_DEFAULT(yyCurrent, yyRhs, YYN) \ + ((yyCurrent).first_line = YYRHSLOC(yyRhs, 1).first_line, \ + (yyCurrent).first_column = YYRHSLOC(yyRhs, 1).first_column, \ + (yyCurrent).last_line = YYRHSLOC(yyRhs, YYN).last_line, \ + (yyCurrent).last_column = YYRHSLOC(yyRhs, YYN).last_column) @end group @end example @@ -3237,6 +3247,12 @@ result) should be modified by @code{YYLLOC_DEFAULT}. @item For consistency with semantic actions, valid indexes for the location array range from 1 to @var{n}. + +@item +Your macro should parenthesize its arguments, if need be, since the +actual arguments may not be surrounded by parentheses. Also, your +macro should expand to something that can be used as a single +statement when it is followed by a semicolon. @end itemize @node Declarations @@ -3493,7 +3509,7 @@ is called when a symbol is thrown away. Declare that the @var{code} must be invoked for each of the @var{symbols} that will be discarded by the parser. The @var{code} should use @code{$$} to designate the semantic value associated to the -@var{symbols}. The additional parser parameters are also avaible +@var{symbols}. The additional parser parameters are also available (@pxref{Parser Function, , The Parser Function @code{yyparse}}). @strong{Warning:} as of Bison 1.875, this feature is still considered as @@ -3669,9 +3685,16 @@ Declare a terminal symbol (token type name) that is left-associative @deffn {Directive} %nonassoc Declare a terminal symbol (token type name) that is nonassociative -(using it in a way that would be associative is a syntax error) -@end deffn (@pxref{Precedence Decl, ,Operator Precedence}). +Using it in a way that would be associative is a syntax error. +@end deffn + +@ifset defaultprec +@deffn {Directive} %default-prec +Assign a precedence to rules lacking an explicit @code{%prec} modifier +(@pxref{Contextual Precedence, ,Context-Dependent Precedence}). +@end deffn +@end ifset @deffn {Directive} %type Declare the type of semantic values for a nonterminal symbol @@ -3743,6 +3766,14 @@ and so on. @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. @end deffn +@ifset defaultprec +@deffn {Directive} %no-default-prec +Do not assign a precedence to rules lacking an explicit @code{%prec} +modifier (@pxref{Contextual Precedence, ,Context-Dependent +Precedence}). +@end deffn +@end ifset + @deffn {Directive} %no-parser Do not include any C code in the parser file; generate tables only. The parser file contains just @code{#define} directives and static variable @@ -3902,12 +3933,6 @@ Return immediately with value 0 (to report success). Return immediately with value 1 (to report failure). @end defmac -@c For now, do not document %lex-param and %parse-param, since it's -@c not clear that the current behavior is stable enough. For example, -@c we may need to add %error-param. -@clear documentparam - -@ifset documentparam If you use a reentrant parser, you can optionally pass additional parameter information to it in a reentrant way. To do so, use the declaration @code{%parse-param}: @@ -3946,7 +3971,6 @@ In the grammar actions, use expressions like this to refer to the data: @example exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} @end example -@end ifset @node Lexical @@ -3971,7 +3995,7 @@ that need it. @xref{Invocation, ,Invoking Bison}. * Calling Convention:: How @code{yyparse} calls @code{yylex}. * Token Values:: How @code{yylex} must return the semantic value of the token it has read. -* Token Positions:: How @code{yylex} must return the text position +* Token Locations:: How @code{yylex} must return the text location (line number, etc.) of the token, if the actions want that. * Pure Calling:: How the calling convention differs @@ -4104,8 +4128,8 @@ then the code in @code{yylex} might look like this: @end group @end example -@node Token Positions -@subsection Textual Positions of Tokens +@node Token Locations +@subsection Textual Locations of Tokens @vindex yylloc If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , @@ -4148,12 +4172,11 @@ yylex (YYSTYPE *lvalp, YYLTYPE *llocp) @end example If the grammar file does not use the @samp{@@} constructs to refer to -textual positions, then the type @code{YYLTYPE} will not be defined. In +textual locations, then the type @code{YYLTYPE} will not be defined. In this case, omit the second argument; @code{yylex} will be called with only one argument. -@ifset documentparam If you wish to pass the additional parameter data to @code{yylex}, use @code{%lex-param} just like @code{%parse-param} (@pxref{Parser Function}). @@ -4194,7 +4217,6 @@ and finally, if both @code{%pure-parser} and @code{%locations} are used: int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); int yyparse (int *nastiness, int *randomness); @end example -@end ifset @node Error Reporting @section The Error Reporting Function @code{yyerror} @@ -4259,7 +4281,6 @@ void yyerror (char const *msg); /* Yacc parsers. */ void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ @end example -@ifset documentparam If @samp{%parse-param @{int *nastiness@}} is used, then: @example @@ -4293,7 +4314,6 @@ void yyerror (YYLTYPE *locp, int *nastiness, int *randomness, char const *msg); @end example -@end ifset @noindent The prototypes are only indications of how the code produced by Bison @@ -4410,7 +4430,7 @@ errors. This is useful primarily in error rules. @deffn {Value} @@$ @findex @@$ -Acts like a structure variable containing information on the textual position +Acts like a structure variable containing information on the textual location of the grouping made by the current rule. @xref{Locations, , Tracking Locations}. @@ -4436,7 +4456,7 @@ Tracking Locations}. @deffn {Value} @@@var{n} @findex @@@var{n} -Acts like a structure variable containing information on the textual position +Acts like a structure variable containing information on the textual location of the @var{n}th component of the current rule. @xref{Locations, , Tracking Locations}. @end deffn @@ -4852,6 +4872,28 @@ exp: @dots{} @end group @end example +@ifset defaultprec +If you forget to append @code{%prec UMINUS} to the rule for unary +minus, Bison silently assumes that minus has its usual precedence. +This kind of problem can be tricky to debug, since one typically +discovers the mistake only by testing the code. + +The @code{%no-default-prec;} declaration makes it easier to discover +this kind of problem systematically. It causes rules that lack a +@code{%prec} modifier to have no precedence, even if the last terminal +symbol mentioned in their components has a declared precedence. + +If @code{%no-default-prec;} is in effect, you must specify @code{%prec} +for all rules that participate in precedence conflict resolution. +Then you will see any shift/reduce conflict until you tell Bison how +to resolve it, either by changing your grammar or by adding an +explicit precedence. This will probably add declarations to the +grammar, but it helps to protect against incorrect rule precedences. + +The effect of @code{%no-default-prec;} can be reversed by giving +@code{%default-prec;}, which is the default. +@end ifset + @node Parser States @section Parser States @cindex finite-state machine @@ -6363,6 +6405,10 @@ are addressed. @menu * Parser Stack Overflow:: Breaking the Stack Limits +* How Can I Reset the Parser:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings +* C++ Parsers:: Compiling Parsers with C++ Compilers +* Implementing Loops:: Control Flow in the Calculator @end menu @node Parser Stack Overflow @@ -6376,6 +6422,204 @@ message. What can I do? This question is already addressed elsewhere, @xref{Recursion, ,Recursive Rules}. +@node How Can I Reset the Parser +@section How Can I Reset the Parser + +The following phenomenon has several symptoms, resulting in the +following typical questions: + +@display +I invoke @code{yyparse} several times, and on correct input it works +properly; but when a parse error is found, all the other calls fail +too. How can I reset the error flag of @code{yyparse}? +@end display + +@noindent +or + +@display +My parser includes support for an @samp{#include}-like feature, in +which case I run @code{yyparse} from @code{yyparse}. This fails +although I did specify I needed a @code{%pure-parser}. +@end display + +These problems typically come not from Bison itself, but from +Lex-generated scanners. Because these scanners use large buffers for +speed, they might not notice a change of input file. As a +demonstration, consider the following source file, +@file{first-line.l}: + +@verbatim +%{ +#include +#include +%} +%% +.*\n ECHO; return 1; +%% +int +yyparse (char const *file) +{ + yyin = fopen (file, "r"); + if (!yyin) + exit (2); + /* One token only. */ + yylex (); + if (fclose (yyin) != 0) + exit (3); + return 0; +} + +int +main (void) +{ + yyparse ("input"); + yyparse ("input"); + return 0; +} +@end verbatim + +@noindent +If the file @file{input} contains + +@verbatim +input:1: Hello, +input:2: World! +@end verbatim + +@noindent +then instead of getting the first line twice, you get: + +@example +$ @kbd{flex -ofirst-line.c first-line.l} +$ @kbd{gcc -ofirst-line first-line.c -ll} +$ @kbd{./first-line} +input:1: Hello, +input:2: World! +@end example + +Therefore, whenever you change @code{yyin}, you must tell the +Lex-generated scanner to discard its current buffer and switch to the +new one. This depends upon your implementation of Lex; see its +documentation for more. For Flex, it suffices to call +@samp{YY_FLUSH_BUFFER} after each change to @code{yyin}. If your +Flex-generated scanner needs to read from several input streams to +handle features like include files, you might consider using Flex +functions like @samp{yy_switch_to_buffer} that manipulate multiple +input buffers. + +@node Strings are Destroyed +@section Strings are Destroyed + +@display +My parser seems to destroy old strings, or maybe it loses track of +them. Instead of reporting @samp{"foo", "bar"}, it reports +@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}. +@end display + +This error is probably the single most frequent ``bug report'' sent to +Bison lists, but is only concerned with a misunderstanding of the role +of scanner. Consider the following Lex code: + +@verbatim +%{ +#include +char *yylval = NULL; +%} +%% +.* yylval = yytext; return 1; +\n /* IGNORE */ +%% +int +main () +{ + /* Similar to using $1, $2 in a Bison action. */ + char *fst = (yylex (), yylval); + char *snd = (yylex (), yylval); + printf ("\"%s\", \"%s\"\n", fst, snd); + return 0; +} +@end verbatim + +If you compile and run this code, you get: + +@example +$ @kbd{flex -osplit-lines.c split-lines.l} +$ @kbd{gcc -osplit-lines split-lines.c -ll} +$ @kbd{printf 'one\ntwo\n' | ./split-lines} +"one +two", "two" +@end example + +@noindent +this is because @code{yytext} is a buffer provided for @emph{reading} +in the action, but if you want to keep it, you have to duplicate it +(e.g., using @code{strdup}). Note that the output may depend on how +your implementation of Lex handles @code{yytext}. For instance, when +given the Lex compatibility option @option{-l} (which triggers the +option @samp{%array}) Flex generates a different behavior: + +@example +$ @kbd{flex -l -osplit-lines.c split-lines.l} +$ @kbd{gcc -osplit-lines split-lines.c -ll} +$ @kbd{printf 'one\ntwo\n' | ./split-lines} +"two", "two" +@end example + + +@node C++ Parsers +@section C++ Parsers + +@display +How can I generate parsers in C++? +@end display + +We are working on a C++ output for Bison, but unfortunately, for lack +of time, the skeleton is not finished. It is functional, but in +numerous respects, it will require additional work which @emph{might} +break backward compatibility. Since the skeleton for C++ is not +documented, we do not consider ourselves bound to this interface, +nevertheless, as much as possible we will try to keep compatibility. + +Another possibility is to use the regular C parsers, and to compile +them with a C++ compiler. This works properly, provided that you bear +some simple C++ rules in mind, such as not including ``real classes'' +(i.e., structure with constructors) in unions. Therefore, in the +@code{%union}, use pointers to classes, or better yet, a single +pointer type to the root of your lexical/syntactic hierarchy. + + +@node Implementing Loops +@section Implementing Loops + +@display +My simple calculator supports variables, assignments, and functions, +but how can I implement loops? +@end display + +Although very pedagogical, the examples included in the document blur +the distinction to make between the parser---whose job is to recover +the structure of a text and to transmit it to subsequent modules of +the program---and the processing (such as the execution) of this +structure. This works well with so called straight line programs, +i.e., precisely those that have a straightforward execution model: +execute simple instructions one after the others. + +@cindex abstract syntax tree +@cindex @acronym{AST} +If you want a richer model, you will probably need to use the parser +to construct a tree that does represent the structure it has +recovered; this tree is usually called the @dfn{abstract syntax tree}, +or @dfn{@acronym{AST}} for short. Then, walking through this tree, +traversing it in various ways, will enable treatments such as its +execution or its translation, which will result in an interpreter or a +compiler. + +This topic is way beyond the scope of this manual, and the reader is +invited to consult the dedicated literature. + + + @c ================================================= Table of Symbols @node Table of Symbols @@ -6562,8 +6806,8 @@ External variable in which @code{yylex} should place the line and column numbers associated with a token. (In a pure parser, it is a local variable within @code{yyparse}, and its address is passed to @code{yylex}.) You can ignore this variable if you don't use the -@samp{@@} feature in the grammar actions. @xref{Token Positions, -,Textual Positions of Tokens}. +@samp{@@} feature in the grammar actions. @xref{Token Locations, +,Textual Locations of Tokens}. @end deffn @deffn {Variable} yynerrs @@ -6581,6 +6825,14 @@ parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}. Equip the parser for debugging. @xref{Decl Summary}. @end deffn +@ifset defaultprec +@deffn {Directive} %default-prec +Assign a precedence to rules that lack an explicit @samp{%prec} +modifier. @xref{Contextual Precedence, ,Context-Dependent +Precedence}. +@end deffn +@end ifset + @deffn {Directive} %defines Bison declaration to create a header file meant for the scanner. @xref{Decl Summary}. @@ -6617,13 +6869,11 @@ Bison declaration to assign left associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn -@ifset documentparam @deffn {Directive} %lex-param @{@var{argument-declaration}@} Bison declaration to specifying an additional parameter that @code{yylex} should accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}. @end deffn -@end ifset @deffn {Directive} %merge Bison declaration to assign a merging function to a rule. If there is a @@ -6636,6 +6886,14 @@ function is applied to the two semantic values to get a single result. Bison declaration to rename the external symbols. @xref{Decl Summary}. @end deffn +@ifset defaultprec +@deffn {Directive} %no-default-prec +Do not assign a precedence to rules that lack an explicit @samp{%prec} +modifier. @xref{Contextual Precedence, ,Context-Dependent +Precedence}. +@end deffn +@end ifset + @deffn {Directive} %no-lines Bison declaration to avoid generating @code{#line} directives in the parser file. @xref{Decl Summary}. @@ -6651,13 +6909,11 @@ Bison declaration to set the name of the parser file. @xref{Decl Summary}. @end deffn -@ifset documentparam @deffn {Directive} %parse-param @{@var{argument-declaration}@} Bison declaration to specifying an additional parameter that @code{yyparse} should accept. @xref{Parser Function,, The Parser Function @code{yyparse}}. @end deffn -@end ifset @deffn {Directive} %prec Bison declaration to assign a precedence to a specific rule. @@ -6921,3 +7177,33 @@ grammatically indivisible. The piece of text it represents is a token. @printindex cp @bye + +@c LocalWords: texinfo setfilename settitle setchapternewpage finalout +@c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex +@c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry +@c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa +@c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc +@c LocalWords: rpcalc Lexer Gen Comp Expr ltcalc mfcalc Decl Symtab yylex +@c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref +@c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex +@c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge +@c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG +@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit +@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok +@c LocalWords: longjmp fprintf stderr preg yylloc YYLTYPE cos ln +@c LocalWords: smallexample symrec val tptr FNCT fnctptr func struct sym +@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof +@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum +@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype +@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless typefull yynerrs +@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES +@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param +@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP +@c LocalWords: YYEMPTY YYRECOVERING yyclearin GE def UMINUS maybeword +@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH +@c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm VCG notype +@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args +@c LocalWords: YYPRINTF infile ypp yxx outfile itemx vcg tex leaderfill +@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll +@c LocalWords: yyrestart nbar yytext fst snd osplit ntwo strdup AST +@c LocalWords: YYSTACK DVI fdl printindex