X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/1386333308dbb0d5c2ee7b1d20dd47096ca711be..dc2825ae898c526ea3619450112b2ea44f437151:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 72c7c522..68b24c41 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -125,7 +125,8 @@ instead of in the original English. @sp 2 Cover art by Etienne Suvasa. @end titlepage -@page + +@contents @node Top, Introduction, (dir), (dir) @@ -317,7 +318,7 @@ chapters follow which describe specific aspects of Bison in detail. Bison was written primarily by Robert Corbett; Richard Stallman made it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added -multicharacter string literals and other features. +multi-character string literals and other features. This edition corresponds to version @value{VERSION} of Bison. @@ -757,6 +758,7 @@ use Bison or Yacc, we suggest you start by reading this chapter carefully. a semantic value (the value of an integer, the name of an identifier, etc.). * Semantic Actions:: Each rule can have an action containing C code. +* Locations Overview:: Tracking Locations. * Bison Parser:: What are Bison's input and output, how is the output used? * Stages:: Stages in writing and running Bison grammars. @@ -959,7 +961,7 @@ semantic value that is a number. In a compiler for a programming language, an expression typically has a semantic value that is a tree structure describing the meaning of the expression. -@node Semantic Actions, Bison Parser, Semantic Values, Concepts +@node Semantic Actions, Locations Overview, Semantic Values, Concepts @section Semantic Actions @cindex semantic actions @cindex actions, semantic @@ -990,7 +992,36 @@ expr: expr '+' expr @{ $$ = $1 + $3; @} The action says how to produce the semantic value of the sum expression from the values of the two subexpressions. -@node Bison Parser, Stages, Semantic Actions, Concepts +@node Locations Overview, Bison Parser, Semantic Actions, Concepts +@section Locations +@cindex location +@cindex textual position +@cindex position, textual + +Many applications, like interpreters or compilers, have to produce verbose +and useful error messages. To achieve this, one must be able to keep track of +the @dfn{textual position}, or @dfn{location}, of each syntactic construct. +Bison provides a mechanism for handling these locations. + +Each token has a semantic value. In a similar fashion, each token has an +associated location, but the type of locations is the same for all tokens and +groupings. Moreover, the output parser is equipped with a default data +structure for storing locations (@pxref{Locations}, for more details). + +Like semantic values, locations can be reached in actions using a dedicated +set of constructs. In the example above, the location of the whole grouping +is @code{@@$}, while the locations of the subexpressions are @code{@@1} and +@code{@@3}. + +When a rule is matched, a default action is used to compute the semantic value +of its left hand side (@pxref{Actions}). In the same way, another default +action is used for locations. However, the action for locations is general +enough for most cases, meaning there is usually no need to describe for each +rule how @code{@@$} should be formed. When building a new location for a given +grouping, the default behavior of the output parser is to take the beginning +of the first symbol, and the end of the last symbol. + +@node Bison Parser, Stages, Locations Overview, Concepts @section Bison Output: the Parser File @cindex Bison parser @cindex Bison utility @@ -1510,11 +1541,12 @@ real calculator, but it is adequate for the first example. @subsection Running Bison to Make the Parser @cindex running Bison (introduction) -Before running Bison to produce a parser, we need to decide how to arrange -all the source code in one or more source files. For such a simple example, -the easiest thing is to put everything in one file. The definitions of -@code{yylex}, @code{yyerror} and @code{main} go at the end, in the -``additional C code'' section of the file (@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}). +Before running Bison to produce a parser, we need to decide how to +arrange all the source code in one or more source files. For such a +simple example, the easiest thing is to put everything in one file. The +definitions of @code{yylex}, @code{yyerror} and @code{main} go at the +end, in the ``additional C code'' section of the file (@pxref{Grammar +Layout, ,The Overall Layout of a Bison Grammar}). For a large project, you would probably have several source files, and use @code{make} to arrange to recompile them. @@ -1628,8 +1660,8 @@ exp: NUM @{ $$ = $1; @} @end example @noindent -The functions @code{yylex}, @code{yyerror} and @code{main} can be the same -as before. +The functions @code{yylex}, @code{yyerror} and @code{main} can be the +same as before. There are two important new features shown in this code. @@ -1671,10 +1703,10 @@ Here is a sample run of @file{calc.y}: Up to this point, this manual has not addressed the issue of @dfn{error recovery}---how to continue parsing after the parser detects a syntax -error. All we have handled is error reporting with @code{yyerror}. Recall -that by default @code{yyparse} returns after calling @code{yyerror}. This -means that an erroneous input line causes the calculator program to exit. -Now we show how to rectify this deficiency. +error. All we have handled is error reporting with @code{yyerror}. +Recall that by default @code{yyparse} returns after calling +@code{yyerror}. This means that an erroneous input line causes the +calculator program to exit. Now we show how to rectify this deficiency. The Bison language itself includes the reserved word @code{error}, which may be included in the grammar rules. In the example below it has @@ -1689,14 +1721,15 @@ line: '\n' @end group @end example -This addition to the grammar allows for simple error recovery in the event -of a parse error. If an expression that cannot be evaluated is read, the -error will be recognized by the third rule for @code{line}, and parsing -will continue. (The @code{yyerror} function is still called upon to print -its message as well.) The action executes the statement @code{yyerrok}, a -macro defined automatically by Bison; its meaning is that error recovery is -complete (@pxref{Error Recovery}). Note the difference between -@code{yyerrok} and @code{yyerror}; neither one is a misprint.@refill +This addition to the grammar allows for simple error recovery in the +event of a parse error. If an expression that cannot be evaluated is +read, the error will be recognized by the third rule for @code{line}, +and parsing will continue. (The @code{yyerror} function is still called +upon to print its message as well.) The action executes the statement +@code{yyerrok}, a macro defined automatically by Bison; its meaning is +that error recovery is complete (@pxref{Error Recovery}). Note the +difference between @code{yyerrok} and @code{yyerror}; neither one is a +misprint.@refill This form of error recovery deals with syntax errors. There are other kinds of errors; for example, division by zero, which raises an exception @@ -1856,17 +1889,20 @@ The symbol table itself consists of a linked list of records. Its definition, which is kept in the header @file{calc.h}, is as follows. It provides for either functions or variables to be placed in the table. -@c FIXME: ANSIfy the prototypes for FNCTPTR etc. @smallexample @group +/* Fonctions type. */ +typedef double (*func_t) (double); + /* Data type for links in the chain of symbols. */ struct symrec @{ char *name; /* name of symbol */ int type; /* type of symbol: either VAR or FNCT */ - union @{ - double var; /* value of a VAR */ - double (*fnctptr)(); /* value of a FNCT */ + union + @{ + double var; /* value of a VAR */ + func_t fnctptr; /* value of a FNCT */ @} value; struct symrec *next; /* link field */ @}; @@ -1878,8 +1914,8 @@ typedef struct symrec symrec; /* The symbol table: a chain of `struct symrec'. */ extern symrec *sym_table; -symrec *putsym (); -symrec *getsym (); +symrec *putsym (const char *, func_t); +symrec *getsym (const char *); @end group @end smallexample @@ -1909,24 +1945,24 @@ yyerror (const char *s) /* Called by yyparse on error */ struct init @{ char *fname; - double (*fnct)(); + double (*fnct)(double); @}; @end group @group struct init arith_fncts[] = @{ - "sin", sin, - "cos", cos, + "sin", sin, + "cos", cos, "atan", atan, - "ln", log, - "exp", exp, + "ln", log, + "exp", exp, "sqrt", sqrt, 0, 0 @}; /* The symbol table: a chain of `struct symrec'. */ -symrec *sym_table = (symrec *)0; +symrec *sym_table = (symrec *) 0; @end group @group @@ -1984,7 +2020,7 @@ getsym (const char *sym_name) The function @code{yylex} must now recognize variables, numeric values, and the single-character arithmetic operators. Strings of alphanumeric -characters with a leading nondigit are recognized as either variables or +characters with a leading non-digit are recognized as either variables or functions depending on what the symbol table says about them. The string is passed to @code{getsym} for look up in the symbol table. If @@ -2106,6 +2142,7 @@ Bison takes as input a context-free grammar specification and produces a C-language function that recognizes correct instances of the grammar. The Bison grammar input file conventionally has a name ending in @samp{.y}. +@xref{Invocation, ,Invoking Bison}. @menu * Grammar Outline:: Overall layout of the grammar file. @@ -2113,6 +2150,7 @@ The Bison grammar input file conventionally has a name ending in @samp{.y}. * Rules:: How to write grammar rules. * Recursion:: Writing recursive rules. * Semantics:: Semantic values and actions. +* Locations:: Locations and actions. * Declarations:: All kinds of Bison declarations are described here. * Multiple Parsers:: Putting more than one Bison parser in one program. @end menu @@ -2186,12 +2224,13 @@ if it is the first thing in the file. @cindex additional C code section @cindex C code, section for additional -The @var{additional C code} section is copied verbatim to the end of -the parser file, just as the @var{C declarations} section is copied to -the beginning. This is the most convenient place to put anything -that you want to have in the parser file but which need not come before -the definition of @code{yyparse}. For example, the definitions of -@code{yylex} and @code{yyerror} often go here. @xref{Interface, ,Parser C-Language Interface}. +The @var{additional C code} section is copied verbatim to the end of the +parser file, just as the @var{C declarations} section is copied to the +beginning. This is the most convenient place to put anything that you +want to have in the parser file but which need not come before the +definition of @code{yyparse}. For example, the definitions of +@code{yylex} and @code{yyerror} often go here. @xref{Interface, ,Parser +C-Language Interface}. If the last section is empty, you may omit the @samp{%%} that separates it from the grammar rules. @@ -2266,7 +2305,7 @@ for @code{yylex}}). A @dfn{literal string token} is written like a C string constant; for example, @code{"<="} is a literal string token. A literal string token doesn't need to be declared unless you need to specify its semantic -value data type (@pxref{Value Type}), associativity, precedence +value data type (@pxref{Value Type}), associativity, or precedence (@pxref{Precedence}). You can associate the literal string token with a symbolic name as an @@ -2477,7 +2516,7 @@ primary: constant defines two mutually-recursive nonterminals, since each refers to the other. -@node Semantics, Declarations, Recursion, Grammar File +@node Semantics, Locations, Recursion, Grammar File @section Defining Language Semantics @cindex defining language semantics @cindex language semantics, defining @@ -2540,10 +2579,11 @@ Specify the entire collection of possible data types, with the @code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}). @item -Choose one of those types for each symbol (terminal or nonterminal) -for which semantic values are used. This is done for tokens with the -@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) and for groupings -with the @code{%type} Bison declaration (@pxref{Type Decl, ,Nonterminal Symbols}). +Choose one of those types for each symbol (terminal or nonterminal) for +which semantic values are used. This is done for tokens with the +@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) +and for groupings with the @code{%type} Bison declaration (@pxref{Type +Decl, ,Nonterminal Symbols}). @end itemize @node Actions, Action Types, Multiple Types, Semantics @@ -2829,7 +2869,119 @@ the action is now at the end of its rule. Any mid-rule action can be converted to an end-of-rule action in this way, and this is what Bison actually does to implement mid-rule actions. -@node Declarations, Multiple Parsers, Semantics, Grammar File +@node Locations, Declarations, Semantics, Grammar File +@section Tracking Locations +@cindex location +@cindex textual position +@cindex position, textual + +Though grammar rules and semantic actions are enough to write a fully +functional parser, it can be useful to process some additionnal informations, +especially locations of tokens and groupings. + +The way locations are handled is defined by providing a data type, and actions +to take when rules are matched. + +@menu +* Location Type:: Specifying a data type for locations. +* Actions and Locations:: Using locations in actions. +* Location Default Action:: Defining a general way to compute locations. +@end menu + +@node Location Type, Actions and Locations, , Locations +@subsection Data Type of Locations +@cindex data type of locations +@cindex default location type + +Defining a data type for locations is much simpler than for semantic values, +since all tokens and groupings always use the same type. + +The type of locations is specified by defining a macro called @code{YYLTYPE}. +When @code{YYLTYPE} is not defined, Bison uses a default structure type with +four members: + +@example +struct +@{ + int first_line; + int first_column; + int last_line; + int last_column; +@} +@end example + +@node Actions and Locations, Location Default Action, Location Type, Locations +@subsection Actions and Locations +@cindex location actions +@cindex actions, location +@vindex @@$ +@vindex @@@var{n} + +Actions are not only useful for defining language semantics, but also for +describing the behavior of the output parser with locations. + +The most obvious way for building locations of syntactic groupings is very +similar to the way semantic values are computed. In a given rule, several +constructs can be used to access the locations of the elements being matched. +The location of the @var{n}th component of the right hand side is +@code{@@@var{n}}, while the location of the left hand side grouping is +@code{@@$}. + +Here is a simple example using the default data type for locations: + +@example +@group +exp: @dots{} + | exp '+' exp + @{ + @@$.last_column = @@3.last_column; + @@$.last_line = @@3.last_line; + $$ = $1 + $3; + @} +@end group +@end example + +@noindent +In the example above, there is no need to set the beginning of @code{@@$}. The +output parser always sets @code{@@$} to @code{@@1} before executing the C +code of a given action, whether you provide a processing for locations or not. + +@node Location Default Action, , Actions and Locations, Locations +@subsection Default Action for Locations +@vindex YYLLOC_DEFAULT + +Actually, actions are not the best place to compute locations. Since locations +are much more general than semantic values, there is room in the output parser +to define a default action to take for each rule. The @code{YYLLOC_DEFAULT} +macro is called each time a rule is matched, before the associated action is +run. + +@c Documentation for the old (?) YYLLOC_DEFAULT + +This macro takes two parameters, the first one being the location of the +grouping (the result of the computation), and the second one being the +location of the last element matched. Of course, before @code{YYLLOC_DEFAULT} +is run, the result is set to the location of the first component matched. + +By default, this macro computes a location that ranges from the beginning of +the first element to the end of the last element. It is defined this way: + +@example +@group +#define YYLLOC_DEFAULT(Current, Last) \ + Current.last_line = Last.last_line; \ + Current.last_column = Last.last_column; +@end group +@end example + +@c not Documentation for the old (?) YYLLOC_DEFAULT + +@noindent + +Most of the time, the default action for locations is general enough to +suppress location dedicated code from most actions. + +@node Declarations, Multiple Parsers, Locations, Grammar File @section Bison Declarations @cindex declarations, Bison @cindex Bison declarations @@ -2875,9 +3027,10 @@ Bison will convert this into a @code{#define} directive in the parser, so that the function @code{yylex} (if it is in this file) can use the name @var{name} to stand for this token type's code. -Alternatively, you can use @code{%left}, @code{%right}, or @code{%nonassoc} -instead of @code{%token}, if you wish to specify associativity and precedence. -@xref{Precedence Decl, ,Operator Precedence}. +Alternatively, you can use @code{%left}, @code{%right}, or +@code{%nonassoc} instead of @code{%token}, if you wish to specify +associativity and precedence. @xref{Precedence Decl, ,Operator +Precedence}. You can explicitly specify the numeric code for a token type by appending an integer value in the field immediately following the token name: @@ -3110,8 +3263,8 @@ may override this restriction with the @code{%start} declaration as follows: A @dfn{reentrant} program is one which does not alter in the course of execution; in other words, it consists entirely of @dfn{pure} (read-only) code. Reentrancy is important whenever asynchronous execution is possible; -for example, a nonreentrant program may not be safe to call from a signal -handler. In systems with multiple threads of control, a nonreentrant +for example, a non-reentrant program may not be safe to call from a signal +handler. In systems with multiple threads of control, a non-reentrant program must be called only within interlocks. Normally, Bison generates a parser which is not reentrant. This is @@ -3176,14 +3329,37 @@ Declare the type of semantic values for a nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}). @item %start -Specify the grammar's start symbol (@pxref{Start Decl, ,The Start-Symbol}). +Specify the grammar's start symbol (@pxref{Start Decl, ,The +Start-Symbol}). @item %expect Declare the expected number of shift-reduce conflicts (@pxref{Expect Decl, ,Suppressing Conflict Warnings}). +@item %yacc +@itemx %fixed_output_files +Pretend the option @option{--yacc} was given, i.e., imitate Yacc, +including its naming conventions. @xref{Bison Options}, for more. + +@item %locations +Generate the code processing the locations (@pxref{Action Features, +,Special Features for Use in Actions}). This mode is enabled as soon as +the grammar uses the special @samp{@@@var{n}} tokens, but if your +grammar does not use it, using @samp{%locations} allows for more +accurate parse error messages. + @item %pure_parser -Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure +(Reentrant) Parser}). + +@item %no_parser +Do not include any C code in the parser file; generate tables only. The +parser file contains just @code{#define} directives and static variable +declarations. + +This option also tells Bison to write the C code for the grammar actions +into a file named @file{@var{filename}.act}, in the form of a +brace-surrounded body fit for a @code{switch} statement. @item %no_lines Don't generate any @code{#line} preprocessor commands in the parser @@ -3193,12 +3369,38 @@ your source file (the grammar file). This directive causes them to associate errors with the parser file, treating it an independent source file in its own right. -@item %raw -The output file @file{@var{name}.h} normally defines the tokens with -Yacc-compatible token numbers. If this option is specified, the -internal Bison numbers are used instead. (Yacc-compatible numbers start -at 257 except for single-character tokens; Bison assigns token numbers -sequentially for all tokens starting at 3.) +@item %debug +Output a definition of the macro @code{YYDEBUG} into the parser file, so +that the debugging facilities are compiled. @xref{Debugging, ,Debugging +Your Parser}. + +@item %defines +Write an extra output file containing macro definitions for the token +type names defined in the grammar and the semantic value type +@code{YYSTYPE}, as well as a few @code{extern} variable declarations. + +If the parser output file is named @file{@var{name}.c} then this file +is named @file{@var{name}.h}.@refill + +This output file is essential if you wish to put the definition of +@code{yylex} in a separate source file, because @code{yylex} needs to +be able to refer to token type codes and the variable +@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill + +@item %verbose +Write an extra output file containing verbose descriptions of the +parser states and what is done for each type of look-ahead token in +that state. + +This file also describes all the conflicts, both those resolved by +operator precedence and the unresolved ones. + +The file's name is made by removing @samp{.tab.c} or @samp{.c} from +the parser output file name, and adding @samp{.output} instead.@refill + +Therefore, if the input file is @file{foo.y}, then the parser file is +called @file{foo.tab.c} by default. As a consequence, the verbose +output file is called @file{foo.output}.@refill @item %token_table Generate an array of token names in the parser file. The name of the @@ -3293,8 +3495,8 @@ C code in the grammar file, you are likely to run into trouble. You call the function @code{yyparse} to cause parsing to occur. This function reads tokens, executes actions, and ultimately returns when it encounters end-of-input or an unrecoverable syntax error. You can also -write an action which directs @code{yyparse} to return immediately without -reading further. +write an action which directs @code{yyparse} to return immediately +without reading further. The value returned by @code{yyparse} is 0 if parsing was successful (return is due to end-of-input). @@ -3424,7 +3626,7 @@ The @code{yytname} table is generated only if you use the @subsection Semantic Values of Tokens @vindex yylval -In an ordinary (nonreentrant) parser, the semantic value of the token must +In an ordinary (non-reentrant) parser, the semantic value of the token must be stored into the global variable @code{yylval}. When you are using just one data type for semantic values, @code{yylval} has that type. Thus, if the type is @code{int} (the default), you might write this in @@ -3470,16 +3672,19 @@ then the code in @code{yylex} might look like this: @subsection Textual Positions of Tokens @vindex yylloc -If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, ,Special Features for Use in Actions}) in -actions to keep track of the textual locations of tokens and groupings, -then you must provide this information in @code{yylex}. The function -@code{yyparse} expects to find the textual location of a token just parsed -in the global variable @code{yylloc}. So @code{yylex} must store the -proper data in that variable. The value of @code{yylloc} is a structure -and you need only initialize the members that are going to be used by the -actions. The four members are called @code{first_line}, -@code{first_column}, @code{last_line} and @code{last_column}. Note that -the use of this feature makes the parser noticeably slower. +If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , +Tracking Locations}) in actions to keep track of the +textual locations of tokens and groupings, then you must provide this +information in @code{yylex}. The function @code{yyparse} expects to +find the textual location of a token just parsed in the global variable +@code{yylloc}. So @code{yylex} must store the proper data in that +variable. + +By default, the value of @code{yylloc} is a structure and you need only +initialize the members that are going to be used by the actions. The +four members are called @code{first_line}, @code{first_column}, +@code{last_line} and @code{last_column}. Note that the use of this +feature makes the parser noticeably slower. @tindex YYLTYPE The data type of @code{yylloc} has the name @code{YYLTYPE}. @@ -3601,7 +3806,8 @@ with no arguments, as usual. The Bison parser detects a @dfn{parse error} or @dfn{syntax error} whenever it reads a token which cannot satisfy any syntax rule. An action in the grammar can also explicitly proclaim an error, using the -macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use in Actions}). +macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use +in Actions}). The Bison parser expects to report the error by calling an error reporting function named @code{yyerror}, which you must supply. It is @@ -3611,10 +3817,11 @@ receives one argument. For a parse error, the string is normally @findex YYERROR_VERBOSE If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations -section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose -and specific error message string instead of just plain @w{@code{"parse -error"}}. It doesn't matter what definition you use for -@code{YYERROR_VERBOSE}, just whether you define it. +section (@pxref{Bison Declarations, ,The Bison Declarations Section}), +then Bison provides a more verbose and specific error message string +instead of just plain @w{@code{"parse error"}}. It doesn't matter what +definition you use for @code{YYERROR_VERBOSE}, just whether you define +it. The parser can detect one other kind of error: stack overflow. This happens when the input contains constructions that are very deeply @@ -3730,28 +3937,37 @@ Resume generating error messages immediately for subsequent syntax errors. This is useful primarily in error rules. @xref{Error Recovery}. -@item @@@var{n} -@findex @@@var{n} -Acts like a structure variable containing information on the line -numbers and column numbers of the @var{n}th component of the current -rule. The structure has four members, like this: +@item @@$ +@findex @@$ +Acts like a structure variable containing information on the textual position +of the grouping made by the current rule. @xref{Locations, , +Tracking Locations}. -@example -struct @{ - int first_line, last_line; - int first_column, last_column; -@}; -@end example +@c Check if those paragraphs are still useful or not. + +@c @example +@c struct @{ +@c int first_line, last_line; +@c int first_column, last_column; +@c @}; +@c @end example -Thus, to get the starting line number of the third component, you would -use @samp{@@3.first_line}. +@c Thus, to get the starting line number of the third component, you would +@c use @samp{@@3.first_line}. -In order for the members of this structure to contain valid information, -you must make @code{yylex} supply this information about each token. -If you need only certain members, then @code{yylex} need only fill in -those members. +@c In order for the members of this structure to contain valid information, +@c you must make @code{yylex} supply this information about each token. +@c If you need only certain members, then @code{yylex} need only fill in +@c those members. + +@c The use of this feature makes the parser noticeably slower. + +@item @@@var{n} +@findex @@@var{n} +Acts like a structure variable containing information on the textual position +of the @var{n}th component of the current rule. @xref{Locations, , +Tracking Locations}. -The use of this feature makes the parser noticeably slower. @end table @node Algorithm, Error Recovery, Interface, Top @@ -4008,33 +4224,33 @@ expr: expr '-' expr @noindent Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2}; -should it reduce them via the rule for the subtraction operator? It depends -on the next token. Of course, if the next token is @samp{)}, we must -reduce; shifting is invalid because no single rule can reduce the token -sequence @w{@samp{- 2 )}} or anything starting with that. But if the next -token is @samp{*} or @samp{<}, we have a choice: either shifting or -reduction would allow the parse to complete, but with different -results. - -To decide which one Bison should do, we must consider the -results. If the next operator token @var{op} is shifted, then it -must be reduced first in order to permit another opportunity to -reduce the difference. The result is (in effect) @w{@samp{1 - (2 -@var{op} 3)}}. On the other hand, if the subtraction is reduced -before shifting @var{op}, the result is @w{@samp{(1 - 2) @var{op} -3}}. Clearly, then, the choice of shift or reduce should depend -on the relative precedence of the operators @samp{-} and -@var{op}: @samp{*} should be shifted first, but not @samp{<}. +should it reduce them via the rule for the subtraction operator? It +depends on the next token. Of course, if the next token is @samp{)}, we +must reduce; shifting is invalid because no single rule can reduce the +token sequence @w{@samp{- 2 )}} or anything starting with that. But if +the next token is @samp{*} or @samp{<}, we have a choice: either +shifting or reduction would allow the parse to complete, but with +different results. + +To decide which one Bison should do, we must consider the results. If +the next operator token @var{op} is shifted, then it must be reduced +first in order to permit another opportunity to reduce the difference. +The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other +hand, if the subtraction is reduced before shifting @var{op}, the result +is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or +reduce should depend on the relative precedence of the operators +@samp{-} and @var{op}: @samp{*} should be shifted first, but not +@samp{<}. @cindex associativity What about input such as @w{@samp{1 - 2 - 5}}; should this be -@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For -most operators we prefer the former, which is called @dfn{left -association}. The latter alternative, @dfn{right association}, is -desirable for assignment operators. The choice of left or right -association is a matter of whether the parser chooses to shift or -reduce when the stack contains @w{@samp{1 - 2}} and the look-ahead -token is @samp{-}: shifting makes right-associativity. +@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most +operators we prefer the former, which is called @dfn{left association}. +The latter alternative, @dfn{right association}, is desirable for +assignment operators. The choice of left or right association is a +matter of whether the parser chooses to shift or reduce when the stack +contains @w{@samp{1 - 2}} and the look-ahead token is @samp{-}: shifting +makes right-associativity. @node Using Precedence, Precedence Examples, Why Precedence, Precedence @subsection Specifying Operator Precedence @@ -4626,10 +4842,10 @@ Unfortunately, the name being declared is separated from the declaration construct itself by a complicated syntactic structure---the ``declarator''. As a result, part of the Bison parser for C needs to be duplicated, with -all the nonterminal names changed: once for parsing a declaration in which -a typedef name can be redefined, and once for parsing a declaration in -which that can't be done. Here is a part of the duplication, with actions -omitted for brevity: +all the nonterminal names changed: once for parsing a declaration in +which a typedef name can be redefined, and once for parsing a +declaration in which that can't be done. Here is a part of the +duplication, with actions omitted for brevity: @example initdcl: @@ -4864,7 +5080,26 @@ Here @var{infile} is the grammar file name, which usually ends in @samp{.y}. The parser file's name is made by replacing the @samp{.y} with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields @file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields -@file{hack/foo.tab.c}.@refill +@file{hack/foo.tab.c}. It's is also possible, in case you are writting +C++ code instead of C in your grammar file, to name it @file{foo.ypp} +or @file{foo.y++}. Then, the output files will take an extention like +the given one as input (repectively @file{foo.tab.cpp} and @file{foo.tab.c++}). +This feature takes effect with all options that manipulate filenames like +@samp{-o} or @samp{-d}. + +For example : + +@example +bison -d @var{infile.yxx} +@end example +will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}. and + +@example +bison -d @var{infile.y} -o @var{output.c++} +@end example +will produce @file{output.c++} and @file{outfile.h++}. + +@refill @menu * Bison Options:: All the options described in detail, @@ -4888,50 +5123,50 @@ Here is a list of options that can be used with Bison, alphabetized by short option. It is followed by a cross key alphabetized by long option. -@table @samp -@item -b @var{file-prefix} -@itemx --file-prefix=@var{prefix} -Specify a prefix to use for all Bison output file names. The names are -chosen as if the input file were named @file{@var{prefix}.c}. - -@item -d -@itemx --defines -Write an extra output file containing macro definitions for the token -type names defined in the grammar and the semantic value type -@code{YYSTYPE}, as well as a few @code{extern} variable declarations. +@c Please, keep this ordered as in `bison --help'. +@noindent +Operations modes: +@table @option +@item -h +@itemx --help +Print a summary of the command-line options to Bison and exit. -If the parser output file is named @file{@var{name}.c} then this file -is named @file{@var{name}.h}.@refill +@item -V +@itemx --version +Print the version number of Bison and exit. -This output file is essential if you wish to put the definition of -@code{yylex} in a separate source file, because @code{yylex} needs to -be able to refer to token type codes and the variable -@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill +@need 1750 +@item -y +@itemx --yacc +@itemx --fixed-output-files +Equivalent to @samp{-o y.tab.c}; the parser output file is called +@file{y.tab.c}, and the other outputs are called @file{y.output} and +@file{y.tab.h}. The purpose of this option is to imitate Yacc's output +file name conventions. Thus, the following shell script can substitute +for Yacc:@refill -@item -l -@itemx --no-lines -Don't put any @code{#line} preprocessor commands in the parser file. -Ordinarily Bison puts them in the parser file so that the C compiler -and debuggers will associate errors with your source file, the -grammar file. This option causes them to associate errors with the -parser file, treating it as an independent source file in its own right. +@example +bison -y $* +@end example +@end table -@item -n -@itemx --no-parser -Do not include any C code in the parser file; generate tables only. The -parser file contains just @code{#define} directives and static variable -declarations. +@noindent +Tuning the parser: -This option also tells Bison to write the C code for the grammar actions -into a file named @file{@var{filename}.act}, in the form of a -brace-surrounded body fit for a @code{switch} statement. +@table @option +@item -S @var{file} +@itemx --skeleton=@var{file} +Specify the skeleton to use. You probably don't need this option unless +you are developing Bison. -@item -o @var{outfile} -@itemx --output-file=@var{outfile} -Specify the name @var{outfile} for the parser file. +@item -t +@itemx --debug +Output a definition of the macro @code{YYDEBUG} into the parser file, so +that the debugging facilities are compiled. @xref{Debugging, ,Debugging +Your Parser}. -The other output files' names are constructed from @var{outfile} -as described under the @samp{-v} and @samp{-d} options. +@item --locations +Pretend that @code{%locactions} was specified. @xref{Decl Summary}. @item -p @var{prefix} @itemx --name-prefix=@var{prefix} @@ -4945,52 +5180,51 @@ For example, if you use @samp{-p c}, the names become @code{cparse}, @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. -@item -r -@itemx --raw -Pretend that @code{%raw} was specified. @xref{Decl Summary}. - -@item -t -@itemx --debug -Output a definition of the macro @code{YYDEBUG} into the parser file, -so that the debugging facilities are compiled. @xref{Debugging, ,Debugging Your Parser}. +@item -l +@itemx --no-lines +Don't put any @code{#line} preprocessor commands in the parser file. +Ordinarily Bison puts them in the parser file so that the C compiler +and debuggers will associate errors with your source file, the +grammar file. This option causes them to associate errors with the +parser file, treating it as an independent source file in its own right. -@item -v -@itemx --verbose -Write an extra output file containing verbose descriptions of the -parser states and what is done for each type of look-ahead token in -that state. +@item -n +@itemx --no-parser +Pretend that @code{%no_parser} was specified. @xref{Decl Summary}. -This file also describes all the conflicts, both those resolved by -operator precedence and the unresolved ones. +@item -k +@itemx --token-table +Pretend that @code{%token_table} was specified. @xref{Decl Summary}. +@end table -The file's name is made by removing @samp{.tab.c} or @samp{.c} from -the parser output file name, and adding @samp{.output} instead.@refill +@noindent +Adjust the output: -Therefore, if the input file is @file{foo.y}, then the parser file is -called @file{foo.tab.c} by default. As a consequence, the verbose -output file is called @file{foo.output}.@refill +@table @option +@item -d +@itemx --defines +Pretend that @code{%verbose} was specified, i.e., write an extra output +file containing macro definitions for the token type names defined in +the grammar and the semantic value type @code{YYSTYPE}, as well as a few +@code{extern} variable declarations. @xref{Decl Summary}. -@item -V -@itemx --version -Print the version number of Bison and exit. +@item -b @var{file-prefix} +@itemx --file-prefix=@var{prefix} +Specify a prefix to use for all Bison output file names. The names are +chosen as if the input file were named @file{@var{prefix}.c}. -@item -h -@itemx --help -Print a summary of the command-line options to Bison and exit. +@item -v +@itemx --verbose +Pretend that @code{%verbose} was specified, i.e, write an extra output +file containing verbose descriptions of the grammar and +parser. @xref{Decl Summary}, for more. -@need 1750 -@item -y -@itemx --yacc -@itemx --fixed-output-files -Equivalent to @samp{-o y.tab.c}; the parser output file is called -@file{y.tab.c}, and the other outputs are called @file{y.output} and -@file{y.tab.h}. The purpose of this option is to imitate Yacc's output -file name conventions. Thus, the following shell script can substitute -for Yacc:@refill +@item -o @var{outfile} +@itemx --output-file=@var{outfile} +Specify the name @var{outfile} for the parser file. -@example -bison -y $* -@end example +The other output files' names are constructed from @var{outfile} +as described under the @samp{-v} and @samp{-d} options. @end table @node Environment Variables, Option Cross Key, Bison Options, Invocation @@ -5037,7 +5271,6 @@ the corresponding short option. \line{ --no-lines \leaderfill -l} \line{ --no-parser \leaderfill -n} \line{ --output-file \leaderfill -o} -\line{ --raw \leaderfill -r} \line{ --token-table \leaderfill -k} \line{ --verbose \leaderfill -v} \line{ --version \leaderfill -V} @@ -5056,7 +5289,6 @@ the corresponding short option. --no-lines -l --no-parser -n --output-file=@var{outfile} -o @var{outfile} ---raw -r --token-table -k --verbose -v --version -V @@ -5110,7 +5342,8 @@ token is reset to the token that originally caused the violation. @item YYABORT Macro to pretend that an unrecoverable syntax error has occurred, by making @code{yyparse} return 1 immediately. The error reporting -function @code{yyerror} is not called. @xref{Parser Function, ,The Parser Function @code{yyparse}}. +function @code{yyerror} is not called. @xref{Parser Function, ,The +Parser Function @code{yyparse}}. @item YYACCEPT Macro to pretend that a complete utterance of the language has been @@ -5143,7 +5376,7 @@ Conventions for Pure Parsers}. @item YYLTYPE Macro for the data type of @code{yylloc}; a structure with four -members. @xref{Token Positions, ,Textual Positions of Tokens}. +members. @xref{Location Type, , Data Types of Locations}. @item yyltype Default value for YYLTYPE. @@ -5216,6 +5449,13 @@ Global variable which Bison increments each time there is a parse error. The parser function produced by Bison; call this function to start parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}. +@item %debug +Equip the parser for debugging. @xref{Decl Summary}. + +@item %defines +Bison declaration to create a header file meant for the scanner. +@xref{Decl Summary}. + @item %left Bison declaration to assign left associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @@ -5225,7 +5465,7 @@ Bison declaration to avoid generating @code{#line} directives in the parser file. @xref{Decl Summary}. @item %nonassoc -Bison declaration to assign nonassociativity to token(s). +Bison declaration to assign non-associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @item %prec @@ -5236,11 +5476,6 @@ Bison declaration to assign a precedence to a specific rule. Bison declaration to request a pure (reentrant) parser. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. -@item %raw -Bison declaration to use Bison internal token code numbers in token -tables instead of the usual Yacc-compatible token code numbers. -@xref{Decl Summary}. - @item %right Bison declaration to assign right associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @@ -5273,15 +5508,17 @@ Bison declarations section or the additional C code section. @xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}. @item %@{ %@} -All code listed between @samp{%@{} and @samp{%@}} is copied directly -to the output file uninterpreted. Such code forms the ``C -declarations'' section of the input file. @xref{Grammar Outline, ,Outline of a Bison Grammar}. +All code listed between @samp{%@{} and @samp{%@}} is copied directly to +the output file uninterpreted. Such code forms the ``C declarations'' +section of the input file. @xref{Grammar Outline, ,Outline of a Bison +Grammar}. @item /*@dots{}*/ Comment delimiters, as in C. @item : -Separates a rule's result from its components. @xref{Rules, ,Syntax of Grammar Rules}. +Separates a rule's result from its components. @xref{Rules, ,Syntax of +Grammar Rules}. @item ; Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}. @@ -5298,13 +5535,15 @@ Separates alternate rules for the same result nonterminal. @table @asis @item Backus-Naur Form (BNF) Formal method of specifying context-free grammars. BNF was first used -in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, +,Languages and Context-Free Grammars}. @item Context-free grammars Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an expression, integers are allowed @emph{anywhere} an expression is -permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +permitted. @xref{Language and Grammar, ,Languages and Context-Free +Grammars}. @item Dynamic allocation Allocation of memory that occurs during execution, rather than at @@ -5345,8 +5584,9 @@ Operators having left associativity are analyzed from left to right: @samp{c}. @xref{Precedence, ,Operator Precedence}. @item Left recursion -A rule whose result symbol is also its first component symbol; -for example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive Rules}. +A rule whose result symbol is also its first component symbol; for +example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive +Rules}. @item Left-to-right parsing Parsing a sentence of a language by analyzing it token by token from @@ -5361,11 +5601,11 @@ A flag, set by actions in the grammar rules, which alters the way tokens are parsed. @xref{Lexical Tie-ins}. @item Literal string token -A token which consists of two or more fixed characters. -@xref{Symbols}. +A token which consists of two or more fixed characters. @xref{Symbols}. @item Look-ahead token -A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}. +A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead +Tokens}. @item LALR(1) The class of context-free grammars that Bison (like most other parser @@ -5396,7 +5636,8 @@ performs some operation. @item Reduction Replacing a string of nonterminals and/or terminals with a single -nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison Parser Algorithm }. +nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison +Parser Algorithm }. @item Reentrant A reentrant subprogram is a subprogram which can be in invoked any @@ -5407,8 +5648,9 @@ invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. A language in which all operators are postfix operators. @item Right recursion -A rule whose result symbol is also its last component symbol; -for example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive Rules}. +A rule whose result symbol is also its last component symbol; for +example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive +Rules}. @item Semantics In computer languages, the semantics are specified by the actions @@ -5442,9 +5684,9 @@ The input of the Bison parser is a stream of tokens which comes from the lexical analyzer. @xref{Symbols}. @item Terminal symbol -A grammar symbol that has no rules in the grammar and therefore -is grammatically indivisible. The piece of text it represents -is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +A grammar symbol that has no rules in the grammar and therefore is +grammatically indivisible. The piece of text it represents is a token. +@xref{Language and Grammar, ,Languages and Context-Free Grammars}. @end table @node Index, , Glossary, Top @@ -5452,6 +5694,4 @@ is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. @printindex cp -@contents - @bye