X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/cd5bd6ac4f0ee29eec5efd5c4659a29c37527c23..b77b9ee0a338cdcab3b5ab774cc8418f42e6d19f:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index c3783fb0..730d4b3a 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -758,6 +758,7 @@ use Bison or Yacc, we suggest you start by reading this chapter carefully. a semantic value (the value of an integer, the name of an identifier, etc.). * Semantic Actions:: Each rule can have an action containing C code. +* Locations Overview:: Tracking Locations. * Bison Parser:: What are Bison's input and output, how is the output used? * Stages:: Stages in writing and running Bison grammars. @@ -960,7 +961,7 @@ semantic value that is a number. In a compiler for a programming language, an expression typically has a semantic value that is a tree structure describing the meaning of the expression. -@node Semantic Actions, Bison Parser, Semantic Values, Concepts +@node Semantic Actions, Locations Overview, Semantic Values, Concepts @section Semantic Actions @cindex semantic actions @cindex actions, semantic @@ -991,7 +992,36 @@ expr: expr '+' expr @{ $$ = $1 + $3; @} The action says how to produce the semantic value of the sum expression from the values of the two subexpressions. -@node Bison Parser, Stages, Semantic Actions, Concepts +@node Locations Overview, Bison Parser, Semantic Actions, Concepts +@section Locations +@cindex location +@cindex textual position +@cindex position, textual + +Many applications, like interpreters or compilers, have to produce verbose +and useful error messages. To achieve this, one must be able to keep track of +the @dfn{textual position}, or @dfn{location}, of each syntactic construct. +Bison provides a mechanism for handling these locations. + +Each token has a semantic value. In a similar fashion, each token has an +associated location, but the type of locations is the same for all tokens and +groupings. Moreover, the output parser is equipped with a default data +structure for storing locations (@pxref{Locations}, for more details). + +Like semantic values, locations can be reached in actions using a dedicated +set of constructs. In the example above, the location of the whole grouping +is @code{@@$}, while the locations of the subexpressions are @code{@@1} and +@code{@@3}. + +When a rule is matched, a default action is used to compute the semantic value +of its left hand side (@pxref{Actions}). In the same way, another default +action is used for locations. However, the action for locations is general +enough for most cases, meaning there is usually no need to describe for each +rule how @code{@@$} should be formed. When building a new location for a given +grouping, the default behavior of the output parser is to take the beginning +of the first symbol, and the end of the last symbol. + +@node Bison Parser, Stages, Locations Overview, Concepts @section Bison Output: the Parser File @cindex Bison parser @cindex Bison utility @@ -1859,17 +1889,20 @@ The symbol table itself consists of a linked list of records. Its definition, which is kept in the header @file{calc.h}, is as follows. It provides for either functions or variables to be placed in the table. -@c FIXME: ANSIfy the prototypes for FNCTPTR etc. @smallexample @group +/* Fonctions type. */ +typedef double (*func_t) (double); + /* Data type for links in the chain of symbols. */ struct symrec @{ char *name; /* name of symbol */ int type; /* type of symbol: either VAR or FNCT */ - union @{ - double var; /* value of a VAR */ - double (*fnctptr)(); /* value of a FNCT */ + union + @{ + double var; /* value of a VAR */ + func_t fnctptr; /* value of a FNCT */ @} value; struct symrec *next; /* link field */ @}; @@ -1881,8 +1914,8 @@ typedef struct symrec symrec; /* The symbol table: a chain of `struct symrec'. */ extern symrec *sym_table; -symrec *putsym (); -symrec *getsym (); +symrec *putsym (const char *, func_t); +symrec *getsym (const char *); @end group @end smallexample @@ -1912,24 +1945,24 @@ yyerror (const char *s) /* Called by yyparse on error */ struct init @{ char *fname; - double (*fnct)(); + double (*fnct)(double); @}; @end group @group struct init arith_fncts[] = @{ - "sin", sin, - "cos", cos, + "sin", sin, + "cos", cos, "atan", atan, - "ln", log, - "exp", exp, + "ln", log, + "exp", exp, "sqrt", sqrt, 0, 0 @}; /* The symbol table: a chain of `struct symrec'. */ -symrec *sym_table = (symrec *)0; +symrec *sym_table = (symrec *) 0; @end group @group @@ -2109,6 +2142,7 @@ Bison takes as input a context-free grammar specification and produces a C-language function that recognizes correct instances of the grammar. The Bison grammar input file conventionally has a name ending in @samp{.y}. +@xref{Invocation, ,Invoking Bison}. @menu * Grammar Outline:: Overall layout of the grammar file. @@ -2116,6 +2150,7 @@ The Bison grammar input file conventionally has a name ending in @samp{.y}. * Rules:: How to write grammar rules. * Recursion:: Writing recursive rules. * Semantics:: Semantic values and actions. +* Locations:: Locations and actions. * Declarations:: All kinds of Bison declarations are described here. * Multiple Parsers:: Putting more than one Bison parser in one program. @end menu @@ -2481,7 +2516,7 @@ primary: constant defines two mutually-recursive nonterminals, since each refers to the other. -@node Semantics, Declarations, Recursion, Grammar File +@node Semantics, Locations, Recursion, Grammar File @section Defining Language Semantics @cindex defining language semantics @cindex language semantics, defining @@ -2695,8 +2730,8 @@ The mid-rule action can also have a semantic value. The action can set its value with an assignment to @code{$$}, and actions later in the rule can refer to the value using @code{$@var{n}}. Since there is no symbol to name the action, there is no way to declare a data type for the value -in advance, so you must use the @samp{$<@dots{}>} construct to specify a -data type each time you refer to this value. +in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to +specify a data type each time you refer to this value. There is no way to set the value of the entire rule with a mid-rule action, because assignments to @code{$$} do not have that effect. The @@ -2834,7 +2869,161 @@ the action is now at the end of its rule. Any mid-rule action can be converted to an end-of-rule action in this way, and this is what Bison actually does to implement mid-rule actions. -@node Declarations, Multiple Parsers, Semantics, Grammar File +@node Locations, Declarations, Semantics, Grammar File +@section Tracking Locations +@cindex location +@cindex textual position +@cindex position, textual + +Though grammar rules and semantic actions are enough to write a fully +functional parser, it can be useful to process some additionnal informations, +especially symbol locations. + +@c (terminal or not) ? + +The way locations are handled is defined by providing a data type, and actions +to take when rules are matched. + +@menu +* Location Type:: Specifying a data type for locations. +* Actions and Locations:: Using locations in actions. +* Location Default Action:: Defining a general way to compute locations. +@end menu + +@node Location Type, Actions and Locations, , Locations +@subsection Data Type of Locations +@cindex data type of locations +@cindex default location type + +Defining a data type for locations is much simpler than for semantic values, +since all tokens and groupings always use the same type. + +The type of locations is specified by defining a macro called @code{YYLTYPE}. +When @code{YYLTYPE} is not defined, Bison uses a default structure type with +four members: + +@example +struct +@{ + int first_line; + int first_column; + int last_line; + int last_column; +@} +@end example + +@node Actions and Locations, Location Default Action, Location Type, Locations +@subsection Actions and Locations +@cindex location actions +@cindex actions, location +@vindex @@$ +@vindex @@@var{n} + +Actions are not only useful for defining language semantics, but also for +describing the behavior of the output parser with locations. + +The most obvious way for building locations of syntactic groupings is very +similar to the way semantic values are computed. In a given rule, several +constructs can be used to access the locations of the elements being matched. +The location of the @var{n}th component of the right hand side is +@code{@@@var{n}}, while the location of the left hand side grouping is +@code{@@$}. + +Here is a basic example using the default data type for locations: + +@example +@group +exp: @dots{} + | exp '/' exp + @{ + @@$.first_column = @@1.first_column; + @@$.first_line = @@1.first_line; + @@$.last_column = @@3.last_column; + @@$.last_line = @@3.last_line; + if ($3) + $$ = $1 / $3; + else + @{ + $$ = 1; + printf("Division by zero, l%d,c%d-l%d,c%d", + @@3.first_line, @@3.first_column, + @@3.last_line, @@3.last_column); + @} + @} +@end group +@end example + +As for semantic values, there is a default action for locations that is +run each time a rule is matched. It sets the beginning of @code{@@$} to the +beginning of the first symbol, and the end of @code{@@$} to the end of the +last symbol. + +With this default action, the location tracking can be fully automatic. The +example above simply rewrites this way: + +@example +@group +exp: @dots{} + | exp '/' exp + @{ + if ($3) + $$ = $1 / $3; + else + @{ + $$ = 1; + printf("Division by zero, l%d,c%d-l%d,c%d", + @@3.first_line, @@3.first_column, + @@3.last_line, @@3.last_column); + @} + @} +@end group +@end example + +@node Location Default Action, , Actions and Locations, Locations +@subsection Default Action for Locations +@vindex YYLLOC_DEFAULT + +Actually, actions are not the best place to compute locations. Since locations +are much more general than semantic values, there is room in the output parser +to redefine the default action to take for each rule. The +@code{YYLLOC_DEFAULT} macro is called each time a rule is matched, before the +associated action is run. + +Most of the time, this macro is general enough to suppress location +dedicated code from semantic actions. + +The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is +the location of the grouping (the result of the computation). The second one +is an array holding locations of all right hand side elements of the rule +being matched. The last one is the size of the right hand side rule. + +By default, it is defined this way: + +@example +@group +#define YYLLOC_DEFAULT(Current, Rhs, N) \ + Current.last_line = Rhs[N].last_line; \ + Current.last_column = Rhs[N].last_column; +@end group +@end example + +When defining @code{YYLLOC_DEFAULT}, you should consider that: + +@itemize @bullet +@item +All arguments are free of side-effects. However, only the first one (the +result) should be modified by @code{YYLLOC_DEFAULT}. + +@item +Before @code{YYLLOC_DEFAULT} is executed, the output parser sets @code{@@$} +to @code{@@1}. + +@item +For consistency with semantic actions, valid indexes for the location array +range from 1 to @var{n}. +@end itemize + +@node Declarations, Multiple Parsers, Locations, Grammar File @section Bison Declarations @cindex declarations, Bison @cindex Bison declarations @@ -3255,13 +3444,6 @@ Therefore, if the input file is @file{foo.y}, then the parser file is called @file{foo.tab.c} by default. As a consequence, the verbose output file is called @file{foo.output}.@refill -@item %raw -The output file @file{@var{name}.h} normally defines the tokens with -Yacc-compatible token numbers. If this option is specified, the -internal Bison numbers are used instead. (Yacc-compatible numbers start -at 257 except for single-character tokens; Bison assigns token numbers -sequentially for all tokens starting at 3.) - @item %token_table Generate an array of token names in the parser file. The name of the array is @code{yytname}; @code{yytname[@var{i}]} is the name of the @@ -3532,13 +3714,15 @@ then the code in @code{yylex} might look like this: @subsection Textual Positions of Tokens @vindex yylloc -If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, -,Special Features for Use in Actions}) in actions to keep track of the +If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , +Tracking Locations}) in actions to keep track of the textual locations of tokens and groupings, then you must provide this information in @code{yylex}. The function @code{yyparse} expects to find the textual location of a token just parsed in the global variable @code{yylloc}. So @code{yylex} must store the proper data in that -variable. The value of @code{yylloc} is a structure and you need only +variable. + +By default, the value of @code{yylloc} is a structure and you need only initialize the members that are going to be used by the actions. The four members are called @code{first_line}, @code{first_column}, @code{last_line} and @code{last_column}. Note that the use of this @@ -3795,28 +3979,37 @@ Resume generating error messages immediately for subsequent syntax errors. This is useful primarily in error rules. @xref{Error Recovery}. -@item @@@var{n} -@findex @@@var{n} -Acts like a structure variable containing information on the line -numbers and column numbers of the @var{n}th component of the current -rule. The structure has four members, like this: +@item @@$ +@findex @@$ +Acts like a structure variable containing information on the textual position +of the grouping made by the current rule. @xref{Locations, , +Tracking Locations}. -@example -struct @{ - int first_line, last_line; - int first_column, last_column; -@}; -@end example +@c Check if those paragraphs are still useful or not. + +@c @example +@c struct @{ +@c int first_line, last_line; +@c int first_column, last_column; +@c @}; +@c @end example + +@c Thus, to get the starting line number of the third component, you would +@c use @samp{@@3.first_line}. + +@c In order for the members of this structure to contain valid information, +@c you must make @code{yylex} supply this information about each token. +@c If you need only certain members, then @code{yylex} need only fill in +@c those members. -Thus, to get the starting line number of the third component, you would -use @samp{@@3.first_line}. +@c The use of this feature makes the parser noticeably slower. -In order for the members of this structure to contain valid information, -you must make @code{yylex} supply this information about each token. -If you need only certain members, then @code{yylex} need only fill in -those members. +@item @@@var{n} +@findex @@@var{n} +Acts like a structure variable containing information on the textual position +of the @var{n}th component of the current rule. @xref{Locations, , +Tracking Locations}. -The use of this feature makes the parser noticeably slower. @end table @node Algorithm, Error Recovery, Interface, Top @@ -4929,7 +5122,27 @@ Here @var{infile} is the grammar file name, which usually ends in @samp{.y}. The parser file's name is made by replacing the @samp{.y} with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields @file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields -@file{hack/foo.tab.c}.@refill +@file{hack/foo.tab.c}. It's is also possible, in case you are writting +C++ code instead of C in your grammar file, to name it @file{foo.ypp} +or @file{foo.y++}. Then, the output files will take an extention like +the given one as input (repectively @file{foo.tab.cpp} and @file{foo.tab.c++}). +This feature takes effect with all options that manipulate filenames like +@samp{-o} or @samp{-d}. + +For example : + +@example +bison -d @var{infile.yxx} +@end example +@noindent +will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}. and + +@example +bison -d @var{infile.y} -o @var{output.c++} +@end example +@noindent +will produce @file{output.c++} and @file{outfile.h++}. + @menu * Bison Options:: All the options described in detail, @@ -5022,10 +5235,6 @@ parser file, treating it as an independent source file in its own right. @itemx --no-parser Pretend that @code{%no_parser} was specified. @xref{Decl Summary}. -@item -r -@itemx --raw -Pretend that @code{%raw} was specified. @xref{Decl Summary}. - @item -k @itemx --token-table Pretend that @code{%token_table} was specified. @xref{Decl Summary}. @@ -5105,7 +5314,6 @@ the corresponding short option. \line{ --no-lines \leaderfill -l} \line{ --no-parser \leaderfill -n} \line{ --output-file \leaderfill -o} -\line{ --raw \leaderfill -r} \line{ --token-table \leaderfill -k} \line{ --verbose \leaderfill -v} \line{ --version \leaderfill -V} @@ -5124,7 +5332,6 @@ the corresponding short option. --no-lines -l --no-parser -n --output-file=@var{outfile} -o @var{outfile} ---raw -r --token-table -k --verbose -v --version -V @@ -5212,7 +5419,7 @@ Conventions for Pure Parsers}. @item YYLTYPE Macro for the data type of @code{yylloc}; a structure with four -members. @xref{Token Positions, ,Textual Positions of Tokens}. +members. @xref{Location Type, , Data Types of Locations}. @item yyltype Default value for YYLTYPE. @@ -5312,11 +5519,6 @@ Bison declaration to assign a precedence to a specific rule. Bison declaration to request a pure (reentrant) parser. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. -@item %raw -Bison declaration to use Bison internal token code numbers in token -tables instead of the usual Yacc-compatible token code numbers. -@xref{Decl Summary}. - @item %right Bison declaration to assign right associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}.