X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/d5796688b17857ba9041aa65c67f5c2c1c88c3bc..19c50364f38b72dcd7bf67a998db3c815b7b6a94:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 5ec63108..da850821 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -318,7 +318,7 @@ chapters follow which describe specific aspects of Bison in detail. Bison was written primarily by Robert Corbett; Richard Stallman made it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added -multicharacter string literals and other features. +multi-character string literals and other features. This edition corresponds to version @value{VERSION} of Bison. @@ -1987,7 +1987,7 @@ getsym (const char *sym_name) The function @code{yylex} must now recognize variables, numeric values, and the single-character arithmetic operators. Strings of alphanumeric -characters with a leading nondigit are recognized as either variables or +characters with a leading non-digit are recognized as either variables or functions depending on what the symbol table says about them. The string is passed to @code{getsym} for look up in the symbol table. If @@ -2270,7 +2270,7 @@ for @code{yylex}}). A @dfn{literal string token} is written like a C string constant; for example, @code{"<="} is a literal string token. A literal string token doesn't need to be declared unless you need to specify its semantic -value data type (@pxref{Value Type}), associativity, precedence +value data type (@pxref{Value Type}), associativity, or precedence (@pxref{Precedence}). You can associate the literal string token with a symbolic name as an @@ -2544,10 +2544,11 @@ Specify the entire collection of possible data types, with the @code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}). @item -Choose one of those types for each symbol (terminal or nonterminal) -for which semantic values are used. This is done for tokens with the -@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) and for groupings -with the @code{%type} Bison declaration (@pxref{Type Decl, ,Nonterminal Symbols}). +Choose one of those types for each symbol (terminal or nonterminal) for +which semantic values are used. This is done for tokens with the +@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) +and for groupings with the @code{%type} Bison declaration (@pxref{Type +Decl, ,Nonterminal Symbols}). @end itemize @node Actions, Action Types, Multiple Types, Semantics @@ -2879,9 +2880,10 @@ Bison will convert this into a @code{#define} directive in the parser, so that the function @code{yylex} (if it is in this file) can use the name @var{name} to stand for this token type's code. -Alternatively, you can use @code{%left}, @code{%right}, or @code{%nonassoc} -instead of @code{%token}, if you wish to specify associativity and precedence. -@xref{Precedence Decl, ,Operator Precedence}. +Alternatively, you can use @code{%left}, @code{%right}, or +@code{%nonassoc} instead of @code{%token}, if you wish to specify +associativity and precedence. @xref{Precedence Decl, ,Operator +Precedence}. You can explicitly specify the numeric code for a token type by appending an integer value in the field immediately following the token name: @@ -3114,8 +3116,8 @@ may override this restriction with the @code{%start} declaration as follows: A @dfn{reentrant} program is one which does not alter in the course of execution; in other words, it consists entirely of @dfn{pure} (read-only) code. Reentrancy is important whenever asynchronous execution is possible; -for example, a nonreentrant program may not be safe to call from a signal -handler. In systems with multiple threads of control, a nonreentrant +for example, a non-reentrant program may not be safe to call from a signal +handler. In systems with multiple threads of control, a non-reentrant program must be called only within interlocks. Normally, Bison generates a parser which is not reentrant. This is @@ -3180,14 +3182,23 @@ Declare the type of semantic values for a nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}). @item %start -Specify the grammar's start symbol (@pxref{Start Decl, ,The Start-Symbol}). +Specify the grammar's start symbol (@pxref{Start Decl, ,The +Start-Symbol}). @item %expect Declare the expected number of shift-reduce conflicts (@pxref{Expect Decl, ,Suppressing Conflict Warnings}). +@item %locations +Generate the code processing the locations (@pxref{Action Features, +,Special Features for Use in Actions}). This mode is enabled as soon as +the grammar uses the special @samp{@@@var{n}} tokens, but if your +grammar does not use it, using @samp{%locations} allows for more +accurate parse error messages. + @item %pure_parser -Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure +(Reentrant) Parser}). @item %no_lines Don't generate any @code{#line} preprocessor commands in the parser @@ -3297,8 +3308,8 @@ C code in the grammar file, you are likely to run into trouble. You call the function @code{yyparse} to cause parsing to occur. This function reads tokens, executes actions, and ultimately returns when it encounters end-of-input or an unrecoverable syntax error. You can also -write an action which directs @code{yyparse} to return immediately without -reading further. +write an action which directs @code{yyparse} to return immediately +without reading further. The value returned by @code{yyparse} is 0 if parsing was successful (return is due to end-of-input). @@ -3428,7 +3439,7 @@ The @code{yytname} table is generated only if you use the @subsection Semantic Values of Tokens @vindex yylval -In an ordinary (nonreentrant) parser, the semantic value of the token must +In an ordinary (non-reentrant) parser, the semantic value of the token must be stored into the global variable @code{yylval}. When you are using just one data type for semantic values, @code{yylval} has that type. Thus, if the type is @code{int} (the default), you might write this in @@ -3474,16 +3485,17 @@ then the code in @code{yylex} might look like this: @subsection Textual Positions of Tokens @vindex yylloc -If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, ,Special Features for Use in Actions}) in -actions to keep track of the textual locations of tokens and groupings, -then you must provide this information in @code{yylex}. The function -@code{yyparse} expects to find the textual location of a token just parsed -in the global variable @code{yylloc}. So @code{yylex} must store the -proper data in that variable. The value of @code{yylloc} is a structure -and you need only initialize the members that are going to be used by the -actions. The four members are called @code{first_line}, -@code{first_column}, @code{last_line} and @code{last_column}. Note that -the use of this feature makes the parser noticeably slower. +If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, +,Special Features for Use in Actions}) in actions to keep track of the +textual locations of tokens and groupings, then you must provide this +information in @code{yylex}. The function @code{yyparse} expects to +find the textual location of a token just parsed in the global variable +@code{yylloc}. So @code{yylex} must store the proper data in that +variable. The value of @code{yylloc} is a structure and you need only +initialize the members that are going to be used by the actions. The +four members are called @code{first_line}, @code{first_column}, +@code{last_line} and @code{last_column}. Note that the use of this +feature makes the parser noticeably slower. @tindex YYLTYPE The data type of @code{yylloc} has the name @code{YYLTYPE}. @@ -4014,33 +4026,33 @@ expr: expr '-' expr @noindent Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2}; -should it reduce them via the rule for the subtraction operator? It depends -on the next token. Of course, if the next token is @samp{)}, we must -reduce; shifting is invalid because no single rule can reduce the token -sequence @w{@samp{- 2 )}} or anything starting with that. But if the next -token is @samp{*} or @samp{<}, we have a choice: either shifting or -reduction would allow the parse to complete, but with different -results. - -To decide which one Bison should do, we must consider the -results. If the next operator token @var{op} is shifted, then it -must be reduced first in order to permit another opportunity to -reduce the difference. The result is (in effect) @w{@samp{1 - (2 -@var{op} 3)}}. On the other hand, if the subtraction is reduced -before shifting @var{op}, the result is @w{@samp{(1 - 2) @var{op} -3}}. Clearly, then, the choice of shift or reduce should depend -on the relative precedence of the operators @samp{-} and -@var{op}: @samp{*} should be shifted first, but not @samp{<}. +should it reduce them via the rule for the subtraction operator? It +depends on the next token. Of course, if the next token is @samp{)}, we +must reduce; shifting is invalid because no single rule can reduce the +token sequence @w{@samp{- 2 )}} or anything starting with that. But if +the next token is @samp{*} or @samp{<}, we have a choice: either +shifting or reduction would allow the parse to complete, but with +different results. + +To decide which one Bison should do, we must consider the results. If +the next operator token @var{op} is shifted, then it must be reduced +first in order to permit another opportunity to reduce the difference. +The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other +hand, if the subtraction is reduced before shifting @var{op}, the result +is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or +reduce should depend on the relative precedence of the operators +@samp{-} and @var{op}: @samp{*} should be shifted first, but not +@samp{<}. @cindex associativity What about input such as @w{@samp{1 - 2 - 5}}; should this be -@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For -most operators we prefer the former, which is called @dfn{left -association}. The latter alternative, @dfn{right association}, is -desirable for assignment operators. The choice of left or right -association is a matter of whether the parser chooses to shift or -reduce when the stack contains @w{@samp{1 - 2}} and the look-ahead -token is @samp{-}: shifting makes right-associativity. +@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most +operators we prefer the former, which is called @dfn{left association}. +The latter alternative, @dfn{right association}, is desirable for +assignment operators. The choice of left or right association is a +matter of whether the parser chooses to shift or reduce when the stack +contains @w{@samp{1 - 2}} and the look-ahead token is @samp{-}: shifting +makes right-associativity. @node Using Precedence, Precedence Examples, Why Precedence, Precedence @subsection Specifying Operator Precedence @@ -4632,10 +4644,10 @@ Unfortunately, the name being declared is separated from the declaration construct itself by a complicated syntactic structure---the ``declarator''. As a result, part of the Bison parser for C needs to be duplicated, with -all the nonterminal names changed: once for parsing a declaration in which -a typedef name can be redefined, and once for parsing a declaration in -which that can't be done. Here is a part of the duplication, with actions -omitted for brevity: +all the nonterminal names changed: once for parsing a declaration in +which a typedef name can be redefined, and once for parsing a +declaration in which that can't be done. Here is a part of the +duplication, with actions omitted for brevity: @example initdcl: @@ -4894,25 +4906,56 @@ Here is a list of options that can be used with Bison, alphabetized by short option. It is followed by a cross key alphabetized by long option. -@table @samp -@item -b @var{file-prefix} -@itemx --file-prefix=@var{prefix} -Specify a prefix to use for all Bison output file names. The names are -chosen as if the input file were named @file{@var{prefix}.c}. +@c Please, keep this ordered as in `bison --help'. +@noindent +Operations modes: +@table @option +@item -h +@itemx --help +Print a summary of the command-line options to Bison and exit. -@item -d -@itemx --defines -Write an extra output file containing macro definitions for the token -type names defined in the grammar and the semantic value type -@code{YYSTYPE}, as well as a few @code{extern} variable declarations. +@item -V +@itemx --version +Print the version number of Bison and exit. -If the parser output file is named @file{@var{name}.c} then this file -is named @file{@var{name}.h}.@refill +@need 1750 +@item -y +@itemx --yacc +@itemx --fixed-output-files +Equivalent to @samp{-o y.tab.c}; the parser output file is called +@file{y.tab.c}, and the other outputs are called @file{y.output} and +@file{y.tab.h}. The purpose of this option is to imitate Yacc's output +file name conventions. Thus, the following shell script can substitute +for Yacc:@refill -This output file is essential if you wish to put the definition of -@code{yylex} in a separate source file, because @code{yylex} needs to -be able to refer to token type codes and the variable -@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill +@example +bison -y $* +@end example +@end table + +@noindent +Tuning the parser: + +@table @option +@item -t +@itemx --debug +Output a definition of the macro @code{YYDEBUG} into the parser file, +so that the debugging facilities are compiled. @xref{Debugging, ,Debugging Your Parser}. + +@item --locations +Pretend that @code{%locactions} was specified. @xref{Decl Summary}. + +@item -p @var{prefix} +@itemx --name-prefix=@var{prefix} +Rename the external symbols used in the parser so that they start with +@var{prefix} instead of @samp{yy}. The precise list of symbols renamed +is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs}, +@code{yylval}, @code{yychar} and @code{yydebug}. + +For example, if you use @samp{-p c}, the names become @code{cparse}, +@code{clex}, and so on. + +@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. @item -l @itemx --no-lines @@ -4932,33 +4975,37 @@ This option also tells Bison to write the C code for the grammar actions into a file named @file{@var{filename}.act}, in the form of a brace-surrounded body fit for a @code{switch} statement. -@item -o @var{outfile} -@itemx --output-file=@var{outfile} -Specify the name @var{outfile} for the parser file. +@item -r +@itemx --raw +Pretend that @code{%raw} was specified. @xref{Decl Summary}. -The other output files' names are constructed from @var{outfile} -as described under the @samp{-v} and @samp{-d} options. +@item -k +@itemx --token-table +Pretend that @code{%token_table} was specified. @xref{Decl Summary}. +@end table -@item -p @var{prefix} -@itemx --name-prefix=@var{prefix} -Rename the external symbols used in the parser so that they start with -@var{prefix} instead of @samp{yy}. The precise list of symbols renamed -is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs}, -@code{yylval}, @code{yychar} and @code{yydebug}. +@noindent +Adjust the output: -For example, if you use @samp{-p c}, the names become @code{cparse}, -@code{clex}, and so on. +@table @option +@item -d +@itemx --defines +Write an extra output file containing macro definitions for the token +type names defined in the grammar and the semantic value type +@code{YYSTYPE}, as well as a few @code{extern} variable declarations. -@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. +If the parser output file is named @file{@var{name}.c} then this file +is named @file{@var{name}.h}.@refill -@item -r -@itemx --raw -Pretend that @code{%raw} was specified. @xref{Decl Summary}. +This output file is essential if you wish to put the definition of +@code{yylex} in a separate source file, because @code{yylex} needs to +be able to refer to token type codes and the variable +@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill -@item -t -@itemx --debug -Output a definition of the macro @code{YYDEBUG} into the parser file, -so that the debugging facilities are compiled. @xref{Debugging, ,Debugging Your Parser}. +@item -b @var{file-prefix} +@itemx --file-prefix=@var{prefix} +Specify a prefix to use for all Bison output file names. The names are +chosen as if the input file were named @file{@var{prefix}.c}. @item -v @itemx --verbose @@ -4976,27 +5023,12 @@ Therefore, if the input file is @file{foo.y}, then the parser file is called @file{foo.tab.c} by default. As a consequence, the verbose output file is called @file{foo.output}.@refill -@item -V -@itemx --version -Print the version number of Bison and exit. - -@item -h -@itemx --help -Print a summary of the command-line options to Bison and exit. - -@need 1750 -@item -y -@itemx --yacc -@itemx --fixed-output-files -Equivalent to @samp{-o y.tab.c}; the parser output file is called -@file{y.tab.c}, and the other outputs are called @file{y.output} and -@file{y.tab.h}. The purpose of this option is to imitate Yacc's output -file name conventions. Thus, the following shell script can substitute -for Yacc:@refill +@item -o @var{outfile} +@itemx --output-file=@var{outfile} +Specify the name @var{outfile} for the parser file. -@example -bison -y $* -@end example +The other output files' names are constructed from @var{outfile} +as described under the @samp{-v} and @samp{-d} options. @end table @node Environment Variables, Option Cross Key, Bison Options, Invocation @@ -5232,7 +5264,7 @@ Bison declaration to avoid generating @code{#line} directives in the parser file. @xref{Decl Summary}. @item %nonassoc -Bison declaration to assign nonassociativity to token(s). +Bison declaration to assign non-associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @item %prec @@ -5280,15 +5312,17 @@ Bison declarations section or the additional C code section. @xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}. @item %@{ %@} -All code listed between @samp{%@{} and @samp{%@}} is copied directly -to the output file uninterpreted. Such code forms the ``C -declarations'' section of the input file. @xref{Grammar Outline, ,Outline of a Bison Grammar}. +All code listed between @samp{%@{} and @samp{%@}} is copied directly to +the output file uninterpreted. Such code forms the ``C declarations'' +section of the input file. @xref{Grammar Outline, ,Outline of a Bison +Grammar}. @item /*@dots{}*/ Comment delimiters, as in C. @item : -Separates a rule's result from its components. @xref{Rules, ,Syntax of Grammar Rules}. +Separates a rule's result from its components. @xref{Rules, ,Syntax of +Grammar Rules}. @item ; Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}. @@ -5305,13 +5339,15 @@ Separates alternate rules for the same result nonterminal. @table @asis @item Backus-Naur Form (BNF) Formal method of specifying context-free grammars. BNF was first used -in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, +,Languages and Context-Free Grammars}. @item Context-free grammars Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an expression, integers are allowed @emph{anywhere} an expression is -permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +permitted. @xref{Language and Grammar, ,Languages and Context-Free +Grammars}. @item Dynamic allocation Allocation of memory that occurs during execution, rather than at @@ -5352,8 +5388,9 @@ Operators having left associativity are analyzed from left to right: @samp{c}. @xref{Precedence, ,Operator Precedence}. @item Left recursion -A rule whose result symbol is also its first component symbol; -for example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive Rules}. +A rule whose result symbol is also its first component symbol; for +example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive +Rules}. @item Left-to-right parsing Parsing a sentence of a language by analyzing it token by token from @@ -5368,11 +5405,11 @@ A flag, set by actions in the grammar rules, which alters the way tokens are parsed. @xref{Lexical Tie-ins}. @item Literal string token -A token which consists of two or more fixed characters. -@xref{Symbols}. +A token which consists of two or more fixed characters. @xref{Symbols}. @item Look-ahead token -A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}. +A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead +Tokens}. @item LALR(1) The class of context-free grammars that Bison (like most other parser @@ -5403,7 +5440,8 @@ performs some operation. @item Reduction Replacing a string of nonterminals and/or terminals with a single -nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison Parser Algorithm }. +nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison +Parser Algorithm }. @item Reentrant A reentrant subprogram is a subprogram which can be in invoked any @@ -5414,8 +5452,9 @@ invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. A language in which all operators are postfix operators. @item Right recursion -A rule whose result symbol is also its last component symbol; -for example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive Rules}. +A rule whose result symbol is also its last component symbol; for +example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive +Rules}. @item Semantics In computer languages, the semantics are specified by the actions @@ -5449,9 +5488,9 @@ The input of the Bison parser is a stream of tokens which comes from the lexical analyzer. @xref{Symbols}. @item Terminal symbol -A grammar symbol that has no rules in the grammar and therefore -is grammatically indivisible. The piece of text it represents -is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +A grammar symbol that has no rules in the grammar and therefore is +grammatically indivisible. The piece of text it represents is a token. +@xref{Language and Grammar, ,Languages and Context-Free Grammars}. @end table @node Index, , Glossary, Top