X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/9d9b8b70061b84abff1222350bc94f033cee6d62..c21493b89f5f5dd49e1f46d311326d5d4f49f8a4:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 74e53812..5ff8f863 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -44,7 +44,7 @@ This manual is for @acronym{GNU} Bison (version @value{VERSION}, @value{UPDATED}), the @acronym{GNU} parser generator. Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, -1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. +1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -145,9 +145,10 @@ The Concepts of Bison Writing @acronym{GLR} Parsers -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. Examples @@ -733,9 +734,10 @@ user-defined function on the resulting values to produce an arbitrary merged result. @menu -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. @end menu @node Simple GLR Parsers @@ -910,29 +912,27 @@ parser recognizes all valid declarations, according to the limited syntax above, transparently. In fact, the user does not even notice when the parser splits. -So here we have a case where we can use the benefits of @acronym{GLR}, almost -without disadvantages. Even in simple cases like this, however, there -are at least two potential problems to beware. -First, always analyze the conflicts reported by -Bison to make sure that @acronym{GLR} splitting is only done where it is -intended. A @acronym{GLR} parser splitting inadvertently may cause -problems less obvious than an @acronym{LALR} parser statically choosing the -wrong alternative in a conflict. -Second, consider interactions with the lexer (@pxref{Semantic Tokens}) -with great care. Since a split parser consumes tokens -without performing any actions during the split, the lexer cannot -obtain information via parser actions. Some cases of -lexer interactions can be eliminated by using @acronym{GLR} to -shift the complications from the lexer to the parser. You must check -the remaining cases for correctness. - -In our example, it would be safe for the lexer to return tokens -based on their current meanings in some symbol table, because no new -symbols are defined in the middle of a type declaration. Though it -is possible for a parser to define the enumeration -constants as they are parsed, before the type declaration is -completed, it actually makes no difference since they cannot be used -within the same enumerated type declaration. +So here we have a case where we can use the benefits of @acronym{GLR}, +almost without disadvantages. Even in simple cases like this, however, +there are at least two potential problems to beware. First, always +analyze the conflicts reported by Bison to make sure that @acronym{GLR} +splitting is only done where it is intended. A @acronym{GLR} parser +splitting inadvertently may cause problems less obvious than an +@acronym{LALR} parser statically choosing the wrong alternative in a +conflict. Second, consider interactions with the lexer (@pxref{Semantic +Tokens}) with great care. Since a split parser consumes tokens without +performing any actions during the split, the lexer cannot obtain +information via parser actions. Some cases of lexer interactions can be +eliminated by using @acronym{GLR} to shift the complications from the +lexer to the parser. You must check the remaining cases for +correctness. + +In our example, it would be safe for the lexer to return tokens based on +their current meanings in some symbol table, because no new symbols are +defined in the middle of a type declaration. Though it is possible for +a parser to define the enumeration constants as they are parsed, before +the type declaration is completed, it actually makes no difference since +they cannot be used within the same enumerated type declaration. @node Merging GLR Parses @subsection Using @acronym{GLR} to Resolve Ambiguities @@ -1096,6 +1096,52 @@ productions that participate in any particular merge have identical and the parser will report an error during any parse that results in the offending merge. +@node GLR Semantic Actions +@subsection GLR Semantic Actions + +@cindex deferred semantic actions +By definition, a deferred semantic action is not performed at the same time as +the associated reduction. +This raises caveats for several Bison features you might use in a semantic +action in a @acronym{GLR} parser. + +@vindex yychar +@cindex @acronym{GLR} parsers and @code{yychar} +@vindex yylval +@cindex @acronym{GLR} parsers and @code{yylval} +@vindex yylloc +@cindex @acronym{GLR} parsers and @code{yylloc} +In any semantic action, you can examine @code{yychar} to determine the type of +the look-ahead token present at the time of the associated reduction. +After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF}, +you can then examine @code{yylval} and @code{yylloc} to determine the +look-ahead token's semantic value and location, if any. +In a nondeferred semantic action, you can also modify any of these variables to +influence syntax analysis. +@xref{Look-Ahead, ,Look-Ahead Tokens}. + +@findex yyclearin +@cindex @acronym{GLR} parsers and @code{yyclearin} +In a deferred semantic action, it's too late to influence syntax analysis. +In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to +shallow copies of the values they had at the time of the associated reduction. +For this reason alone, modifying them is dangerous. +Moreover, the result of modifying them is undefined and subject to change with +future versions of Bison. +For example, if a semantic action might be deferred, you should never write it +to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free +memory referenced by @code{yylval}. + +@findex YYERROR +@cindex @acronym{GLR} parsers and @code{YYERROR} +Another Bison feature requiring special consideration is @code{YYERROR} +(@pxref{Action Features}), which you can invoke in any semantic action to +initiate error recovery. +During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is +the same as its effect in an @acronym{LALR}(1) parser. +In a deferred semantic action, its effect is undefined. +@c The effect is probably a syntax error at the split point. + @node Compiler Requirements @subsection Considerations when Compiling @acronym{GLR} Parsers @cindex @code{inline} @@ -2585,13 +2631,13 @@ continues until end of line. @cindex Prologue @cindex declarations -The @var{Prologue} section contains macro definitions and -declarations of functions and variables that are used in the actions in the -grammar rules. These are copied to the beginning of the parser file so -that they precede the definition of @code{yyparse}. You can use -@samp{#include} to get the declarations from a header file. If you don't -need any C declarations, you may omit the @samp{%@{} and @samp{%@}} -delimiters that bracket this section. +The @var{Prologue} section contains macro definitions and declarations +of functions and variables that are used in the actions in the grammar +rules. These are copied to the beginning of the parser file so that +they precede the definition of @code{yyparse}. You can use +@samp{#include} to get the declarations from a header file. If you +don't need any C declarations, you may omit the @samp{%@{} and +@samp{%@}} delimiters that bracket this section. You may have more than one @var{Prologue} section, intermixed with the @var{Bison declarations}. This allows you to have C and Bison @@ -2661,10 +2707,10 @@ even if you define them in the Epilogue. If the last section is empty, you may omit the @samp{%%} that separates it from the grammar rules. -The Bison parser itself contains many macros and identifiers whose -names start with @samp{yy} or @samp{YY}, so it is a -good idea to avoid using any such names (except those documented in this -manual) in the epilogue of the grammar file. +The Bison parser itself contains many macros and identifiers whose names +start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using +any such names (except those documented in this manual) in the epilogue +of the grammar file. @node Symbols @section Symbols, Terminal and Nonterminal @@ -2680,13 +2726,13 @@ A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a class of syntactically equivalent tokens. You use the symbol in grammar rules to mean that a token in that class is allowed. The symbol is represented in the Bison parser by a numeric code, and the @code{yylex} -function returns a token type code to indicate what kind of token has been -read. You don't need to know what the code value is; you can use the -symbol to stand for it. +function returns a token type code to indicate what kind of token has +been read. You don't need to know what the code value is; you can use +the symbol to stand for it. -A @dfn{nonterminal symbol} stands for a class of syntactically equivalent -groupings. The symbol name is used in writing grammar rules. By convention, -it should be all lower case. +A @dfn{nonterminal symbol} stands for a class of syntactically +equivalent groupings. The symbol name is used in writing grammar rules. +By convention, it should be all lower case. Symbol names can contain letters, digits (not at the beginning), underscores and periods. Periods make sense only in nonterminals. @@ -2791,17 +2837,17 @@ characters in the following C-language string: "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~" @end example -The @code{yylex} function and Bison must use a consistent character -set and encoding for character tokens. For example, if you run Bison in an -@acronym{ASCII} environment, but then compile and run the resulting program -in an environment that uses an incompatible character set like -@acronym{EBCDIC}, the resulting program may not work because the -tables generated by Bison will assume @acronym{ASCII} numeric values for -character tokens. It is standard -practice for software distributions to contain C source files that -were generated by Bison in an @acronym{ASCII} environment, so installers on -platforms that are incompatible with @acronym{ASCII} must rebuild those -files before compiling them. +The @code{yylex} function and Bison must use a consistent character set +and encoding for character tokens. For example, if you run Bison in an +@acronym{ASCII} environment, but then compile and run the resulting +program in an environment that uses an incompatible character set like +@acronym{EBCDIC}, the resulting program may not work because the tables +generated by Bison will assume @acronym{ASCII} numeric values for +character tokens. It is standard practice for software distributions to +contain C source files that were generated by Bison in an +@acronym{ASCII} environment, so installers on platforms that are +incompatible with @acronym{ASCII} must rebuild those files before +compiling them. The symbol @code{error} is a terminal symbol reserved for error recovery (@pxref{Error Recovery}); you shouldn't use it for any other purpose. @@ -2908,10 +2954,10 @@ with no components. @section Recursive Rules @cindex recursive rule -A rule is called @dfn{recursive} when its @var{result} nonterminal appears -also on its right hand side. Nearly all Bison grammars need to use -recursion, because that is the only way to define a sequence of any number -of a particular thing. Consider this recursive definition of a +A rule is called @dfn{recursive} when its @var{result} nonterminal +appears also on its right hand side. Nearly all Bison grammars need to +use recursion, because that is the only way to define a sequence of any +number of a particular thing. Consider this recursive definition of a comma-separated sequence of one or more expressions: @example @@ -3025,8 +3071,9 @@ This macro definition must go in the prologue of the grammar file In most programs, you will need different data types for different kinds of tokens and groupings. For example, a numeric constant may need type -@code{int} or @code{long int}, while a string constant needs type @code{char *}, -and an identifier might need a pointer to an entry in the symbol table. +@code{int} or @code{long int}, while a string constant needs type +@code{char *}, and an identifier might need a pointer to an entry in the +symbol table. To use more than one data type for semantic values in one parser, Bison requires you to do two things: @@ -3141,6 +3188,12 @@ As long as @code{bar} is used only in the fashion shown here, @code{$0} always refers to the @code{expr} which precedes @code{bar} in the definition of @code{foo}. +@vindex yylval +It is also possible to access the semantic value of the look-ahead token, if +any, from a semantic action. +This semantic value is stored in @code{yylval}. +@xref{Action Features, ,Special Features for Use in Actions}. + @node Action Types @subsection Data Types of Values in Actions @cindex action data types @@ -3458,6 +3511,12 @@ exp: @dots{} @end group @end example +@vindex yylloc +It is also possible to access the location of the look-ahead token, if any, +from a semantic action. +This location is stored in @code{yylloc}. +@xref{Action Features, ,Special Features for Use in Actions}. + @node Location Default Action @subsection Default Action for Locations @vindex YYLLOC_DEFAULT @@ -3743,10 +3802,15 @@ As an extension to @acronym{POSIX}, a tag is allowed after the @end group @end example +@noindent specifies the union tag @code{value}, so the corresponding C type is @code{union value}. If you do not specify a tag, it defaults to @code{YYSTYPE}. +As another extension to @acronym{POSIX}, you may specify multiple +@code{%union} declarations; their contents are concatenated. However, +only the first @code{%union} declaration can specify a tag. + Note that, unlike making a @code{union} declaration in C, you need not write a semicolon after the closing brace. @@ -4068,13 +4132,12 @@ is named @file{@var{name}.h}. Unless @code{YYSTYPE} is already defined as a macro, the output header declares @code{YYSTYPE}. Therefore, if you are using a @code{%union} -(@pxref{Multiple Types, ,More Than One Value Type}) with components -that require other definitions, or if you have defined a -@code{YYSTYPE} macro (@pxref{Value Type, ,Data Types of Semantic -Values}), you need to arrange for these definitions to be propagated to -all modules, e.g., by putting them in a -prerequisite header that is included both by your parser and by any -other module that needs @code{YYSTYPE}. +(@pxref{Multiple Types, ,More Than One Value Type}) with components that +require other definitions, or if you have defined a @code{YYSTYPE} macro +(@pxref{Value Type, ,Data Types of Semantic Values}), you need to +arrange for these definitions to be propagated to all modules, e.g., by +putting them in a prerequisite header that is included both by your +parser and by any other module that needs @code{YYSTYPE}. Unless your parser is pure, the output header declares @code{yylval} as an external variable. @xref{Pure Decl, ,A Pure (Reentrant) @@ -4085,11 +4148,11 @@ If you have also used locations, the output header declares @code{YYSTYPE} and @code{yylval}. @xref{Locations, ,Tracking Locations}. -This output file is normally essential if you wish to put the -definition of @code{yylex} in a separate source file, because -@code{yylex} typically needs to be able to refer to the -above-mentioned declarations and to the token type codes. -@xref{Token Values, ,Semantic Values of Tokens}. +This output file is normally essential if you wish to put the definition +of @code{yylex} in a separate source file, because @code{yylex} +typically needs to be able to refer to the above-mentioned declarations +and to the token type codes. @xref{Token Values, ,Semantic Values of +Tokens}. @end deffn @deffn {Directive} %destructor @@ -4500,12 +4563,11 @@ then the code in @code{yylex} might look like this: @vindex yylloc If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , -Tracking Locations}) in actions to keep track of the -textual locations of tokens and groupings, then you must provide this -information in @code{yylex}. The function @code{yyparse} expects to -find the textual location of a token just parsed in the global variable -@code{yylloc}. So @code{yylex} must store the proper data in that -variable. +Tracking Locations}) in actions to keep track of the textual locations +of tokens and groupings, then you must provide this information in +@code{yylex}. The function @code{yyparse} expects to find the textual +location of a token just parsed in the global variable @code{yylloc}. +So @code{yylex} must store the proper data in that variable. By default, the value of @code{yylloc} is a structure and you need only initialize the members that are going to be used by the actions. The @@ -4766,6 +4828,12 @@ In either case, the rest of the action is not executed. Value stored in @code{yychar} when there is no look-ahead token. @end deffn +@deffn {Macro} YYEOF +@vindex YYEOF +Value stored in @code{yychar} when the look-ahead is the end of the input +stream. +@end deffn + @deffn {Macro} YYERROR; @findex YYERROR Cause an immediate syntax error. This statement initiates error @@ -4782,15 +4850,20 @@ is recovering from a syntax error, and 0 the rest of the time. @end deffn @deffn {Variable} yychar -Variable containing the current look-ahead token. (In a pure parser, -this is actually a local variable within @code{yyparse}.) When there is -no look-ahead token, the value @code{YYEMPTY} is stored in the variable. +Variable containing either the look-ahead token, or @code{YYEOF} when the +look-ahead is the end of the input stream, or @code{YYEMPTY} when no look-ahead +has been performed so the next token is not yet known. +Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic +Actions}). @xref{Look-Ahead, ,Look-Ahead Tokens}. @end deffn @deffn {Macro} yyclearin; Discard the current look-ahead token. This is useful primarily in -error rules. @xref{Error Recovery}. +error rules. +Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR +Semantic Actions}). +@xref{Error Recovery}. @end deffn @deffn {Macro} yyerrok; @@ -4799,6 +4872,22 @@ errors. This is useful primarily in error rules. @xref{Error Recovery}. @end deffn +@deffn {Variable} yylloc +Variable containing the look-ahead token location when @code{yychar} is not set +to @code{YYEMPTY} or @code{YYEOF}. +Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic +Actions}). +@xref{Actions and Locations, ,Actions and Locations}. +@end deffn + +@deffn {Variable} yylval +Variable containing the look-ahead token semantic value when @code{yychar} is +not set to @code{YYEMPTY} or @code{YYEOF}. +Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic +Actions}). +@xref{Actions, ,Actions}. +@end deffn + @deffn {Value} @@$ @findex @@$ Acts like a structure variable containing information on the textual location @@ -4842,12 +4931,11 @@ Tracking Locations}. A Bison-generated parser can print diagnostics, including error and tracing messages. By default, they appear in English. However, Bison -also supports outputting diagnostics in the user's native language. -To make this work, the user should set the usual environment -variables. @xref{Users, , The User's View, gettext, GNU -@code{gettext} utilities}. For -example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might set -the user's locale to French Canadian using the @acronym{UTF}-8 +also supports outputting diagnostics in the user's native language. To +make this work, the user should set the usual environment variables. +@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}. +For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might +set the user's locale to French Canadian using the @acronym{UTF}-8 encoding. The exact set of available locales depends on the user's installation. @@ -5037,7 +5125,11 @@ doing so would produce on the stack the sequence of symbols @code{expr '!'}. No rule allows that sequence. @vindex yychar -The current look-ahead token is stored in the variable @code{yychar}. +@vindex yylval +@vindex yylloc +The look-ahead token is stored in the variable @code{yychar}. +Its semantic value and location, if any, are stored in the variables +@code{yylval} and @code{yylloc}. @xref{Action Features, ,Special Features for Use in Actions}. @node Shift/Reduce @@ -5854,6 +5946,7 @@ The previous look-ahead token is reanalyzed immediately after an error. If this is unacceptable, then the macro @code{yyclearin} may be used to clear this token. Write the statement @samp{yyclearin;} in the error rule's action. +@xref{Action Features, ,Special Features for Use in Actions}. For example, suppose that on a syntax error, an error handling routine is called that advances the input stream to some point where parsing should @@ -6681,13 +6774,14 @@ Print the version number of Bison and exit. @item --print-localedir Print the name of the directory containing locale-dependent data. -@need 1750 @item -y @itemx --yacc -Equivalent to @samp{-o y.tab.c}; the parser output file is called +Act more like the traditional Yacc command. This can cause +different diagnostics to be generated, and may change behavior in +other minor ways. Most importantly, imitate Yacc's output +file name conventions, so that the parser output file is called @file{y.tab.c}, and the other outputs are called @file{y.output} and -@file{y.tab.h}. The purpose of this option is to imitate Yacc's output -file name conventions. Thus, the following shell script can substitute +@file{y.tab.h}. Thus, the following shell script can substitute for Yacc, and the Bison distribution contains such a script for compatibility with @acronym{POSIX}: @@ -6695,6 +6789,12 @@ compatibility with @acronym{POSIX}: #! /bin/sh bison -y "$@@" @end example + +The @option{-y}/@option{--yacc} option is intended for use with +traditional Yacc grammars. If your grammar uses a Bison extension +like @samp{%glr-parser}, Bison might not be Yacc-compatible even if +this option is specified. + @end table @noindent @@ -8036,7 +8136,7 @@ token. @xref{Action Features, ,Special Features for Use in Actions}. @end deffn @deffn {Variable} yychar -External integer variable that contains the integer value of the current +External integer variable that contains the integer value of the look-ahead token. (In a pure parser, it is a local variable within @code{yyparse}.) Error-recovery rule actions may examine this variable. @xref{Action Features, ,Special Features for Use in Actions}. @@ -8097,7 +8197,7 @@ the next token. @xref{Lexical, ,The Lexical Analyzer Function @deffn {Macro} YYLEX_PARAM An obsolete macro for specifying an extra argument (or list of extra -arguments) for @code{yyparse} to pass to @code{yylex}. he use of this +arguments) for @code{yyparse} to pass to @code{yylex}. The use of this macro is deprecated, and is supported only for Yacc like parsers. @xref{Pure Calling,, Calling Conventions for Pure Parsers}. @end deffn @@ -8106,9 +8206,12 @@ macro is deprecated, and is supported only for Yacc like parsers. External variable in which @code{yylex} should place the line and column numbers associated with a token. (In a pure parser, it is a local variable within @code{yyparse}, and its address is passed to -@code{yylex}.) You can ignore this variable if you don't use the -@samp{@@} feature in the grammar actions. @xref{Token Locations, -,Textual Locations of Tokens}. +@code{yylex}.) +You can ignore this variable if you don't use the @samp{@@} feature in the +grammar actions. +@xref{Token Locations, ,Textual Locations of Tokens}. +In semantic actions, it stores the location of the look-ahead token. +@xref{Actions and Locations, ,Actions and Locations}. @end deffn @deffn {Type} YYLTYPE @@ -8120,7 +8223,10 @@ members. @xref{Location Type, , Data Types of Locations}. External variable in which @code{yylex} should place the semantic value associated with a token. (In a pure parser, it is a local variable within @code{yyparse}, and its address is passed to -@code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}. +@code{yylex}.) +@xref{Token Values, ,Semantic Values of Tokens}. +In semantic actions, it stores the semantic value of the look-ahead token. +@xref{Actions, ,Actions}. @end deffn @deffn {Macro} YYMAXDEPTH @@ -8381,7 +8487,7 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES @c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param @c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP -@c LocalWords: YYEMPTY YYRECOVERING yyclearin GE def UMINUS maybeword +@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword @c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH @c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm VCG notype @c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args