@value{UPDATED}), the @acronym{GNU} parser generator.
Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
-1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
+1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
@quotation
Permission is granted to copy, distribute and/or modify this document
Writing @acronym{GLR} Parsers
-* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
-* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
-* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
+* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
+* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
+* GLR Semantic Actions:: Deferred semantic actions have special concerns.
+* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
Examples
merged result.
@menu
-* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars
-* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities
-* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler
+* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars.
+* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities.
+* GLR Semantic Actions:: Deferred semantic actions have special concerns.
+* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler.
@end menu
@node Simple GLR Parsers
limited syntax above, transparently. In fact, the user does not even
notice when the parser splits.
-So here we have a case where we can use the benefits of @acronym{GLR}, almost
-without disadvantages. Even in simple cases like this, however, there
-are at least two potential problems to beware.
-First, always analyze the conflicts reported by
-Bison to make sure that @acronym{GLR} splitting is only done where it is
-intended. A @acronym{GLR} parser splitting inadvertently may cause
-problems less obvious than an @acronym{LALR} parser statically choosing the
-wrong alternative in a conflict.
-Second, consider interactions with the lexer (@pxref{Semantic Tokens})
-with great care. Since a split parser consumes tokens
-without performing any actions during the split, the lexer cannot
-obtain information via parser actions. Some cases of
-lexer interactions can be eliminated by using @acronym{GLR} to
-shift the complications from the lexer to the parser. You must check
-the remaining cases for correctness.
-
-In our example, it would be safe for the lexer to return tokens
-based on their current meanings in some symbol table, because no new
-symbols are defined in the middle of a type declaration. Though it
-is possible for a parser to define the enumeration
-constants as they are parsed, before the type declaration is
-completed, it actually makes no difference since they cannot be used
-within the same enumerated type declaration.
+So here we have a case where we can use the benefits of @acronym{GLR},
+almost without disadvantages. Even in simple cases like this, however,
+there are at least two potential problems to beware. First, always
+analyze the conflicts reported by Bison to make sure that @acronym{GLR}
+splitting is only done where it is intended. A @acronym{GLR} parser
+splitting inadvertently may cause problems less obvious than an
+@acronym{LALR} parser statically choosing the wrong alternative in a
+conflict. Second, consider interactions with the lexer (@pxref{Semantic
+Tokens}) with great care. Since a split parser consumes tokens without
+performing any actions during the split, the lexer cannot obtain
+information via parser actions. Some cases of lexer interactions can be
+eliminated by using @acronym{GLR} to shift the complications from the
+lexer to the parser. You must check the remaining cases for
+correctness.
+
+In our example, it would be safe for the lexer to return tokens based on
+their current meanings in some symbol table, because no new symbols are
+defined in the middle of a type declaration. Though it is possible for
+a parser to define the enumeration constants as they are parsed, before
+the type declaration is completed, it actually makes no difference since
+they cannot be used within the same enumerated type declaration.
@node Merging GLR Parses
@subsection Using @acronym{GLR} to Resolve Ambiguities
and the parser will report an error during any parse that results in
the offending merge.
+@node GLR Semantic Actions
+@subsection GLR Semantic Actions
+
+@cindex deferred semantic actions
+By definition, a deferred semantic action is not performed at the same time as
+the associated reduction.
+This raises caveats for several Bison features you might use in a semantic
+action in a @acronym{GLR} parser.
+
+@vindex yychar
+@cindex @acronym{GLR} parsers and @code{yychar}
+@vindex yylval
+@cindex @acronym{GLR} parsers and @code{yylval}
+@vindex yylloc
+@cindex @acronym{GLR} parsers and @code{yylloc}
+In any semantic action, you can examine @code{yychar} to determine the type of
+the look-ahead token present at the time of the associated reduction.
+After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF},
+you can then examine @code{yylval} and @code{yylloc} to determine the
+look-ahead token's semantic value and location, if any.
+In a nondeferred semantic action, you can also modify any of these variables to
+influence syntax analysis.
+@xref{Look-Ahead, ,Look-Ahead Tokens}.
+
+@findex yyclearin
+@cindex @acronym{GLR} parsers and @code{yyclearin}
+In a deferred semantic action, it's too late to influence syntax analysis.
+In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to
+shallow copies of the values they had at the time of the associated reduction.
+For this reason alone, modifying them is dangerous.
+Moreover, the result of modifying them is undefined and subject to change with
+future versions of Bison.
+For example, if a semantic action might be deferred, you should never write it
+to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free
+memory referenced by @code{yylval}.
+
+@findex YYERROR
+@cindex @acronym{GLR} parsers and @code{YYERROR}
+Another Bison feature requiring special consideration is @code{YYERROR}
+(@pxref{Action Features}), which you can invoke in any semantic action to
+initiate error recovery.
+During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is
+the same as its effect in an @acronym{LALR}(1) parser.
+In a deferred semantic action, its effect is undefined.
+@c The effect is probably a syntax error at the split point.
+
@node Compiler Requirements
@subsection Considerations when Compiling @acronym{GLR} Parsers
@cindex @code{inline}
@cindex Prologue
@cindex declarations
-The @var{Prologue} section contains macro definitions and
-declarations of functions and variables that are used in the actions in the
-grammar rules. These are copied to the beginning of the parser file so
-that they precede the definition of @code{yyparse}. You can use
-@samp{#include} to get the declarations from a header file. If you don't
-need any C declarations, you may omit the @samp{%@{} and @samp{%@}}
-delimiters that bracket this section.
+The @var{Prologue} section contains macro definitions and declarations
+of functions and variables that are used in the actions in the grammar
+rules. These are copied to the beginning of the parser file so that
+they precede the definition of @code{yyparse}. You can use
+@samp{#include} to get the declarations from a header file. If you
+don't need any C declarations, you may omit the @samp{%@{} and
+@samp{%@}} delimiters that bracket this section.
You may have more than one @var{Prologue} section, intermixed with the
@var{Bison declarations}. This allows you to have C and Bison
If the last section is empty, you may omit the @samp{%%} that separates it
from the grammar rules.
-The Bison parser itself contains many macros and identifiers whose
-names start with @samp{yy} or @samp{YY}, so it is a
-good idea to avoid using any such names (except those documented in this
-manual) in the epilogue of the grammar file.
+The Bison parser itself contains many macros and identifiers whose names
+start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using
+any such names (except those documented in this manual) in the epilogue
+of the grammar file.
@node Symbols
@section Symbols, Terminal and Nonterminal
class of syntactically equivalent tokens. You use the symbol in grammar
rules to mean that a token in that class is allowed. The symbol is
represented in the Bison parser by a numeric code, and the @code{yylex}
-function returns a token type code to indicate what kind of token has been
-read. You don't need to know what the code value is; you can use the
-symbol to stand for it.
+function returns a token type code to indicate what kind of token has
+been read. You don't need to know what the code value is; you can use
+the symbol to stand for it.
-A @dfn{nonterminal symbol} stands for a class of syntactically equivalent
-groupings. The symbol name is used in writing grammar rules. By convention,
-it should be all lower case.
+A @dfn{nonterminal symbol} stands for a class of syntactically
+equivalent groupings. The symbol name is used in writing grammar rules.
+By convention, it should be all lower case.
Symbol names can contain letters, digits (not at the beginning),
underscores and periods. Periods make sense only in nonterminals.
"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~"
@end example
-The @code{yylex} function and Bison must use a consistent character
-set and encoding for character tokens. For example, if you run Bison in an
-@acronym{ASCII} environment, but then compile and run the resulting program
-in an environment that uses an incompatible character set like
-@acronym{EBCDIC}, the resulting program may not work because the
-tables generated by Bison will assume @acronym{ASCII} numeric values for
-character tokens. It is standard
-practice for software distributions to contain C source files that
-were generated by Bison in an @acronym{ASCII} environment, so installers on
-platforms that are incompatible with @acronym{ASCII} must rebuild those
-files before compiling them.
+The @code{yylex} function and Bison must use a consistent character set
+and encoding for character tokens. For example, if you run Bison in an
+@acronym{ASCII} environment, but then compile and run the resulting
+program in an environment that uses an incompatible character set like
+@acronym{EBCDIC}, the resulting program may not work because the tables
+generated by Bison will assume @acronym{ASCII} numeric values for
+character tokens. It is standard practice for software distributions to
+contain C source files that were generated by Bison in an
+@acronym{ASCII} environment, so installers on platforms that are
+incompatible with @acronym{ASCII} must rebuild those files before
+compiling them.
The symbol @code{error} is a terminal symbol reserved for error recovery
(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
@section Recursive Rules
@cindex recursive rule
-A rule is called @dfn{recursive} when its @var{result} nonterminal appears
-also on its right hand side. Nearly all Bison grammars need to use
-recursion, because that is the only way to define a sequence of any number
-of a particular thing. Consider this recursive definition of a
+A rule is called @dfn{recursive} when its @var{result} nonterminal
+appears also on its right hand side. Nearly all Bison grammars need to
+use recursion, because that is the only way to define a sequence of any
+number of a particular thing. Consider this recursive definition of a
comma-separated sequence of one or more expressions:
@example
In most programs, you will need different data types for different kinds
of tokens and groupings. For example, a numeric constant may need type
-@code{int} or @code{long int}, while a string constant needs type @code{char *},
-and an identifier might need a pointer to an entry in the symbol table.
+@code{int} or @code{long int}, while a string constant needs type
+@code{char *}, and an identifier might need a pointer to an entry in the
+symbol table.
To use more than one data type for semantic values in one parser, Bison
requires you to do two things:
always refers to the @code{expr} which precedes @code{bar} in the
definition of @code{foo}.
+@vindex yylval
+It is also possible to access the semantic value of the look-ahead token, if
+any, from a semantic action.
+This semantic value is stored in @code{yylval}.
+@xref{Action Features, ,Special Features for Use in Actions}.
+
@node Action Types
@subsection Data Types of Values in Actions
@cindex action data types
@end group
@end example
+@vindex yylloc
+It is also possible to access the location of the look-ahead token, if any,
+from a semantic action.
+This location is stored in @code{yylloc}.
+@xref{Action Features, ,Special Features for Use in Actions}.
+
@node Location Default Action
@subsection Default Action for Locations
@vindex YYLLOC_DEFAULT
@end group
@end example
+@noindent
specifies the union tag @code{value}, so the corresponding C type is
@code{union value}. If you do not specify a tag, it defaults to
@code{YYSTYPE}.
+As another extension to @acronym{POSIX}, you may specify multiple
+@code{%union} declarations; their contents are concatenated. However,
+only the first @code{%union} declaration can specify a tag.
+
Note that, unlike making a @code{union} declaration in C, you need not write
a semicolon after the closing brace.
Unless @code{YYSTYPE} is already defined as a macro, the output header
declares @code{YYSTYPE}. Therefore, if you are using a @code{%union}
-(@pxref{Multiple Types, ,More Than One Value Type}) with components
-that require other definitions, or if you have defined a
-@code{YYSTYPE} macro (@pxref{Value Type, ,Data Types of Semantic
-Values}), you need to arrange for these definitions to be propagated to
-all modules, e.g., by putting them in a
-prerequisite header that is included both by your parser and by any
-other module that needs @code{YYSTYPE}.
+(@pxref{Multiple Types, ,More Than One Value Type}) with components that
+require other definitions, or if you have defined a @code{YYSTYPE} macro
+(@pxref{Value Type, ,Data Types of Semantic Values}), you need to
+arrange for these definitions to be propagated to all modules, e.g., by
+putting them in a prerequisite header that is included both by your
+parser and by any other module that needs @code{YYSTYPE}.
Unless your parser is pure, the output header declares @code{yylval}
as an external variable. @xref{Pure Decl, ,A Pure (Reentrant)
@code{YYSTYPE} and @code{yylval}. @xref{Locations, ,Tracking
Locations}.
-This output file is normally essential if you wish to put the
-definition of @code{yylex} in a separate source file, because
-@code{yylex} typically needs to be able to refer to the
-above-mentioned declarations and to the token type codes.
-@xref{Token Values, ,Semantic Values of Tokens}.
+This output file is normally essential if you wish to put the definition
+of @code{yylex} in a separate source file, because @code{yylex}
+typically needs to be able to refer to the above-mentioned declarations
+and to the token type codes. @xref{Token Values, ,Semantic Values of
+Tokens}.
@end deffn
@deffn {Directive} %destructor
@vindex yylloc
If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
-Tracking Locations}) in actions to keep track of the
-textual locations of tokens and groupings, then you must provide this
-information in @code{yylex}. The function @code{yyparse} expects to
-find the textual location of a token just parsed in the global variable
-@code{yylloc}. So @code{yylex} must store the proper data in that
-variable.
+Tracking Locations}) in actions to keep track of the textual locations
+of tokens and groupings, then you must provide this information in
+@code{yylex}. The function @code{yyparse} expects to find the textual
+location of a token just parsed in the global variable @code{yylloc}.
+So @code{yylex} must store the proper data in that variable.
By default, the value of @code{yylloc} is a structure and you need only
initialize the members that are going to be used by the actions. The
Value stored in @code{yychar} when there is no look-ahead token.
@end deffn
+@deffn {Macro} YYEOF
+@vindex YYEOF
+Value stored in @code{yychar} when the look-ahead is the end of the input
+stream.
+@end deffn
+
@deffn {Macro} YYERROR;
@findex YYERROR
Cause an immediate syntax error. This statement initiates error
@end deffn
@deffn {Variable} yychar
-Variable containing the current look-ahead token. (In a pure parser,
-this is actually a local variable within @code{yyparse}.) When there is
-no look-ahead token, the value @code{YYEMPTY} is stored in the variable.
+Variable containing either the look-ahead token, or @code{YYEOF} when the
+look-ahead is the end of the input stream, or @code{YYEMPTY} when no look-ahead
+has been performed so the next token is not yet known.
+Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic
+Actions}).
@xref{Look-Ahead, ,Look-Ahead Tokens}.
@end deffn
@deffn {Macro} yyclearin;
Discard the current look-ahead token. This is useful primarily in
-error rules. @xref{Error Recovery}.
+error rules.
+Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR
+Semantic Actions}).
+@xref{Error Recovery}.
@end deffn
@deffn {Macro} yyerrok;
@xref{Error Recovery}.
@end deffn
+@deffn {Variable} yylloc
+Variable containing the look-ahead token location when @code{yychar} is not set
+to @code{YYEMPTY} or @code{YYEOF}.
+Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic
+Actions}).
+@xref{Actions and Locations, ,Actions and Locations}.
+@end deffn
+
+@deffn {Variable} yylval
+Variable containing the look-ahead token semantic value when @code{yychar} is
+not set to @code{YYEMPTY} or @code{YYEOF}.
+Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic
+Actions}).
+@xref{Actions, ,Actions}.
+@end deffn
+
@deffn {Value} @@$
@findex @@$
Acts like a structure variable containing information on the textual location
A Bison-generated parser can print diagnostics, including error and
tracing messages. By default, they appear in English. However, Bison
-also supports outputting diagnostics in the user's native language.
-To make this work, the user should set the usual environment
-variables. @xref{Users, , The User's View, gettext, GNU
-@code{gettext} utilities}. For
-example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might set
-the user's locale to French Canadian using the @acronym{UTF}-8
+also supports outputting diagnostics in the user's native language. To
+make this work, the user should set the usual environment variables.
+@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}.
+For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might
+set the user's locale to French Canadian using the @acronym{UTF}-8
encoding. The exact set of available locales depends on the user's
installation.
'!'}. No rule allows that sequence.
@vindex yychar
-The current look-ahead token is stored in the variable @code{yychar}.
+@vindex yylval
+@vindex yylloc
+The look-ahead token is stored in the variable @code{yychar}.
+Its semantic value and location, if any, are stored in the variables
+@code{yylval} and @code{yylloc}.
@xref{Action Features, ,Special Features for Use in Actions}.
@node Shift/Reduce
this is unacceptable, then the macro @code{yyclearin} may be used to clear
this token. Write the statement @samp{yyclearin;} in the error rule's
action.
+@xref{Action Features, ,Special Features for Use in Actions}.
For example, suppose that on a syntax error, an error handling routine is
called that advances the input stream to some point where parsing should
@item --print-localedir
Print the name of the directory containing locale-dependent data.
-@need 1750
@item -y
@itemx --yacc
-Equivalent to @samp{-o y.tab.c}; the parser output file is called
+Act more like the traditional Yacc command. This can cause
+different diagnostics to be generated, and may change behavior in
+other minor ways. Most importantly, imitate Yacc's output
+file name conventions, so that the parser output file is called
@file{y.tab.c}, and the other outputs are called @file{y.output} and
-@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
-file name conventions. Thus, the following shell script can substitute
+@file{y.tab.h}. Thus, the following shell script can substitute
for Yacc, and the Bison distribution contains such a script for
compatibility with @acronym{POSIX}:
#! /bin/sh
bison -y "$@@"
@end example
+
+The @option{-y}/@option{--yacc} option is intended for use with
+traditional Yacc grammars. If your grammar uses a Bison extension
+like @samp{%glr-parser}, Bison might not be Yacc-compatible even if
+this option is specified.
+
@end table
@noindent
@end deffn
@deffn {Variable} yychar
-External integer variable that contains the integer value of the current
+External integer variable that contains the integer value of the
look-ahead token. (In a pure parser, it is a local variable within
@code{yyparse}.) Error-recovery rule actions may examine this variable.
@xref{Action Features, ,Special Features for Use in Actions}.
@deffn {Macro} YYLEX_PARAM
An obsolete macro for specifying an extra argument (or list of extra
-arguments) for @code{yyparse} to pass to @code{yylex}. he use of this
+arguments) for @code{yyparse} to pass to @code{yylex}. The use of this
macro is deprecated, and is supported only for Yacc like parsers.
@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
@end deffn
External variable in which @code{yylex} should place the line and column
numbers associated with a token. (In a pure parser, it is a local
variable within @code{yyparse}, and its address is passed to
-@code{yylex}.) You can ignore this variable if you don't use the
-@samp{@@} feature in the grammar actions. @xref{Token Locations,
-,Textual Locations of Tokens}.
+@code{yylex}.)
+You can ignore this variable if you don't use the @samp{@@} feature in the
+grammar actions.
+@xref{Token Locations, ,Textual Locations of Tokens}.
+In semantic actions, it stores the location of the look-ahead token.
+@xref{Actions and Locations, ,Actions and Locations}.
@end deffn
@deffn {Type} YYLTYPE
External variable in which @code{yylex} should place the semantic
value associated with a token. (In a pure parser, it is a local
variable within @code{yyparse}, and its address is passed to
-@code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}.
+@code{yylex}.)
+@xref{Token Values, ,Semantic Values of Tokens}.
+In semantic actions, it stores the semantic value of the look-ahead token.
+@xref{Actions, ,Actions}.
@end deffn
@deffn {Macro} YYMAXDEPTH
@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES
@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param
@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP
-@c LocalWords: YYEMPTY YYRECOVERING yyclearin GE def UMINUS maybeword
+@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword
@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH
@c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm VCG notype
@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args