\input texinfo @c -*-texinfo-*-
@comment %**start of header
@setfilename bison.info
-@settitle Bison 1.20
+@settitle Bison 1.25
@setchapternewpage odd
@iftex
@ifinfo
This file documents the Bison parser generator.
-Copyright (C) 1988, 1989, 1990, 1991, 1992 Free Software Foundation, Inc.
+Copyright (C) 1988, 89, 90, 91, 92, 93, 1995 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
@titlepage
@title Bison
@subtitle The YACC-compatible Parser Generator
-@subtitle December 1992, Bison Version 1.20
+@subtitle August 1995, Bison Version 1.25
@author by Charles Donnelly and Richard Stallman
@page
@vskip 0pt plus 1filll
-Copyright @copyright{} 1988, 1989, 1990, 1991, 1992 Free Software
+Copyright @copyright{} 1988, 89, 90, 91, 92, 93, 1995 Free Software
Foundation
@sp 2
Published by the Free Software Foundation @*
-675 Massachusetts Avenue @*
-Cambridge, MA 02139 USA @*
+59 Temple Place, Suite 330 @*
+Boston, MA 02111-1307 USA @*
Printed copies are available for $15 each.@*
-ISBN-1-882114-30-2
+ISBN 1-882114-45-0
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
@node Top, Introduction, (dir), (dir)
@ifinfo
-This manual documents version 1.20 of Bison.
+This manual documents version 1.25 of Bison.
@end ifinfo
@menu
don't know Bison or Yacc, start by reading these chapters. Reference
chapters follow which describe specific aspects of Bison in detail.
-Bison was written primarily by Robert Corbett; Richard Stallman made
-it Yacc-compatible. This edition corresponds to version 1.20 of Bison.
+Bison was written primarily by Robert Corbett; Richard Stallman made it
+Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added
+multicharacter string literals and other features.
+
+This edition corresponds to version 1.25 of Bison.
@node Conditions, Copying, Introduction, Top
@unnumbered Conditions for Using Bison
-Bison grammars can be used only in programs that are free software. This
-is in contrast to what happens with the GNU C compiler and the other
-GNU programming tools.
-
-The reason Bison is special is that the output of the Bison utility---the
-Bison parser file---contains a verbatim copy of a sizable piece of Bison,
-which is the code for the @code{yyparse} function. (The actions from your
-grammar are inserted into this function at one point, but the rest of the
-function is not changed.)
-
-As a result, the Bison parser file is covered by the same copying
-conditions that cover Bison itself and the rest of the GNU system: any
-program containing it has to be distributed under the standard GNU copying
-conditions.
-
-Occasionally people who would like to use Bison to develop proprietary
-programs complain about this.
-
-We don't particularly sympathize with their complaints. The purpose of the
-GNU project is to promote the right to share software and the practice of
-sharing software; it is a means of changing society. The people who
-complain are planning to be uncooperative toward the rest of the world; why
-should they deserve our help in doing so?
-
-However, it's possible that a change in these conditions might encourage
-computer companies to use and distribute the GNU system. If so, then we
-might decide to change the terms on @code{yyparse} as a matter of the
-strategy of promoting the right to share. Such a change would be
-irrevocable. Since we stand by the copying permissions we have announced,
-we cannot withdraw them once given.
-
-We mustn't make an irrevocable change hastily. We have to wait until there
-is a complete GNU system and there has been time to learn how this issue
-affects its reception.
+As of Bison version 1.24, we have changed the distribution terms for
+@code{yyparse} to permit using Bison's output in non-free programs.
+Formerly, Bison parsers could be used only in programs that were free
+software.
+
+The other GNU programming tools, such as the GNU C compiler, have never
+had such a requirement. They could always be used for non-free
+software. The reason Bison was different was not due to a special
+policy decision; it resulted from applying the usual General Public
+License to all of the Bison source code.
+
+The output of the Bison utility---the Bison parser file---contains a
+verbatim copy of a sizable piece of Bison, which is the code for the
+@code{yyparse} function. (The actions from your grammar are inserted
+into this function at one point, but the rest of the function is not
+changed.) When we applied the GPL terms to the code for @code{yyparse},
+the effect was to restrict the use of Bison output to free software.
+
+We didn't change the terms because of sympathy for people who want to
+make software proprietary. @strong{Software should be free.} But we
+concluded that limiting Bison's use to free software was doing little to
+encourage people to make other software free. So we decided to make the
+practical conditions for using Bison match the practical conditions for
+using the other GNU tools.
@node Copying, Concepts, Conditions, Top
@unnumbered GNU GENERAL PUBLIC LICENSE
@node Language and Grammar, Grammar in Bison, , Concepts
@section Languages and Context-Free Grammars
-@c !!! ``An expression can be an integer'' is not a valid Bison
-@c expression---Bison cannot read English! --rjc 6 Feb 1992
@cindex context-free grammar
@cindex grammar, context-free
In order for Bison to parse a language, it must be described by a
@code{RETURN}. A terminal symbol that stands for a particular keyword in
the language should be named after that keyword converted to upper case.
The terminal symbol @code{error} is reserved for error recovery.
-@xref{Symbols}.@refill
+@xref{Symbols}.
A terminal symbol can also be represented as a character literal, just like
a C character constant. You should do this whenever a token is just a
single character (parenthesis, plus-sign, etc.): use that same character in
a literal as the terminal symbol for that token.
+A third way to represent a terminal symbol is with a C string constant
+containing several characters. @xref{Symbols}, for more information.
+
The grammar rules also have an expression in Bison syntax. For example,
here is the Bison rule for a C @code{return} statement. The semicolon in
quotes is a literal character token, representing part of the C syntax for
Symbol names can contain letters, digits (not at the beginning),
underscores and periods. Periods make sense only in nonterminals.
-There are two ways of writing terminal symbols in the grammar:
+There are three ways of writing terminal symbols in the grammar:
@itemize @bullet
@item
@cindex character token
@cindex literal token
@cindex single-character literal
-A @dfn{character token type} (or @dfn{literal token}) is written in
-the grammar using the same syntax used in C for character constants;
-for example, @code{'+'} is a character token type. A character token
-type doesn't need to be declared unless you need to specify its
-semantic value data type (@pxref{Value Type, ,Data Types of Semantic Values}), associativity, or
-precedence (@pxref{Precedence, ,Operator Precedence}).
+A @dfn{character token type} (or @dfn{literal character token}) is
+written in the grammar using the same syntax used in C for character
+constants; for example, @code{'+'} is a character token type. A
+character token type doesn't need to be declared unless you need to
+specify its semantic value data type (@pxref{Value Type, ,Data Types of
+Semantic Values}), associativity, or precedence (@pxref{Precedence,
+,Operator Precedence}).
By convention, a character token type is used only to represent a
token that consists of that particular character. Thus, the token
All the usual escape sequences used in character literals in C can be
used in Bison as well, but you must not use the null character as a
-character literal because its ASCII code, zero, is the code
-@code{yylex} returns for end-of-input (@pxref{Calling Convention, ,Calling Convention for @code{yylex}}).
+character literal because its ASCII code, zero, is the code @code{yylex}
+returns for end-of-input (@pxref{Calling Convention, ,Calling Convention
+for @code{yylex}}).
+
+@item
+@cindex string token
+@cindex literal string token
+@cindex multi-character literal
+A @dfn{literal string token} is written like a C string constant; for
+example, @code{"<="} is a literal string token. A literal string token
+doesn't need to be declared unless you need to specify its semantic
+value data type (@pxref{Value Type}), associativity, precedence
+(@pxref{Precedence}).
+
+You can associate the literal string token with a symbolic name as an
+alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
+Declarations}). If you don't do that, the lexical analyzer has to
+retrieve the token number for the literal string token from the
+@code{yytname} table (@pxref{Calling Convention}).
+
+@strong{WARNING}: literal string tokens do not work in Yacc.
+
+By convention, a literal string token is used only to represent a token
+that consists of that particular string. Thus, you should use the token
+type @code{"<="} to represent the string @samp{<=} as a token. Bison
+does not enforces this convention, but if you depart from it, people who
+read your program will be confused.
+
+All the escape sequences used in string literals in C can be used in
+Bison as well. A literal string token must contain two or more
+characters; for a token containing just one character, use a character
+token (see above).
@end itemize
How you choose to write a terminal symbol has no effect on its
A Bison grammar rule has the following general form:
@example
+@group
@var{result}: @var{components}@dots{}
;
+@end group
@end example
@noindent
@subsection Token Type Names
@cindex declaring token type names
@cindex token type names, declaring
+@cindex declaring literal string tokens
@findex %token
The basic way to declare a token type name (terminal symbol) is as follows:
@end group
@end example
+You can associate a literal string token with a token type name by
+writing the literal string at the end of a @code{%token}
+declaration which declares the name. For example:
+
+@example
+%token arrow "=>"
+@end example
+
+@noindent
+For example, a grammar for the C language might specify these names with
+equivalent literal string tokens:
+
+@example
+%token <operator> OR "||"
+%token <operator> LE 134 "<="
+%left OR "<="
+@end example
+
+@noindent
+Once you equate the literal string and the token name, you can use them
+interchangeably in further declarations or the grammar rules. The
+@code{yylex} function can use the token name or the literal string to
+obtain the token type code number (@pxref{Calling Convention}).
+
@node Precedence Decl, Union Decl, Token Decl, Declarations
@subsection Operator Precedence
@cindex precedence declarations
the same @code{%type} declaration, if they have the same value type. Use
spaces to separate the symbol names.
+You can also declare the value type of a terminal symbol. To do this,
+use the same @code{<@var{type}>} construction in a declaration for the
+terminal symbol. All kinds of token declarations allow
+@code{<@var{type}>}.
+
@node Expect Decl, Start Decl, Type Decl, Declarations
@subsection Suppressing Conflict Warnings
@cindex suppressing conflict warnings
@end example
The effect is that the two communication variables become local
-variables in @code{yyparse}, and a different calling convention is used for
-the lexical analyzer function @code{yylex}. @xref{Pure Calling, ,Calling for Pure Parsers}, for the
-details of this. The variable @code{yynerrs} also becomes local in
-@code{yyparse} (@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). The convention for calling
-@code{yyparse} itself is unchanged.
+variables in @code{yyparse}, and a different calling convention is used
+for the lexical analyzer function @code{yylex}. @xref{Pure Calling,
+,Calling Conventions for Pure Parsers}, for the details of this. The
+variable @code{yynerrs} also becomes local in @code{yyparse}
+(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}).
+The convention for calling @code{yyparse} itself is unchanged.
@node Decl Summary, , Pure Decl, Declarations
@subsection Bison Declaration Summary
@item %pure_parser
Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
+
+@item %no_lines
+Don't generate any @code{#line} preprocessor commands in the parser
+file. Ordinarily Bison writes these commands in the parser file so that
+the C compiler and debuggers will associate errors and object code with
+your source file (the grammar file). This directive causes them to
+associate errors with the parser file, treating it an independent source
+file in its own right.
+
+@item %raw
+The output file @file{@var{name}.h} normally defines the tokens with
+Yacc-compatible token numbers. If this option is specified, the
+internal Bison numbers are used instead. (Yacc-compatible numbers start
+at 257 except for single character tokens; Bison assigns token numbers
+sequentially for all tokens starting at 3.)
+
+@item %token_table
+Generate an array of token names in the parser file. The name of the
+array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
+token whose internal Bison token code number is @var{i}. The first three
+elements of @code{yytname} are always @code{"$"}, @code{"error"}, and
+@code{"$illegal"}; after these come the symbols defined in the grammar
+file.
+
+For single-character literal tokens and literal string tokens, the name
+in the table includes the single-quote or double-quote characters: for
+example, @code{"'+'"} is a single-character literal and @code{"\"<=\""}
+is a literal string token. All the characters of the literal string
+token appear verbatim in the string found in the table; even
+double-quote characters are not escaped. For example, if the token
+consists of three characters @samp{*"*}, its string in @code{yytname}
+contains @samp{"*"*"}. (In C, that would be written as
+@code{"\"*\"*\""}).
+
+When you specify @code{%token_table}, Bison also generates macro
+definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
+@code{YYNRULES}, and @code{YYNSTATES}:
+
+@table @code
+@item YYNTOKENS
+The highest token number, plus one.
+@item YYNNTS
+The number of non-terminal symbols.
+@item YYNRULES
+The number of grammar rules,
+@item YYNSTATES
+The number of parser states (@pxref{Parser States}).
+@end table
@end table
-@node Multiple Parsers, , Declarations, Grammar File
+@node Multiple Parsers,, Declarations, Grammar File
@section Multiple Parsers in the Same Program
Most programs that use Bison parse only one language and therefore contain
not conflict.
The precise list of symbols renamed is @code{yyparse}, @code{yylex},
-@code{yyerror}, @code{yylval}, @code{yychar} and @code{yydebug}. For
-example, if you use @samp{-p c}, the names become @code{cparse},
-@code{clex}, and so on.
+@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yychar} and
+@code{yydebug}. For example, if you use @samp{-p c}, the names become
+@code{cparse}, @code{clex}, and so on.
@strong{All the other variables and macros associated with Bison are not
renamed.} These others are not global; there is no conflict if the same
This interface has been designed so that the output from the @code{lex}
utility can be used without change as the definition of @code{yylex}.
+If the grammar uses literal string tokens, there are two ways that
+@code{yylex} can determine the token type codes for them:
+
+@itemize @bullet
+@item
+If the grammar defines symbolic token names as aliases for the
+literal string tokens, @code{yylex} can use these symbolic names like
+all others. In this case, the use of the literal string tokens in
+the grammar file has no effect on @code{yylex}.
+
+@item
+@code{yylex} can find the multi-character token in the @code{yytname}
+table. The index of the token in the table is the token type's code.
+The name of a multi-character token is recorded in @code{yytname} with a
+double-quote, the token's characters, and another double-quote. The
+token's characters are not escaped in any way; they appear verbatim in
+the contents of the string in the table.
+
+Here's code for looking up a token in @code{yytname}, assuming that the
+characters of the token are stored in @code{token_buffer}.
+
+@smallexample
+for (i = 0; i < YYNTOKENS; i++)
+ @{
+ if (yytname[i] != 0
+ && yytname[i][0] == '"'
+ && strncmp (yytname[i] + 1, token_buffer, strlen (token_buffer))
+ && yytname[i][strlen (token_buffer) + 1] == '"'
+ && yytname[i][strlen (token_buffer) + 2] == 0)
+ break;
+ @}
+@end smallexample
+
+The @code{yytname} table is generated only if you use the
+@code{%token_table} declaration. @xref{Decl Summary}.
+@end itemize
+
@node Token Values, Token Positions, Calling Convention, Lexical
@subsection Semantic Values of Tokens
The data type of @code{yylloc} has the name @code{YYLTYPE}.
@node Pure Calling, , Token Positions, Lexical
-@subsection Calling for Pure Parsers
+@subsection Calling Conventions for Pure Parsers
-When you use the Bison declaration @code{%pure_parser} to request a pure,
-reentrant parser, the global communication variables @code{yylval} and
-@code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.) In such parsers the
-two global variables are replaced by pointers passed as arguments to
-@code{yylex}. You must declare them as shown here, and pass the
-information back by storing it through those pointers.
+When you use the Bison declaration @code{%pure_parser} to request a
+pure, reentrant parser, the global communication variables @code{yylval}
+and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
+Parser}.) In such parsers the two global variables are replaced by
+pointers passed as arguments to @code{yylex}. You must declare them as
+shown here, and pass the information back by storing it through those
+pointers.
@example
yylex (lvalp, llocp)
this case, omit the second argument; @code{yylex} will be called with
only one argument.
+@vindex YYPARSE_PARAM
+If you use a reentrant parser, you can optionally pass additional
+parameter information to it in a reentrant way. To do so, define the
+macro @code{YYPARSE_PARAM} as a variable name. This modifies the
+@code{yyparse} function to accept one argument, of type @code{void *},
+with that name.
+
+When you call @code{yyparse}, pass the address of an object, casting the
+address to @code{void *}. The grammar actions can refer to the contents
+of the object by casting the pointer value back to its proper type and
+then dereferencing it. Here's an example. Write this in the parser:
+
+@example
+%@{
+struct parser_control
+@{
+ int nastiness;
+ int randomness;
+@};
+
+#define YYPARSE_PARAM parm
+%@}
+@end example
+
+@noindent
+Then call the parser like this:
+
+@example
+struct parser_control
+@{
+ int nastiness;
+ int randomness;
+@};
+
+@dots{}
+
+@{
+ struct parser_control foo;
+ @dots{} /* @r{Store proper data in @code{foo}.} */
+ value = yyparse ((void *) &foo);
+ @dots{}
+@}
+@end example
+
+@noindent
+In the grammar actions, use expressions like this to refer to the data:
+
+@example
+((struct parser_control *) parm)->randomness
+@end example
+
+@vindex YYLEX_PARAM
+If you wish to pass the additional parameter data to @code{yylex},
+define the macro @code{YYLEX_PARAM} just like @code{YYPARSE_PARAM}, as
+shown here:
+
+@example
+%@{
+struct parser_control
+@{
+ int nastiness;
+ int randomness;
+@};
+
+#define YYPARSE_PARAM parm
+#define YYLEX_PARAM parm
+%@}
+@end example
+
+You should then define @code{yylex} to accept one additional
+argument---the value of @code{parm}. (This makes either two or three
+arguments in total, depending on whether an argument of type
+@code{YYLTYPE} is passed.) You can declare the argument as a pointer to
+the proper object type, or you can declare it as @code{void *} and
+access the contents as shown above.
+
+You can use @samp{%pure_parser} to request a reentrant parser without
+also using @code{YYPARSE_PARAM}. Then you should call @code{yyparse}
+with no arguments, as usual.
+
@node Error Reporting, Action Features, Lexical, Interface
@section The Error Reporting Function @code{yyerror}
@cindex error reporting function
Ordinarily Bison puts them in the parser file so that the C compiler
and debuggers will associate errors with your source file, the
grammar file. This option causes them to associate errors with the
-parser file, treating it an independent source file in its own right.
+parser file, treating it as an independent source file in its own right.
+
+@item -n
+@itemx --no-parser
+Do not include any C code in the parser file; generate tables only. The
+parser file contains just @code{#define} directives and static variable
+declarations.
+
+This option also tells Bison to write the C code for the grammar actions
+into a file named @file{@var{filename}.act}, in the form of a
+brace-surrounded body fit for a @code{switch} statement.
@item -o @var{outfile}
@itemx --output-file=@var{outfile}
Specify the name @var{outfile} for the parser file.
The other output files' names are constructed from @var{outfile}
-as described under the @samp{-v} and @samp{-d} switches.
+as described under the @samp{-v} and @samp{-d} options.
@item -p @var{prefix}
@itemx --name-prefix=@var{prefix}
Rename the external symbols used in the parser so that they start with
@var{prefix} instead of @samp{yy}. The precise list of symbols renamed
-is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yylval},
-@code{yychar} and @code{yydebug}.
+is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
+@code{yylval}, @code{yychar} and @code{yydebug}.
For example, if you use @samp{-p c}, the names become @code{cparse},
@code{clex}, and so on.
@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
+@item -r
+@itemx --raw
+Pretend that @code{%raw} was specified. @xref{Decl Summary}.
+
@item -t
@itemx --debug
Output a definition of the macro @code{YYDEBUG} into the parser file,
@itemx --fixed-output-files
Equivalent to @samp{-o y.tab.c}; the parser output file is called
@file{y.tab.c}, and the other outputs are called @file{y.output} and
-@file{y.tab.h}. The purpose of this switch is to imitate Yacc's output
+@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
file name conventions. Thus, the following shell script can substitute
for Yacc:@refill
\line{ --help \leaderfill -h}
\line{ --name-prefix \leaderfill -p}
\line{ --no-lines \leaderfill -l}
+\line{ --no-parser \leaderfill -n}
\line{ --output-file \leaderfill -o}
+\line{ --raw \leaderfill -r}
+\line{ --token-table \leaderfill -k}
\line{ --verbose \leaderfill -v}
\line{ --version \leaderfill -V}
\line{ --yacc \leaderfill -y}
--file-prefix=@var{prefix} -b @var{file-prefix}
--fixed-output-files --yacc -y
--help -h
---name-prefix -p
+--name-prefix=@var{prefix} -p @var{name-prefix}
--no-lines -l
+--no-parser -n
--output-file=@var{outfile} -o @var{outfile}
+--raw -r
+--token-table -k
--verbose -v
--version -V
@end example
Macro for specifying the initial size of the parser stack.
@xref{Stack Overflow}.
+@item YYLEX_PARAM
+Macro for specifying an extra argument (or list of extra arguments) for
+@code{yyparse} to pass to @code{yylex}. @xref{Pure Calling,, Calling
+Conventions for Pure Parsers}.
+
@item YYLTYPE
Macro for the data type of @code{yylloc}; a structure with four
members. @xref{Token Positions, ,Textual Positions of Tokens}.
+@item yyltype
+Default value for YYLTYPE.
+
@item YYMAXDEPTH
Macro for specifying the maximum size of the parser stack.
@xref{Stack Overflow}.
+@item YYPARSE_PARAM
+Macro for specifying the name of a parameter that @code{yyparse} should
+accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}.
+
@item YYRECOVERING
Macro whose value indicates whether the parser is recovering from a
syntax error. @xref{Action Features, ,Special Features for Use in Actions}.
Bison declaration to assign left associativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
+@item %no_lines
+Bison declaration to avoid generating @code{#line} directives in the
+parser file. @xref{Decl Summary}.
+
@item %nonassoc
Bison declaration to assign nonassociativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
Bison declaration to request a pure (reentrant) parser.
@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
+@item %raw
+Bison declaration to use Bison internal token code numbers in token
+tables instead of the usual Yacc-compatible token code numbers.
+@xref{Decl Summary}.
+
@item %right
Bison declaration to assign right associativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
Bison declaration to declare token(s) without specifying precedence.
@xref{Token Decl, ,Token Type Names}.
+@item %token_table
+Bison declaration to include a token name table in the parser file.
+@xref{Decl Summary}.
+
@item %type
Bison declaration to declare nonterminals. @xref{Type Decl, ,Nonterminal Symbols}.
A flag, set by actions in the grammar rules, which alters the way
tokens are parsed. @xref{Lexical Tie-ins}.
+@item Literal string token
+A token which constists of two or more fixed characters.
+@xref{Symbols}.
+
@item Look-ahead token
A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}.