\input texinfo @c -*-texinfo-*-
@comment %**start of header
@setfilename bison.info
-@settitle Bison 1.24
+@settitle Bison 1.25
@setchapternewpage odd
@iftex
@titlepage
@title Bison
@subtitle The YACC-compatible Parser Generator
-@subtitle May 1995, Bison Version 1.24
+@subtitle November 1995, Bison Version 1.25
@author by Charles Donnelly and Richard Stallman
@sp 2
Published by the Free Software Foundation @*
-675 Massachusetts Avenue @*
-Cambridge, MA 02139 USA @*
+59 Temple Place, Suite 330 @*
+Boston, MA 02111-1307 USA @*
Printed copies are available for $15 each.@*
-ISBN-1-882114-30-2
+ISBN 1-882114-45-0
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
@node Top, Introduction, (dir), (dir)
@ifinfo
-This manual documents version 1.24 of Bison.
+This manual documents version 1.25 of Bison.
@end ifinfo
@menu
don't know Bison or Yacc, start by reading these chapters. Reference
chapters follow which describe specific aspects of Bison in detail.
-Bison was written primarily by Robert Corbett; Richard Stallman made
-it Yacc-compatible. This edition corresponds to version 1.24 of Bison.
+Bison was written primarily by Robert Corbett; Richard Stallman made it
+Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added
+multicharacter string literals and other features.
+
+This edition corresponds to version 1.25 of Bison.
@node Conditions, Copying, Introduction, Top
@unnumbered Conditions for Using Bison
@code{RETURN}. A terminal symbol that stands for a particular keyword in
the language should be named after that keyword converted to upper case.
The terminal symbol @code{error} is reserved for error recovery.
-@xref{Symbols}.@refill
+@xref{Symbols}.
A terminal symbol can also be represented as a character literal, just like
a C character constant. You should do this whenever a token is just a
single character (parenthesis, plus-sign, etc.): use that same character in
a literal as the terminal symbol for that token.
+A third way to represent a terminal symbol is with a C string constant
+containing several characters. @xref{Symbols}, for more information.
+
The grammar rules also have an expression in Bison syntax. For example,
here is the Bison rule for a C @code{return} statement. The semicolon in
quotes is a literal character token, representing part of the C syntax for
Symbol names can contain letters, digits (not at the beginning),
underscores and periods. Periods make sense only in nonterminals.
-There are two ways of writing terminal symbols in the grammar:
+There are three ways of writing terminal symbols in the grammar:
@itemize @bullet
@item
@cindex character token
@cindex literal token
@cindex single-character literal
-A @dfn{character token type} (or @dfn{literal token}) is written in
-the grammar using the same syntax used in C for character constants;
-for example, @code{'+'} is a character token type. A character token
-type doesn't need to be declared unless you need to specify its
-semantic value data type (@pxref{Value Type, ,Data Types of Semantic Values}), associativity, or
-precedence (@pxref{Precedence, ,Operator Precedence}).
+A @dfn{character token type} (or @dfn{literal character token}) is
+written in the grammar using the same syntax used in C for character
+constants; for example, @code{'+'} is a character token type. A
+character token type doesn't need to be declared unless you need to
+specify its semantic value data type (@pxref{Value Type, ,Data Types of
+Semantic Values}), associativity, or precedence (@pxref{Precedence,
+,Operator Precedence}).
By convention, a character token type is used only to represent a
token that consists of that particular character. Thus, the token
All the usual escape sequences used in character literals in C can be
used in Bison as well, but you must not use the null character as a
-character literal because its ASCII code, zero, is the code
-@code{yylex} returns for end-of-input (@pxref{Calling Convention, ,Calling Convention for @code{yylex}}).
+character literal because its ASCII code, zero, is the code @code{yylex}
+returns for end-of-input (@pxref{Calling Convention, ,Calling Convention
+for @code{yylex}}).
+
+@item
+@cindex string token
+@cindex literal string token
+@cindex multi-character literal
+A @dfn{literal string token} is written like a C string constant; for
+example, @code{"<="} is a literal string token. A literal string token
+doesn't need to be declared unless you need to specify its semantic
+value data type (@pxref{Value Type}), associativity, precedence
+(@pxref{Precedence}).
+
+You can associate the literal string token with a symbolic name as an
+alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
+Declarations}). If you don't do that, the lexical analyzer has to
+retrieve the token number for the literal string token from the
+@code{yytname} table (@pxref{Calling Convention}).
+
+@strong{WARNING}: literal string tokens do not work in Yacc.
+
+By convention, a literal string token is used only to represent a token
+that consists of that particular string. Thus, you should use the token
+type @code{"<="} to represent the string @samp{<=} as a token. Bison
+does not enforces this convention, but if you depart from it, people who
+read your program will be confused.
+
+All the escape sequences used in string literals in C can be used in
+Bison as well. A literal string token must contain two or more
+characters; for a token containing just one character, use a character
+token (see above).
@end itemize
How you choose to write a terminal symbol has no effect on its
@subsection Token Type Names
@cindex declaring token type names
@cindex token type names, declaring
+@cindex declaring literal string tokens
@findex %token
The basic way to declare a token type name (terminal symbol) is as follows:
@end group
@end example
+You can associate a literal string token with a token type name by
+writing the literal string at the end of a @code{%token}
+declaration which declares the name. For example:
+
+@example
+%token arrow "=>"
+@end example
+
+@noindent
+For example, a grammar for the C language might specify these names with
+equivalent literal string tokens:
+
+@example
+%token <operator> OR "||"
+%token <operator> LE 134 "<="
+%left OR "<="
+@end example
+
+@noindent
+Once you equate the literal string and the token name, you can use them
+interchangeably in further declarations or the grammar rules. The
+@code{yylex} function can use the token name or the literal string to
+obtain the token type code number (@pxref{Calling Convention}).
+
@node Precedence Decl, Union Decl, Token Decl, Declarations
@subsection Operator Precedence
@cindex precedence declarations
the same @code{%type} declaration, if they have the same value type. Use
spaces to separate the symbol names.
+You can also declare the value type of a terminal symbol. To do this,
+use the same @code{<@var{type}>} construction in a declaration for the
+terminal symbol. All kinds of token declarations allow
+@code{<@var{type}>}.
+
@node Expect Decl, Start Decl, Type Decl, Declarations
@subsection Suppressing Conflict Warnings
@cindex suppressing conflict warnings
handler. In systems with multiple threads of control, a nonreentrant
program must be called only within interlocks.
-The Bison parser is not normally a reentrant program, because it uses
-statically allocated variables for communication with @code{yylex}. These
-variables include @code{yylval} and @code{yylloc}.
+Normally, Bison generates a parser which is not reentrant. This is
+suitable for most uses, and it permits compatibility with YACC. (The
+standard YACC interfaces are inherently nonreentrant, because they use
+statically allocated variables for communication with @code{yylex},
+including @code{yylval} and @code{yylloc}.)
-The Bison declaration @code{%pure_parser} says that you want the parser
-to be reentrant. It looks like this:
+Alternatively, you can generate a pure, reentrant parser. The Bison
+declaration @code{%pure_parser} says that you want the parser to be
+reentrant. It looks like this:
@example
%pure_parser
@end example
-The effect is that the two communication variables become local
-variables in @code{yyparse}, and a different calling convention is used
-for the lexical analyzer function @code{yylex}. @xref{Pure Calling,
-,Calling Conventions for Pure Parsers}, for the details of this. The
-variable @code{yynerrs} also becomes local in @code{yyparse}
-(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}).
-The convention for calling @code{yyparse} itself is unchanged.
+The result is that the communication variables @code{yylval} and
+@code{yylloc} become local variables in @code{yyparse}, and a different
+calling convention is used for the lexical analyzer function
+@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
+Parsers}, for the details of this. The variable @code{yynerrs} also
+becomes local in @code{yyparse} (@pxref{Error Reporting, ,The Error
+Reporting Function @code{yyerror}}). The convention for calling
+@code{yyparse} itself is unchanged.
+
+Whether the parser is pure has nothing to do with the grammar rules.
+You can generate either a pure parser or a nonreentrant parser from any
+valid grammar.
@node Decl Summary, , Pure Decl, Declarations
@subsection Bison Declaration Summary
@item %pure_parser
Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
+
+@item %no_lines
+Don't generate any @code{#line} preprocessor commands in the parser
+file. Ordinarily Bison writes these commands in the parser file so that
+the C compiler and debuggers will associate errors and object code with
+your source file (the grammar file). This directive causes them to
+associate errors with the parser file, treating it an independent source
+file in its own right.
+
+@item %raw
+The output file @file{@var{name}.h} normally defines the tokens with
+Yacc-compatible token numbers. If this option is specified, the
+internal Bison numbers are used instead. (Yacc-compatible numbers start
+at 257 except for single character tokens; Bison assigns token numbers
+sequentially for all tokens starting at 3.)
+
+@item %token_table
+Generate an array of token names in the parser file. The name of the
+array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
+token whose internal Bison token code number is @var{i}. The first three
+elements of @code{yytname} are always @code{"$"}, @code{"error"}, and
+@code{"$illegal"}; after these come the symbols defined in the grammar
+file.
+
+For single-character literal tokens and literal string tokens, the name
+in the table includes the single-quote or double-quote characters: for
+example, @code{"'+'"} is a single-character literal and @code{"\"<=\""}
+is a literal string token. All the characters of the literal string
+token appear verbatim in the string found in the table; even
+double-quote characters are not escaped. For example, if the token
+consists of three characters @samp{*"*}, its string in @code{yytname}
+contains @samp{"*"*"}. (In C, that would be written as
+@code{"\"*\"*\""}).
+
+When you specify @code{%token_table}, Bison also generates macro
+definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
+@code{YYNRULES}, and @code{YYNSTATES}:
+
+@table @code
+@item YYNTOKENS
+The highest token number, plus one.
+@item YYNNTS
+The number of non-terminal symbols.
+@item YYNRULES
+The number of grammar rules,
+@item YYNSTATES
+The number of parser states (@pxref{Parser States}).
+@end table
@end table
-@node Multiple Parsers, , Declarations, Grammar File
+@node Multiple Parsers,, Declarations, Grammar File
@section Multiple Parsers in the Same Program
Most programs that use Bison parse only one language and therefore contain
This interface has been designed so that the output from the @code{lex}
utility can be used without change as the definition of @code{yylex}.
+If the grammar uses literal string tokens, there are two ways that
+@code{yylex} can determine the token type codes for them:
+
+@itemize @bullet
+@item
+If the grammar defines symbolic token names as aliases for the
+literal string tokens, @code{yylex} can use these symbolic names like
+all others. In this case, the use of the literal string tokens in
+the grammar file has no effect on @code{yylex}.
+
+@item
+@code{yylex} can find the multi-character token in the @code{yytname}
+table. The index of the token in the table is the token type's code.
+The name of a multi-character token is recorded in @code{yytname} with a
+double-quote, the token's characters, and another double-quote. The
+token's characters are not escaped in any way; they appear verbatim in
+the contents of the string in the table.
+
+Here's code for looking up a token in @code{yytname}, assuming that the
+characters of the token are stored in @code{token_buffer}.
+
+@smallexample
+for (i = 0; i < YYNTOKENS; i++)
+ @{
+ if (yytname[i] != 0
+ && yytname[i][0] == '"'
+ && strncmp (yytname[i] + 1, token_buffer, strlen (token_buffer))
+ && yytname[i][strlen (token_buffer) + 1] == '"'
+ && yytname[i][strlen (token_buffer) + 2] == 0)
+ break;
+ @}
+@end smallexample
+
+The @code{yytname} table is generated only if you use the
+@code{%token_table} declaration. @xref{Decl Summary}.
+@end itemize
+
@node Token Values, Token Positions, Calling Convention, Lexical
@subsection Semantic Values of Tokens
only one argument.
@vindex YYPARSE_PARAM
-You can pass parameter information to a reentrant parser in a reentrant
-way. Define the macro @code{YYPARSE_PARAM} as a variable name. The
-resulting @code{yyparse} function then accepts one argument, of type
-@code{void *}, with that name.
+If you use a reentrant parser, you can optionally pass additional
+parameter information to it in a reentrant way. To do so, define the
+macro @code{YYPARSE_PARAM} as a variable name. This modifies the
+@code{yyparse} function to accept one argument, of type @code{void *},
+with that name.
When you call @code{yyparse}, pass the address of an object, casting the
address to @code{void *}. The grammar actions can refer to the contents
the proper object type, or you can declare it as @code{void *} and
access the contents as shown above.
+You can use @samp{%pure_parser} to request a reentrant parser without
+also using @code{YYPARSE_PARAM}. Then you should call @code{yyparse}
+with no arguments, as usual.
+
@node Error Reporting, Action Features, Lexical, Interface
@section The Error Reporting Function @code{yyerror}
@cindex error reporting function
Ordinarily Bison puts them in the parser file so that the C compiler
and debuggers will associate errors with your source file, the
grammar file. This option causes them to associate errors with the
-parser file, treating it an independent source file in its own right.
+parser file, treating it as an independent source file in its own right.
+
+@item -n
+@itemx --no-parser
+Do not include any C code in the parser file; generate tables only. The
+parser file contains just @code{#define} directives and static variable
+declarations.
+
+This option also tells Bison to write the C code for the grammar actions
+into a file named @file{@var{filename}.act}, in the form of a
+brace-surrounded body fit for a @code{switch} statement.
@item -o @var{outfile}
@itemx --output-file=@var{outfile}
Specify the name @var{outfile} for the parser file.
The other output files' names are constructed from @var{outfile}
-as described under the @samp{-v} and @samp{-d} switches.
+as described under the @samp{-v} and @samp{-d} options.
@item -p @var{prefix}
@itemx --name-prefix=@var{prefix}
@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
+@item -r
+@itemx --raw
+Pretend that @code{%raw} was specified. @xref{Decl Summary}.
+
@item -t
@itemx --debug
Output a definition of the macro @code{YYDEBUG} into the parser file,
@itemx --fixed-output-files
Equivalent to @samp{-o y.tab.c}; the parser output file is called
@file{y.tab.c}, and the other outputs are called @file{y.output} and
-@file{y.tab.h}. The purpose of this switch is to imitate Yacc's output
+@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
file name conventions. Thus, the following shell script can substitute
for Yacc:@refill
\line{ --help \leaderfill -h}
\line{ --name-prefix \leaderfill -p}
\line{ --no-lines \leaderfill -l}
+\line{ --no-parser \leaderfill -n}
\line{ --output-file \leaderfill -o}
+\line{ --raw \leaderfill -r}
+\line{ --token-table \leaderfill -k}
\line{ --verbose \leaderfill -v}
\line{ --version \leaderfill -V}
\line{ --yacc \leaderfill -y}
--file-prefix=@var{prefix} -b @var{file-prefix}
--fixed-output-files --yacc -y
--help -h
---name-prefix -p
+--name-prefix=@var{prefix} -p @var{name-prefix}
--no-lines -l
+--no-parser -n
--output-file=@var{outfile} -o @var{outfile}
+--raw -r
+--token-table -k
--verbose -v
--version -V
@end example
Macro for the data type of @code{yylloc}; a structure with four
members. @xref{Token Positions, ,Textual Positions of Tokens}.
+@item yyltype
+Default value for YYLTYPE.
+
@item YYMAXDEPTH
Macro for specifying the maximum size of the parser stack.
@xref{Stack Overflow}.
Bison declaration to assign left associativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
+@item %no_lines
+Bison declaration to avoid generating @code{#line} directives in the
+parser file. @xref{Decl Summary}.
+
@item %nonassoc
Bison declaration to assign nonassociativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
Bison declaration to request a pure (reentrant) parser.
@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
+@item %raw
+Bison declaration to use Bison internal token code numbers in token
+tables instead of the usual Yacc-compatible token code numbers.
+@xref{Decl Summary}.
+
@item %right
Bison declaration to assign right associativity to token(s).
@xref{Precedence Decl, ,Operator Precedence}.
Bison declaration to declare token(s) without specifying precedence.
@xref{Token Decl, ,Token Type Names}.
+@item %token_table
+Bison declaration to include a token name table in the parser file.
+@xref{Decl Summary}.
+
@item %type
Bison declaration to declare nonterminals. @xref{Type Decl, ,Nonterminal Symbols}.
A flag, set by actions in the grammar rules, which alters the way
tokens are parsed. @xref{Lexical Tie-ins}.
+@item Literal string token
+A token which constists of two or more fixed characters.
+@xref{Symbols}.
+
@item Look-ahead token
A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}.