This file documents the Bison parser generator.
Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
-2000, 2001
+2000, 2001, 2002
Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of
@page
@vskip 0pt plus 1filll
Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
-1999, 2000, 2001
+1999, 2000, 2001, 2002
Free Software Foundation, Inc.
@sp 2
represents a token type. The same text used in Bison rules to stand for
this token type is also a C expression for the numeric code for the type.
This works in two ways. If the token type is a character literal, then its
-numeric code is the ASCII code for that character; you can use the same
+numeric code is that of the character; you can use the same
character literal in the lexical analyzer to express the number. If the
token type is an identifier, that identifier is defined by Bison as a C
macro whose definition is the appropriate number. In this example,
@example
@group
/* Lexical analyzer returns a double floating point
- number on the stack and the token NUM, or the ASCII
- character read if not a number. Skips all blanks
+ number on the stack and the token NUM, or the numeric code
+ of the character read if not a number. Skips all blanks
and tabs, returns 0 for EOF. */
#include <ctype.h>
All the usual escape sequences used in character literals in C can be
used in Bison as well, but you must not use the null character as a
-character literal because its ASCII code, zero, is the code @code{yylex}
+character literal because its numeric code, zero, is the code @code{yylex}
returns for end-of-input (@pxref{Calling Convention, ,Calling Convention
for @code{yylex}}).
The value returned by @code{yylex} is always one of the terminal symbols
(or 0 for end-of-input). Whichever way you write the token type in the
grammar rules, you write it the same way in the definition of @code{yylex}.
-The numeric code for a character token type is simply the ASCII code for
+The numeric code for a character token type is simply the numeric code of
the character, so @code{yylex} can use the identical character constant to
generate the requisite code. Each named token type becomes a C macro in
the parser file, so @code{yylex} can use the name to stand for the code.
into a separate header file @file{@var{name}.tab.h} which you can include
in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
+The @code{yylex} function must use the same character set and encoding
+that was used by Bison. For example, if you run Bison in an
+@sc{ascii} environment, but then compile and run the resulting program
+in an environment that uses an incompatible character set like
+@sc{ebcdic}, the resulting program will probably not work because the
+tables generated by Bison will assume @sc{ascii} numeric values for
+character tokens. Portable grammars should avoid non-@sc{ascii}
+character tokens, as implementations in practice often use different
+and incompatible extensions in this area. However, it is standard
+practice for software distributions to contain C source files that
+were generated by Bison in an @sc{ascii} environment, so installers on
+platforms that are incompatible with @sc{ascii} must rebuild those
+files before compiling them.
+
The symbol @code{error} is a terminal symbol reserved for error recovery
(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
In particular, @code{yylex} should never return this value.
+The default value of the error token is 256, so in the
+unlikely event that you need to use a character token with numeric
+value 256 you must reassign the error token's value with a
+@code{%token} declaration.
@node Rules
@section Syntax of Grammar Rules
@noindent
It is generally best, however, to let Bison choose the numeric codes for
all token types. Bison will automatically select codes that don't conflict
-with each other or with ASCII characters.
+with each other or with normal characters.
In the event that the stack type is a union, you must augment the
@code{%token} or other token declaration to include the data type