From e966383bf4a4bcf36643699f432d6eaf3edf370b Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 4 Apr 2002 21:34:34 +0000 Subject: [PATCH] * doc/bison.texinfo: Update copyright date. (Rpcalc Lexer, Symbols, Token Decl): Don't assume ASCII. (Symbols): Warn about running Bison in one character set, but compiling and/or running in an incompatible one. Warn about character code 256, too. --- ChangeLog | 15 +++++++++++++++ doc/bison.texinfo | 34 ++++++++++++++++++++++++++-------- 2 files changed, 41 insertions(+), 8 deletions(-) diff --git a/ChangeLog b/ChangeLog index 224024cf..6ae6f0ba 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,18 @@ +2002-04-04 Paul Eggert + + * doc/bison.texinfo: Update copyright date. + (Rpcalc Lexer, Symbols, Token Decl): Don't assume ASCII. + (Symbols): Warn about running Bison in one character set, + but compiling and/or running in an incompatible one. + Warn about character code 256, too. + +2002-04-03 Paul Eggert + + * src/bison.data (YYSTACK_ALLOC): Depend on whether + YYERROR_VERBOSE is nonzero, not whether it is defined. + + Merge changes from bison-1_29-branch. + 2002-03-20 Paul Eggert Merge fixes from Debian bison_1.34-1.diff. diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 7c0b3454..d26295c6 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -47,7 +47,7 @@ END-INFO-DIR-ENTRY This file documents the Bison parser generator. Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, -2000, 2001 +2000, 2001, 2002 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of @@ -89,7 +89,7 @@ instead of in the original English. @page @vskip 0pt plus 1filll Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, -1999, 2000, 2001 +1999, 2000, 2001, 2002 Free Software Foundation, Inc. @sp 2 @@ -1083,7 +1083,7 @@ The return value of the lexical analyzer function is a numeric code which represents a token type. The same text used in Bison rules to stand for this token type is also a C expression for the numeric code for the type. This works in two ways. If the token type is a character literal, then its -numeric code is the ASCII code for that character; you can use the same +numeric code is that of the character; you can use the same character literal in the lexical analyzer to express the number. If the token type is an identifier, that identifier is defined by Bison as a C macro whose definition is the appropriate number. In this example, @@ -1104,8 +1104,8 @@ Here is the code for the lexical analyzer: @example @group /* Lexical analyzer returns a double floating point - number on the stack and the token NUM, or the ASCII - character read if not a number. Skips all blanks + number on the stack and the token NUM, or the numeric code + of the character read if not a number. Skips all blanks and tabs, returns 0 for EOF. */ #include @@ -2148,7 +2148,7 @@ your program will confuse other readers. All the usual escape sequences used in character literals in C can be used in Bison as well, but you must not use the null character as a -character literal because its ASCII code, zero, is the code @code{yylex} +character literal because its numeric code, zero, is the code @code{yylex} returns for end-of-input (@pxref{Calling Convention, ,Calling Convention for @code{yylex}}). @@ -2189,7 +2189,7 @@ on when the parser function returns that symbol. The value returned by @code{yylex} is always one of the terminal symbols (or 0 for end-of-input). Whichever way you write the token type in the grammar rules, you write it the same way in the definition of @code{yylex}. -The numeric code for a character token type is simply the ASCII code for +The numeric code for a character token type is simply the numeric code of the character, so @code{yylex} can use the identical character constant to generate the requisite code. Each named token type becomes a C macro in the parser file, so @code{yylex} can use the name to stand for the code. @@ -2202,9 +2202,27 @@ option when you run Bison, so that it will write these macro definitions into a separate header file @file{@var{name}.tab.h} which you can include in the other source files that need it. @xref{Invocation, ,Invoking Bison}. +The @code{yylex} function must use the same character set and encoding +that was used by Bison. For example, if you run Bison in an +@sc{ascii} environment, but then compile and run the resulting program +in an environment that uses an incompatible character set like +@sc{ebcdic}, the resulting program will probably not work because the +tables generated by Bison will assume @sc{ascii} numeric values for +character tokens. Portable grammars should avoid non-@sc{ascii} +character tokens, as implementations in practice often use different +and incompatible extensions in this area. However, it is standard +practice for software distributions to contain C source files that +were generated by Bison in an @sc{ascii} environment, so installers on +platforms that are incompatible with @sc{ascii} must rebuild those +files before compiling them. + The symbol @code{error} is a terminal symbol reserved for error recovery (@pxref{Error Recovery}); you shouldn't use it for any other purpose. In particular, @code{yylex} should never return this value. +The default value of the error token is 256, so in the +unlikely event that you need to use a character token with numeric +value 256 you must reassign the error token's value with a +@code{%token} declaration. @node Rules @section Syntax of Grammar Rules @@ -2942,7 +2960,7 @@ an integer value in the field immediately following the token name: @noindent It is generally best, however, to let Bison choose the numeric codes for all token types. Bison will automatically select codes that don't conflict -with each other or with ASCII characters. +with each other or with normal characters. In the event that the stack type is a union, you must augment the @code{%token} or other token declaration to include the data type -- 2.47.2