From: Akim Demaille Date: Tue, 26 Jun 2012 14:55:23 +0000 (+0200) Subject: maint: use *.texi. X-Git-Tag: v2.6~45 X-Git-Url: https://git.saurik.com/bison.git/commitdiff_plain/9bcffa0c13d1abee13433f14cdb785cbb960425c maint: use *.texi. This is more consistent with the other packages, and Automake-NG supports only *.texi. * doc/bison.texinfo: Rename as... * doc/bison.texi: this. * doc/Makefile.am, examples/calc++/Makefile.am: Adjust. --- diff --git a/doc/Makefile.am b/doc/Makefile.am index c7f27558..d87f00f0 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -16,7 +16,7 @@ # along with this program. If not, see . AM_MAKEINFOFLAGS = --no-split -info_TEXINFOS = bison.texinfo +info_TEXINFOS = bison.texi bison_TEXINFOS = $(srcdir)/cross-options.texi gpl-3.0.texi fdl.texi CLEANFILES = bison.fns diff --git a/doc/bison.texi b/doc/bison.texi new file mode 100644 index 00000000..4f2e1c62 --- /dev/null +++ b/doc/bison.texi @@ -0,0 +1,11687 @@ +\input texinfo @c -*-texinfo-*- +@comment %**start of header +@setfilename bison.info +@include version.texi +@settitle Bison @value{VERSION} +@setchapternewpage odd + +@finalout + +@c SMALL BOOK version +@c This edition has been formatted so that you can format and print it in +@c the smallbook format. +@c @smallbook + +@c Set following if you want to document %default-prec and %no-default-prec. +@c This feature is experimental and may change in future Bison versions. +@c @set defaultprec + +@ifnotinfo +@syncodeindex fn cp +@syncodeindex vr cp +@syncodeindex tp cp +@end ifnotinfo +@ifinfo +@synindex fn cp +@synindex vr cp +@synindex tp cp +@end ifinfo +@comment %**end of header + +@copying + +This manual (@value{UPDATED}) is for GNU Bison (version +@value{VERSION}), the GNU parser generator. + +Copyright @copyright{} 1988-1993, 1995, 1998-2012 Free Software +Foundation, Inc. + +@quotation +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, +Version 1.3 or any later version published by the Free Software +Foundation; with no Invariant Sections, with the Front-Cover texts +being ``A GNU Manual,'' and with the Back-Cover Texts as in +(a) below. A copy of the license is included in the section entitled +``GNU Free Documentation License.'' + +(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and +modify this GNU manual. Buying copies from the FSF +supports it in developing GNU and promoting software +freedom.'' +@end quotation +@end copying + +@dircategory Software development +@direntry +* bison: (bison). GNU parser generator (Yacc replacement). +@end direntry + +@titlepage +@title Bison +@subtitle The Yacc-compatible Parser Generator +@subtitle @value{UPDATED}, Bison Version @value{VERSION} + +@author by Charles Donnelly and Richard Stallman + +@page +@vskip 0pt plus 1filll +@insertcopying +@sp 2 +Published by the Free Software Foundation @* +51 Franklin Street, Fifth Floor @* +Boston, MA 02110-1301 USA @* +Printed copies are available from the Free Software Foundation.@* +ISBN 1-882114-44-2 +@sp 2 +Cover art by Etienne Suvasa. +@end titlepage + +@contents + +@ifnottex +@node Top +@top Bison +@insertcopying +@end ifnottex + +@menu +* Introduction:: +* Conditions:: +* Copying:: The GNU General Public License says + how you can copy and share Bison. + +Tutorial sections: +* Concepts:: Basic concepts for understanding Bison. +* Examples:: Three simple explained examples of using Bison. + +Reference sections: +* Grammar File:: Writing Bison declarations and rules. +* Interface:: C-language interface to the parser function @code{yyparse}. +* Algorithm:: How the Bison parser works at run-time. +* Error Recovery:: Writing rules for error recovery. +* Context Dependency:: What to do if your language syntax is too + messy for Bison to handle straightforwardly. +* Debugging:: Understanding or debugging Bison parsers. +* Invocation:: How to run Bison (to produce the parser implementation). +* Other Languages:: Creating C++ and Java parsers. +* FAQ:: Frequently Asked Questions +* Table of Symbols:: All the keywords of the Bison language are explained. +* Glossary:: Basic concepts are explained. +* Copying This Manual:: License for copying this manual. +* Bibliography:: Publications cited in this manual. +* Index:: Cross-references to the text. + +@detailmenu + --- The Detailed Node Listing --- + +The Concepts of Bison + +* Language and Grammar:: Languages and context-free grammars, + as mathematical ideas. +* Grammar in Bison:: How we represent grammars for Bison's sake. +* Semantic Values:: Each token or syntactic grouping can have + a semantic value (the value of an integer, + the name of an identifier, etc.). +* Semantic Actions:: Each rule can have an action containing C code. +* GLR Parsers:: Writing parsers for general context-free languages. +* Locations:: Overview of location tracking. +* Bison Parser:: What are Bison's input and output, + how is the output used? +* Stages:: Stages in writing and running Bison grammars. +* Grammar Layout:: Overall structure of a Bison grammar file. + +Writing GLR Parsers + +* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. +* Merging GLR Parses:: Using GLR parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: GLR parsers require a modern C compiler. + +Examples + +* RPN Calc:: Reverse polish notation calculator; + a first example with no operator precedence. +* Infix Calc:: Infix (algebraic) notation calculator. + Operator precedence is introduced. +* Simple Error Recovery:: Continuing after syntax errors. +* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. +* Multi-function Calc:: Calculator with memory and trig functions. + It uses multiple data-types for semantic values. +* Exercises:: Ideas for improving the multi-function calculator. + +Reverse Polish Notation Calculator + +* Rpcalc Declarations:: Prologue (declarations) for rpcalc. +* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. +* Rpcalc Lexer:: The lexical analyzer. +* Rpcalc Main:: The controlling function. +* Rpcalc Error:: The error reporting function. +* Rpcalc Generate:: Running Bison on the grammar file. +* Rpcalc Compile:: Run the C compiler on the output code. + +Grammar Rules for @code{rpcalc} + +* Rpcalc Input:: +* Rpcalc Line:: +* Rpcalc Expr:: + +Location Tracking Calculator: @code{ltcalc} + +* Ltcalc Declarations:: Bison and C declarations for ltcalc. +* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. +* Ltcalc Lexer:: The lexical analyzer. + +Multi-Function Calculator: @code{mfcalc} + +* Mfcalc Declarations:: Bison declarations for multi-function calculator. +* Mfcalc Rules:: Grammar rules for the calculator. +* Mfcalc Symbol Table:: Symbol table management subroutines. + +Bison Grammar Files + +* Grammar Outline:: Overall layout of the grammar file. +* Symbols:: Terminal and nonterminal symbols. +* Rules:: How to write grammar rules. +* Recursion:: Writing recursive rules. +* Semantics:: Semantic values and actions. +* Tracking Locations:: Locations and actions. +* Named References:: Using named references in actions. +* Declarations:: All kinds of Bison declarations are described here. +* Multiple Parsers:: Putting more than one Bison parser in one program. + +Outline of a Bison Grammar + +* Prologue:: Syntax and usage of the prologue. +* Prologue Alternatives:: Syntax and usage of alternatives to the prologue. +* Bison Declarations:: Syntax and usage of the Bison declarations section. +* Grammar Rules:: Syntax and usage of the grammar rules section. +* Epilogue:: Syntax and usage of the epilogue. + +Defining Language Semantics + +* Value Type:: Specifying one data type for all semantic values. +* Multiple Types:: Specifying several alternative data types. +* Actions:: An action is the semantic definition of a grammar rule. +* Action Types:: Specifying data types for actions to operate on. +* Mid-Rule Actions:: Most actions go at the end of a rule. + This says when, why and how to use the exceptional + action in the middle of a rule. + +Tracking Locations + +* Location Type:: Specifying a data type for locations. +* Actions and Locations:: Using locations in actions. +* Location Default Action:: Defining a general way to compute locations. + +Bison Declarations + +* Require Decl:: Requiring a Bison version. +* Token Decl:: Declaring terminal symbols. +* Precedence Decl:: Declaring terminals with precedence and associativity. +* Union Decl:: Declaring the set of all semantic value types. +* Type Decl:: Declaring the choice of type for a nonterminal symbol. +* Initial Action Decl:: Code run before parsing starts. +* Destructor Decl:: Declaring how symbols are freed. +* Printer Decl:: Declaring how symbol values are displayed. +* Expect Decl:: Suppressing warnings about parsing conflicts. +* Start Decl:: Specifying the start symbol. +* Pure Decl:: Requesting a reentrant parser. +* Push Decl:: Requesting a push parser. +* Decl Summary:: Table of all Bison declarations. +* %define Summary:: Defining variables to adjust Bison's behavior. +* %code Summary:: Inserting code into the parser source. + +Parser C-Language Interface + +* Parser Function:: How to call @code{yyparse} and what it returns. +* Push Parser Function:: How to call @code{yypush_parse} and what it returns. +* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. +* Parser Create Function:: How to call @code{yypstate_new} and what it returns. +* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. +* Lexical:: You must supply a function @code{yylex} + which reads tokens. +* Error Reporting:: You must supply a function @code{yyerror}. +* Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. + +The Lexical Analyzer Function @code{yylex} + +* Calling Convention:: How @code{yyparse} calls @code{yylex}. +* Token Values:: How @code{yylex} must return the semantic value + of the token it has read. +* Token Locations:: How @code{yylex} must return the text location + (line number, etc.) of the token, if the + actions want that. +* Pure Calling:: How the calling convention differs in a pure parser + (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). + +The Bison Parser Algorithm + +* Lookahead:: Parser looks one token ahead when deciding what to do. +* Shift/Reduce:: Conflicts: when either shifting or reduction is valid. +* Precedence:: Operator precedence works by resolving conflicts. +* Contextual Precedence:: When an operator's precedence depends on context. +* Parser States:: The parser is a finite-state-machine with stack. +* Reduce/Reduce:: When two rules are applicable in the same situation. +* Mysterious Conflicts:: Conflicts that look unjustified. +* Tuning LR:: How to tune fundamental aspects of LR-based parsing. +* Generalized LR Parsing:: Parsing arbitrary context-free grammars. +* Memory Management:: What happens when memory is exhausted. How to avoid it. + +Operator Precedence + +* Why Precedence:: An example showing why precedence is needed. +* Using Precedence:: How to specify precedence in Bison grammars. +* Precedence Examples:: How these features are used in the previous example. +* How Precedence:: How they work. + +Tuning LR + +* LR Table Construction:: Choose a different construction algorithm. +* Default Reductions:: Disable default reductions. +* LAC:: Correct lookahead sets in the parser states. +* Unreachable States:: Keep unreachable parser states for debugging. + +Handling Context Dependencies + +* Semantic Tokens:: Token parsing can depend on the semantic context. +* Lexical Tie-ins:: Token parsing can depend on the syntactic context. +* Tie-in Recovery:: Lexical tie-ins have implications for how + error recovery rules must be written. + +Debugging Your Parser + +* Understanding:: Understanding the structure of your parser. +* Tracing:: Tracing the execution of your parser. + +Tracing Your Parser + +* Enabling Traces:: Activating run-time trace support +* Mfcalc Traces:: Extending @code{mfcalc} to support traces +* The YYPRINT Macro:: Obsolete interface for semantic value reports + +Invoking Bison + +* Bison Options:: All the options described in detail, + in alphabetical order by short options. +* Option Cross Key:: Alphabetical list of long options. +* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}. + +Parsers Written In Other Languages + +* C++ Parsers:: The interface to generate C++ parser classes +* Java Parsers:: The interface to generate Java parser classes + +C++ Parsers + +* C++ Bison Interface:: Asking for C++ parser generation +* C++ Semantic Values:: %union vs. C++ +* C++ Location Values:: The position and location classes +* C++ Parser Interface:: Instantiating and running the parser +* C++ Scanner Interface:: Exchanges between yylex and parse +* A Complete C++ Example:: Demonstrating their use + +C++ Location Values + +* C++ position:: One point in the source file +* C++ location:: Two points in the source file + +A Complete C++ Example + +* Calc++ --- C++ Calculator:: The specifications +* Calc++ Parsing Driver:: An active parsing context +* Calc++ Parser:: A parser class +* Calc++ Scanner:: A pure C++ Flex scanner +* Calc++ Top Level:: Conducting the band + +Java Parsers + +* Java Bison Interface:: Asking for Java parser generation +* Java Semantic Values:: %type and %token vs. Java +* Java Location Values:: The position and location classes +* Java Parser Interface:: Instantiating and running the parser +* Java Scanner Interface:: Specifying the scanner for the parser +* Java Action Features:: Special features for use in actions +* Java Differences:: Differences between C/C++ and Java Grammars +* Java Declarations Summary:: List of Bison declarations used with Java + +Frequently Asked Questions + +* Memory Exhausted:: Breaking the Stack Limits +* How Can I Reset the Parser:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings +* Implementing Gotos/Loops:: Control Flow in the Calculator +* Multiple start-symbols:: Factoring closely related grammars +* Secure? Conform?:: Is Bison POSIX safe? +* I can't build Bison:: Troubleshooting +* Where can I find help?:: Troubleshouting +* Bug Reports:: Troublereporting +* More Languages:: Parsers in C++, Java, and so on +* Beta Testing:: Experimenting development versions +* Mailing Lists:: Meeting other Bison users + +Copying This Manual + +* Copying This Manual:: License for copying this manual. + +@end detailmenu +@end menu + +@node Introduction +@unnumbered Introduction +@cindex introduction + +@dfn{Bison} is a general-purpose parser generator that converts an +annotated context-free grammar into a deterministic LR or generalized +LR (GLR) parser employing LALR(1) parser tables. As an experimental +feature, Bison can also generate IELR(1) or canonical LR(1) parser +tables. Once you are proficient with Bison, you can use it to develop +a wide range of language parsers, from those used in simple desk +calculators to complex programming languages. + +Bison is upward compatible with Yacc: all properly-written Yacc +grammars ought to work with Bison with no change. Anyone familiar +with Yacc should be able to use Bison with little trouble. You need +to be fluent in C or C++ programming in order to use Bison or to +understand this manual. Java is also supported as an experimental +feature. + +We begin with tutorial chapters that explain the basic concepts of +using Bison and show three explained examples, each building on the +last. If you don't know Bison or Yacc, start by reading these +chapters. Reference chapters follow, which describe specific aspects +of Bison in detail. + +Bison was written originally by Robert Corbett. Richard Stallman made +it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University +added multi-character string literals and other features. Since then, +Bison has grown more robust and evolved many other new features thanks +to the hard work of a long list of volunteers. For details, see the +@file{THANKS} and @file{ChangeLog} files included in the Bison +distribution. + +This edition corresponds to version @value{VERSION} of Bison. + +@node Conditions +@unnumbered Conditions for Using Bison + +The distribution terms for Bison-generated parsers permit using the +parsers in nonfree programs. Before Bison version 2.2, these extra +permissions applied only when Bison was generating LALR(1) +parsers in C@. And before Bison version 1.24, Bison-generated +parsers could be used only in programs that were free software. + +The other GNU programming tools, such as the GNU C +compiler, have never +had such a requirement. They could always be used for nonfree +software. The reason Bison was different was not due to a special +policy decision; it resulted from applying the usual General Public +License to all of the Bison source code. + +The main output of the Bison utility---the Bison parser implementation +file---contains a verbatim copy of a sizable piece of Bison, which is +the code for the parser's implementation. (The actions from your +grammar are inserted into this implementation at one point, but most +of the rest of the implementation is not changed.) When we applied +the GPL terms to the skeleton code for the parser's implementation, +the effect was to restrict the use of Bison output to free software. + +We didn't change the terms because of sympathy for people who want to +make software proprietary. @strong{Software should be free.} But we +concluded that limiting Bison's use to free software was doing little to +encourage people to make other software free. So we decided to make the +practical conditions for using Bison match the practical conditions for +using the other GNU tools. + +This exception applies when Bison is generating code for a parser. +You can tell whether the exception applies to a Bison output file by +inspecting the file for text beginning with ``As a special +exception@dots{}''. The text spells out the exact terms of the +exception. + +@node Copying +@unnumbered GNU GENERAL PUBLIC LICENSE +@include gpl-3.0.texi + +@node Concepts +@chapter The Concepts of Bison + +This chapter introduces many of the basic concepts without which the +details of Bison will not make sense. If you do not already know how to +use Bison or Yacc, we suggest you start by reading this chapter carefully. + +@menu +* Language and Grammar:: Languages and context-free grammars, + as mathematical ideas. +* Grammar in Bison:: How we represent grammars for Bison's sake. +* Semantic Values:: Each token or syntactic grouping can have + a semantic value (the value of an integer, + the name of an identifier, etc.). +* Semantic Actions:: Each rule can have an action containing C code. +* GLR Parsers:: Writing parsers for general context-free languages. +* Locations:: Overview of location tracking. +* Bison Parser:: What are Bison's input and output, + how is the output used? +* Stages:: Stages in writing and running Bison grammars. +* Grammar Layout:: Overall structure of a Bison grammar file. +@end menu + +@node Language and Grammar +@section Languages and Context-Free Grammars + +@cindex context-free grammar +@cindex grammar, context-free +In order for Bison to parse a language, it must be described by a +@dfn{context-free grammar}. This means that you specify one or more +@dfn{syntactic groupings} and give rules for constructing them from their +parts. For example, in the C language, one kind of grouping is called an +`expression'. One rule for making an expression might be, ``An expression +can be made of a minus sign and another expression''. Another would be, +``An expression can be an integer''. As you can see, rules are often +recursive, but there must be at least one rule which leads out of the +recursion. + +@cindex BNF +@cindex Backus-Naur form +The most common formal system for presenting such rules for humans to read +is @dfn{Backus-Naur Form} or ``BNF'', which was developed in +order to specify the language Algol 60. Any grammar expressed in +BNF is a context-free grammar. The input to Bison is +essentially machine-readable BNF. + +@cindex LALR grammars +@cindex IELR grammars +@cindex LR grammars +There are various important subclasses of context-free grammars. Although +it can handle almost all context-free grammars, Bison is optimized for what +are called LR(1) grammars. In brief, in these grammars, it must be possible +to tell how to parse any portion of an input string with just a single token +of lookahead. For historical reasons, Bison by default is limited by the +additional restrictions of LALR(1), which is hard to explain simply. +@xref{Mysterious Conflicts}, for more information on this. As an +experimental feature, you can escape these additional restrictions by +requesting IELR(1) or canonical LR(1) parser tables. @xref{LR Table +Construction}, to learn how. + +@cindex GLR parsing +@cindex generalized LR (GLR) parsing +@cindex ambiguous grammars +@cindex nondeterministic parsing + +Parsers for LR(1) grammars are @dfn{deterministic}, meaning +roughly that the next grammar rule to apply at any point in the input is +uniquely determined by the preceding input and a fixed, finite portion +(called a @dfn{lookahead}) of the remaining input. A context-free +grammar can be @dfn{ambiguous}, meaning that there are multiple ways to +apply the grammar rules to get the same inputs. Even unambiguous +grammars can be @dfn{nondeterministic}, meaning that no fixed +lookahead always suffices to determine the next grammar rule to apply. +With the proper declarations, Bison is also able to parse these more +general context-free grammars, using a technique known as GLR +parsing (for Generalized LR). Bison's GLR parsers +are able to handle any context-free grammar for which the number of +possible parses of any given string is finite. + +@cindex symbols (abstract) +@cindex token +@cindex syntactic grouping +@cindex grouping, syntactic +In the formal grammatical rules for a language, each kind of syntactic +unit or grouping is named by a @dfn{symbol}. Those which are built by +grouping smaller constructs according to grammatical rules are called +@dfn{nonterminal symbols}; those which can't be subdivided are called +@dfn{terminal symbols} or @dfn{token types}. We call a piece of input +corresponding to a single terminal symbol a @dfn{token}, and a piece +corresponding to a single nonterminal symbol a @dfn{grouping}. + +We can use the C language as an example of what symbols, terminal and +nonterminal, mean. The tokens of C are identifiers, constants (numeric +and string), and the various keywords, arithmetic operators and +punctuation marks. So the terminal symbols of a grammar for C include +`identifier', `number', `string', plus one symbol for each keyword, +operator or punctuation mark: `if', `return', `const', `static', `int', +`char', `plus-sign', `open-brace', `close-brace', `comma' and many more. +(These tokens can be subdivided into characters, but that is a matter of +lexicography, not grammar.) + +Here is a simple C function subdivided into tokens: + +@example +int /* @r{keyword `int'} */ +square (int x) /* @r{identifier, open-paren, keyword `int',} + @r{identifier, close-paren} */ +@{ /* @r{open-brace} */ + return x * x; /* @r{keyword `return', identifier, asterisk,} + @r{identifier, semicolon} */ +@} /* @r{close-brace} */ +@end example + +The syntactic groupings of C include the expression, the statement, the +declaration, and the function definition. These are represented in the +grammar of C by nonterminal symbols `expression', `statement', +`declaration' and `function definition'. The full grammar uses dozens of +additional language constructs, each with its own nonterminal symbol, in +order to express the meanings of these four. The example above is a +function definition; it contains one declaration, and one statement. In +the statement, each @samp{x} is an expression and so is @samp{x * x}. + +Each nonterminal symbol must have grammatical rules showing how it is made +out of simpler constructs. For example, one kind of C statement is the +@code{return} statement; this would be described with a grammar rule which +reads informally as follows: + +@quotation +A `statement' can be made of a `return' keyword, an `expression' and a +`semicolon'. +@end quotation + +@noindent +There would be many other rules for `statement', one for each kind of +statement in C. + +@cindex start symbol +One nonterminal symbol must be distinguished as the special one which +defines a complete utterance in the language. It is called the @dfn{start +symbol}. In a compiler, this means a complete input program. In the C +language, the nonterminal symbol `sequence of definitions and declarations' +plays this role. + +For example, @samp{1 + 2} is a valid C expression---a valid part of a C +program---but it is not valid as an @emph{entire} C program. In the +context-free grammar of C, this follows from the fact that `expression' is +not the start symbol. + +The Bison parser reads a sequence of tokens as its input, and groups the +tokens using the grammar rules. If the input is valid, the end result is +that the entire token sequence reduces to a single grouping whose symbol is +the grammar's start symbol. If we use a grammar for C, the entire input +must be a `sequence of definitions and declarations'. If not, the parser +reports a syntax error. + +@node Grammar in Bison +@section From Formal Rules to Bison Input +@cindex Bison grammar +@cindex grammar, Bison +@cindex formal grammar + +A formal grammar is a mathematical construct. To define the language +for Bison, you must write a file expressing the grammar in Bison syntax: +a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}. + +A nonterminal symbol in the formal grammar is represented in Bison input +as an identifier, like an identifier in C@. By convention, it should be +in lower case, such as @code{expr}, @code{stmt} or @code{declaration}. + +The Bison representation for a terminal symbol is also called a @dfn{token +type}. Token types as well can be represented as C-like identifiers. By +convention, these identifiers should be upper case to distinguish them from +nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or +@code{RETURN}. A terminal symbol that stands for a particular keyword in +the language should be named after that keyword converted to upper case. +The terminal symbol @code{error} is reserved for error recovery. +@xref{Symbols}. + +A terminal symbol can also be represented as a character literal, just like +a C character constant. You should do this whenever a token is just a +single character (parenthesis, plus-sign, etc.): use that same character in +a literal as the terminal symbol for that token. + +A third way to represent a terminal symbol is with a C string constant +containing several characters. @xref{Symbols}, for more information. + +The grammar rules also have an expression in Bison syntax. For example, +here is the Bison rule for a C @code{return} statement. The semicolon in +quotes is a literal character token, representing part of the C syntax for +the statement; the naked semicolon, and the colon, are Bison punctuation +used in every rule. + +@example +stmt: RETURN expr ';' ; +@end example + +@noindent +@xref{Rules, ,Syntax of Grammar Rules}. + +@node Semantic Values +@section Semantic Values +@cindex semantic value +@cindex value, semantic + +A formal grammar selects tokens only by their classifications: for example, +if a rule mentions the terminal symbol `integer constant', it means that +@emph{any} integer constant is grammatically valid in that position. The +precise value of the constant is irrelevant to how to parse the input: if +@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally +grammatical. + +But the precise value is very important for what the input means once it is +parsed. A compiler is useless if it fails to distinguish between 4, 1 and +3989 as constants in the program! Therefore, each token in a Bison grammar +has both a token type and a @dfn{semantic value}. @xref{Semantics, +,Defining Language Semantics}, +for details. + +The token type is a terminal symbol defined in the grammar, such as +@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything +you need to know to decide where the token may validly appear and how to +group it with other tokens. The grammar rules know nothing about tokens +except their types. + +The semantic value has all the rest of the information about the +meaning of the token, such as the value of an integer, or the name of an +identifier. (A token such as @code{','} which is just punctuation doesn't +need to have any semantic value.) + +For example, an input token might be classified as token type +@code{INTEGER} and have the semantic value 4. Another input token might +have the same token type @code{INTEGER} but value 3989. When a grammar +rule says that @code{INTEGER} is allowed, either of these tokens is +acceptable because each is an @code{INTEGER}. When the parser accepts the +token, it keeps track of the token's semantic value. + +Each grouping can also have a semantic value as well as its nonterminal +symbol. For example, in a calculator, an expression typically has a +semantic value that is a number. In a compiler for a programming +language, an expression typically has a semantic value that is a tree +structure describing the meaning of the expression. + +@node Semantic Actions +@section Semantic Actions +@cindex semantic actions +@cindex actions, semantic + +In order to be useful, a program must do more than parse input; it must +also produce some output based on the input. In a Bison grammar, a grammar +rule can have an @dfn{action} made up of C statements. Each time the +parser recognizes a match for that rule, the action is executed. +@xref{Actions}. + +Most of the time, the purpose of an action is to compute the semantic value +of the whole construct from the semantic values of its parts. For example, +suppose we have a rule which says an expression can be the sum of two +expressions. When the parser recognizes such a sum, each of the +subexpressions has a semantic value which describes how it was built up. +The action for this rule should create a similar sort of value for the +newly recognized larger expression. + +For example, here is a rule that says an expression can be the sum of +two subexpressions: + +@example +expr: expr '+' expr @{ $$ = $1 + $3; @} ; +@end example + +@noindent +The action says how to produce the semantic value of the sum expression +from the values of the two subexpressions. + +@node GLR Parsers +@section Writing GLR Parsers +@cindex GLR parsing +@cindex generalized LR (GLR) parsing +@findex %glr-parser +@cindex conflicts +@cindex shift/reduce conflicts +@cindex reduce/reduce conflicts + +In some grammars, Bison's deterministic +LR(1) parsing algorithm cannot decide whether to apply a +certain grammar rule at a given point. That is, it may not be able to +decide (on the basis of the input read so far) which of two possible +reductions (applications of a grammar rule) applies, or whether to apply +a reduction or read more of the input and apply a reduction later in the +input. These are known respectively as @dfn{reduce/reduce} conflicts +(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts +(@pxref{Shift/Reduce}). + +To use a grammar that is not easily modified to be LR(1), a +more general parsing algorithm is sometimes necessary. If you include +@code{%glr-parser} among the Bison declarations in your file +(@pxref{Grammar Outline}), the result is a Generalized LR +(GLR) parser. These parsers handle Bison grammars that +contain no unresolved conflicts (i.e., after applying precedence +declarations) identically to deterministic parsers. However, when +faced with unresolved shift/reduce and reduce/reduce conflicts, +GLR parsers use the simple expedient of doing both, +effectively cloning the parser to follow both possibilities. Each of +the resulting parsers can again split, so that at any given time, there +can be any number of possible parses being explored. The parsers +proceed in lockstep; that is, all of them consume (shift) a given input +symbol before any of them proceed to the next. Each of the cloned +parsers eventually meets one of two possible fates: either it runs into +a parsing error, in which case it simply vanishes, or it merges with +another parser, because the two of them have reduced the input to an +identical set of symbols. + +During the time that there are multiple parsers, semantic actions are +recorded, but not performed. When a parser disappears, its recorded +semantic actions disappear as well, and are never performed. When a +reduction makes two parsers identical, causing them to merge, Bison +records both sets of semantic actions. Whenever the last two parsers +merge, reverting to the single-parser case, Bison resolves all the +outstanding actions either by precedences given to the grammar rules +involved, or by performing both actions, and then calling a designated +user-defined function on the resulting values to produce an arbitrary +merged result. + +@menu +* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. +* Merging GLR Parses:: Using GLR parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: GLR parsers require a modern C compiler. +@end menu + +@node Simple GLR Parsers +@subsection Using GLR on Unambiguous Grammars +@cindex GLR parsing, unambiguous grammars +@cindex generalized LR (GLR) parsing, unambiguous grammars +@findex %glr-parser +@findex %expect-rr +@cindex conflicts +@cindex reduce/reduce conflicts +@cindex shift/reduce conflicts + +In the simplest cases, you can use the GLR algorithm +to parse grammars that are unambiguous but fail to be LR(1). +Such grammars typically require more than one symbol of lookahead. + +Consider a problem that +arises in the declaration of enumerated and subrange types in the +programming language Pascal. Here are some examples: + +@example +type subrange = lo .. hi; +type enum = (a, b, c); +@end example + +@noindent +The original language standard allows only numeric +literals and constant identifiers for the subrange bounds (@samp{lo} +and @samp{hi}), but Extended Pascal (ISO/IEC +10206) and many other +Pascal implementations allow arbitrary expressions there. This gives +rise to the following situation, containing a superfluous pair of +parentheses: + +@example +type subrange = (a) .. b; +@end example + +@noindent +Compare this to the following declaration of an enumerated +type with only one value: + +@example +type enum = (a); +@end example + +@noindent +(These declarations are contrived, but they are syntactically +valid, and more-complicated cases can come up in practical programs.) + +These two declarations look identical until the @samp{..} token. +With normal LR(1) one-token lookahead it is not +possible to decide between the two forms when the identifier +@samp{a} is parsed. It is, however, desirable +for a parser to decide this, since in the latter case +@samp{a} must become a new identifier to represent the enumeration +value, while in the former case @samp{a} must be evaluated with its +current meaning, which may be a constant or even a function call. + +You could parse @samp{(a)} as an ``unspecified identifier in parentheses'', +to be resolved later, but this typically requires substantial +contortions in both semantic actions and large parts of the +grammar, where the parentheses are nested in the recursive rules for +expressions. + +You might think of using the lexer to distinguish between the two +forms by returning different tokens for currently defined and +undefined identifiers. But if these declarations occur in a local +scope, and @samp{a} is defined in an outer scope, then both forms +are possible---either locally redefining @samp{a}, or using the +value of @samp{a} from the outer scope. So this approach cannot +work. + +A simple solution to this problem is to declare the parser to +use the GLR algorithm. +When the GLR parser reaches the critical state, it +merely splits into two branches and pursues both syntax rules +simultaneously. Sooner or later, one of them runs into a parsing +error. If there is a @samp{..} token before the next +@samp{;}, the rule for enumerated types fails since it cannot +accept @samp{..} anywhere; otherwise, the subrange type rule +fails since it requires a @samp{..} token. So one of the branches +fails silently, and the other one continues normally, performing +all the intermediate actions that were postponed during the split. + +If the input is syntactically incorrect, both branches fail and the parser +reports a syntax error as usual. + +The effect of all this is that the parser seems to ``guess'' the +correct branch to take, or in other words, it seems to use more +lookahead than the underlying LR(1) algorithm actually allows +for. In this example, LR(2) would suffice, but also some cases +that are not LR(@math{k}) for any @math{k} can be handled this way. + +In general, a GLR parser can take quadratic or cubic worst-case time, +and the current Bison parser even takes exponential time and space +for some grammars. In practice, this rarely happens, and for many +grammars it is possible to prove that it cannot happen. +The present example contains only one conflict between two +rules, and the type-declaration context containing the conflict +cannot be nested. So the number of +branches that can exist at any time is limited by the constant 2, +and the parsing time is still linear. + +Here is a Bison grammar corresponding to the example above. It +parses a vastly simplified form of Pascal type declarations. + +@example +%token TYPE DOTDOT ID + +@group +%left '+' '-' +%left '*' '/' +@end group + +%% + +@group +type_decl: TYPE ID '=' type ';' ; +@end group + +@group +type: + '(' id_list ')' +| expr DOTDOT expr +; +@end group + +@group +id_list: + ID +| id_list ',' ID +; +@end group + +@group +expr: + '(' expr ')' +| expr '+' expr +| expr '-' expr +| expr '*' expr +| expr '/' expr +| ID +; +@end group +@end example + +When used as a normal LR(1) grammar, Bison correctly complains +about one reduce/reduce conflict. In the conflicting situation the +parser chooses one of the alternatives, arbitrarily the one +declared first. Therefore the following correct input is not +recognized: + +@example +type t = (a) .. b; +@end example + +The parser can be turned into a GLR parser, while also telling Bison +to be silent about the one known reduce/reduce conflict, by adding +these two declarations to the Bison grammar file (before the first +@samp{%%}): + +@example +%glr-parser +%expect-rr 1 +@end example + +@noindent +No change in the grammar itself is required. Now the +parser recognizes all valid declarations, according to the +limited syntax above, transparently. In fact, the user does not even +notice when the parser splits. + +So here we have a case where we can use the benefits of GLR, +almost without disadvantages. Even in simple cases like this, however, +there are at least two potential problems to beware. First, always +analyze the conflicts reported by Bison to make sure that GLR +splitting is only done where it is intended. A GLR parser +splitting inadvertently may cause problems less obvious than an +LR parser statically choosing the wrong alternative in a +conflict. Second, consider interactions with the lexer (@pxref{Semantic +Tokens}) with great care. Since a split parser consumes tokens without +performing any actions during the split, the lexer cannot obtain +information via parser actions. Some cases of lexer interactions can be +eliminated by using GLR to shift the complications from the +lexer to the parser. You must check the remaining cases for +correctness. + +In our example, it would be safe for the lexer to return tokens based on +their current meanings in some symbol table, because no new symbols are +defined in the middle of a type declaration. Though it is possible for +a parser to define the enumeration constants as they are parsed, before +the type declaration is completed, it actually makes no difference since +they cannot be used within the same enumerated type declaration. + +@node Merging GLR Parses +@subsection Using GLR to Resolve Ambiguities +@cindex GLR parsing, ambiguous grammars +@cindex generalized LR (GLR) parsing, ambiguous grammars +@findex %dprec +@findex %merge +@cindex conflicts +@cindex reduce/reduce conflicts + +Let's consider an example, vastly simplified from a C++ grammar. + +@example +%@{ + #include + #define YYSTYPE char const * + int yylex (void); + void yyerror (char const *); +%@} + +%token TYPENAME ID + +%right '=' +%left '+' + +%glr-parser + +%% + +prog: + /* Nothing. */ +| prog stmt @{ printf ("\n"); @} +; + +stmt: + expr ';' %dprec 1 +| decl %dprec 2 +; + +expr: + ID @{ printf ("%s ", $$); @} +| TYPENAME '(' expr ')' + @{ printf ("%s ", $1); @} +| expr '+' expr @{ printf ("+ "); @} +| expr '=' expr @{ printf ("= "); @} +; + +decl: + TYPENAME declarator ';' + @{ printf ("%s ", $1); @} +| TYPENAME declarator '=' expr ';' + @{ printf ("%s ", $1); @} +; + +declarator: + ID @{ printf ("\"%s\" ", $1); @} +| '(' declarator ')' +; +@end example + +@noindent +This models a problematic part of the C++ grammar---the ambiguity between +certain declarations and statements. For example, + +@example +T (x) = y+z; +@end example + +@noindent +parses as either an @code{expr} or a @code{stmt} +(assuming that @samp{T} is recognized as a @code{TYPENAME} and +@samp{x} as an @code{ID}). +Bison detects this as a reduce/reduce conflict between the rules +@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the +time it encounters @code{x} in the example above. Since this is a +GLR parser, it therefore splits the problem into two parses, one for +each choice of resolving the reduce/reduce conflict. +Unlike the example from the previous section (@pxref{Simple GLR Parsers}), +however, neither of these parses ``dies,'' because the grammar as it stands is +ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and +the other reduces @code{stmt : decl}, after which both parsers are in an +identical state: they've seen @samp{prog stmt} and have the same unprocessed +input remaining. We say that these parses have @dfn{merged.} + +At this point, the GLR parser requires a specification in the +grammar of how to choose between the competing parses. +In the example above, the two @code{%dprec} +declarations specify that Bison is to give precedence +to the parse that interprets the example as a +@code{decl}, which implies that @code{x} is a declarator. +The parser therefore prints + +@example +"x" y z + T +@end example + +The @code{%dprec} declarations only come into play when more than one +parse survives. Consider a different input string for this parser: + +@example +T (x) + y; +@end example + +@noindent +This is another example of using GLR to parse an unambiguous +construct, as shown in the previous section (@pxref{Simple GLR Parsers}). +Here, there is no ambiguity (this cannot be parsed as a declaration). +However, at the time the Bison parser encounters @code{x}, it does not +have enough information to resolve the reduce/reduce conflict (again, +between @code{x} as an @code{expr} or a @code{declarator}). In this +case, no precedence declaration is used. Again, the parser splits +into two, one assuming that @code{x} is an @code{expr}, and the other +assuming @code{x} is a @code{declarator}. The second of these parsers +then vanishes when it sees @code{+}, and the parser prints + +@example +x T y + +@end example + +Suppose that instead of resolving the ambiguity, you wanted to see all +the possibilities. For this purpose, you must merge the semantic +actions of the two possible parsers, rather than choosing one over the +other. To do so, you could change the declaration of @code{stmt} as +follows: + +@example +stmt: + expr ';' %merge +| decl %merge +; +@end example + +@noindent +and define the @code{stmtMerge} function as: + +@example +static YYSTYPE +stmtMerge (YYSTYPE x0, YYSTYPE x1) +@{ + printf (" "); + return ""; +@} +@end example + +@noindent +with an accompanying forward declaration +in the C declarations at the beginning of the file: + +@example +%@{ + #define YYSTYPE char const * + static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1); +%@} +@end example + +@noindent +With these declarations, the resulting parser parses the first example +as both an @code{expr} and a @code{decl}, and prints + +@example +"x" y z + T x T y z + = +@end example + +Bison requires that all of the +productions that participate in any particular merge have identical +@samp{%merge} clauses. Otherwise, the ambiguity would be unresolvable, +and the parser will report an error during any parse that results in +the offending merge. + +@node GLR Semantic Actions +@subsection GLR Semantic Actions + +@cindex deferred semantic actions +By definition, a deferred semantic action is not performed at the same time as +the associated reduction. +This raises caveats for several Bison features you might use in a semantic +action in a GLR parser. + +@vindex yychar +@cindex GLR parsers and @code{yychar} +@vindex yylval +@cindex GLR parsers and @code{yylval} +@vindex yylloc +@cindex GLR parsers and @code{yylloc} +In any semantic action, you can examine @code{yychar} to determine the type of +the lookahead token present at the time of the associated reduction. +After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF}, +you can then examine @code{yylval} and @code{yylloc} to determine the +lookahead token's semantic value and location, if any. +In a nondeferred semantic action, you can also modify any of these variables to +influence syntax analysis. +@xref{Lookahead, ,Lookahead Tokens}. + +@findex yyclearin +@cindex GLR parsers and @code{yyclearin} +In a deferred semantic action, it's too late to influence syntax analysis. +In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to +shallow copies of the values they had at the time of the associated reduction. +For this reason alone, modifying them is dangerous. +Moreover, the result of modifying them is undefined and subject to change with +future versions of Bison. +For example, if a semantic action might be deferred, you should never write it +to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free +memory referenced by @code{yylval}. + +@findex YYERROR +@cindex GLR parsers and @code{YYERROR} +Another Bison feature requiring special consideration is @code{YYERROR} +(@pxref{Action Features}), which you can invoke in a semantic action to +initiate error recovery. +During deterministic GLR operation, the effect of @code{YYERROR} is +the same as its effect in a deterministic parser. +In a deferred semantic action, its effect is undefined. +@c The effect is probably a syntax error at the split point. + +Also, see @ref{Location Default Action, ,Default Action for Locations}, which +describes a special usage of @code{YYLLOC_DEFAULT} in GLR parsers. + +@node Compiler Requirements +@subsection Considerations when Compiling GLR Parsers +@cindex @code{inline} +@cindex GLR parsers and @code{inline} + +The GLR parsers require a compiler for ISO C89 or +later. In addition, they use the @code{inline} keyword, which is not +C89, but is C99 and is a common extension in pre-C99 compilers. It is +up to the user of these parsers to handle +portability issues. For instance, if using Autoconf and the Autoconf +macro @code{AC_C_INLINE}, a mere + +@example +%@{ + #include +%@} +@end example + +@noindent +will suffice. Otherwise, we suggest + +@example +%@{ + #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \ + && ! defined inline) + # define inline + #endif +%@} +@end example + +@node Locations +@section Locations +@cindex location +@cindex textual location +@cindex location, textual + +Many applications, like interpreters or compilers, have to produce verbose +and useful error messages. To achieve this, one must be able to keep track of +the @dfn{textual location}, or @dfn{location}, of each syntactic construct. +Bison provides a mechanism for handling these locations. + +Each token has a semantic value. In a similar fashion, each token has an +associated location, but the type of locations is the same for all tokens +and groupings. Moreover, the output parser is equipped with a default data +structure for storing locations (@pxref{Tracking Locations}, for more +details). + +Like semantic values, locations can be reached in actions using a dedicated +set of constructs. In the example above, the location of the whole grouping +is @code{@@$}, while the locations of the subexpressions are @code{@@1} and +@code{@@3}. + +When a rule is matched, a default action is used to compute the semantic value +of its left hand side (@pxref{Actions}). In the same way, another default +action is used for locations. However, the action for locations is general +enough for most cases, meaning there is usually no need to describe for each +rule how @code{@@$} should be formed. When building a new location for a given +grouping, the default behavior of the output parser is to take the beginning +of the first symbol, and the end of the last symbol. + +@node Bison Parser +@section Bison Output: the Parser Implementation File +@cindex Bison parser +@cindex Bison utility +@cindex lexical analyzer, purpose +@cindex parser + +When you run Bison, you give it a Bison grammar file as input. The +most important output is a C source file that implements a parser for +the language described by the grammar. This parser is called a +@dfn{Bison parser}, and this file is called a @dfn{Bison parser +implementation file}. Keep in mind that the Bison utility and the +Bison parser are two distinct programs: the Bison utility is a program +whose output is the Bison parser implementation file that becomes part +of your program. + +The job of the Bison parser is to group tokens into groupings according to +the grammar rules---for example, to build identifiers and operators into +expressions. As it does this, it runs the actions for the grammar rules it +uses. + +The tokens come from a function called the @dfn{lexical analyzer} that +you must supply in some fashion (such as by writing it in C). The Bison +parser calls the lexical analyzer each time it wants a new token. It +doesn't know what is ``inside'' the tokens (though their semantic values +may reflect this). Typically the lexical analyzer makes the tokens by +parsing characters of text, but Bison does not depend on this. +@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. + +The Bison parser implementation file is C code which defines a +function named @code{yyparse} which implements that grammar. This +function does not make a complete C program: you must supply some +additional functions. One is the lexical analyzer. Another is an +error-reporting function which the parser calls to report an error. +In addition, a complete C program must start with a function called +@code{main}; you have to provide this, and arrange for it to call +@code{yyparse} or the parser will never run. @xref{Interface, ,Parser +C-Language Interface}. + +Aside from the token type names and the symbols in the actions you +write, all symbols defined in the Bison parser implementation file +itself begin with @samp{yy} or @samp{YY}. This includes interface +functions such as the lexical analyzer function @code{yylex}, the +error reporting function @code{yyerror} and the parser function +@code{yyparse} itself. This also includes numerous identifiers used +for internal purposes. Therefore, you should avoid using C +identifiers starting with @samp{yy} or @samp{YY} in the Bison grammar +file except for the ones defined in this manual. Also, you should +avoid using the C identifiers @samp{malloc} and @samp{free} for +anything other than their usual meanings. + +In some cases the Bison parser implementation file includes system +headers, and in those cases your code should respect the identifiers +reserved by those headers. On some non-GNU hosts, @code{}, +@code{}, @code{}, and @code{} are +included as needed to declare memory allocators and related types. +@code{} is included if message translation is in use +(@pxref{Internationalization}). Other system headers may be included +if you define @code{YYDEBUG} to a nonzero value (@pxref{Tracing, +,Tracing Your Parser}). + +@node Stages +@section Stages in Using Bison +@cindex stages in using Bison +@cindex using Bison + +The actual language-design process using Bison, from grammar specification +to a working compiler or interpreter, has these parts: + +@enumerate +@item +Formally specify the grammar in a form recognized by Bison +(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule +in the language, describe the action that is to be taken when an +instance of that rule is recognized. The action is described by a +sequence of C statements. + +@item +Write a lexical analyzer to process input and pass tokens to the parser. +The lexical analyzer may be written by hand in C (@pxref{Lexical, ,The +Lexical Analyzer Function @code{yylex}}). It could also be produced +using Lex, but the use of Lex is not discussed in this manual. + +@item +Write a controlling function that calls the Bison-produced parser. + +@item +Write error-reporting routines. +@end enumerate + +To turn this source code as written into a runnable program, you +must follow these steps: + +@enumerate +@item +Run Bison on the grammar to produce the parser. + +@item +Compile the code output by Bison, as well as any other source files. + +@item +Link the object files to produce the finished product. +@end enumerate + +@node Grammar Layout +@section The Overall Layout of a Bison Grammar +@cindex grammar file +@cindex file format +@cindex format of grammar file +@cindex layout of Bison grammar + +The input file for the Bison utility is a @dfn{Bison grammar file}. The +general form of a Bison grammar file is as follows: + +@example +%@{ +@var{Prologue} +%@} + +@var{Bison declarations} + +%% +@var{Grammar rules} +%% +@var{Epilogue} +@end example + +@noindent +The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears +in every Bison grammar file to separate the sections. + +The prologue may define types and variables used in the actions. You can +also use preprocessor commands to define macros used there, and use +@code{#include} to include header files that do any of these things. +You need to declare the lexical analyzer @code{yylex} and the error +printer @code{yyerror} here, along with any other global identifiers +used by the actions in the grammar rules. + +The Bison declarations declare the names of the terminal and nonterminal +symbols, and may also describe operator precedence and the data types of +semantic values of various symbols. + +The grammar rules define how to construct each nonterminal symbol from its +parts. + +The epilogue can contain any code you want to use. Often the +definitions of functions declared in the prologue go here. In a +simple program, all the rest of the program can go here. + +@node Examples +@chapter Examples +@cindex simple examples +@cindex examples, simple + +Now we show and explain several sample programs written using Bison: a +reverse polish notation calculator, an algebraic (infix) notation +calculator --- later extended to track ``locations'' --- +and a multi-function calculator. All +produce usable, though limited, interactive desk-top calculators. + +These examples are simple, but Bison grammars for real programming +languages are written the same way. You can copy these examples into a +source file to try them. + +@menu +* RPN Calc:: Reverse polish notation calculator; + a first example with no operator precedence. +* Infix Calc:: Infix (algebraic) notation calculator. + Operator precedence is introduced. +* Simple Error Recovery:: Continuing after syntax errors. +* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. +* Multi-function Calc:: Calculator with memory and trig functions. + It uses multiple data-types for semantic values. +* Exercises:: Ideas for improving the multi-function calculator. +@end menu + +@node RPN Calc +@section Reverse Polish Notation Calculator +@cindex reverse polish notation +@cindex polish notation calculator +@cindex @code{rpcalc} +@cindex calculator, simple + +The first example is that of a simple double-precision @dfn{reverse polish +notation} calculator (a calculator using postfix operators). This example +provides a good starting point, since operator precedence is not an issue. +The second example will illustrate how operator precedence is handled. + +The source code for this calculator is named @file{rpcalc.y}. The +@samp{.y} extension is a convention used for Bison grammar files. + +@menu +* Rpcalc Declarations:: Prologue (declarations) for rpcalc. +* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. +* Rpcalc Lexer:: The lexical analyzer. +* Rpcalc Main:: The controlling function. +* Rpcalc Error:: The error reporting function. +* Rpcalc Generate:: Running Bison on the grammar file. +* Rpcalc Compile:: Run the C compiler on the output code. +@end menu + +@node Rpcalc Declarations +@subsection Declarations for @code{rpcalc} + +Here are the C and Bison declarations for the reverse polish notation +calculator. As in C, comments are placed between @samp{/*@dots{}*/}. + +@example +/* Reverse polish notation calculator. */ + +%@{ + #define YYSTYPE double + #include + int yylex (void); + void yyerror (char const *); +%@} + +%token NUM + +%% /* Grammar rules and actions follow. */ +@end example + +The declarations section (@pxref{Prologue, , The prologue}) contains two +preprocessor directives and two forward declarations. + +The @code{#define} directive defines the macro @code{YYSTYPE}, thus +specifying the C data type for semantic values of both tokens and +groupings (@pxref{Value Type, ,Data Types of Semantic Values}). The +Bison parser will use whatever type @code{YYSTYPE} is defined as; if you +don't define it, @code{int} is the default. Because we specify +@code{double}, each token and each expression has an associated value, +which is a floating point number. + +The @code{#include} directive is used to declare the exponentiation +function @code{pow}. + +The forward declarations for @code{yylex} and @code{yyerror} are +needed because the C language requires that functions be declared +before they are used. These functions will be defined in the +epilogue, but the parser calls them so they must be declared in the +prologue. + +The second section, Bison declarations, provides information to Bison +about the token types (@pxref{Bison Declarations, ,The Bison +Declarations Section}). Each terminal symbol that is not a +single-character literal must be declared here. (Single-character +literals normally don't need to be declared.) In this example, all the +arithmetic operators are designated by single-character literals, so the +only terminal symbol that needs to be declared is @code{NUM}, the token +type for numeric constants. + +@node Rpcalc Rules +@subsection Grammar Rules for @code{rpcalc} + +Here are the grammar rules for the reverse polish notation calculator. + +@example +@group +input: + /* empty */ +| input line +; +@end group + +@group +line: + '\n' +| exp '\n' @{ printf ("%.10g\n", $1); @} +; +@end group + +@group +exp: + NUM @{ $$ = $1; @} +| exp exp '+' @{ $$ = $1 + $2; @} +| exp exp '-' @{ $$ = $1 - $2; @} +| exp exp '*' @{ $$ = $1 * $2; @} +| exp exp '/' @{ $$ = $1 / $2; @} +| exp exp '^' @{ $$ = pow ($1, $2); @} /* Exponentiation */ +| exp 'n' @{ $$ = -$1; @} /* Unary minus */ +; +@end group +%% +@end example + +The groupings of the rpcalc ``language'' defined here are the expression +(given the name @code{exp}), the line of input (@code{line}), and the +complete input transcript (@code{input}). Each of these nonterminal +symbols has several alternate rules, joined by the vertical bar @samp{|} +which is read as ``or''. The following sections explain what these rules +mean. + +The semantics of the language is determined by the actions taken when a +grouping is recognized. The actions are the C code that appears inside +braces. @xref{Actions}. + +You must specify these actions in C, but Bison provides the means for +passing semantic values between the rules. In each action, the +pseudo-variable @code{$$} stands for the semantic value for the grouping +that the rule is going to construct. Assigning a value to @code{$$} is the +main job of most actions. The semantic values of the components of the +rule are referred to as @code{$1}, @code{$2}, and so on. + +@menu +* Rpcalc Input:: +* Rpcalc Line:: +* Rpcalc Expr:: +@end menu + +@node Rpcalc Input +@subsubsection Explanation of @code{input} + +Consider the definition of @code{input}: + +@example +input: + /* empty */ +| input line +; +@end example + +This definition reads as follows: ``A complete input is either an empty +string, or a complete input followed by an input line''. Notice that +``complete input'' is defined in terms of itself. This definition is said +to be @dfn{left recursive} since @code{input} appears always as the +leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}. + +The first alternative is empty because there are no symbols between the +colon and the first @samp{|}; this means that @code{input} can match an +empty string of input (no tokens). We write the rules this way because it +is legitimate to type @kbd{Ctrl-d} right after you start the calculator. +It's conventional to put an empty alternative first and write the comment +@samp{/* empty */} in it. + +The second alternate rule (@code{input line}) handles all nontrivial input. +It means, ``After reading any number of lines, read one more line if +possible.'' The left recursion makes this rule into a loop. Since the +first alternative matches empty input, the loop can be executed zero or +more times. + +The parser function @code{yyparse} continues to process input until a +grammatical error is seen or the lexical analyzer says there are no more +input tokens; we will arrange for the latter to happen at end-of-input. + +@node Rpcalc Line +@subsubsection Explanation of @code{line} + +Now consider the definition of @code{line}: + +@example +line: + '\n' +| exp '\n' @{ printf ("%.10g\n", $1); @} +; +@end example + +The first alternative is a token which is a newline character; this means +that rpcalc accepts a blank line (and ignores it, since there is no +action). The second alternative is an expression followed by a newline. +This is the alternative that makes rpcalc useful. The semantic value of +the @code{exp} grouping is the value of @code{$1} because the @code{exp} in +question is the first symbol in the alternative. The action prints this +value, which is the result of the computation the user asked for. + +This action is unusual because it does not assign a value to @code{$$}. As +a consequence, the semantic value associated with the @code{line} is +uninitialized (its value will be unpredictable). This would be a bug if +that value were ever used, but we don't use it: once rpcalc has printed the +value of the user's input line, that value is no longer needed. + +@node Rpcalc Expr +@subsubsection Explanation of @code{expr} + +The @code{exp} grouping has several rules, one for each kind of expression. +The first rule handles the simplest expressions: those that are just numbers. +The second handles an addition-expression, which looks like two expressions +followed by a plus-sign. The third handles subtraction, and so on. + +@example +exp: + NUM +| exp exp '+' @{ $$ = $1 + $2; @} +| exp exp '-' @{ $$ = $1 - $2; @} +@dots{} +; +@end example + +We have used @samp{|} to join all the rules for @code{exp}, but we could +equally well have written them separately: + +@example +exp: NUM ; +exp: exp exp '+' @{ $$ = $1 + $2; @}; +exp: exp exp '-' @{ $$ = $1 - $2; @}; +@dots{} +@end example + +Most of the rules have actions that compute the value of the expression in +terms of the value of its parts. For example, in the rule for addition, +@code{$1} refers to the first component @code{exp} and @code{$2} refers to +the second one. The third component, @code{'+'}, has no meaningful +associated semantic value, but if it had one you could refer to it as +@code{$3}. When @code{yyparse} recognizes a sum expression using this +rule, the sum of the two subexpressions' values is produced as the value of +the entire expression. @xref{Actions}. + +You don't have to give an action for every rule. When a rule has no +action, Bison by default copies the value of @code{$1} into @code{$$}. +This is what happens in the first rule (the one that uses @code{NUM}). + +The formatting shown here is the recommended convention, but Bison does +not require it. You can add or change white space as much as you wish. +For example, this: + +@example +exp: NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ; +@end example + +@noindent +means the same thing as this: + +@example +exp: + NUM +| exp exp '+' @{ $$ = $1 + $2; @} +| @dots{} +; +@end example + +@noindent +The latter, however, is much more readable. + +@node Rpcalc Lexer +@subsection The @code{rpcalc} Lexical Analyzer +@cindex writing a lexical analyzer +@cindex lexical analyzer, writing + +The lexical analyzer's job is low-level parsing: converting characters +or sequences of characters into tokens. The Bison parser gets its +tokens by calling the lexical analyzer. @xref{Lexical, ,The Lexical +Analyzer Function @code{yylex}}. + +Only a simple lexical analyzer is needed for the RPN +calculator. This +lexical analyzer skips blanks and tabs, then reads in numbers as +@code{double} and returns them as @code{NUM} tokens. Any other character +that isn't part of a number is a separate token. Note that the token-code +for such a single-character token is the character itself. + +The return value of the lexical analyzer function is a numeric code which +represents a token type. The same text used in Bison rules to stand for +this token type is also a C expression for the numeric code for the type. +This works in two ways. If the token type is a character literal, then its +numeric code is that of the character; you can use the same +character literal in the lexical analyzer to express the number. If the +token type is an identifier, that identifier is defined by Bison as a C +macro whose definition is the appropriate number. In this example, +therefore, @code{NUM} becomes a macro for @code{yylex} to use. + +The semantic value of the token (if it has one) is stored into the +global variable @code{yylval}, which is where the Bison parser will look +for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was +defined at the beginning of the grammar; @pxref{Rpcalc Declarations, +,Declarations for @code{rpcalc}}.) + +A token type code of zero is returned if the end-of-input is encountered. +(Bison recognizes any nonpositive value as indicating end-of-input.) + +Here is the code for the lexical analyzer: + +@example +@group +/* The lexical analyzer returns a double floating point + number on the stack and the token NUM, or the numeric code + of the character read if not a number. It skips all blanks + and tabs, and returns 0 for end-of-input. */ + +#include +@end group + +@group +int +yylex (void) +@{ + int c; + + /* Skip white space. */ + while ((c = getchar ()) == ' ' || c == '\t') + continue; +@end group +@group + /* Process numbers. */ + if (c == '.' || isdigit (c)) + @{ + ungetc (c, stdin); + scanf ("%lf", &yylval); + return NUM; + @} +@end group +@group + /* Return end-of-input. */ + if (c == EOF) + return 0; + /* Return a single char. */ + return c; +@} +@end group +@end example + +@node Rpcalc Main +@subsection The Controlling Function +@cindex controlling function +@cindex main function in simple example + +In keeping with the spirit of this example, the controlling function is +kept to the bare minimum. The only requirement is that it call +@code{yyparse} to start the process of parsing. + +@example +@group +int +main (void) +@{ + return yyparse (); +@} +@end group +@end example + +@node Rpcalc Error +@subsection The Error Reporting Routine +@cindex error reporting routine + +When @code{yyparse} detects a syntax error, it calls the error reporting +function @code{yyerror} to print an error message (usually but not +always @code{"syntax error"}). It is up to the programmer to supply +@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so +here is the definition we will use: + +@example +@group +#include +@end group + +@group +/* Called by yyparse on error. */ +void +yyerror (char const *s) +@{ + fprintf (stderr, "%s\n", s); +@} +@end group +@end example + +After @code{yyerror} returns, the Bison parser may recover from the error +and continue parsing if the grammar contains a suitable error rule +(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We +have not written any error rules in this example, so any invalid input will +cause the calculator program to exit. This is not clean behavior for a +real calculator, but it is adequate for the first example. + +@node Rpcalc Generate +@subsection Running Bison to Make the Parser +@cindex running Bison (introduction) + +Before running Bison to produce a parser, we need to decide how to +arrange all the source code in one or more source files. For such a +simple example, the easiest thing is to put everything in one file, +the grammar file. The definitions of @code{yylex}, @code{yyerror} and +@code{main} go at the end, in the epilogue of the grammar file +(@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}). + +For a large project, you would probably have several source files, and use +@code{make} to arrange to recompile them. + +With all the source in the grammar file, you use the following command +to convert it into a parser implementation file: + +@example +bison @var{file}.y +@end example + +@noindent +In this example, the grammar file is called @file{rpcalc.y} (for +``Reverse Polish @sc{calc}ulator''). Bison produces a parser +implementation file named @file{@var{file}.tab.c}, removing the +@samp{.y} from the grammar file name. The parser implementation file +contains the source code for @code{yyparse}. The additional functions +in the grammar file (@code{yylex}, @code{yyerror} and @code{main}) are +copied verbatim to the parser implementation file. + +@node Rpcalc Compile +@subsection Compiling the Parser Implementation File +@cindex compiling the parser + +Here is how to compile and run the parser implementation file: + +@example +@group +# @r{List files in current directory.} +$ @kbd{ls} +rpcalc.tab.c rpcalc.y +@end group + +@group +# @r{Compile the Bison parser.} +# @r{@samp{-lm} tells compiler to search math library for @code{pow}.} +$ @kbd{cc -lm -o rpcalc rpcalc.tab.c} +@end group + +@group +# @r{List files again.} +$ @kbd{ls} +rpcalc rpcalc.tab.c rpcalc.y +@end group +@end example + +The file @file{rpcalc} now contains the executable code. Here is an +example session using @code{rpcalc}. + +@example +$ @kbd{rpcalc} +@kbd{4 9 +} +13 +@kbd{3 7 + 3 4 5 *+-} +-13 +@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}} +13 +@kbd{5 6 / 4 n +} +-3.166666667 +@kbd{3 4 ^} @r{Exponentiation} +81 +@kbd{^D} @r{End-of-file indicator} +$ +@end example + +@node Infix Calc +@section Infix Notation Calculator: @code{calc} +@cindex infix notation calculator +@cindex @code{calc} +@cindex calculator, infix notation + +We now modify rpcalc to handle infix operators instead of postfix. Infix +notation involves the concept of operator precedence and the need for +parentheses nested to arbitrary depth. Here is the Bison code for +@file{calc.y}, an infix desk-top calculator. + +@example +/* Infix notation calculator. */ + +@group +%@{ + #define YYSTYPE double + #include + #include + int yylex (void); + void yyerror (char const *); +%@} +@end group + +@group +/* Bison declarations. */ +%token NUM +%left '-' '+' +%left '*' '/' +%left NEG /* negation--unary minus */ +%right '^' /* exponentiation */ +@end group + +%% /* The grammar follows. */ +@group +input: + /* empty */ +| input line +; +@end group + +@group +line: + '\n' +| exp '\n' @{ printf ("\t%.10g\n", $1); @} +; +@end group + +@group +exp: + NUM @{ $$ = $1; @} +| exp '+' exp @{ $$ = $1 + $3; @} +| exp '-' exp @{ $$ = $1 - $3; @} +| exp '*' exp @{ $$ = $1 * $3; @} +| exp '/' exp @{ $$ = $1 / $3; @} +| '-' exp %prec NEG @{ $$ = -$2; @} +| exp '^' exp @{ $$ = pow ($1, $3); @} +| '(' exp ')' @{ $$ = $2; @} +; +@end group +%% +@end example + +@noindent +The functions @code{yylex}, @code{yyerror} and @code{main} can be the +same as before. + +There are two important new features shown in this code. + +In the second section (Bison declarations), @code{%left} declares token +types and says they are left-associative operators. The declarations +@code{%left} and @code{%right} (right associativity) take the place of +@code{%token} which is used to declare a token type name without +associativity. (These tokens are single-character literals, which +ordinarily don't need to be declared. We declare them here to specify +the associativity.) + +Operator precedence is determined by the line ordering of the +declarations; the higher the line number of the declaration (lower on +the page or screen), the higher the precedence. Hence, exponentiation +has the highest precedence, unary minus (@code{NEG}) is next, followed +by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator +Precedence}. + +The other important new feature is the @code{%prec} in the grammar +section for the unary minus operator. The @code{%prec} simply instructs +Bison that the rule @samp{| '-' exp} has the same precedence as +@code{NEG}---in this case the next-to-highest. @xref{Contextual +Precedence, ,Context-Dependent Precedence}. + +Here is a sample run of @file{calc.y}: + +@need 500 +@example +$ @kbd{calc} +@kbd{4 + 4.5 - (34/(8*3+-3))} +6.880952381 +@kbd{-56 + 2} +-54 +@kbd{3 ^ 2} +9 +@end example + +@node Simple Error Recovery +@section Simple Error Recovery +@cindex error recovery, simple + +Up to this point, this manual has not addressed the issue of @dfn{error +recovery}---how to continue parsing after the parser detects a syntax +error. All we have handled is error reporting with @code{yyerror}. +Recall that by default @code{yyparse} returns after calling +@code{yyerror}. This means that an erroneous input line causes the +calculator program to exit. Now we show how to rectify this deficiency. + +The Bison language itself includes the reserved word @code{error}, which +may be included in the grammar rules. In the example below it has +been added to one of the alternatives for @code{line}: + +@example +@group +line: + '\n' +| exp '\n' @{ printf ("\t%.10g\n", $1); @} +| error '\n' @{ yyerrok; @} +; +@end group +@end example + +This addition to the grammar allows for simple error recovery in the +event of a syntax error. If an expression that cannot be evaluated is +read, the error will be recognized by the third rule for @code{line}, +and parsing will continue. (The @code{yyerror} function is still called +upon to print its message as well.) The action executes the statement +@code{yyerrok}, a macro defined automatically by Bison; its meaning is +that error recovery is complete (@pxref{Error Recovery}). Note the +difference between @code{yyerrok} and @code{yyerror}; neither one is a +misprint. + +This form of error recovery deals with syntax errors. There are other +kinds of errors; for example, division by zero, which raises an exception +signal that is normally fatal. A real calculator program must handle this +signal and use @code{longjmp} to return to @code{main} and resume parsing +input lines; it would also have to discard the rest of the current line of +input. We won't discuss this issue further because it is not specific to +Bison programs. + +@node Location Tracking Calc +@section Location Tracking Calculator: @code{ltcalc} +@cindex location tracking calculator +@cindex @code{ltcalc} +@cindex calculator, location tracking + +This example extends the infix notation calculator with location +tracking. This feature will be used to improve the error messages. For +the sake of clarity, this example is a simple integer calculator, since +most of the work needed to use locations will be done in the lexical +analyzer. + +@menu +* Ltcalc Declarations:: Bison and C declarations for ltcalc. +* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. +* Ltcalc Lexer:: The lexical analyzer. +@end menu + +@node Ltcalc Declarations +@subsection Declarations for @code{ltcalc} + +The C and Bison declarations for the location tracking calculator are +the same as the declarations for the infix notation calculator. + +@example +/* Location tracking calculator. */ + +%@{ + #define YYSTYPE int + #include + int yylex (void); + void yyerror (char const *); +%@} + +/* Bison declarations. */ +%token NUM + +%left '-' '+' +%left '*' '/' +%left NEG +%right '^' + +%% /* The grammar follows. */ +@end example + +@noindent +Note there are no declarations specific to locations. Defining a data +type for storing locations is not needed: we will use the type provided +by default (@pxref{Location Type, ,Data Types of Locations}), which is a +four member structure with the following integer fields: +@code{first_line}, @code{first_column}, @code{last_line} and +@code{last_column}. By conventions, and in accordance with the GNU +Coding Standards and common practice, the line and column count both +start at 1. + +@node Ltcalc Rules +@subsection Grammar Rules for @code{ltcalc} + +Whether handling locations or not has no effect on the syntax of your +language. Therefore, grammar rules for this example will be very close +to those of the previous example: we will only modify them to benefit +from the new information. + +Here, we will use locations to report divisions by zero, and locate the +wrong expressions or subexpressions. + +@example +@group +input: + /* empty */ +| input line +; +@end group + +@group +line: + '\n' +| exp '\n' @{ printf ("%d\n", $1); @} +; +@end group + +@group +exp: + NUM @{ $$ = $1; @} +| exp '+' exp @{ $$ = $1 + $3; @} +| exp '-' exp @{ $$ = $1 - $3; @} +| exp '*' exp @{ $$ = $1 * $3; @} +@end group +@group +| exp '/' exp + @{ + if ($3) + $$ = $1 / $3; + else + @{ + $$ = 1; + fprintf (stderr, "%d.%d-%d.%d: division by zero", + @@3.first_line, @@3.first_column, + @@3.last_line, @@3.last_column); + @} + @} +@end group +@group +| '-' exp %prec NEG @{ $$ = -$2; @} +| exp '^' exp @{ $$ = pow ($1, $3); @} +| '(' exp ')' @{ $$ = $2; @} +@end group +@end example + +This code shows how to reach locations inside of semantic actions, by +using the pseudo-variables @code{@@@var{n}} for rule components, and the +pseudo-variable @code{@@$} for groupings. + +We don't need to assign a value to @code{@@$}: the output parser does it +automatically. By default, before executing the C code of each action, +@code{@@$} is set to range from the beginning of @code{@@1} to the end +of @code{@@@var{n}}, for a rule with @var{n} components. This behavior +can be redefined (@pxref{Location Default Action, , Default Action for +Locations}), and for very specific rules, @code{@@$} can be computed by +hand. + +@node Ltcalc Lexer +@subsection The @code{ltcalc} Lexical Analyzer. + +Until now, we relied on Bison's defaults to enable location +tracking. The next step is to rewrite the lexical analyzer, and make it +able to feed the parser with the token locations, as it already does for +semantic values. + +To this end, we must take into account every single character of the +input text, to avoid the computed locations of being fuzzy or wrong: + +@example +@group +int +yylex (void) +@{ + int c; +@end group + +@group + /* Skip white space. */ + while ((c = getchar ()) == ' ' || c == '\t') + ++yylloc.last_column; +@end group + +@group + /* Step. */ + yylloc.first_line = yylloc.last_line; + yylloc.first_column = yylloc.last_column; +@end group + +@group + /* Process numbers. */ + if (isdigit (c)) + @{ + yylval = c - '0'; + ++yylloc.last_column; + while (isdigit (c = getchar ())) + @{ + ++yylloc.last_column; + yylval = yylval * 10 + c - '0'; + @} + ungetc (c, stdin); + return NUM; + @} +@end group + + /* Return end-of-input. */ + if (c == EOF) + return 0; + +@group + /* Return a single char, and update location. */ + if (c == '\n') + @{ + ++yylloc.last_line; + yylloc.last_column = 0; + @} + else + ++yylloc.last_column; + return c; +@} +@end group +@end example + +Basically, the lexical analyzer performs the same processing as before: +it skips blanks and tabs, and reads numbers or single-character tokens. +In addition, it updates @code{yylloc}, the global variable (of type +@code{YYLTYPE}) containing the token's location. + +Now, each time this function returns a token, the parser has its number +as well as its semantic value, and its location in the text. The last +needed change is to initialize @code{yylloc}, for example in the +controlling function: + +@example +@group +int +main (void) +@{ + yylloc.first_line = yylloc.last_line = 1; + yylloc.first_column = yylloc.last_column = 0; + return yyparse (); +@} +@end group +@end example + +Remember that computing locations is not a matter of syntax. Every +character must be associated to a location update, whether it is in +valid input, in comments, in literal strings, and so on. + +@node Multi-function Calc +@section Multi-Function Calculator: @code{mfcalc} +@cindex multi-function calculator +@cindex @code{mfcalc} +@cindex calculator, multi-function + +Now that the basics of Bison have been discussed, it is time to move on to +a more advanced problem. The above calculators provided only five +functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would +be nice to have a calculator that provides other mathematical functions such +as @code{sin}, @code{cos}, etc. + +It is easy to add new operators to the infix calculator as long as they are +only single-character literals. The lexical analyzer @code{yylex} passes +back all nonnumeric characters as tokens, so new grammar rules suffice for +adding a new operator. But we want something more flexible: built-in +functions whose syntax has this form: + +@example +@var{function_name} (@var{argument}) +@end example + +@noindent +At the same time, we will add memory to the calculator, by allowing you +to create named variables, store values in them, and use them later. +Here is a sample session with the multi-function calculator: + +@example +$ @kbd{mfcalc} +@kbd{pi = 3.141592653589} +3.1415926536 +@kbd{sin(pi)} +0.0000000000 +@kbd{alpha = beta1 = 2.3} +2.3000000000 +@kbd{alpha} +2.3000000000 +@kbd{ln(alpha)} +0.8329091229 +@kbd{exp(ln(beta1))} +2.3000000000 +$ +@end example + +Note that multiple assignment and nested function calls are permitted. + +@menu +* Mfcalc Declarations:: Bison declarations for multi-function calculator. +* Mfcalc Rules:: Grammar rules for the calculator. +* Mfcalc Symbol Table:: Symbol table management subroutines. +@end menu + +@node Mfcalc Declarations +@subsection Declarations for @code{mfcalc} + +Here are the C and Bison declarations for the multi-function calculator. + +@comment file: mfcalc.y: 1 +@example +@group +%@{ + #include /* For math functions, cos(), sin(), etc. */ + #include "calc.h" /* Contains definition of `symrec'. */ + int yylex (void); + void yyerror (char const *); +%@} +@end group + +@group +%union @{ + double val; /* For returning numbers. */ + symrec *tptr; /* For returning symbol-table pointers. */ +@} +@end group +%token NUM /* Simple double precision number. */ +%token VAR FNCT /* Variable and function. */ +%type exp + +@group +%right '=' +%left '-' '+' +%left '*' '/' +%left NEG /* negation--unary minus */ +%right '^' /* exponentiation */ +@end group +@end example + +The above grammar introduces only two new features of the Bison language. +These features allow semantic values to have various data types +(@pxref{Multiple Types, ,More Than One Value Type}). + +The @code{%union} declaration specifies the entire list of possible types; +this is instead of defining @code{YYSTYPE}. The allowable types are now +double-floats (for @code{exp} and @code{NUM}) and pointers to entries in +the symbol table. @xref{Union Decl, ,The Collection of Value Types}. + +Since values can now have various types, it is necessary to associate a +type with each grammar symbol whose semantic value is used. These symbols +are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their +declarations are augmented with information about their data type (placed +between angle brackets). + +The Bison construct @code{%type} is used for declaring nonterminal +symbols, just as @code{%token} is used for declaring token types. We +have not used @code{%type} before because nonterminal symbols are +normally declared implicitly by the rules that define them. But +@code{exp} must be declared explicitly so we can specify its value type. +@xref{Type Decl, ,Nonterminal Symbols}. + +@node Mfcalc Rules +@subsection Grammar Rules for @code{mfcalc} + +Here are the grammar rules for the multi-function calculator. +Most of them are copied directly from @code{calc}; three rules, +those which mention @code{VAR} or @code{FNCT}, are new. + +@comment file: mfcalc.y: 3 +@example +%% /* The grammar follows. */ +@group +input: + /* empty */ +| input line +; +@end group + +@group +line: + '\n' +| exp '\n' @{ printf ("%.10g\n", $1); @} +| error '\n' @{ yyerrok; @} +; +@end group + +@group +exp: + NUM @{ $$ = $1; @} +| VAR @{ $$ = $1->value.var; @} +| VAR '=' exp @{ $$ = $3; $1->value.var = $3; @} +| FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @} +| exp '+' exp @{ $$ = $1 + $3; @} +| exp '-' exp @{ $$ = $1 - $3; @} +| exp '*' exp @{ $$ = $1 * $3; @} +| exp '/' exp @{ $$ = $1 / $3; @} +| '-' exp %prec NEG @{ $$ = -$2; @} +| exp '^' exp @{ $$ = pow ($1, $3); @} +| '(' exp ')' @{ $$ = $2; @} +; +@end group +/* End of grammar. */ +%% +@end example + +@node Mfcalc Symbol Table +@subsection The @code{mfcalc} Symbol Table +@cindex symbol table example + +The multi-function calculator requires a symbol table to keep track of the +names and meanings of variables and functions. This doesn't affect the +grammar rules (except for the actions) or the Bison declarations, but it +requires some additional C functions for support. + +The symbol table itself consists of a linked list of records. Its +definition, which is kept in the header @file{calc.h}, is as follows. It +provides for either functions or variables to be placed in the table. + +@comment file: calc.h +@example +@group +/* Function type. */ +typedef double (*func_t) (double); +@end group + +@group +/* Data type for links in the chain of symbols. */ +struct symrec +@{ + char *name; /* name of symbol */ + int type; /* type of symbol: either VAR or FNCT */ + union + @{ + double var; /* value of a VAR */ + func_t fnctptr; /* value of a FNCT */ + @} value; + struct symrec *next; /* link field */ +@}; +@end group + +@group +typedef struct symrec symrec; + +/* The symbol table: a chain of `struct symrec'. */ +extern symrec *sym_table; + +symrec *putsym (char const *, int); +symrec *getsym (char const *); +@end group +@end example + +The new version of @code{main} includes a call to @code{init_table}, a +function that initializes the symbol table. Here it is, and +@code{init_table} as well: + +@comment file: mfcalc.y: 3 +@example +#include + +@group +/* Called by yyparse on error. */ +void +yyerror (char const *s) +@{ + printf ("%s\n", s); +@} +@end group + +@group +struct init +@{ + char const *fname; + double (*fnct) (double); +@}; +@end group + +@group +struct init const arith_fncts[] = +@{ + "sin", sin, + "cos", cos, + "atan", atan, + "ln", log, + "exp", exp, + "sqrt", sqrt, + 0, 0 +@}; +@end group + +@group +/* The symbol table: a chain of `struct symrec'. */ +symrec *sym_table; +@end group + +@group +/* Put arithmetic functions in table. */ +void +init_table (void) +@{ + int i; + for (i = 0; arith_fncts[i].fname != 0; i++) + @{ + symrec *ptr = putsym (arith_fncts[i].fname, FNCT); + ptr->value.fnctptr = arith_fncts[i].fnct; + @} +@} +@end group + +@group +int +main (void) +@{ + init_table (); + return yyparse (); +@} +@end group +@end example + +By simply editing the initialization list and adding the necessary include +files, you can add additional functions to the calculator. + +Two important functions allow look-up and installation of symbols in the +symbol table. The function @code{putsym} is passed a name and the type +(@code{VAR} or @code{FNCT}) of the object to be installed. The object is +linked to the front of the list, and a pointer to the object is returned. +The function @code{getsym} is passed the name of the symbol to look up. If +found, a pointer to that symbol is returned; otherwise zero is returned. + +@comment file: mfcalc.y: 3 +@example +#include /* malloc. */ +#include /* strlen. */ + +@group +symrec * +putsym (char const *sym_name, int sym_type) +@{ + symrec *ptr = (symrec *) malloc (sizeof (symrec)); + ptr->name = (char *) malloc (strlen (sym_name) + 1); + strcpy (ptr->name,sym_name); + ptr->type = sym_type; + ptr->value.var = 0; /* Set value to 0 even if fctn. */ + ptr->next = (struct symrec *)sym_table; + sym_table = ptr; + return ptr; +@} +@end group + +@group +symrec * +getsym (char const *sym_name) +@{ + symrec *ptr; + for (ptr = sym_table; ptr != (symrec *) 0; + ptr = (symrec *)ptr->next) + if (strcmp (ptr->name,sym_name) == 0) + return ptr; + return 0; +@} +@end group +@end example + +The function @code{yylex} must now recognize variables, numeric values, and +the single-character arithmetic operators. Strings of alphanumeric +characters with a leading letter are recognized as either variables or +functions depending on what the symbol table says about them. + +The string is passed to @code{getsym} for look up in the symbol table. If +the name appears in the table, a pointer to its location and its type +(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not +already in the table, then it is installed as a @code{VAR} using +@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is +returned to @code{yyparse}. + +No change is needed in the handling of numeric values and arithmetic +operators in @code{yylex}. + +@comment file: mfcalc.y: 3 +@example +@group +#include +@end group + +@group +int +yylex (void) +@{ + int c; + + /* Ignore white space, get first nonwhite character. */ + while ((c = getchar ()) == ' ' || c == '\t') + continue; + + if (c == EOF) + return 0; +@end group + +@group + /* Char starts a number => parse the number. */ + if (c == '.' || isdigit (c)) + @{ + ungetc (c, stdin); + scanf ("%lf", &yylval.val); + return NUM; + @} +@end group + +@group + /* Char starts an identifier => read the name. */ + if (isalpha (c)) + @{ + /* Initially make the buffer long enough + for a 40-character symbol name. */ + static size_t length = 40; + static char *symbuf = 0; + symrec *s; + int i; +@end group + + if (!symbuf) + symbuf = (char *) malloc (length + 1); + + i = 0; + do +@group + @{ + /* If buffer is full, make it bigger. */ + if (i == length) + @{ + length *= 2; + symbuf = (char *) realloc (symbuf, length + 1); + @} + /* Add this character to the buffer. */ + symbuf[i++] = c; + /* Get another character. */ + c = getchar (); + @} +@end group +@group + while (isalnum (c)); + + ungetc (c, stdin); + symbuf[i] = '\0'; +@end group + +@group + s = getsym (symbuf); + if (s == 0) + s = putsym (symbuf, VAR); + yylval.tptr = s; + return s->type; + @} + + /* Any other character is a token by itself. */ + return c; +@} +@end group +@end example + +The error reporting function is unchanged, and the new version of +@code{main} includes a call to @code{init_table} and sets the @code{yydebug} +on user demand (@xref{Tracing, , Tracing Your Parser}, for details): + +@comment file: mfcalc.y: 3 +@example +@group +/* Called by yyparse on error. */ +void +yyerror (char const *s) +@{ + fprintf (stderr, "%s\n", s); +@} +@end group + +@group +int +main (int argc, char const* argv[]) +@{ + int i; + /* Enable parse traces on option -p. */ + for (i = 1; i < argc; ++i) + if (!strcmp(argv[i], "-p")) + yydebug = 1; + init_table (); + return yyparse (); +@} +@end group +@end example + +This program is both powerful and flexible. You may easily add new +functions, and it is a simple job to modify this code to install +predefined variables such as @code{pi} or @code{e} as well. + +@node Exercises +@section Exercises +@cindex exercises + +@enumerate +@item +Add some new functions from @file{math.h} to the initialization list. + +@item +Add another array that contains constants and their values. Then +modify @code{init_table} to add these constants to the symbol table. +It will be easiest to give the constants type @code{VAR}. + +@item +Make the program report an error if the user refers to an +uninitialized variable in any way except to store a value in it. +@end enumerate + +@node Grammar File +@chapter Bison Grammar Files + +Bison takes as input a context-free grammar specification and produces a +C-language function that recognizes correct instances of the grammar. + +The Bison grammar file conventionally has a name ending in @samp{.y}. +@xref{Invocation, ,Invoking Bison}. + +@menu +* Grammar Outline:: Overall layout of the grammar file. +* Symbols:: Terminal and nonterminal symbols. +* Rules:: How to write grammar rules. +* Recursion:: Writing recursive rules. +* Semantics:: Semantic values and actions. +* Tracking Locations:: Locations and actions. +* Named References:: Using named references in actions. +* Declarations:: All kinds of Bison declarations are described here. +* Multiple Parsers:: Putting more than one Bison parser in one program. +@end menu + +@node Grammar Outline +@section Outline of a Bison Grammar + +A Bison grammar file has four main sections, shown here with the +appropriate delimiters: + +@example +%@{ + @var{Prologue} +%@} + +@var{Bison declarations} + +%% +@var{Grammar rules} +%% + +@var{Epilogue} +@end example + +Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections. +As a GNU extension, @samp{//} introduces a comment that +continues until end of line. + +@menu +* Prologue:: Syntax and usage of the prologue. +* Prologue Alternatives:: Syntax and usage of alternatives to the prologue. +* Bison Declarations:: Syntax and usage of the Bison declarations section. +* Grammar Rules:: Syntax and usage of the grammar rules section. +* Epilogue:: Syntax and usage of the epilogue. +@end menu + +@node Prologue +@subsection The prologue +@cindex declarations section +@cindex Prologue +@cindex declarations + +The @var{Prologue} section contains macro definitions and declarations +of functions and variables that are used in the actions in the grammar +rules. These are copied to the beginning of the parser implementation +file so that they precede the definition of @code{yyparse}. You can +use @samp{#include} to get the declarations from a header file. If +you don't need any C declarations, you may omit the @samp{%@{} and +@samp{%@}} delimiters that bracket this section. + +The @var{Prologue} section is terminated by the first occurrence +of @samp{%@}} that is outside a comment, a string literal, or a +character constant. + +You may have more than one @var{Prologue} section, intermixed with the +@var{Bison declarations}. This allows you to have C and Bison +declarations that refer to each other. For example, the @code{%union} +declaration may use types defined in a header file, and you may wish to +prototype functions that take arguments of type @code{YYSTYPE}. This +can be done with two @var{Prologue} blocks, one before and one after the +@code{%union} declaration. + +@example +%@{ + #define _GNU_SOURCE + #include + #include "ptypes.h" +%@} + +%union @{ + long int n; + tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ +@} + +%@{ + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(F, N, L) print_token_value (F, N, L) +%@} + +@dots{} +@end example + +When in doubt, it is usually safer to put prologue code before all +Bison declarations, rather than after. For example, any definitions +of feature test macros like @code{_GNU_SOURCE} or +@code{_POSIX_C_SOURCE} should appear before all Bison declarations, as +feature test macros can affect the behavior of Bison-generated +@code{#include} directives. + +@node Prologue Alternatives +@subsection Prologue Alternatives +@cindex Prologue Alternatives + +@findex %code +@findex %code requires +@findex %code provides +@findex %code top + +The functionality of @var{Prologue} sections can often be subtle and +inflexible. As an alternative, Bison provides a @code{%code} +directive with an explicit qualifier field, which identifies the +purpose of the code and thus the location(s) where Bison should +generate it. For C/C++, the qualifier can be omitted for the default +location, or it can be one of @code{requires}, @code{provides}, +@code{top}. @xref{%code Summary}. + +Look again at the example of the previous section: + +@example +%@{ + #define _GNU_SOURCE + #include + #include "ptypes.h" +%@} + +%union @{ + long int n; + tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ +@} + +%@{ + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(F, N, L) print_token_value (F, N, L) +%@} + +@dots{} +@end example + +@noindent +Notice that there are two @var{Prologue} sections here, but there's a +subtle distinction between their functionality. For example, if you +decide to override Bison's default definition for @code{YYLTYPE}, in +which @var{Prologue} section should you write your new definition? +You should write it in the first since Bison will insert that code +into the parser implementation file @emph{before} the default +@code{YYLTYPE} definition. In which @var{Prologue} section should you +prototype an internal function, @code{trace_token}, that accepts +@code{YYLTYPE} and @code{yytokentype} as arguments? You should +prototype it in the second since Bison will insert that code +@emph{after} the @code{YYLTYPE} and @code{yytokentype} definitions. + +This distinction in functionality between the two @var{Prologue} sections is +established by the appearance of the @code{%union} between them. +This behavior raises a few questions. +First, why should the position of a @code{%union} affect definitions related to +@code{YYLTYPE} and @code{yytokentype}? +Second, what if there is no @code{%union}? +In that case, the second kind of @var{Prologue} section is not available. +This behavior is not intuitive. + +To avoid this subtle @code{%union} dependency, rewrite the example using a +@code{%code top} and an unqualified @code{%code}. +Let's go ahead and add the new @code{YYLTYPE} definition and the +@code{trace_token} prototype at the same time: + +@example +%code top @{ + #define _GNU_SOURCE + #include + + /* WARNING: The following code really belongs + * in a `%code requires'; see below. */ + + #include "ptypes.h" + #define YYLTYPE YYLTYPE + typedef struct YYLTYPE + @{ + int first_line; + int first_column; + int last_line; + int last_column; + char *filename; + @} YYLTYPE; +@} + +%union @{ + long int n; + tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ +@} + +%code @{ + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(F, N, L) print_token_value (F, N, L) + static void trace_token (enum yytokentype token, YYLTYPE loc); +@} + +@dots{} +@end example + +@noindent +In this way, @code{%code top} and the unqualified @code{%code} achieve the same +functionality as the two kinds of @var{Prologue} sections, but it's always +explicit which kind you intend. +Moreover, both kinds are always available even in the absence of @code{%union}. + +The @code{%code top} block above logically contains two parts. The +first two lines before the warning need to appear near the top of the +parser implementation file. The first line after the warning is +required by @code{YYSTYPE} and thus also needs to appear in the parser +implementation file. However, if you've instructed Bison to generate +a parser header file (@pxref{Decl Summary, ,%defines}), you probably +want that line to appear before the @code{YYSTYPE} definition in that +header file as well. The @code{YYLTYPE} definition should also appear +in the parser header file to override the default @code{YYLTYPE} +definition there. + +In other words, in the @code{%code top} block above, all but the first two +lines are dependency code required by the @code{YYSTYPE} and @code{YYLTYPE} +definitions. +Thus, they belong in one or more @code{%code requires}: + +@example +@group +%code top @{ + #define _GNU_SOURCE + #include +@} +@end group + +@group +%code requires @{ + #include "ptypes.h" +@} +@end group +@group +%union @{ + long int n; + tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ +@} +@end group + +@group +%code requires @{ + #define YYLTYPE YYLTYPE + typedef struct YYLTYPE + @{ + int first_line; + int first_column; + int last_line; + int last_column; + char *filename; + @} YYLTYPE; +@} +@end group + +@group +%code @{ + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(F, N, L) print_token_value (F, N, L) + static void trace_token (enum yytokentype token, YYLTYPE loc); +@} +@end group + +@dots{} +@end example + +@noindent +Now Bison will insert @code{#include "ptypes.h"} and the new +@code{YYLTYPE} definition before the Bison-generated @code{YYSTYPE} +and @code{YYLTYPE} definitions in both the parser implementation file +and the parser header file. (By the same reasoning, @code{%code +requires} would also be the appropriate place to write your own +definition for @code{YYSTYPE}.) + +When you are writing dependency code for @code{YYSTYPE} and +@code{YYLTYPE}, you should prefer @code{%code requires} over +@code{%code top} regardless of whether you instruct Bison to generate +a parser header file. When you are writing code that you need Bison +to insert only into the parser implementation file and that has no +special need to appear at the top of that file, you should prefer the +unqualified @code{%code} over @code{%code top}. These practices will +make the purpose of each block of your code explicit to Bison and to +other developers reading your grammar file. Following these +practices, we expect the unqualified @code{%code} and @code{%code +requires} to be the most important of the four @var{Prologue} +alternatives. + +At some point while developing your parser, you might decide to +provide @code{trace_token} to modules that are external to your +parser. Thus, you might wish for Bison to insert the prototype into +both the parser header file and the parser implementation file. Since +this function is not a dependency required by @code{YYSTYPE} or +@code{YYLTYPE}, it doesn't make sense to move its prototype to a +@code{%code requires}. More importantly, since it depends upon +@code{YYLTYPE} and @code{yytokentype}, @code{%code requires} is not +sufficient. Instead, move its prototype from the unqualified +@code{%code} to a @code{%code provides}: + +@example +@group +%code top @{ + #define _GNU_SOURCE + #include +@} +@end group + +@group +%code requires @{ + #include "ptypes.h" +@} +@end group +@group +%union @{ + long int n; + tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ +@} +@end group + +@group +%code requires @{ + #define YYLTYPE YYLTYPE + typedef struct YYLTYPE + @{ + int first_line; + int first_column; + int last_line; + int last_column; + char *filename; + @} YYLTYPE; +@} +@end group + +@group +%code provides @{ + void trace_token (enum yytokentype token, YYLTYPE loc); +@} +@end group + +@group +%code @{ + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(F, N, L) print_token_value (F, N, L) +@} +@end group + +@dots{} +@end example + +@noindent +Bison will insert the @code{trace_token} prototype into both the +parser header file and the parser implementation file after the +definitions for @code{yytokentype}, @code{YYLTYPE}, and +@code{YYSTYPE}. + +The above examples are careful to write directives in an order that +reflects the layout of the generated parser implementation and header +files: @code{%code top}, @code{%code requires}, @code{%code provides}, +and then @code{%code}. While your grammar files may generally be +easier to read if you also follow this order, Bison does not require +it. Instead, Bison lets you choose an organization that makes sense +to you. + +You may declare any of these directives multiple times in the grammar file. +In that case, Bison concatenates the contained code in declaration order. +This is the only way in which the position of one of these directives within +the grammar file affects its functionality. + +The result of the previous two properties is greater flexibility in how you may +organize your grammar file. +For example, you may organize semantic-type-related directives by semantic +type: + +@example +@group +%code requires @{ #include "type1.h" @} +%union @{ type1 field1; @} +%destructor @{ type1_free ($$); @} +%printer @{ type1_print (yyoutput, $$); @} +@end group + +@group +%code requires @{ #include "type2.h" @} +%union @{ type2 field2; @} +%destructor @{ type2_free ($$); @} +%printer @{ type2_print (yyoutput, $$); @} +@end group +@end example + +@noindent +You could even place each of the above directive groups in the rules section of +the grammar file next to the set of rules that uses the associated semantic +type. +(In the rules section, you must terminate each of those directives with a +semicolon.) +And you don't have to worry that some directive (like a @code{%union}) in the +definitions section is going to adversely affect their functionality in some +counter-intuitive manner just because it comes first. +Such an organization is not possible using @var{Prologue} sections. + +This section has been concerned with explaining the advantages of the four +@var{Prologue} alternatives over the original Yacc @var{Prologue}. +However, in most cases when using these directives, you shouldn't need to +think about all the low-level ordering issues discussed here. +Instead, you should simply use these directives to label each block of your +code according to its purpose and let Bison handle the ordering. +@code{%code} is the most generic label. +Move code to @code{%code requires}, @code{%code provides}, or @code{%code top} +as needed. + +@node Bison Declarations +@subsection The Bison Declarations Section +@cindex Bison declarations (introduction) +@cindex declarations, Bison (introduction) + +The @var{Bison declarations} section contains declarations that define +terminal and nonterminal symbols, specify precedence, and so on. +In some simple grammars you may not need any declarations. +@xref{Declarations, ,Bison Declarations}. + +@node Grammar Rules +@subsection The Grammar Rules Section +@cindex grammar rules section +@cindex rules section for grammar + +The @dfn{grammar rules} section contains one or more Bison grammar +rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}. + +There must always be at least one grammar rule, and the first +@samp{%%} (which precedes the grammar rules) may never be omitted even +if it is the first thing in the file. + +@node Epilogue +@subsection The epilogue +@cindex additional C code section +@cindex epilogue +@cindex C code, section for additional + +The @var{Epilogue} is copied verbatim to the end of the parser +implementation file, just as the @var{Prologue} is copied to the +beginning. This is the most convenient place to put anything that you +want to have in the parser implementation file but which need not come +before the definition of @code{yyparse}. For example, the definitions +of @code{yylex} and @code{yyerror} often go here. Because C requires +functions to be declared before being used, you often need to declare +functions like @code{yylex} and @code{yyerror} in the Prologue, even +if you define them in the Epilogue. @xref{Interface, ,Parser +C-Language Interface}. + +If the last section is empty, you may omit the @samp{%%} that separates it +from the grammar rules. + +The Bison parser itself contains many macros and identifiers whose names +start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using +any such names (except those documented in this manual) in the epilogue +of the grammar file. + +@node Symbols +@section Symbols, Terminal and Nonterminal +@cindex nonterminal symbol +@cindex terminal symbol +@cindex token type +@cindex symbol + +@dfn{Symbols} in Bison grammars represent the grammatical classifications +of the language. + +A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a +class of syntactically equivalent tokens. You use the symbol in grammar +rules to mean that a token in that class is allowed. The symbol is +represented in the Bison parser by a numeric code, and the @code{yylex} +function returns a token type code to indicate what kind of token has +been read. You don't need to know what the code value is; you can use +the symbol to stand for it. + +A @dfn{nonterminal symbol} stands for a class of syntactically +equivalent groupings. The symbol name is used in writing grammar rules. +By convention, it should be all lower case. + +Symbol names can contain letters, underscores, periods, and non-initial +digits and dashes. Dashes in symbol names are a GNU extension, incompatible +with POSIX Yacc. Periods and dashes make symbol names less convenient to +use with named references, which require brackets around such names +(@pxref{Named References}). Terminal symbols that contain periods or dashes +make little sense: since they are not valid symbols (in most programming +languages) they are not exported as token names. + +There are three ways of writing terminal symbols in the grammar: + +@itemize @bullet +@item +A @dfn{named token type} is written with an identifier, like an +identifier in C@. By convention, it should be all upper case. Each +such name must be defined with a Bison declaration such as +@code{%token}. @xref{Token Decl, ,Token Type Names}. + +@item +@cindex character token +@cindex literal token +@cindex single-character literal +A @dfn{character token type} (or @dfn{literal character token}) is +written in the grammar using the same syntax used in C for character +constants; for example, @code{'+'} is a character token type. A +character token type doesn't need to be declared unless you need to +specify its semantic value data type (@pxref{Value Type, ,Data Types of +Semantic Values}), associativity, or precedence (@pxref{Precedence, +,Operator Precedence}). + +By convention, a character token type is used only to represent a +token that consists of that particular character. Thus, the token +type @code{'+'} is used to represent the character @samp{+} as a +token. Nothing enforces this convention, but if you depart from it, +your program will confuse other readers. + +All the usual escape sequences used in character literals in C can be +used in Bison as well, but you must not use the null character as a +character literal because its numeric code, zero, signifies +end-of-input (@pxref{Calling Convention, ,Calling Convention +for @code{yylex}}). Also, unlike standard C, trigraphs have no +special meaning in Bison character literals, nor is backslash-newline +allowed. + +@item +@cindex string token +@cindex literal string token +@cindex multicharacter literal +A @dfn{literal string token} is written like a C string constant; for +example, @code{"<="} is a literal string token. A literal string token +doesn't need to be declared unless you need to specify its semantic +value data type (@pxref{Value Type}), associativity, or precedence +(@pxref{Precedence}). + +You can associate the literal string token with a symbolic name as an +alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token +Declarations}). If you don't do that, the lexical analyzer has to +retrieve the token number for the literal string token from the +@code{yytname} table (@pxref{Calling Convention}). + +@strong{Warning}: literal string tokens do not work in Yacc. + +By convention, a literal string token is used only to represent a token +that consists of that particular string. Thus, you should use the token +type @code{"<="} to represent the string @samp{<=} as a token. Bison +does not enforce this convention, but if you depart from it, people who +read your program will be confused. + +All the escape sequences used in string literals in C can be used in +Bison as well, except that you must not use a null character within a +string literal. Also, unlike Standard C, trigraphs have no special +meaning in Bison string literals, nor is backslash-newline allowed. A +literal string token must contain two or more characters; for a token +containing just one character, use a character token (see above). +@end itemize + +How you choose to write a terminal symbol has no effect on its +grammatical meaning. That depends only on where it appears in rules and +on when the parser function returns that symbol. + +The value returned by @code{yylex} is always one of the terminal +symbols, except that a zero or negative value signifies end-of-input. +Whichever way you write the token type in the grammar rules, you write +it the same way in the definition of @code{yylex}. The numeric code +for a character token type is simply the positive numeric code of the +character, so @code{yylex} can use the identical value to generate the +requisite code, though you may need to convert it to @code{unsigned +char} to avoid sign-extension on hosts where @code{char} is signed. +Each named token type becomes a C macro in the parser implementation +file, so @code{yylex} can use the name to stand for the code. (This +is why periods don't make sense in terminal symbols.) @xref{Calling +Convention, ,Calling Convention for @code{yylex}}. + +If @code{yylex} is defined in a separate file, you need to arrange for the +token-type macro definitions to be available there. Use the @samp{-d} +option when you run Bison, so that it will write these macro definitions +into a separate header file @file{@var{name}.tab.h} which you can include +in the other source files that need it. @xref{Invocation, ,Invoking Bison}. + +If you want to write a grammar that is portable to any Standard C +host, you must use only nonnull character tokens taken from the basic +execution character set of Standard C@. This set consists of the ten +digits, the 52 lower- and upper-case English letters, and the +characters in the following C-language string: + +@example +"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~" +@end example + +The @code{yylex} function and Bison must use a consistent character set +and encoding for character tokens. For example, if you run Bison in an +ASCII environment, but then compile and run the resulting +program in an environment that uses an incompatible character set like +EBCDIC, the resulting program may not work because the tables +generated by Bison will assume ASCII numeric values for +character tokens. It is standard practice for software distributions to +contain C source files that were generated by Bison in an +ASCII environment, so installers on platforms that are +incompatible with ASCII must rebuild those files before +compiling them. + +The symbol @code{error} is a terminal symbol reserved for error recovery +(@pxref{Error Recovery}); you shouldn't use it for any other purpose. +In particular, @code{yylex} should never return this value. The default +value of the error token is 256, unless you explicitly assigned 256 to +one of your tokens with a @code{%token} declaration. + +@node Rules +@section Syntax of Grammar Rules +@cindex rule syntax +@cindex grammar rule syntax +@cindex syntax of grammar rules + +A Bison grammar rule has the following general form: + +@example +@group +@var{result}: @var{components}@dots{}; +@end group +@end example + +@noindent +where @var{result} is the nonterminal symbol that this rule describes, +and @var{components} are various terminal and nonterminal symbols that +are put together by this rule (@pxref{Symbols}). + +For example, + +@example +@group +exp: exp '+' exp; +@end group +@end example + +@noindent +says that two groupings of type @code{exp}, with a @samp{+} token in between, +can be combined into a larger grouping of type @code{exp}. + +White space in rules is significant only to separate symbols. You can add +extra white space as you wish. + +Scattered among the components can be @var{actions} that determine +the semantics of the rule. An action looks like this: + +@example +@{@var{C statements}@} +@end example + +@noindent +@cindex braced code +This is an example of @dfn{braced code}, that is, C code surrounded by +braces, much like a compound statement in C@. Braced code can contain +any sequence of C tokens, so long as its braces are balanced. Bison +does not check the braced code for correctness directly; it merely +copies the code to the parser implementation file, where the C +compiler can check it. + +Within braced code, the balanced-brace count is not affected by braces +within comments, string literals, or character constants, but it is +affected by the C digraphs @samp{<%} and @samp{%>} that represent +braces. At the top level braced code must be terminated by @samp{@}} +and not by a digraph. Bison does not look for trigraphs, so if braced +code uses trigraphs you should ensure that they do not affect the +nesting of braces or the boundaries of comments, string literals, or +character constants. + +Usually there is only one action and it follows the components. +@xref{Actions}. + +@findex | +Multiple rules for the same @var{result} can be written separately or can +be joined with the vertical-bar character @samp{|} as follows: + +@example +@group +@var{result}: + @var{rule1-components}@dots{} +| @var{rule2-components}@dots{} +@dots{} +; +@end group +@end example + +@noindent +They are still considered distinct rules even when joined in this way. + +If @var{components} in a rule is empty, it means that @var{result} can +match the empty string. For example, here is how to define a +comma-separated sequence of zero or more @code{exp} groupings: + +@example +@group +expseq: + /* empty */ +| expseq1 +; +@end group + +@group +expseq1: + exp +| expseq1 ',' exp +; +@end group +@end example + +@noindent +It is customary to write a comment @samp{/* empty */} in each rule +with no components. + +@node Recursion +@section Recursive Rules +@cindex recursive rule + +A rule is called @dfn{recursive} when its @var{result} nonterminal +appears also on its right hand side. Nearly all Bison grammars need to +use recursion, because that is the only way to define a sequence of any +number of a particular thing. Consider this recursive definition of a +comma-separated sequence of one or more expressions: + +@example +@group +expseq1: + exp +| expseq1 ',' exp +; +@end group +@end example + +@cindex left recursion +@cindex right recursion +@noindent +Since the recursive use of @code{expseq1} is the leftmost symbol in the +right hand side, we call this @dfn{left recursion}. By contrast, here +the same construct is defined using @dfn{right recursion}: + +@example +@group +expseq1: + exp +| exp ',' expseq1 +; +@end group +@end example + +@noindent +Any kind of sequence can be defined using either left recursion or right +recursion, but you should always use left recursion, because it can +parse a sequence of any number of elements with bounded stack space. +Right recursion uses up space on the Bison stack in proportion to the +number of elements in the sequence, because all the elements must be +shifted onto the stack before the rule can be applied even once. +@xref{Algorithm, ,The Bison Parser Algorithm}, for further explanation +of this. + +@cindex mutual recursion +@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the +rule does not appear directly on its right hand side, but does appear +in rules for other nonterminals which do appear on its right hand +side. + +For example: + +@example +@group +expr: + primary +| primary '+' primary +; +@end group + +@group +primary: + constant +| '(' expr ')' +; +@end group +@end example + +@noindent +defines two mutually-recursive nonterminals, since each refers to the +other. + +@node Semantics +@section Defining Language Semantics +@cindex defining language semantics +@cindex language semantics, defining + +The grammar rules for a language determine only the syntax. The semantics +are determined by the semantic values associated with various tokens and +groupings, and by the actions taken when various groupings are recognized. + +For example, the calculator calculates properly because the value +associated with each expression is the proper number; it adds properly +because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add +the numbers associated with @var{x} and @var{y}. + +@menu +* Value Type:: Specifying one data type for all semantic values. +* Multiple Types:: Specifying several alternative data types. +* Actions:: An action is the semantic definition of a grammar rule. +* Action Types:: Specifying data types for actions to operate on. +* Mid-Rule Actions:: Most actions go at the end of a rule. + This says when, why and how to use the exceptional + action in the middle of a rule. +@end menu + +@node Value Type +@subsection Data Types of Semantic Values +@cindex semantic value type +@cindex value type, semantic +@cindex data types of semantic values +@cindex default data type + +In a simple program it may be sufficient to use the same data type for +the semantic values of all language constructs. This was true in the +RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish +Notation Calculator}). + +Bison normally uses the type @code{int} for semantic values if your +program uses the same data type for all language constructs. To +specify some other type, define @code{YYSTYPE} as a macro, like this: + +@example +#define YYSTYPE double +@end example + +@noindent +@code{YYSTYPE}'s replacement list should be a type name +that does not contain parentheses or square brackets. +This macro definition must go in the prologue of the grammar file +(@pxref{Grammar Outline, ,Outline of a Bison Grammar}). + +@node Multiple Types +@subsection More Than One Value Type + +In most programs, you will need different data types for different kinds +of tokens and groupings. For example, a numeric constant may need type +@code{int} or @code{long int}, while a string constant needs type +@code{char *}, and an identifier might need a pointer to an entry in the +symbol table. + +To use more than one data type for semantic values in one parser, Bison +requires you to do two things: + +@itemize @bullet +@item +Specify the entire collection of possible data types, either by using the +@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of +Value Types}), or by using a @code{typedef} or a @code{#define} to +define @code{YYSTYPE} to be a union type whose member names are +the type tags. + +@item +Choose one of those types for each symbol (terminal or nonterminal) for +which semantic values are used. This is done for tokens with the +@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) +and for groupings with the @code{%type} Bison declaration (@pxref{Type +Decl, ,Nonterminal Symbols}). +@end itemize + +@node Actions +@subsection Actions +@cindex action +@vindex $$ +@vindex $@var{n} +@vindex $@var{name} +@vindex $[@var{name}] + +An action accompanies a syntactic rule and contains C code to be executed +each time an instance of that rule is recognized. The task of most actions +is to compute a semantic value for the grouping built by the rule from the +semantic values associated with tokens or smaller groupings. + +An action consists of braced code containing C statements, and can be +placed at any position in the rule; +it is executed at that position. Most rules have just one action at the +end of the rule, following all the components. Actions in the middle of +a rule are tricky and used only for special purposes (@pxref{Mid-Rule +Actions, ,Actions in Mid-Rule}). + +The C code in an action can refer to the semantic values of the +components matched by the rule with the construct @code{$@var{n}}, +which stands for the value of the @var{n}th component. The semantic +value for the grouping being constructed is @code{$$}. In addition, +the semantic values of symbols can be accessed with the named +references construct @code{$@var{name}} or @code{$[@var{name}]}. +Bison translates both of these constructs into expressions of the +appropriate type when it copies the actions into the parser +implementation file. @code{$$} (or @code{$@var{name}}, when it stands +for the current grouping) is translated to a modifiable lvalue, so it +can be assigned to. + +Here is a typical example: + +@example +@group +exp: +@dots{} +| exp '+' exp @{ $$ = $1 + $3; @} +@end group +@end example + +Or, in terms of named references: + +@example +@group +exp[result]: +@dots{} +| exp[left] '+' exp[right] @{ $result = $left + $right; @} +@end group +@end example + +@noindent +This rule constructs an @code{exp} from two smaller @code{exp} groupings +connected by a plus-sign token. In the action, @code{$1} and @code{$3} +(@code{$left} and @code{$right}) +refer to the semantic values of the two component @code{exp} groupings, +which are the first and third symbols on the right hand side of the rule. +The sum is stored into @code{$$} (@code{$result}) so that it becomes the +semantic value of +the addition-expression just recognized by the rule. If there were a +useful semantic value associated with the @samp{+} token, it could be +referred to as @code{$2}. + +@xref{Named References}, for more information about using the named +references construct. + +Note that the vertical-bar character @samp{|} is really a rule +separator, and actions are attached to a single rule. This is a +difference with tools like Flex, for which @samp{|} stands for either +``or'', or ``the same action as that of the next rule''. In the +following example, the action is triggered only when @samp{b} is found: + +@example +@group +a-or-b: 'a'|'b' @{ a_or_b_found = 1; @}; +@end group +@end example + +@cindex default action +If you don't specify an action for a rule, Bison supplies a default: +@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule +becomes the value of the whole rule. Of course, the default action is +valid only if the two data types match. There is no meaningful default +action for an empty rule; every empty rule must have an explicit action +unless the rule's value does not matter. + +@code{$@var{n}} with @var{n} zero or negative is allowed for reference +to tokens and groupings on the stack @emph{before} those that match the +current rule. This is a very risky practice, and to use it reliably +you must be certain of the context in which the rule is applied. Here +is a case in which you can use this reliably: + +@example +@group +foo: + expr bar '+' expr @{ @dots{} @} +| expr bar '-' expr @{ @dots{} @} +; +@end group + +@group +bar: + /* empty */ @{ previous_expr = $0; @} +; +@end group +@end example + +As long as @code{bar} is used only in the fashion shown here, @code{$0} +always refers to the @code{expr} which precedes @code{bar} in the +definition of @code{foo}. + +@vindex yylval +It is also possible to access the semantic value of the lookahead token, if +any, from a semantic action. +This semantic value is stored in @code{yylval}. +@xref{Action Features, ,Special Features for Use in Actions}. + +@node Action Types +@subsection Data Types of Values in Actions +@cindex action data types +@cindex data types in actions + +If you have chosen a single data type for semantic values, the @code{$$} +and @code{$@var{n}} constructs always have that data type. + +If you have used @code{%union} to specify a variety of data types, then you +must declare a choice among these types for each terminal or nonterminal +symbol that can have a semantic value. Then each time you use @code{$$} or +@code{$@var{n}}, its data type is determined by which symbol it refers to +in the rule. In this example, + +@example +@group +exp: + @dots{} +| exp '+' exp @{ $$ = $1 + $3; @} +@end group +@end example + +@noindent +@code{$1} and @code{$3} refer to instances of @code{exp}, so they all +have the data type declared for the nonterminal symbol @code{exp}. If +@code{$2} were used, it would have the data type declared for the +terminal symbol @code{'+'}, whatever that might be. + +Alternatively, you can specify the data type when you refer to the value, +by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the +reference. For example, if you have defined types as shown here: + +@example +@group +%union @{ + int itype; + double dtype; +@} +@end group +@end example + +@noindent +then you can write @code{$1} to refer to the first subunit of the +rule as an integer, or @code{$1} to refer to it as a double. + +@node Mid-Rule Actions +@subsection Actions in Mid-Rule +@cindex actions in mid-rule +@cindex mid-rule actions + +Occasionally it is useful to put an action in the middle of a rule. +These actions are written just like usual end-of-rule actions, but they +are executed before the parser even recognizes the following components. + +A mid-rule action may refer to the components preceding it using +@code{$@var{n}}, but it may not refer to subsequent components because +it is run before they are parsed. + +The mid-rule action itself counts as one of the components of the rule. +This makes a difference when there is another action later in the same rule +(and usually there is another at the end): you have to count the actions +along with the symbols when working out which number @var{n} to use in +@code{$@var{n}}. + +The mid-rule action can also have a semantic value. The action can set +its value with an assignment to @code{$$}, and actions later in the rule +can refer to the value using @code{$@var{n}}. Since there is no symbol +to name the action, there is no way to declare a data type for the value +in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to +specify a data type each time you refer to this value. + +There is no way to set the value of the entire rule with a mid-rule +action, because assignments to @code{$$} do not have that effect. The +only way to set the value for the entire rule is with an ordinary action +at the end of the rule. + +Here is an example from a hypothetical compiler, handling a @code{let} +statement that looks like @samp{let (@var{variable}) @var{statement}} and +serves to create a variable named @var{variable} temporarily for the +duration of @var{statement}. To parse this construct, we must put +@var{variable} into the symbol table while @var{statement} is parsed, then +remove it afterward. Here is how it is done: + +@example +@group +stmt: + LET '(' var ')' + @{ $$ = push_context (); declare_variable ($3); @} + stmt + @{ $$ = $6; pop_context ($5); @} +@end group +@end example + +@noindent +As soon as @samp{let (@var{variable})} has been recognized, the first +action is run. It saves a copy of the current semantic context (the +list of accessible variables) as its semantic value, using alternative +@code{context} in the data-type union. Then it calls +@code{declare_variable} to add the new variable to that list. Once the +first action is finished, the embedded statement @code{stmt} can be +parsed. Note that the mid-rule action is component number 5, so the +@samp{stmt} is component number 6. + +After the embedded statement is parsed, its semantic value becomes the +value of the entire @code{let}-statement. Then the semantic value from the +earlier action is used to restore the prior list of variables. This +removes the temporary @code{let}-variable from the list so that it won't +appear to exist while the rest of the program is parsed. + +@findex %destructor +@cindex discarded symbols, mid-rule actions +@cindex error recovery, mid-rule actions +In the above example, if the parser initiates error recovery (@pxref{Error +Recovery}) while parsing the tokens in the embedded statement @code{stmt}, +it might discard the previous semantic context @code{$5} without +restoring it. +Thus, @code{$5} needs a destructor (@pxref{Destructor Decl, , Freeing +Discarded Symbols}). +However, Bison currently provides no means to declare a destructor specific to +a particular mid-rule action's semantic value. + +One solution is to bury the mid-rule action inside a nonterminal symbol and to +declare a destructor for that symbol: + +@example +@group +%type let +%destructor @{ pop_context ($$); @} let + +%% + +stmt: + let stmt + @{ + $$ = $2; + pop_context ($1); + @}; + +let: + LET '(' var ')' + @{ + $$ = push_context (); + declare_variable ($3); + @}; + +@end group +@end example + +@noindent +Note that the action is now at the end of its rule. +Any mid-rule action can be converted to an end-of-rule action in this way, and +this is what Bison actually does to implement mid-rule actions. + +Taking action before a rule is completely recognized often leads to +conflicts since the parser must commit to a parse in order to execute the +action. For example, the following two rules, without mid-rule actions, +can coexist in a working parser because the parser can shift the open-brace +token and look at what follows before deciding whether there is a +declaration or not: + +@example +@group +compound: + '@{' declarations statements '@}' +| '@{' statements '@}' +; +@end group +@end example + +@noindent +But when we add a mid-rule action as follows, the rules become nonfunctional: + +@example +@group +compound: + @{ prepare_for_local_variables (); @} + '@{' declarations statements '@}' +@end group +@group +| '@{' statements '@}' +; +@end group +@end example + +@noindent +Now the parser is forced to decide whether to run the mid-rule action +when it has read no farther than the open-brace. In other words, it +must commit to using one rule or the other, without sufficient +information to do it correctly. (The open-brace token is what is called +the @dfn{lookahead} token at this time, since the parser is still +deciding what to do about it. @xref{Lookahead, ,Lookahead Tokens}.) + +You might think that you could correct the problem by putting identical +actions into the two rules, like this: + +@example +@group +compound: + @{ prepare_for_local_variables (); @} + '@{' declarations statements '@}' +| @{ prepare_for_local_variables (); @} + '@{' statements '@}' +; +@end group +@end example + +@noindent +But this does not help, because Bison does not realize that the two actions +are identical. (Bison never tries to understand the C code in an action.) + +If the grammar is such that a declaration can be distinguished from a +statement by the first token (which is true in C), then one solution which +does work is to put the action after the open-brace, like this: + +@example +@group +compound: + '@{' @{ prepare_for_local_variables (); @} + declarations statements '@}' +| '@{' statements '@}' +; +@end group +@end example + +@noindent +Now the first token of the following declaration or statement, +which would in any case tell Bison which rule to use, can still do so. + +Another solution is to bury the action inside a nonterminal symbol which +serves as a subroutine: + +@example +@group +subroutine: + /* empty */ @{ prepare_for_local_variables (); @} +; +@end group + +@group +compound: + subroutine '@{' declarations statements '@}' +| subroutine '@{' statements '@}' +; +@end group +@end example + +@noindent +Now Bison can execute the action in the rule for @code{subroutine} without +deciding which rule for @code{compound} it will eventually use. + +@node Tracking Locations +@section Tracking Locations +@cindex location +@cindex textual location +@cindex location, textual + +Though grammar rules and semantic actions are enough to write a fully +functional parser, it can be useful to process some additional information, +especially symbol locations. + +The way locations are handled is defined by providing a data type, and +actions to take when rules are matched. + +@menu +* Location Type:: Specifying a data type for locations. +* Actions and Locations:: Using locations in actions. +* Location Default Action:: Defining a general way to compute locations. +@end menu + +@node Location Type +@subsection Data Type of Locations +@cindex data type of locations +@cindex default location type + +Defining a data type for locations is much simpler than for semantic values, +since all tokens and groupings always use the same type. + +You can specify the type of locations by defining a macro called +@code{YYLTYPE}, just as you can specify the semantic value type by +defining a @code{YYSTYPE} macro (@pxref{Value Type}). +When @code{YYLTYPE} is not defined, Bison uses a default structure type with +four members: + +@example +typedef struct YYLTYPE +@{ + int first_line; + int first_column; + int last_line; + int last_column; +@} YYLTYPE; +@end example + +When @code{YYLTYPE} is not defined, at the beginning of the parsing, Bison +initializes all these fields to 1 for @code{yylloc}. To initialize +@code{yylloc} with a custom location type (or to chose a different +initialization), use the @code{%initial-action} directive. @xref{Initial +Action Decl, , Performing Actions before Parsing}. + +@node Actions and Locations +@subsection Actions and Locations +@cindex location actions +@cindex actions, location +@vindex @@$ +@vindex @@@var{n} +@vindex @@@var{name} +@vindex @@[@var{name}] + +Actions are not only useful for defining language semantics, but also for +describing the behavior of the output parser with locations. + +The most obvious way for building locations of syntactic groupings is very +similar to the way semantic values are computed. In a given rule, several +constructs can be used to access the locations of the elements being matched. +The location of the @var{n}th component of the right hand side is +@code{@@@var{n}}, while the location of the left hand side grouping is +@code{@@$}. + +In addition, the named references construct @code{@@@var{name}} and +@code{@@[@var{name}]} may also be used to address the symbol locations. +@xref{Named References}, for more information about using the named +references construct. + +Here is a basic example using the default data type for locations: + +@example +@group +exp: + @dots{} +| exp '/' exp + @{ + @@$.first_column = @@1.first_column; + @@$.first_line = @@1.first_line; + @@$.last_column = @@3.last_column; + @@$.last_line = @@3.last_line; + if ($3) + $$ = $1 / $3; + else + @{ + $$ = 1; + fprintf (stderr, + "Division by zero, l%d,c%d-l%d,c%d", + @@3.first_line, @@3.first_column, + @@3.last_line, @@3.last_column); + @} + @} +@end group +@end example + +As for semantic values, there is a default action for locations that is +run each time a rule is matched. It sets the beginning of @code{@@$} to the +beginning of the first symbol, and the end of @code{@@$} to the end of the +last symbol. + +With this default action, the location tracking can be fully automatic. The +example above simply rewrites this way: + +@example +@group +exp: + @dots{} +| exp '/' exp + @{ + if ($3) + $$ = $1 / $3; + else + @{ + $$ = 1; + fprintf (stderr, + "Division by zero, l%d,c%d-l%d,c%d", + @@3.first_line, @@3.first_column, + @@3.last_line, @@3.last_column); + @} + @} +@end group +@end example + +@vindex yylloc +It is also possible to access the location of the lookahead token, if any, +from a semantic action. +This location is stored in @code{yylloc}. +@xref{Action Features, ,Special Features for Use in Actions}. + +@node Location Default Action +@subsection Default Action for Locations +@vindex YYLLOC_DEFAULT +@cindex GLR parsers and @code{YYLLOC_DEFAULT} + +Actually, actions are not the best place to compute locations. Since +locations are much more general than semantic values, there is room in +the output parser to redefine the default action to take for each +rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is +matched, before the associated action is run. It is also invoked +while processing a syntax error, to compute the error's location. +Before reporting an unresolvable syntactic ambiguity, a GLR +parser invokes @code{YYLLOC_DEFAULT} recursively to compute the location +of that ambiguity. + +Most of the time, this macro is general enough to suppress location +dedicated code from semantic actions. + +The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is +the location of the grouping (the result of the computation). When a +rule is matched, the second parameter identifies locations of +all right hand side elements of the rule being matched, and the third +parameter is the size of the rule's right hand side. +When a GLR parser reports an ambiguity, which of multiple candidate +right hand sides it passes to @code{YYLLOC_DEFAULT} is undefined. +When processing a syntax error, the second parameter identifies locations +of the symbols that were discarded during error processing, and the third +parameter is the number of discarded symbols. + +By default, @code{YYLLOC_DEFAULT} is defined this way: + +@example +@group +# define YYLLOC_DEFAULT(Cur, Rhs, N) \ +do \ + if (N) \ + @{ \ + (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \ + (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \ + (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \ + (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \ + @} \ + else \ + @{ \ + (Cur).first_line = (Cur).last_line = \ + YYRHSLOC(Rhs, 0).last_line; \ + (Cur).first_column = (Cur).last_column = \ + YYRHSLOC(Rhs, 0).last_column; \ + @} \ +while (0) +@end group +@end example + +@noindent +where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol +in @var{rhs} when @var{k} is positive, and the location of the symbol +just before the reduction when @var{k} and @var{n} are both zero. + +When defining @code{YYLLOC_DEFAULT}, you should consider that: + +@itemize @bullet +@item +All arguments are free of side-effects. However, only the first one (the +result) should be modified by @code{YYLLOC_DEFAULT}. + +@item +For consistency with semantic actions, valid indexes within the +right hand side range from 1 to @var{n}. When @var{n} is zero, only 0 is a +valid index, and it refers to the symbol just before the reduction. +During error processing @var{n} is always positive. + +@item +Your macro should parenthesize its arguments, if need be, since the +actual arguments may not be surrounded by parentheses. Also, your +macro should expand to something that can be used as a single +statement when it is followed by a semicolon. +@end itemize + +@node Named References +@section Named References +@cindex named references + +As described in the preceding sections, the traditional way to refer to any +semantic value or location is a @dfn{positional reference}, which takes the +form @code{$@var{n}}, @code{$$}, @code{@@@var{n}}, and @code{@@$}. However, +such a reference is not very descriptive. Moreover, if you later decide to +insert or remove symbols in the right-hand side of a grammar rule, the need +to renumber such references can be tedious and error-prone. + +To avoid these issues, you can also refer to a semantic value or location +using a @dfn{named reference}. First of all, original symbol names may be +used as named references. For example: + +@example +@group +invocation: op '(' args ')' + @{ $invocation = new_invocation ($op, $args, @@invocation); @} +@end group +@end example + +@noindent +Positional and named references can be mixed arbitrarily. For example: + +@example +@group +invocation: op '(' args ')' + @{ $$ = new_invocation ($op, $args, @@$); @} +@end group +@end example + +@noindent +However, sometimes regular symbol names are not sufficient due to +ambiguities: + +@example +@group +exp: exp '/' exp + @{ $exp = $exp / $exp; @} // $exp is ambiguous. + +exp: exp '/' exp + @{ $$ = $1 / $exp; @} // One usage is ambiguous. + +exp: exp '/' exp + @{ $$ = $1 / $3; @} // No error. +@end group +@end example + +@noindent +When ambiguity occurs, explicitly declared names may be used for values and +locations. Explicit names are declared as a bracketed name after a symbol +appearance in rule definitions. For example: +@example +@group +exp[result]: exp[left] '/' exp[right] + @{ $result = $left / $right; @} +@end group +@end example + +@noindent +In order to access a semantic value generated by a mid-rule action, an +explicit name may also be declared by putting a bracketed name after the +closing brace of the mid-rule action code: +@example +@group +exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right] + @{ $res = $left + $right; @} +@end group +@end example + +@noindent + +In references, in order to specify names containing dots and dashes, an explicit +bracketed syntax @code{$[name]} and @code{@@[name]} must be used: +@example +@group +if-stmt: "if" '(' expr ')' "then" then.stmt ';' + @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @} +@end group +@end example + +It often happens that named references are followed by a dot, dash or other +C punctuation marks and operators. By default, Bison will read +@samp{$name.suffix} as a reference to symbol value @code{$name} followed by +@samp{.suffix}, i.e., an access to the @code{suffix} field of the semantic +value. In order to force Bison to recognize @samp{name.suffix} in its +entirety as the name of a semantic value, the bracketed syntax +@samp{$[name.suffix]} must be used. + +The named references feature is experimental. More user feedback will help +to stabilize it. + +@node Declarations +@section Bison Declarations +@cindex declarations, Bison +@cindex Bison declarations + +The @dfn{Bison declarations} section of a Bison grammar defines the symbols +used in formulating the grammar and the data types of semantic values. +@xref{Symbols}. + +All token type names (but not single-character literal tokens such as +@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be +declared if you need to specify which data type to use for the semantic +value (@pxref{Multiple Types, ,More Than One Value Type}). + +The first rule in the grammar file also specifies the start symbol, by +default. If you want some other symbol to be the start symbol, you +must declare it explicitly (@pxref{Language and Grammar, ,Languages +and Context-Free Grammars}). + +@menu +* Require Decl:: Requiring a Bison version. +* Token Decl:: Declaring terminal symbols. +* Precedence Decl:: Declaring terminals with precedence and associativity. +* Union Decl:: Declaring the set of all semantic value types. +* Type Decl:: Declaring the choice of type for a nonterminal symbol. +* Initial Action Decl:: Code run before parsing starts. +* Destructor Decl:: Declaring how symbols are freed. +* Printer Decl:: Declaring how symbol values are displayed. +* Expect Decl:: Suppressing warnings about parsing conflicts. +* Start Decl:: Specifying the start symbol. +* Pure Decl:: Requesting a reentrant parser. +* Push Decl:: Requesting a push parser. +* Decl Summary:: Table of all Bison declarations. +* %define Summary:: Defining variables to adjust Bison's behavior. +* %code Summary:: Inserting code into the parser source. +@end menu + +@node Require Decl +@subsection Require a Version of Bison +@cindex version requirement +@cindex requiring a version of Bison +@findex %require + +You may require the minimum version of Bison to process the grammar. If +the requirement is not met, @command{bison} exits with an error (exit +status 63). + +@example +%require "@var{version}" +@end example + +@node Token Decl +@subsection Token Type Names +@cindex declaring token type names +@cindex token type names, declaring +@cindex declaring literal string tokens +@findex %token + +The basic way to declare a token type name (terminal symbol) is as follows: + +@example +%token @var{name} +@end example + +Bison will convert this into a @code{#define} directive in +the parser, so that the function @code{yylex} (if it is in this file) +can use the name @var{name} to stand for this token type's code. + +Alternatively, you can use @code{%left}, @code{%right}, or +@code{%nonassoc} instead of @code{%token}, if you wish to specify +associativity and precedence. @xref{Precedence Decl, ,Operator +Precedence}. + +You can explicitly specify the numeric code for a token type by appending +a nonnegative decimal or hexadecimal integer value in the field immediately +following the token name: + +@example +%token NUM 300 +%token XNUM 0x12d // a GNU extension +@end example + +@noindent +It is generally best, however, to let Bison choose the numeric codes for +all token types. Bison will automatically select codes that don't conflict +with each other or with normal characters. + +In the event that the stack type is a union, you must augment the +@code{%token} or other token declaration to include the data type +alternative delimited by angle-brackets (@pxref{Multiple Types, ,More +Than One Value Type}). + +For example: + +@example +@group +%union @{ /* define stack type */ + double val; + symrec *tptr; +@} +%token NUM /* define token NUM and its type */ +@end group +@end example + +You can associate a literal string token with a token type name by +writing the literal string at the end of a @code{%token} +declaration which declares the name. For example: + +@example +%token arrow "=>" +@end example + +@noindent +For example, a grammar for the C language might specify these names with +equivalent literal string tokens: + +@example +%token OR "||" +%token LE 134 "<=" +%left OR "<=" +@end example + +@noindent +Once you equate the literal string and the token name, you can use them +interchangeably in further declarations or the grammar rules. The +@code{yylex} function can use the token name or the literal string to +obtain the token type code number (@pxref{Calling Convention}). +Syntax error messages passed to @code{yyerror} from the parser will reference +the literal string instead of the token name. + +The token numbered as 0 corresponds to end of file; the following line +allows for nicer error messages referring to ``end of file'' instead +of ``$end'': + +@example +%token END 0 "end of file" +@end example + +@node Precedence Decl +@subsection Operator Precedence +@cindex precedence declarations +@cindex declaring operator precedence +@cindex operator precedence, declaring + +Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to +declare a token and specify its precedence and associativity, all at +once. These are called @dfn{precedence declarations}. +@xref{Precedence, ,Operator Precedence}, for general information on +operator precedence. + +The syntax of a precedence declaration is nearly the same as that of +@code{%token}: either + +@example +%left @var{symbols}@dots{} +@end example + +@noindent +or + +@example +%left <@var{type}> @var{symbols}@dots{} +@end example + +And indeed any of these declarations serves the purposes of @code{%token}. +But in addition, they specify the associativity and relative precedence for +all the @var{symbols}: + +@itemize @bullet +@item +The associativity of an operator @var{op} determines how repeated uses +of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op} +@var{z}} is parsed by grouping @var{x} with @var{y} first or by +grouping @var{y} with @var{z} first. @code{%left} specifies +left-associativity (grouping @var{x} with @var{y} first) and +@code{%right} specifies right-associativity (grouping @var{y} with +@var{z} first). @code{%nonassoc} specifies no associativity, which +means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is +considered a syntax error. + +@item +The precedence of an operator determines how it nests with other operators. +All the tokens declared in a single precedence declaration have equal +precedence and nest together according to their associativity. +When two tokens declared in different precedence declarations associate, +the one declared later has the higher precedence and is grouped first. +@end itemize + +For backward compatibility, there is a confusing difference between the +argument lists of @code{%token} and precedence declarations. +Only a @code{%token} can associate a literal string with a token type name. +A precedence declaration always interprets a literal string as a reference to a +separate token. +For example: + +@example +%left OR "<=" // Does not declare an alias. +%left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=". +@end example + +@node Union Decl +@subsection The Collection of Value Types +@cindex declaring value types +@cindex value types, declaring +@findex %union + +The @code{%union} declaration specifies the entire collection of +possible data types for semantic values. The keyword @code{%union} is +followed by braced code containing the same thing that goes inside a +@code{union} in C@. + +For example: + +@example +@group +%union @{ + double val; + symrec *tptr; +@} +@end group +@end example + +@noindent +This says that the two alternative types are @code{double} and @code{symrec +*}. They are given names @code{val} and @code{tptr}; these names are used +in the @code{%token} and @code{%type} declarations to pick one of the types +for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}). + +As an extension to POSIX, a tag is allowed after the +@code{union}. For example: + +@example +@group +%union value @{ + double val; + symrec *tptr; +@} +@end group +@end example + +@noindent +specifies the union tag @code{value}, so the corresponding C type is +@code{union value}. If you do not specify a tag, it defaults to +@code{YYSTYPE}. + +As another extension to POSIX, you may specify multiple +@code{%union} declarations; their contents are concatenated. However, +only the first @code{%union} declaration can specify a tag. + +Note that, unlike making a @code{union} declaration in C, you need not write +a semicolon after the closing brace. + +Instead of @code{%union}, you can define and use your own union type +@code{YYSTYPE} if your grammar contains at least one +@samp{<@var{type}>} tag. For example, you can put the following into +a header file @file{parser.h}: + +@example +@group +union YYSTYPE @{ + double val; + symrec *tptr; +@}; +typedef union YYSTYPE YYSTYPE; +@end group +@end example + +@noindent +and then your grammar can use the following +instead of @code{%union}: + +@example +@group +%@{ +#include "parser.h" +%@} +%type expr +%token ID +@end group +@end example + +@node Type Decl +@subsection Nonterminal Symbols +@cindex declaring value types, nonterminals +@cindex value types, nonterminals, declaring +@findex %type + +@noindent +When you use @code{%union} to specify multiple value types, you must +declare the value type of each nonterminal symbol for which values are +used. This is done with a @code{%type} declaration, like this: + +@example +%type <@var{type}> @var{nonterminal}@dots{} +@end example + +@noindent +Here @var{nonterminal} is the name of a nonterminal symbol, and +@var{type} is the name given in the @code{%union} to the alternative +that you want (@pxref{Union Decl, ,The Collection of Value Types}). You +can give any number of nonterminal symbols in the same @code{%type} +declaration, if they have the same value type. Use spaces to separate +the symbol names. + +You can also declare the value type of a terminal symbol. To do this, +use the same @code{<@var{type}>} construction in a declaration for the +terminal symbol. All kinds of token declarations allow +@code{<@var{type}>}. + +@node Initial Action Decl +@subsection Performing Actions before Parsing +@findex %initial-action + +Sometimes your parser needs to perform some initializations before +parsing. The @code{%initial-action} directive allows for such arbitrary +code. + +@deffn {Directive} %initial-action @{ @var{code} @} +@findex %initial-action +Declare that the braced @var{code} must be invoked before parsing each time +@code{yyparse} is called. The @var{code} may use @code{$$} and +@code{@@$} --- initial value and location of the lookahead --- and the +@code{%parse-param}. +@end deffn + +For instance, if your locations use a file name, you may use + +@example +%parse-param @{ char const *file_name @}; +%initial-action +@{ + @@$.initialize (file_name); +@}; +@end example + + +@node Destructor Decl +@subsection Freeing Discarded Symbols +@cindex freeing discarded symbols +@findex %destructor +@findex <*> +@findex <> +During error recovery (@pxref{Error Recovery}), symbols already pushed +on the stack and tokens coming from the rest of the file are discarded +until the parser falls on its feet. If the parser runs out of memory, +or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the +symbols on the stack must be discarded. Even if the parser succeeds, it +must discard the start symbol. + +When discarded symbols convey heap based information, this memory is +lost. While this behavior can be tolerable for batch parsers, such as +in traditional compilers, it is unacceptable for programs like shells or +protocol implementations that may parse and execute indefinitely. + +The @code{%destructor} directive defines code that is called when a +symbol is automatically discarded. + +@deffn {Directive} %destructor @{ @var{code} @} @var{symbols} +@findex %destructor +Invoke the braced @var{code} whenever the parser discards one of the +@var{symbols}. +Within @var{code}, @code{$$} designates the semantic value associated +with the discarded symbol, and @code{@@$} designates its location. +The additional parser parameters are also available (@pxref{Parser Function, , +The Parser Function @code{yyparse}}). + +When a symbol is listed among @var{symbols}, its @code{%destructor} is called a +per-symbol @code{%destructor}. +You may also define a per-type @code{%destructor} by listing a semantic type +tag among @var{symbols}. +In that case, the parser will invoke this @var{code} whenever it discards any +grammar symbol that has that semantic type tag unless that symbol has its own +per-symbol @code{%destructor}. + +Finally, you can define two different kinds of default @code{%destructor}s. +(These default forms are experimental. +More user feedback will help to determine whether they should become permanent +features.) +You can place each of @code{<*>} and @code{<>} in the @var{symbols} list of +exactly one @code{%destructor} declaration in your grammar file. +The parser will invoke the @var{code} associated with one of these whenever it +discards any user-defined grammar symbol that has no per-symbol and no per-type +@code{%destructor}. +The parser uses the @var{code} for @code{<*>} in the case of such a grammar +symbol for which you have formally declared a semantic type tag (@code{%type} +counts as such a declaration, but @code{$$} does not). +The parser uses the @var{code} for @code{<>} in the case of such a grammar +symbol that has no declared semantic type tag. +@end deffn + +@noindent +For example: + +@example +%union @{ char *string; @} +%token STRING1 +%token STRING2 +%type string1 +%type string2 +%union @{ char character; @} +%token CHR +%type chr +%token TAGLESS + +%destructor @{ @} +%destructor @{ free ($$); @} <*> +%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1 +%destructor @{ printf ("Discarding tagless symbol.\n"); @} <> +@end example + +@noindent +guarantees that, when the parser discards any user-defined symbol that has a +semantic type tag other than @code{}, it passes its semantic value +to @code{free} by default. +However, when the parser discards a @code{STRING1} or a @code{string1}, it also +prints its line number to @code{stdout}. +It performs only the second @code{%destructor} in this case, so it invokes +@code{free} only once. +Finally, the parser merely prints a message whenever it discards any symbol, +such as @code{TAGLESS}, that has no semantic type tag. + +A Bison-generated parser invokes the default @code{%destructor}s only for +user-defined as opposed to Bison-defined symbols. +For example, the parser will not invoke either kind of default +@code{%destructor} for the special Bison-defined symbols @code{$accept}, +@code{$undefined}, or @code{$end} (@pxref{Table of Symbols, ,Bison Symbols}), +none of which you can reference in your grammar. +It also will not invoke either for the @code{error} token (@pxref{Table of +Symbols, ,error}), which is always defined by Bison regardless of whether you +reference it in your grammar. +However, it may invoke one of them for the end token (token 0) if you +redefine it from @code{$end} to, for example, @code{END}: + +@example +%token END 0 +@end example + +@cindex actions in mid-rule +@cindex mid-rule actions +Finally, Bison will never invoke a @code{%destructor} for an unreferenced +mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}). +That is, Bison does not consider a mid-rule to have a semantic value if you +do not reference @code{$$} in the mid-rule's action or @code{$@var{n}} +(where @var{n} is the right-hand side symbol position of the mid-rule) in +any later action in that rule. However, if you do reference either, the +Bison-generated parser will invoke the @code{<>} @code{%destructor} whenever +it discards the mid-rule symbol. + +@ignore +@noindent +In the future, it may be possible to redefine the @code{error} token as a +nonterminal that captures the discarded symbols. +In that case, the parser will invoke the default destructor for it as well. +@end ignore + +@sp 1 + +@cindex discarded symbols +@dfn{Discarded symbols} are the following: + +@itemize +@item +stacked symbols popped during the first phase of error recovery, +@item +incoming terminals during the second phase of error recovery, +@item +the current lookahead and the entire stack (except the current +right-hand side symbols) when the parser returns immediately, and +@item +the start symbol, when the parser succeeds. +@end itemize + +The parser can @dfn{return immediately} because of an explicit call to +@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory +exhaustion. + +Right-hand side symbols of a rule that explicitly triggers a syntax +error via @code{YYERROR} are not discarded automatically. As a rule +of thumb, destructors are invoked only when user actions cannot manage +the memory. + +@node Printer Decl +@subsection Printing Semantic Values +@cindex printing semantic values +@findex %printer +@findex <*> +@findex <> +When run-time traces are enabled (@pxref{Tracing, ,Tracing Your Parser}), +the parser reports its actions, such as reductions. When a symbol involved +in an action is reported, only its kind is displayed, as the parser cannot +know how semantic values should be formatted. + +The @code{%printer} directive defines code that is called when a symbol is +reported. Its syntax is the same as @code{%destructor} (@pxref{Destructor +Decl, , Freeing Discarded Symbols}). + +@deffn {Directive} %printer @{ @var{code} @} @var{symbols} +@findex %printer +@vindex yyoutput +@c This is the same text as for %destructor. +Invoke the braced @var{code} whenever the parser displays one of the +@var{symbols}. Within @var{code}, @code{yyoutput} denotes the output stream +(a @code{FILE*} in C, and an @code{std::ostream&} in C++), +@code{$$} designates the semantic value associated with the symbol, and +@code{@@$} its location. The additional parser parameters are also +available (@pxref{Parser Function, , The Parser Function @code{yyparse}}). + +The @var{symbols} are defined as for @code{%destructor} (@pxref{Destructor +Decl, , Freeing Discarded Symbols}.): they can be per-type (e.g., +@samp{}), per-symbol (e.g., @samp{exp}, @samp{NUM}, @samp{"float"}), +typed per-default (i.e., @samp{<*>}, or untyped per-default (i.e., +@samp{<>}). +@end deffn + +@noindent +For example: + +@example +%union @{ char *string; @} +%token STRING1 +%token STRING2 +%type string1 +%type string2 +%union @{ char character; @} +%token CHR +%type chr +%token TAGLESS + +%printer @{ fprintf (yyoutput, "'%c'", $$); @} +%printer @{ fprintf (yyoutput, "&%p", $$); @} <*> +%printer @{ fprintf (yyoutput, "\"%s\"", $$); @} STRING1 string1 +%printer @{ fprintf (yyoutput, "<>"); @} <> +@end example + +@noindent +guarantees that, when the parser print any symbol that has a semantic type +tag other than @code{}, it display the address of the semantic +value by default. However, when the parser displays a @code{STRING1} or a +@code{string1}, it formats it as a string in double quotes. It performs +only the second @code{%printer} in this case, so it prints only once. +Finally, the parser print @samp{<>} for any symbol, such as @code{TAGLESS}, +that has no semantic type tag. See also + + +@node Expect Decl +@subsection Suppressing Conflict Warnings +@cindex suppressing conflict warnings +@cindex preventing warnings about conflicts +@cindex warnings, preventing +@cindex conflicts, suppressing warnings of +@findex %expect +@findex %expect-rr + +Bison normally warns if there are any conflicts in the grammar +(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars +have harmless shift/reduce conflicts which are resolved in a predictable +way and would be difficult to eliminate. It is desirable to suppress +the warning about these conflicts unless the number of conflicts +changes. You can do this with the @code{%expect} declaration. + +The declaration looks like this: + +@example +%expect @var{n} +@end example + +Here @var{n} is a decimal integer. The declaration says there should +be @var{n} shift/reduce conflicts and no reduce/reduce conflicts. +Bison reports an error if the number of shift/reduce conflicts differs +from @var{n}, or if there are any reduce/reduce conflicts. + +For deterministic parsers, reduce/reduce conflicts are more +serious, and should be eliminated entirely. Bison will always report +reduce/reduce conflicts for these parsers. With GLR +parsers, however, both kinds of conflicts are routine; otherwise, +there would be no need to use GLR parsing. Therefore, it is +also possible to specify an expected number of reduce/reduce conflicts +in GLR parsers, using the declaration: + +@example +%expect-rr @var{n} +@end example + +In general, using @code{%expect} involves these steps: + +@itemize @bullet +@item +Compile your grammar without @code{%expect}. Use the @samp{-v} option +to get a verbose list of where the conflicts occur. Bison will also +print the number of conflicts. + +@item +Check each of the conflicts to make sure that Bison's default +resolution is what you really want. If not, rewrite the grammar and +go back to the beginning. + +@item +Add an @code{%expect} declaration, copying the number @var{n} from the +number which Bison printed. With GLR parsers, add an +@code{%expect-rr} declaration as well. +@end itemize + +Now Bison will report an error if you introduce an unexpected conflict, +but will keep silent otherwise. + +@node Start Decl +@subsection The Start-Symbol +@cindex declaring the start symbol +@cindex start symbol, declaring +@cindex default start symbol +@findex %start + +Bison assumes by default that the start symbol for the grammar is the first +nonterminal specified in the grammar specification section. The programmer +may override this restriction with the @code{%start} declaration as follows: + +@example +%start @var{symbol} +@end example + +@node Pure Decl +@subsection A Pure (Reentrant) Parser +@cindex reentrant parser +@cindex pure parser +@findex %define api.pure + +A @dfn{reentrant} program is one which does not alter in the course of +execution; in other words, it consists entirely of @dfn{pure} (read-only) +code. Reentrancy is important whenever asynchronous execution is possible; +for example, a nonreentrant program may not be safe to call from a signal +handler. In systems with multiple threads of control, a nonreentrant +program must be called only within interlocks. + +Normally, Bison generates a parser which is not reentrant. This is +suitable for most uses, and it permits compatibility with Yacc. (The +standard Yacc interfaces are inherently nonreentrant, because they use +statically allocated variables for communication with @code{yylex}, +including @code{yylval} and @code{yylloc}.) + +Alternatively, you can generate a pure, reentrant parser. The Bison +declaration @code{%define api.pure} says that you want the parser to be +reentrant. It looks like this: + +@example +%define api.pure +@end example + +The result is that the communication variables @code{yylval} and +@code{yylloc} become local variables in @code{yyparse}, and a different +calling convention is used for the lexical analyzer function +@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure +Parsers}, for the details of this. The variable @code{yynerrs} +becomes local in @code{yyparse} in pull mode but it becomes a member +of yypstate in push mode. (@pxref{Error Reporting, ,The Error +Reporting Function @code{yyerror}}). The convention for calling +@code{yyparse} itself is unchanged. + +Whether the parser is pure has nothing to do with the grammar rules. +You can generate either a pure parser or a nonreentrant parser from any +valid grammar. + +@node Push Decl +@subsection A Push Parser +@cindex push parser +@cindex push parser +@findex %define api.push-pull + +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) + +A pull parser is called once and it takes control until all its input +is completely parsed. A push parser, on the other hand, is called +each time a new token is made available. + +A push parser is typically useful when the parser is part of a +main event loop in the client's application. This is typically +a requirement of a GUI, when the main event loop needs to be triggered +within a certain time period. + +Normally, Bison generates a pull parser. +The following Bison declaration says that you want the parser to be a push +parser (@pxref{%define Summary,,api.push-pull}): + +@example +%define api.push-pull push +@end example + +In almost all cases, you want to ensure that your push parser is also +a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). The only +time you should create an impure push parser is to have backwards +compatibility with the impure Yacc pull mode interface. Unless you know +what you are doing, your declarations should look like this: + +@example +%define api.pure +%define api.push-pull push +@end example + +There is a major notable functional difference between the pure push parser +and the impure push parser. It is acceptable for a pure push parser to have +many parser instances, of the same type of parser, in memory at the same time. +An impure push parser should only use one parser at a time. + +When a push parser is selected, Bison will generate some new symbols in +the generated parser. @code{yypstate} is a structure that the generated +parser uses to store the parser's state. @code{yypstate_new} is the +function that will create a new parser instance. @code{yypstate_delete} +will free the resources associated with the corresponding parser instance. +Finally, @code{yypush_parse} is the function that should be called whenever a +token is available to provide the parser. A trivial example +of using a pure push parser would look like this: + +@example +int status; +yypstate *ps = yypstate_new (); +do @{ + status = yypush_parse (ps, yylex (), NULL); +@} while (status == YYPUSH_MORE); +yypstate_delete (ps); +@end example + +If the user decided to use an impure push parser, a few things about +the generated parser will change. The @code{yychar} variable becomes +a global variable instead of a variable in the @code{yypush_parse} function. +For this reason, the signature of the @code{yypush_parse} function is +changed to remove the token as a parameter. A nonreentrant push parser +example would thus look like this: + +@example +extern int yychar; +int status; +yypstate *ps = yypstate_new (); +do @{ + yychar = yylex (); + status = yypush_parse (ps); +@} while (status == YYPUSH_MORE); +yypstate_delete (ps); +@end example + +That's it. Notice the next token is put into the global variable @code{yychar} +for use by the next invocation of the @code{yypush_parse} function. + +Bison also supports both the push parser interface along with the pull parser +interface in the same generated parser. In order to get this functionality, +you should replace the @code{%define api.push-pull push} declaration with the +@code{%define api.push-pull both} declaration. Doing this will create all of +the symbols mentioned earlier along with the two extra symbols, @code{yyparse} +and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally +would be used. However, the user should note that it is implemented in the +generated parser by calling @code{yypull_parse}. +This makes the @code{yyparse} function that is generated with the +@code{%define api.push-pull both} declaration slower than the normal +@code{yyparse} function. If the user +calls the @code{yypull_parse} function it will parse the rest of the input +stream. It is possible to @code{yypush_parse} tokens to select a subgrammar +and then @code{yypull_parse} the rest of the input stream. If you would like +to switch back and forth between between parsing styles, you would have to +write your own @code{yypull_parse} function that knows when to quit looking +for input. An example of using the @code{yypull_parse} function would look +like this: + +@example +yypstate *ps = yypstate_new (); +yypull_parse (ps); /* Will call the lexer */ +yypstate_delete (ps); +@end example + +Adding the @code{%define api.pure} declaration does exactly the same thing to +the generated parser with @code{%define api.push-pull both} as it did for +@code{%define api.push-pull push}. + +@node Decl Summary +@subsection Bison Declaration Summary +@cindex Bison declaration summary +@cindex declaration summary +@cindex summary, Bison declaration + +Here is a summary of the declarations used to define a grammar: + +@deffn {Directive} %union +Declare the collection of data types that semantic values may have +(@pxref{Union Decl, ,The Collection of Value Types}). +@end deffn + +@deffn {Directive} %token +Declare a terminal symbol (token type name) with no precedence +or associativity specified (@pxref{Token Decl, ,Token Type Names}). +@end deffn + +@deffn {Directive} %right +Declare a terminal symbol (token type name) that is right-associative +(@pxref{Precedence Decl, ,Operator Precedence}). +@end deffn + +@deffn {Directive} %left +Declare a terminal symbol (token type name) that is left-associative +(@pxref{Precedence Decl, ,Operator Precedence}). +@end deffn + +@deffn {Directive} %nonassoc +Declare a terminal symbol (token type name) that is nonassociative +(@pxref{Precedence Decl, ,Operator Precedence}). +Using it in a way that would be associative is a syntax error. +@end deffn + +@ifset defaultprec +@deffn {Directive} %default-prec +Assign a precedence to rules lacking an explicit @code{%prec} modifier +(@pxref{Contextual Precedence, ,Context-Dependent Precedence}). +@end deffn +@end ifset + +@deffn {Directive} %type +Declare the type of semantic values for a nonterminal symbol +(@pxref{Type Decl, ,Nonterminal Symbols}). +@end deffn + +@deffn {Directive} %start +Specify the grammar's start symbol (@pxref{Start Decl, ,The +Start-Symbol}). +@end deffn + +@deffn {Directive} %expect +Declare the expected number of shift-reduce conflicts +(@pxref{Expect Decl, ,Suppressing Conflict Warnings}). +@end deffn + + +@sp 1 +@noindent +In order to change the behavior of @command{bison}, use the following +directives: + +@deffn {Directive} %code @{@var{code}@} +@deffnx {Directive} %code @var{qualifier} @{@var{code}@} +@findex %code +Insert @var{code} verbatim into the output parser source at the +default location or at the location specified by @var{qualifier}. +@xref{%code Summary}. +@end deffn + +@deffn {Directive} %debug +In the parser implementation file, define the macro @code{YYDEBUG} to +1 if it is not already defined, so that the debugging facilities are +compiled. @xref{Tracing, ,Tracing Your Parser}. +@end deffn + +@deffn {Directive} %define @var{variable} +@deffnx {Directive} %define @var{variable} @var{value} +@deffnx {Directive} %define @var{variable} "@var{value}" +Define a variable to adjust Bison's behavior. @xref{%define Summary}. +@end deffn + +@deffn {Directive} %defines +Write a parser header file containing macro definitions for the token +type names defined in the grammar as well as a few other declarations. +If the parser implementation file is named @file{@var{name}.c} then +the parser header file is named @file{@var{name}.h}. + +For C parsers, the parser header file declares @code{YYSTYPE} unless +@code{YYSTYPE} is already defined as a macro or you have used a +@code{<@var{type}>} tag without using @code{%union}. Therefore, if +you are using a @code{%union} (@pxref{Multiple Types, ,More Than One +Value Type}) with components that require other definitions, or if you +have defined a @code{YYSTYPE} macro or type definition (@pxref{Value +Type, ,Data Types of Semantic Values}), you need to arrange for these +definitions to be propagated to all modules, e.g., by putting them in +a prerequisite header that is included both by your parser and by any +other module that needs @code{YYSTYPE}. + +Unless your parser is pure, the parser header file declares +@code{yylval} as an external variable. @xref{Pure Decl, ,A Pure +(Reentrant) Parser}. + +If you have also used locations, the parser header file declares +@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of the +@code{YYSTYPE} macro and @code{yylval}. @xref{Tracking Locations}. + +This parser header file is normally essential if you wish to put the +definition of @code{yylex} in a separate source file, because +@code{yylex} typically needs to be able to refer to the +above-mentioned declarations and to the token type codes. @xref{Token +Values, ,Semantic Values of Tokens}. + +@findex %code requires +@findex %code provides +If you have declared @code{%code requires} or @code{%code provides}, the output +header also contains their code. +@xref{%code Summary}. +@end deffn + +@deffn {Directive} %defines @var{defines-file} +Same as above, but save in the file @var{defines-file}. +@end deffn + +@deffn {Directive} %destructor +Specify how the parser should reclaim the memory associated to +discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}. +@end deffn + +@deffn {Directive} %file-prefix "@var{prefix}" +Specify a prefix to use for all Bison output file names. The names +are chosen as if the grammar file were named @file{@var{prefix}.y}. +@end deffn + +@deffn {Directive} %language "@var{language}" +Specify the programming language for the generated parser. Currently +supported languages include C, C++, and Java. +@var{language} is case-insensitive. + +This directive is experimental and its effect may be modified in future +releases. +@end deffn + +@deffn {Directive} %locations +Generate the code processing the locations (@pxref{Action Features, +,Special Features for Use in Actions}). This mode is enabled as soon as +the grammar uses the special @samp{@@@var{n}} tokens, but if your +grammar does not use it, using @samp{%locations} allows for more +accurate syntax error messages. +@end deffn + +@deffn {Directive} %name-prefix "@var{prefix}" +Rename the external symbols used in the parser so that they start with +@var{prefix} instead of @samp{yy}. The precise list of symbols renamed +in C parsers +is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs}, +@code{yylval}, @code{yychar}, @code{yydebug}, and +(if locations are used) @code{yylloc}. If you use a push parser, +@code{yypush_parse}, @code{yypull_parse}, @code{yypstate}, +@code{yypstate_new} and @code{yypstate_delete} will +also be renamed. For example, if you use @samp{%name-prefix "c_"}, the +names become @code{c_parse}, @code{c_lex}, and so on. +For C++ parsers, see the @code{%define namespace} documentation in this +section. +@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. +@end deffn + +@ifset defaultprec +@deffn {Directive} %no-default-prec +Do not assign a precedence to rules lacking an explicit @code{%prec} +modifier (@pxref{Contextual Precedence, ,Context-Dependent +Precedence}). +@end deffn +@end ifset + +@deffn {Directive} %no-lines +Don't generate any @code{#line} preprocessor commands in the parser +implementation file. Ordinarily Bison writes these commands in the +parser implementation file so that the C compiler and debuggers will +associate errors and object code with your source file (the grammar +file). This directive causes them to associate errors with the parser +implementation file, treating it as an independent source file in its +own right. +@end deffn + +@deffn {Directive} %output "@var{file}" +Specify @var{file} for the parser implementation file. +@end deffn + +@deffn {Directive} %pure-parser +Deprecated version of @code{%define api.pure} (@pxref{%define +Summary,,api.pure}), for which Bison is more careful to warn about +unreasonable usage. +@end deffn + +@deffn {Directive} %require "@var{version}" +Require version @var{version} or higher of Bison. @xref{Require Decl, , +Require a Version of Bison}. +@end deffn + +@deffn {Directive} %skeleton "@var{file}" +Specify the skeleton to use. + +@c You probably don't need this option unless you are developing Bison. +@c You should use @code{%language} if you want to specify the skeleton for a +@c different language, because it is clearer and because it will always choose the +@c correct skeleton for non-deterministic or push parsers. + +If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton +file in the Bison installation directory. +If it does, @var{file} is an absolute file name or a file name relative to the +directory of the grammar file. +This is similar to how most shells resolve commands. +@end deffn + +@deffn {Directive} %token-table +Generate an array of token names in the parser implementation file. +The name of the array is @code{yytname}; @code{yytname[@var{i}]} is +the name of the token whose internal Bison token code number is +@var{i}. The first three elements of @code{yytname} correspond to the +predefined tokens @code{"$end"}, @code{"error"}, and +@code{"$undefined"}; after these come the symbols defined in the +grammar file. + +The name in the table includes all the characters needed to represent +the token in Bison. For single-character literals and literal +strings, this includes the surrounding quoting characters and any +escape sequences. For example, the Bison single-character literal +@code{'+'} corresponds to a three-character name, represented in C as +@code{"'+'"}; and the Bison two-character literal string @code{"\\/"} +corresponds to a five-character name, represented in C as +@code{"\"\\\\/\""}. + +When you specify @code{%token-table}, Bison also generates macro +definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and +@code{YYNRULES}, and @code{YYNSTATES}: + +@table @code +@item YYNTOKENS +The highest token number, plus one. +@item YYNNTS +The number of nonterminal symbols. +@item YYNRULES +The number of grammar rules, +@item YYNSTATES +The number of parser states (@pxref{Parser States}). +@end table +@end deffn + +@deffn {Directive} %verbose +Write an extra output file containing verbose descriptions of the +parser states and what is done for each type of lookahead token in +that state. @xref{Understanding, , Understanding Your Parser}, for more +information. +@end deffn + +@deffn {Directive} %yacc +Pretend the option @option{--yacc} was given, i.e., imitate Yacc, +including its naming conventions. @xref{Bison Options}, for more. +@end deffn + + +@node %define Summary +@subsection %define Summary + +There are many features of Bison's behavior that can be controlled by +assigning the feature a single value. For historical reasons, some +such features are assigned values by dedicated directives, such as +@code{%start}, which assigns the start symbol. However, newer such +features are associated with variables, which are assigned by the +@code{%define} directive: + +@deffn {Directive} %define @var{variable} +@deffnx {Directive} %define @var{variable} @var{value} +@deffnx {Directive} %define @var{variable} "@var{value}" +Define @var{variable} to @var{value}. + +@var{value} must be placed in quotation marks if it contains any +character other than a letter, underscore, period, or non-initial dash +or digit. Omitting @code{"@var{value}"} entirely is always equivalent +to specifying @code{""}. + +It is an error if a @var{variable} is defined by @code{%define} +multiple times, but see @ref{Bison Options,,-D +@var{name}[=@var{value}]}. +@end deffn + +The rest of this section summarizes variables and values that +@code{%define} accepts. + +Some @var{variable}s take Boolean values. In this case, Bison will +complain if the variable definition does not meet one of the following +four conditions: + +@enumerate +@item @code{@var{value}} is @code{true} + +@item @code{@var{value}} is omitted (or @code{""} is specified). +This is equivalent to @code{true}. + +@item @code{@var{value}} is @code{false}. + +@item @var{variable} is never defined. +In this case, Bison selects a default value. +@end enumerate + +What @var{variable}s are accepted, as well as their meanings and default +values, depend on the selected target language and/or the parser +skeleton (@pxref{Decl Summary,,%language}, @pxref{Decl +Summary,,%skeleton}). +Unaccepted @var{variable}s produce an error. +Some of the accepted @var{variable}s are: + +@itemize @bullet +@c ================================================== api.pure +@item api.pure +@findex %define api.pure + +@itemize @bullet +@item Language(s): C + +@item Purpose: Request a pure (reentrant) parser program. +@xref{Pure Decl, ,A Pure (Reentrant) Parser}. + +@item Accepted Values: Boolean + +@item Default Value: @code{false} +@end itemize + +@item api.push-pull +@findex %define api.push-pull + +@itemize @bullet +@item Language(s): C (deterministic parsers only) + +@item Purpose: Request a pull parser, a push parser, or both. +@xref{Push Decl, ,A Push Parser}. +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) + +@item Accepted Values: @code{pull}, @code{push}, @code{both} + +@item Default Value: @code{pull} +@end itemize + +@c ================================================== lr.default-reductions + +@item lr.default-reductions +@findex %define lr.default-reductions + +@itemize @bullet +@item Language(s): all + +@item Purpose: Specify the kind of states that are permitted to +contain default reductions. @xref{Default Reductions}. (The ability to +specify where default reductions should be used is experimental. More user +feedback will help to stabilize it.) + +@item Accepted Values: @code{most}, @code{consistent}, @code{accepting} +@item Default Value: +@itemize +@item @code{accepting} if @code{lr.type} is @code{canonical-lr}. +@item @code{most} otherwise. +@end itemize +@end itemize + +@c ============================================ lr.keep-unreachable-states + +@item lr.keep-unreachable-states +@findex %define lr.keep-unreachable-states + +@itemize @bullet +@item Language(s): all +@item Purpose: Request that Bison allow unreachable parser states to +remain in the parser tables. @xref{Unreachable States}. +@item Accepted Values: Boolean +@item Default Value: @code{false} +@end itemize + +@c ================================================== lr.type + +@item lr.type +@findex %define lr.type + +@itemize @bullet +@item Language(s): all + +@item Purpose: Specify the type of parser tables within the +LR(1) family. @xref{LR Table Construction}. (This feature is experimental. +More user feedback will help to stabilize it.) + +@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr} + +@item Default Value: @code{lalr} +@end itemize + +@item namespace +@findex %define namespace + +@itemize +@item Languages(s): C++ + +@item Purpose: Specify the namespace for the parser class. +For example, if you specify: + +@smallexample +%define namespace "foo::bar" +@end smallexample + +Bison uses @code{foo::bar} verbatim in references such as: + +@smallexample +foo::bar::parser::semantic_type +@end smallexample + +However, to open a namespace, Bison removes any leading @code{::} and then +splits on any remaining occurrences: + +@smallexample +namespace foo @{ namespace bar @{ + class position; + class location; +@} @} +@end smallexample + +@item Accepted Values: Any absolute or relative C++ namespace reference without +a trailing @code{"::"}. +For example, @code{"foo"} or @code{"::foo::bar"}. + +@item Default Value: The value specified by @code{%name-prefix}, which defaults +to @code{yy}. +This usage of @code{%name-prefix} is for backward compatibility and can be +confusing since @code{%name-prefix} also specifies the textual prefix for the +lexical analyzer function. +Thus, if you specify @code{%name-prefix}, it is best to also specify +@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the +lexical analyzer function. +For example, if you specify: + +@smallexample +%define namespace "foo" +%name-prefix "bar::" +@end smallexample + +The parser namespace is @code{foo} and @code{yylex} is referenced as +@code{bar::lex}. +@end itemize + +@c ================================================== parse.lac +@item parse.lac +@findex %define parse.lac + +@itemize +@item Languages(s): C (deterministic parsers only) + +@item Purpose: Enable LAC (lookahead correction) to improve +syntax error handling. @xref{LAC}. +@item Accepted Values: @code{none}, @code{full} +@item Default Value: @code{none} +@end itemize +@end itemize + + +@node %code Summary +@subsection %code Summary +@findex %code +@cindex Prologue + +The @code{%code} directive inserts code verbatim into the output +parser source at any of a predefined set of locations. It thus serves +as a flexible and user-friendly alternative to the traditional Yacc +prologue, @code{%@{@var{code}%@}}. This section summarizes the +functionality of @code{%code} for the various target languages +supported by Bison. For a detailed discussion of how to use +@code{%code} in place of @code{%@{@var{code}%@}} for C/C++ and why it +is advantageous to do so, @pxref{Prologue Alternatives}. + +@deffn {Directive} %code @{@var{code}@} +This is the unqualified form of the @code{%code} directive. It +inserts @var{code} verbatim at a language-dependent default location +in the parser implementation. + +For C/C++, the default location is the parser implementation file +after the usual contents of the parser header file. Thus, the +unqualified form replaces @code{%@{@var{code}%@}} for most purposes. + +For Java, the default location is inside the parser class. +@end deffn + +@deffn {Directive} %code @var{qualifier} @{@var{code}@} +This is the qualified form of the @code{%code} directive. +@var{qualifier} identifies the purpose of @var{code} and thus the +location(s) where Bison should insert it. That is, if you need to +specify location-sensitive @var{code} that does not belong at the +default location selected by the unqualified @code{%code} form, use +this form instead. +@end deffn + +For any particular qualifier or for the unqualified form, if there are +multiple occurrences of the @code{%code} directive, Bison concatenates +the specified code in the order in which it appears in the grammar +file. + +Not all qualifiers are accepted for all target languages. Unaccepted +qualifiers produce an error. Some of the accepted qualifiers are: + +@itemize @bullet +@item requires +@findex %code requires + +@itemize @bullet +@item Language(s): C, C++ + +@item Purpose: This is the best place to write dependency code required for +@code{YYSTYPE} and @code{YYLTYPE}. +In other words, it's the best place to define types referenced in @code{%union} +directives, and it's the best place to override Bison's default @code{YYSTYPE} +and @code{YYLTYPE} definitions. + +@item Location(s): The parser header file and the parser implementation file +before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE} +definitions. +@end itemize + +@item provides +@findex %code provides + +@itemize @bullet +@item Language(s): C, C++ + +@item Purpose: This is the best place to write additional definitions and +declarations that should be provided to other modules. + +@item Location(s): The parser header file and the parser implementation +file after the Bison-generated @code{YYSTYPE}, @code{YYLTYPE}, and +token definitions. +@end itemize + +@item top +@findex %code top + +@itemize @bullet +@item Language(s): C, C++ + +@item Purpose: The unqualified @code{%code} or @code{%code requires} +should usually be more appropriate than @code{%code top}. However, +occasionally it is necessary to insert code much nearer the top of the +parser implementation file. For example: + +@example +%code top @{ + #define _GNU_SOURCE + #include +@} +@end example + +@item Location(s): Near the top of the parser implementation file. +@end itemize + +@item imports +@findex %code imports + +@itemize @bullet +@item Language(s): Java + +@item Purpose: This is the best place to write Java import directives. + +@item Location(s): The parser Java file after any Java package directive and +before any class definitions. +@end itemize +@end itemize + +Though we say the insertion locations are language-dependent, they are +technically skeleton-dependent. Writers of non-standard skeletons +however should choose their locations consistently with the behavior +of the standard Bison skeletons. + + +@node Multiple Parsers +@section Multiple Parsers in the Same Program + +Most programs that use Bison parse only one language and therefore contain +only one Bison parser. But what if you want to parse more than one +language with the same program? Then you need to avoid a name conflict +between different definitions of @code{yyparse}, @code{yylval}, and so on. + +The easy way to do this is to use the option @samp{-p @var{prefix}} +(@pxref{Invocation, ,Invoking Bison}). This renames the interface +functions and variables of the Bison parser to start with @var{prefix} +instead of @samp{yy}. You can use this to give each parser distinct +names that do not conflict. + +The precise list of symbols renamed is @code{yyparse}, @code{yylex}, +@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yylloc}, +@code{yychar} and @code{yydebug}. If you use a push parser, +@code{yypush_parse}, @code{yypull_parse}, @code{yypstate}, +@code{yypstate_new} and @code{yypstate_delete} will also be renamed. +For example, if you use @samp{-p c}, the names become @code{cparse}, +@code{clex}, and so on. + +@strong{All the other variables and macros associated with Bison are not +renamed.} These others are not global; there is no conflict if the same +name is used in different parsers. For example, @code{YYSTYPE} is not +renamed, but defining this in different ways in different parsers causes +no trouble (@pxref{Value Type, ,Data Types of Semantic Values}). + +The @samp{-p} option works by adding macro definitions to the +beginning of the parser implementation file, defining @code{yyparse} +as @code{@var{prefix}parse}, and so on. This effectively substitutes +one name for the other in the entire parser implementation file. + +@node Interface +@chapter Parser C-Language Interface +@cindex C-language interface +@cindex interface + +The Bison parser is actually a C function named @code{yyparse}. Here we +describe the interface conventions of @code{yyparse} and the other +functions that it needs to use. + +Keep in mind that the parser uses many C identifiers starting with +@samp{yy} and @samp{YY} for internal purposes. If you use such an +identifier (aside from those in this manual) in an action or in epilogue +in the grammar file, you are likely to run into trouble. + +@menu +* Parser Function:: How to call @code{yyparse} and what it returns. +* Push Parser Function:: How to call @code{yypush_parse} and what it returns. +* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. +* Parser Create Function:: How to call @code{yypstate_new} and what it returns. +* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. +* Lexical:: You must supply a function @code{yylex} + which reads tokens. +* Error Reporting:: You must supply a function @code{yyerror}. +* Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. +@end menu + +@node Parser Function +@section The Parser Function @code{yyparse} +@findex yyparse + +You call the function @code{yyparse} to cause parsing to occur. This +function reads tokens, executes actions, and ultimately returns when it +encounters end-of-input or an unrecoverable syntax error. You can also +write an action which directs @code{yyparse} to return immediately +without reading further. + + +@deftypefun int yyparse (void) +The value returned by @code{yyparse} is 0 if parsing was successful (return +is due to end-of-input). + +The value is 1 if parsing failed because of invalid input, i.e., input +that contains a syntax error or that causes @code{YYABORT} to be +invoked. + +The value is 2 if parsing failed due to memory exhaustion. +@end deftypefun + +In an action, you can cause immediate return from @code{yyparse} by using +these macros: + +@defmac YYACCEPT +@findex YYACCEPT +Return immediately with value 0 (to report success). +@end defmac + +@defmac YYABORT +@findex YYABORT +Return immediately with value 1 (to report failure). +@end defmac + +If you use a reentrant parser, you can optionally pass additional +parameter information to it in a reentrant way. To do so, use the +declaration @code{%parse-param}: + +@deffn {Directive} %parse-param @{@var{argument-declaration}@} +@findex %parse-param +Declare that an argument declared by the braced-code +@var{argument-declaration} is an additional @code{yyparse} argument. +The @var{argument-declaration} is used when declaring +functions or prototypes. The last identifier in +@var{argument-declaration} must be the argument name. +@end deffn + +Here's an example. Write this in the parser: + +@example +%parse-param @{int *nastiness@} +%parse-param @{int *randomness@} +@end example + +@noindent +Then call the parser like this: + +@example +@{ + int nastiness, randomness; + @dots{} /* @r{Store proper data in @code{nastiness} and @code{randomness}.} */ + value = yyparse (&nastiness, &randomness); + @dots{} +@} +@end example + +@noindent +In the grammar actions, use expressions like this to refer to the data: + +@example +exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} +@end example + +@node Push Parser Function +@section The Push Parser Function @code{yypush_parse} +@findex yypush_parse + +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) + +You call the function @code{yypush_parse} to parse a single token. This +function is available if either the @code{%define api.push-pull push} or +@code{%define api.push-pull both} declaration is used. +@xref{Push Decl, ,A Push Parser}. + +@deftypefun int yypush_parse (yypstate *yyps) +The value returned by @code{yypush_parse} is the same as for yyparse with the +following exception. @code{yypush_parse} will return YYPUSH_MORE if more input +is required to finish parsing the grammar. +@end deftypefun + +@node Pull Parser Function +@section The Pull Parser Function @code{yypull_parse} +@findex yypull_parse + +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) + +You call the function @code{yypull_parse} to parse the rest of the input +stream. This function is available if the @code{%define api.push-pull both} +declaration is used. +@xref{Push Decl, ,A Push Parser}. + +@deftypefun int yypull_parse (yypstate *yyps) +The value returned by @code{yypull_parse} is the same as for @code{yyparse}. +@end deftypefun + +@node Parser Create Function +@section The Parser Create Function @code{yystate_new} +@findex yypstate_new + +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) + +You call the function @code{yypstate_new} to create a new parser instance. +This function is available if either the @code{%define api.push-pull push} or +@code{%define api.push-pull both} declaration is used. +@xref{Push Decl, ,A Push Parser}. + +@deftypefun {yypstate*} yypstate_new (void) +The function will return a valid parser instance if there was memory available +or 0 if no memory was available. +In impure mode, it will also return 0 if a parser instance is currently +allocated. +@end deftypefun + +@node Parser Delete Function +@section The Parser Delete Function @code{yystate_delete} +@findex yypstate_delete + +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) + +You call the function @code{yypstate_delete} to delete a parser instance. +function is available if either the @code{%define api.push-pull push} or +@code{%define api.push-pull both} declaration is used. +@xref{Push Decl, ,A Push Parser}. + +@deftypefun void yypstate_delete (yypstate *yyps) +This function will reclaim the memory associated with a parser instance. +After this call, you should no longer attempt to use the parser instance. +@end deftypefun + +@node Lexical +@section The Lexical Analyzer Function @code{yylex} +@findex yylex +@cindex lexical analyzer + +The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from +the input stream and returns them to the parser. Bison does not create +this function automatically; you must write it so that @code{yyparse} can +call it. The function is sometimes referred to as a lexical scanner. + +In simple programs, @code{yylex} is often defined at the end of the +Bison grammar file. If @code{yylex} is defined in a separate source +file, you need to arrange for the token-type macro definitions to be +available there. To do this, use the @samp{-d} option when you run +Bison, so that it will write these macro definitions into the separate +parser header file, @file{@var{name}.tab.h}, which you can include in +the other source files that need it. @xref{Invocation, ,Invoking +Bison}. + +@menu +* Calling Convention:: How @code{yyparse} calls @code{yylex}. +* Token Values:: How @code{yylex} must return the semantic value + of the token it has read. +* Token Locations:: How @code{yylex} must return the text location + (line number, etc.) of the token, if the + actions want that. +* Pure Calling:: How the calling convention differs in a pure parser + (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +@end menu + +@node Calling Convention +@subsection Calling Convention for @code{yylex} + +The value that @code{yylex} returns must be the positive numeric code +for the type of token it has just found; a zero or negative value +signifies end-of-input. + +When a token is referred to in the grammar rules by a name, that name +in the parser implementation file becomes a C macro whose definition +is the proper numeric code for that token type. So @code{yylex} can +use the name to indicate that type. @xref{Symbols}. + +When a token is referred to in the grammar rules by a character literal, +the numeric code for that character is also the code for the token type. +So @code{yylex} can simply return that character code, possibly converted +to @code{unsigned char} to avoid sign-extension. The null character +must not be used this way, because its code is zero and that +signifies end-of-input. + +Here is an example showing these things: + +@example +int +yylex (void) +@{ + @dots{} + if (c == EOF) /* Detect end-of-input. */ + return 0; + @dots{} + if (c == '+' || c == '-') + return c; /* Assume token type for `+' is '+'. */ + @dots{} + return INT; /* Return the type of the token. */ + @dots{} +@} +@end example + +@noindent +This interface has been designed so that the output from the @code{lex} +utility can be used without change as the definition of @code{yylex}. + +If the grammar uses literal string tokens, there are two ways that +@code{yylex} can determine the token type codes for them: + +@itemize @bullet +@item +If the grammar defines symbolic token names as aliases for the +literal string tokens, @code{yylex} can use these symbolic names like +all others. In this case, the use of the literal string tokens in +the grammar file has no effect on @code{yylex}. + +@item +@code{yylex} can find the multicharacter token in the @code{yytname} +table. The index of the token in the table is the token type's code. +The name of a multicharacter token is recorded in @code{yytname} with a +double-quote, the token's characters, and another double-quote. The +token's characters are escaped as necessary to be suitable as input +to Bison. + +Here's code for looking up a multicharacter token in @code{yytname}, +assuming that the characters of the token are stored in +@code{token_buffer}, and assuming that the token does not contain any +characters like @samp{"} that require escaping. + +@example +for (i = 0; i < YYNTOKENS; i++) + @{ + if (yytname[i] != 0 + && yytname[i][0] == '"' + && ! strncmp (yytname[i] + 1, token_buffer, + strlen (token_buffer)) + && yytname[i][strlen (token_buffer) + 1] == '"' + && yytname[i][strlen (token_buffer) + 2] == 0) + break; + @} +@end example + +The @code{yytname} table is generated only if you use the +@code{%token-table} declaration. @xref{Decl Summary}. +@end itemize + +@node Token Values +@subsection Semantic Values of Tokens + +@vindex yylval +In an ordinary (nonreentrant) parser, the semantic value of the token must +be stored into the global variable @code{yylval}. When you are using +just one data type for semantic values, @code{yylval} has that type. +Thus, if the type is @code{int} (the default), you might write this in +@code{yylex}: + +@example +@group + @dots{} + yylval = value; /* Put value onto Bison stack. */ + return INT; /* Return the type of the token. */ + @dots{} +@end group +@end example + +When you are using multiple data types, @code{yylval}'s type is a union +made from the @code{%union} declaration (@pxref{Union Decl, ,The +Collection of Value Types}). So when you store a token's value, you +must use the proper member of the union. If the @code{%union} +declaration looks like this: + +@example +@group +%union @{ + int intval; + double val; + symrec *tptr; +@} +@end group +@end example + +@noindent +then the code in @code{yylex} might look like this: + +@example +@group + @dots{} + yylval.intval = value; /* Put value onto Bison stack. */ + return INT; /* Return the type of the token. */ + @dots{} +@end group +@end example + +@node Token Locations +@subsection Textual Locations of Tokens + +@vindex yylloc +If you are using the @samp{@@@var{n}}-feature (@pxref{Tracking Locations}) +in actions to keep track of the textual locations of tokens and groupings, +then you must provide this information in @code{yylex}. The function +@code{yyparse} expects to find the textual location of a token just parsed +in the global variable @code{yylloc}. So @code{yylex} must store the proper +data in that variable. + +By default, the value of @code{yylloc} is a structure and you need only +initialize the members that are going to be used by the actions. The +four members are called @code{first_line}, @code{first_column}, +@code{last_line} and @code{last_column}. Note that the use of this +feature makes the parser noticeably slower. + +@tindex YYLTYPE +The data type of @code{yylloc} has the name @code{YYLTYPE}. + +@node Pure Calling +@subsection Calling Conventions for Pure Parsers + +When you use the Bison declaration @code{%define api.pure} to request a +pure, reentrant parser, the global communication variables @code{yylval} +and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) +Parser}.) In such parsers the two global variables are replaced by +pointers passed as arguments to @code{yylex}. You must declare them as +shown here, and pass the information back by storing it through those +pointers. + +@example +int +yylex (YYSTYPE *lvalp, YYLTYPE *llocp) +@{ + @dots{} + *lvalp = value; /* Put value onto Bison stack. */ + return INT; /* Return the type of the token. */ + @dots{} +@} +@end example + +If the grammar file does not use the @samp{@@} constructs to refer to +textual locations, then the type @code{YYLTYPE} will not be defined. In +this case, omit the second argument; @code{yylex} will be called with +only one argument. + + +If you wish to pass the additional parameter data to @code{yylex}, use +@code{%lex-param} just like @code{%parse-param} (@pxref{Parser +Function}). + +@deffn {Directive} lex-param @{@var{argument-declaration}@} +@findex %lex-param +Declare that the braced-code @var{argument-declaration} is an +additional @code{yylex} argument declaration. +@end deffn + +For instance: + +@example +%parse-param @{int *nastiness@} +%lex-param @{int *nastiness@} +%parse-param @{int *randomness@} +@end example + +@noindent +results in the following signatures: + +@example +int yylex (int *nastiness); +int yyparse (int *nastiness, int *randomness); +@end example + +If @code{%define api.pure} is added: + +@example +int yylex (YYSTYPE *lvalp, int *nastiness); +int yyparse (int *nastiness, int *randomness); +@end example + +@noindent +and finally, if both @code{%define api.pure} and @code{%locations} are used: + +@example +int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); +int yyparse (int *nastiness, int *randomness); +@end example + +@node Error Reporting +@section The Error Reporting Function @code{yyerror} +@cindex error reporting function +@findex yyerror +@cindex parse error +@cindex syntax error + +The Bison parser detects a @dfn{syntax error} or @dfn{parse error} +whenever it reads a token which cannot satisfy any syntax rule. An +action in the grammar can also explicitly proclaim an error, using the +macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use +in Actions}). + +The Bison parser expects to report the error by calling an error +reporting function named @code{yyerror}, which you must supply. It is +called by @code{yyparse} whenever a syntax error is found, and it +receives one argument. For a syntax error, the string is normally +@w{@code{"syntax error"}}. + +@findex %error-verbose +If you invoke the directive @code{%error-verbose} in the Bison declarations +section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then +Bison provides a more verbose and specific error message string instead of +just plain @w{@code{"syntax error"}}. However, that message sometimes +contains incorrect information if LAC is not enabled (@pxref{LAC}). + +The parser can detect one other kind of error: memory exhaustion. This +can happen when the input contains constructions that are very deeply +nested. It isn't likely you will encounter this, since the Bison +parser normally extends its stack automatically up to a very large limit. But +if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual +fashion, except that the argument string is @w{@code{"memory exhausted"}}. + +In some cases diagnostics like @w{@code{"syntax error"}} are +translated automatically from English to some other language before +they are passed to @code{yyerror}. @xref{Internationalization}. + +The following definition suffices in simple programs: + +@example +@group +void +yyerror (char const *s) +@{ +@end group +@group + fprintf (stderr, "%s\n", s); +@} +@end group +@end example + +After @code{yyerror} returns to @code{yyparse}, the latter will attempt +error recovery if you have written suitable error recovery grammar rules +(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will +immediately return 1. + +Obviously, in location tracking pure parsers, @code{yyerror} should have +an access to the current location. +This is indeed the case for the GLR +parsers, but not for the Yacc parser, for historical reasons. I.e., if +@samp{%locations %define api.pure} is passed then the prototypes for +@code{yyerror} are: + +@example +void yyerror (char const *msg); /* Yacc parsers. */ +void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ +@end example + +If @samp{%parse-param @{int *nastiness@}} is used, then: + +@example +void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */ +void yyerror (int *nastiness, char const *msg); /* GLR parsers. */ +@end example + +Finally, GLR and Yacc parsers share the same @code{yyerror} calling +convention for absolutely pure parsers, i.e., when the calling +convention of @code{yylex} @emph{and} the calling convention of +@code{%define api.pure} are pure. +I.e.: + +@example +/* Location tracking. */ +%locations +/* Pure yylex. */ +%define api.pure +%lex-param @{int *nastiness@} +/* Pure yyparse. */ +%parse-param @{int *nastiness@} +%parse-param @{int *randomness@} +@end example + +@noindent +results in the following signatures for all the parser kinds: + +@example +int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); +int yyparse (int *nastiness, int *randomness); +void yyerror (YYLTYPE *locp, + int *nastiness, int *randomness, + char const *msg); +@end example + +@noindent +The prototypes are only indications of how the code produced by Bison +uses @code{yyerror}. Bison-generated code always ignores the returned +value, so @code{yyerror} can return any type, including @code{void}. +Also, @code{yyerror} can be a variadic function; that is why the +message is always passed last. + +Traditionally @code{yyerror} returns an @code{int} that is always +ignored, but this is purely for historical reasons, and @code{void} is +preferable since it more accurately describes the return type for +@code{yyerror}. + +@vindex yynerrs +The variable @code{yynerrs} contains the number of syntax errors +reported so far. Normally this variable is global; but if you +request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) +then it is a local variable which only the actions can access. + +@node Action Features +@section Special Features for Use in Actions +@cindex summary, action features +@cindex action features summary + +Here is a table of Bison constructs, variables and macros that +are useful in actions. + +@deffn {Variable} $$ +Acts like a variable that contains the semantic value for the +grouping made by the current rule. @xref{Actions}. +@end deffn + +@deffn {Variable} $@var{n} +Acts like a variable that contains the semantic value for the +@var{n}th component of the current rule. @xref{Actions}. +@end deffn + +@deffn {Variable} $<@var{typealt}>$ +Like @code{$$} but specifies alternative @var{typealt} in the union +specified by the @code{%union} declaration. @xref{Action Types, ,Data +Types of Values in Actions}. +@end deffn + +@deffn {Variable} $<@var{typealt}>@var{n} +Like @code{$@var{n}} but specifies alternative @var{typealt} in the +union specified by the @code{%union} declaration. +@xref{Action Types, ,Data Types of Values in Actions}. +@end deffn + +@deffn {Macro} YYABORT @code{;} +Return immediately from @code{yyparse}, indicating failure. +@xref{Parser Function, ,The Parser Function @code{yyparse}}. +@end deffn + +@deffn {Macro} YYACCEPT @code{;} +Return immediately from @code{yyparse}, indicating success. +@xref{Parser Function, ,The Parser Function @code{yyparse}}. +@end deffn + +@deffn {Macro} YYBACKUP (@var{token}, @var{value})@code{;} +@findex YYBACKUP +Unshift a token. This macro is allowed only for rules that reduce +a single value, and only when there is no lookahead token. +It is also disallowed in GLR parsers. +It installs a lookahead token with token type @var{token} and +semantic value @var{value}; then it discards the value that was +going to be reduced by this rule. + +If the macro is used when it is not valid, such as when there is +a lookahead token already, then it reports a syntax error with +a message @samp{cannot back up} and performs ordinary error +recovery. + +In either case, the rest of the action is not executed. +@end deffn + +@deffn {Macro} YYEMPTY +Value stored in @code{yychar} when there is no lookahead token. +@end deffn + +@deffn {Macro} YYEOF +Value stored in @code{yychar} when the lookahead is the end of the input +stream. +@end deffn + +@deffn {Macro} YYERROR @code{;} +Cause an immediate syntax error. This statement initiates error +recovery just as if the parser itself had detected an error; however, it +does not call @code{yyerror}, and does not print any message. If you +want to print an error message, call @code{yyerror} explicitly before +the @samp{YYERROR;} statement. @xref{Error Recovery}. +@end deffn + +@deffn {Macro} YYRECOVERING +@findex YYRECOVERING +The expression @code{YYRECOVERING ()} yields 1 when the parser +is recovering from a syntax error, and 0 otherwise. +@xref{Error Recovery}. +@end deffn + +@deffn {Variable} yychar +Variable containing either the lookahead token, or @code{YYEOF} when the +lookahead is the end of the input stream, or @code{YYEMPTY} when no lookahead +has been performed so the next token is not yet known. +Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic +Actions}). +@xref{Lookahead, ,Lookahead Tokens}. +@end deffn + +@deffn {Macro} yyclearin @code{;} +Discard the current lookahead token. This is useful primarily in +error rules. +Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR +Semantic Actions}). +@xref{Error Recovery}. +@end deffn + +@deffn {Macro} yyerrok @code{;} +Resume generating error messages immediately for subsequent syntax +errors. This is useful primarily in error rules. +@xref{Error Recovery}. +@end deffn + +@deffn {Variable} yylloc +Variable containing the lookahead token location when @code{yychar} is not set +to @code{YYEMPTY} or @code{YYEOF}. +Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic +Actions}). +@xref{Actions and Locations, ,Actions and Locations}. +@end deffn + +@deffn {Variable} yylval +Variable containing the lookahead token semantic value when @code{yychar} is +not set to @code{YYEMPTY} or @code{YYEOF}. +Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic +Actions}). +@xref{Actions, ,Actions}. +@end deffn + +@deffn {Value} @@$ +@findex @@$ +Acts like a structure variable containing information on the textual +location of the grouping made by the current rule. @xref{Tracking +Locations}. + +@c Check if those paragraphs are still useful or not. + +@c @example +@c struct @{ +@c int first_line, last_line; +@c int first_column, last_column; +@c @}; +@c @end example + +@c Thus, to get the starting line number of the third component, you would +@c use @samp{@@3.first_line}. + +@c In order for the members of this structure to contain valid information, +@c you must make @code{yylex} supply this information about each token. +@c If you need only certain members, then @code{yylex} need only fill in +@c those members. + +@c The use of this feature makes the parser noticeably slower. +@end deffn + +@deffn {Value} @@@var{n} +@findex @@@var{n} +Acts like a structure variable containing information on the textual +location of the @var{n}th component of the current rule. @xref{Tracking +Locations}. +@end deffn + +@node Internationalization +@section Parser Internationalization +@cindex internationalization +@cindex i18n +@cindex NLS +@cindex gettext +@cindex bison-po + +A Bison-generated parser can print diagnostics, including error and +tracing messages. By default, they appear in English. However, Bison +also supports outputting diagnostics in the user's native language. To +make this work, the user should set the usual environment variables. +@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}. +For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might +set the user's locale to French Canadian using the UTF-8 +encoding. The exact set of available locales depends on the user's +installation. + +The maintainer of a package that uses a Bison-generated parser enables +the internationalization of the parser's output through the following +steps. Here we assume a package that uses GNU Autoconf and +GNU Automake. + +@enumerate +@item +@cindex bison-i18n.m4 +Into the directory containing the GNU Autoconf macros used +by the package---often called @file{m4}---copy the +@file{bison-i18n.m4} file installed by Bison under +@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory. +For example: + +@example +cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4 +@end example + +@item +@findex BISON_I18N +@vindex BISON_LOCALEDIR +@vindex YYENABLE_NLS +In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT} +invocation, add an invocation of @code{BISON_I18N}. This macro is +defined in the file @file{bison-i18n.m4} that you copied earlier. It +causes @samp{configure} to find the value of the +@code{BISON_LOCALEDIR} variable, and it defines the source-language +symbol @code{YYENABLE_NLS} to enable translations in the +Bison-generated parser. + +@item +In the @code{main} function of your program, designate the directory +containing Bison's runtime message catalog, through a call to +@samp{bindtextdomain} with domain name @samp{bison-runtime}. +For example: + +@example +bindtextdomain ("bison-runtime", BISON_LOCALEDIR); +@end example + +Typically this appears after any other call @code{bindtextdomain +(PACKAGE, LOCALEDIR)} that your package already has. Here we rely on +@samp{BISON_LOCALEDIR} to be defined as a string through the +@file{Makefile}. + +@item +In the @file{Makefile.am} that controls the compilation of the @code{main} +function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro, +either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example: + +@example +DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' +@end example + +or: + +@example +AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' +@end example + +@item +Finally, invoke the command @command{autoreconf} to generate the build +infrastructure. +@end enumerate + + +@node Algorithm +@chapter The Bison Parser Algorithm +@cindex Bison parser algorithm +@cindex algorithm of parser +@cindex shifting +@cindex reduction +@cindex parser stack +@cindex stack, parser + +As Bison reads tokens, it pushes them onto a stack along with their +semantic values. The stack is called the @dfn{parser stack}. Pushing a +token is traditionally called @dfn{shifting}. + +For example, suppose the infix calculator has read @samp{1 + 5 *}, with a +@samp{3} to come. The stack will have four elements, one for each token +that was shifted. + +But the stack does not always have an element for each token read. When +the last @var{n} tokens and groupings shifted match the components of a +grammar rule, they can be combined according to that rule. This is called +@dfn{reduction}. Those tokens and groupings are replaced on the stack by a +single grouping whose symbol is the result (left hand side) of that rule. +Running the rule's action is part of the process of reduction, because this +is what computes the semantic value of the resulting grouping. + +For example, if the infix calculator's parser stack contains this: + +@example +1 + 5 * 3 +@end example + +@noindent +and the next input token is a newline character, then the last three +elements can be reduced to 15 via the rule: + +@example +expr: expr '*' expr; +@end example + +@noindent +Then the stack contains just these three elements: + +@example +1 + 15 +@end example + +@noindent +At this point, another reduction can be made, resulting in the single value +16. Then the newline token can be shifted. + +The parser tries, by shifts and reductions, to reduce the entire input down +to a single grouping whose symbol is the grammar's start-symbol +(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}). + +This kind of parser is known in the literature as a bottom-up parser. + +@menu +* Lookahead:: Parser looks one token ahead when deciding what to do. +* Shift/Reduce:: Conflicts: when either shifting or reduction is valid. +* Precedence:: Operator precedence works by resolving conflicts. +* Contextual Precedence:: When an operator's precedence depends on context. +* Parser States:: The parser is a finite-state-machine with stack. +* Reduce/Reduce:: When two rules are applicable in the same situation. +* Mysterious Conflicts:: Conflicts that look unjustified. +* Tuning LR:: How to tune fundamental aspects of LR-based parsing. +* Generalized LR Parsing:: Parsing arbitrary context-free grammars. +* Memory Management:: What happens when memory is exhausted. How to avoid it. +@end menu + +@node Lookahead +@section Lookahead Tokens +@cindex lookahead token + +The Bison parser does @emph{not} always reduce immediately as soon as the +last @var{n} tokens and groupings match a rule. This is because such a +simple strategy is inadequate to handle most languages. Instead, when a +reduction is possible, the parser sometimes ``looks ahead'' at the next +token in order to decide what to do. + +When a token is read, it is not immediately shifted; first it becomes the +@dfn{lookahead token}, which is not on the stack. Now the parser can +perform one or more reductions of tokens and groupings on the stack, while +the lookahead token remains off to the side. When no more reductions +should take place, the lookahead token is shifted onto the stack. This +does not mean that all possible reductions have been done; depending on the +token type of the lookahead token, some rules may choose to delay their +application. + +Here is a simple case where lookahead is needed. These three rules define +expressions which contain binary addition operators and postfix unary +factorial operators (@samp{!}), and allow parentheses for grouping. + +@example +@group +expr: + term '+' expr +| term +; +@end group + +@group +term: + '(' expr ')' +| term '!' +| NUMBER +; +@end group +@end example + +Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what +should be done? If the following token is @samp{)}, then the first three +tokens must be reduced to form an @code{expr}. This is the only valid +course, because shifting the @samp{)} would produce a sequence of symbols +@w{@code{term ')'}}, and no rule allows this. + +If the following token is @samp{!}, then it must be shifted immediately so +that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the +parser were to reduce before shifting, @w{@samp{1 + 2}} would become an +@code{expr}. It would then be impossible to shift the @samp{!} because +doing so would produce on the stack the sequence of symbols @code{expr +'!'}. No rule allows that sequence. + +@vindex yychar +@vindex yylval +@vindex yylloc +The lookahead token is stored in the variable @code{yychar}. +Its semantic value and location, if any, are stored in the variables +@code{yylval} and @code{yylloc}. +@xref{Action Features, ,Special Features for Use in Actions}. + +@node Shift/Reduce +@section Shift/Reduce Conflicts +@cindex conflicts +@cindex shift/reduce conflicts +@cindex dangling @code{else} +@cindex @code{else}, dangling + +Suppose we are parsing a language which has if-then and if-then-else +statements, with a pair of rules like this: + +@example +@group +if_stmt: + IF expr THEN stmt +| IF expr THEN stmt ELSE stmt +; +@end group +@end example + +@noindent +Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are +terminal symbols for specific keyword tokens. + +When the @code{ELSE} token is read and becomes the lookahead token, the +contents of the stack (assuming the input is valid) are just right for +reduction by the first rule. But it is also legitimate to shift the +@code{ELSE}, because that would lead to eventual reduction by the second +rule. + +This situation, where either a shift or a reduction would be valid, is +called a @dfn{shift/reduce conflict}. Bison is designed to resolve +these conflicts by choosing to shift, unless otherwise directed by +operator precedence declarations. To see the reason for this, let's +contrast it with the other alternative. + +Since the parser prefers to shift the @code{ELSE}, the result is to attach +the else-clause to the innermost if-statement, making these two inputs +equivalent: + +@example +if x then if y then win (); else lose; + +if x then do; if y then win (); else lose; end; +@end example + +But if the parser chose to reduce when possible rather than shift, the +result would be to attach the else-clause to the outermost if-statement, +making these two inputs equivalent: + +@example +if x then if y then win (); else lose; + +if x then do; if y then win (); end; else lose; +@end example + +The conflict exists because the grammar as written is ambiguous: either +parsing of the simple nested if-statement is legitimate. The established +convention is that these ambiguities are resolved by attaching the +else-clause to the innermost if-statement; this is what Bison accomplishes +by choosing to shift rather than reduce. (It would ideally be cleaner to +write an unambiguous grammar, but that is very hard to do in this case.) +This particular ambiguity was first encountered in the specifications of +Algol 60 and is called the ``dangling @code{else}'' ambiguity. + +To avoid warnings from Bison about predictable, legitimate shift/reduce +conflicts, use the @code{%expect @var{n}} declaration. +There will be no warning as long as the number of shift/reduce conflicts +is exactly @var{n}, and Bison will report an error if there is a +different number. +@xref{Expect Decl, ,Suppressing Conflict Warnings}. + +The definition of @code{if_stmt} above is solely to blame for the +conflict, but the conflict does not actually appear without additional +rules. Here is a complete Bison grammar file that actually manifests +the conflict: + +@example +@group +%token IF THEN ELSE variable +%% +@end group +@group +stmt: + expr +| if_stmt +; +@end group + +@group +if_stmt: + IF expr THEN stmt +| IF expr THEN stmt ELSE stmt +; +@end group + +expr: + variable +; +@end example + +@node Precedence +@section Operator Precedence +@cindex operator precedence +@cindex precedence of operators + +Another situation where shift/reduce conflicts appear is in arithmetic +expressions. Here shifting is not always the preferred resolution; the +Bison declarations for operator precedence allow you to specify when to +shift and when to reduce. + +@menu +* Why Precedence:: An example showing why precedence is needed. +* Using Precedence:: How to specify precedence in Bison grammars. +* Precedence Examples:: How these features are used in the previous example. +* How Precedence:: How they work. +@end menu + +@node Why Precedence +@subsection When Precedence is Needed + +Consider the following ambiguous grammar fragment (ambiguous because the +input @w{@samp{1 - 2 * 3}} can be parsed in two different ways): + +@example +@group +expr: + expr '-' expr +| expr '*' expr +| expr '<' expr +| '(' expr ')' +@dots{} +; +@end group +@end example + +@noindent +Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2}; +should it reduce them via the rule for the subtraction operator? It +depends on the next token. Of course, if the next token is @samp{)}, we +must reduce; shifting is invalid because no single rule can reduce the +token sequence @w{@samp{- 2 )}} or anything starting with that. But if +the next token is @samp{*} or @samp{<}, we have a choice: either +shifting or reduction would allow the parse to complete, but with +different results. + +To decide which one Bison should do, we must consider the results. If +the next operator token @var{op} is shifted, then it must be reduced +first in order to permit another opportunity to reduce the difference. +The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other +hand, if the subtraction is reduced before shifting @var{op}, the result +is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or +reduce should depend on the relative precedence of the operators +@samp{-} and @var{op}: @samp{*} should be shifted first, but not +@samp{<}. + +@cindex associativity +What about input such as @w{@samp{1 - 2 - 5}}; should this be +@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most +operators we prefer the former, which is called @dfn{left association}. +The latter alternative, @dfn{right association}, is desirable for +assignment operators. The choice of left or right association is a +matter of whether the parser chooses to shift or reduce when the stack +contains @w{@samp{1 - 2}} and the lookahead token is @samp{-}: shifting +makes right-associativity. + +@node Using Precedence +@subsection Specifying Operator Precedence +@findex %left +@findex %right +@findex %nonassoc + +Bison allows you to specify these choices with the operator precedence +declarations @code{%left} and @code{%right}. Each such declaration +contains a list of tokens, which are operators whose precedence and +associativity is being declared. The @code{%left} declaration makes all +those operators left-associative and the @code{%right} declaration makes +them right-associative. A third alternative is @code{%nonassoc}, which +declares that it is a syntax error to find the same operator twice ``in a +row''. + +The relative precedence of different operators is controlled by the +order in which they are declared. The first @code{%left} or +@code{%right} declaration in the file declares the operators whose +precedence is lowest, the next such declaration declares the operators +whose precedence is a little higher, and so on. + +@node Precedence Examples +@subsection Precedence Examples + +In our example, we would want the following declarations: + +@example +%left '<' +%left '-' +%left '*' +@end example + +In a more complete example, which supports other operators as well, we +would declare them in groups of equal precedence. For example, @code{'+'} is +declared with @code{'-'}: + +@example +%left '<' '>' '=' NE LE GE +%left '+' '-' +%left '*' '/' +@end example + +@noindent +(Here @code{NE} and so on stand for the operators for ``not equal'' +and so on. We assume that these tokens are more than one character long +and therefore are represented by names, not character literals.) + +@node How Precedence +@subsection How Precedence Works + +The first effect of the precedence declarations is to assign precedence +levels to the terminal symbols declared. The second effect is to assign +precedence levels to certain rules: each rule gets its precedence from +the last terminal symbol mentioned in the components. (You can also +specify explicitly the precedence of a rule. @xref{Contextual +Precedence, ,Context-Dependent Precedence}.) + +Finally, the resolution of conflicts works by comparing the precedence +of the rule being considered with that of the lookahead token. If the +token's precedence is higher, the choice is to shift. If the rule's +precedence is higher, the choice is to reduce. If they have equal +precedence, the choice is made based on the associativity of that +precedence level. The verbose output file made by @samp{-v} +(@pxref{Invocation, ,Invoking Bison}) says how each conflict was +resolved. + +Not all rules and not all tokens have precedence. If either the rule or +the lookahead token has no precedence, then the default is to shift. + +@node Contextual Precedence +@section Context-Dependent Precedence +@cindex context-dependent precedence +@cindex unary operator precedence +@cindex precedence, context-dependent +@cindex precedence, unary operator +@findex %prec + +Often the precedence of an operator depends on the context. This sounds +outlandish at first, but it is really very common. For example, a minus +sign typically has a very high precedence as a unary operator, and a +somewhat lower precedence (lower than multiplication) as a binary operator. + +The Bison precedence declarations, @code{%left}, @code{%right} and +@code{%nonassoc}, can only be used once for a given token; so a token has +only one precedence declared in this way. For context-dependent +precedence, you need to use an additional mechanism: the @code{%prec} +modifier for rules. + +The @code{%prec} modifier declares the precedence of a particular rule by +specifying a terminal symbol whose precedence should be used for that rule. +It's not necessary for that symbol to appear otherwise in the rule. The +modifier's syntax is: + +@example +%prec @var{terminal-symbol} +@end example + +@noindent +and it is written after the components of the rule. Its effect is to +assign the rule the precedence of @var{terminal-symbol}, overriding +the precedence that would be deduced for it in the ordinary way. The +altered rule precedence then affects how conflicts involving that rule +are resolved (@pxref{Precedence, ,Operator Precedence}). + +Here is how @code{%prec} solves the problem of unary minus. First, declare +a precedence for a fictitious terminal symbol named @code{UMINUS}. There +are no tokens of this type, but the symbol serves to stand for its +precedence: + +@example +@dots{} +%left '+' '-' +%left '*' +%left UMINUS +@end example + +Now the precedence of @code{UMINUS} can be used in specific rules: + +@example +@group +exp: + @dots{} +| exp '-' exp + @dots{} +| '-' exp %prec UMINUS +@end group +@end example + +@ifset defaultprec +If you forget to append @code{%prec UMINUS} to the rule for unary +minus, Bison silently assumes that minus has its usual precedence. +This kind of problem can be tricky to debug, since one typically +discovers the mistake only by testing the code. + +The @code{%no-default-prec;} declaration makes it easier to discover +this kind of problem systematically. It causes rules that lack a +@code{%prec} modifier to have no precedence, even if the last terminal +symbol mentioned in their components has a declared precedence. + +If @code{%no-default-prec;} is in effect, you must specify @code{%prec} +for all rules that participate in precedence conflict resolution. +Then you will see any shift/reduce conflict until you tell Bison how +to resolve it, either by changing your grammar or by adding an +explicit precedence. This will probably add declarations to the +grammar, but it helps to protect against incorrect rule precedences. + +The effect of @code{%no-default-prec;} can be reversed by giving +@code{%default-prec;}, which is the default. +@end ifset + +@node Parser States +@section Parser States +@cindex finite-state machine +@cindex parser state +@cindex state (of parser) + +The function @code{yyparse} is implemented using a finite-state machine. +The values pushed on the parser stack are not simply token type codes; they +represent the entire sequence of terminal and nonterminal symbols at or +near the top of the stack. The current state collects all the information +about previous input which is relevant to deciding what to do next. + +Each time a lookahead token is read, the current parser state together +with the type of lookahead token are looked up in a table. This table +entry can say, ``Shift the lookahead token.'' In this case, it also +specifies the new parser state, which is pushed onto the top of the +parser stack. Or it can say, ``Reduce using rule number @var{n}.'' +This means that a certain number of tokens or groupings are taken off +the top of the stack, and replaced by one grouping. In other words, +that number of states are popped from the stack, and one new state is +pushed. + +There is one other alternative: the table can say that the lookahead token +is erroneous in the current state. This causes error processing to begin +(@pxref{Error Recovery}). + +@node Reduce/Reduce +@section Reduce/Reduce Conflicts +@cindex reduce/reduce conflict +@cindex conflicts, reduce/reduce + +A reduce/reduce conflict occurs if there are two or more rules that apply +to the same sequence of input. This usually indicates a serious error +in the grammar. + +For example, here is an erroneous attempt to define a sequence +of zero or more @code{word} groupings. + +@example +@group +sequence: + /* empty */ @{ printf ("empty sequence\n"); @} +| maybeword +| sequence word @{ printf ("added word %s\n", $2); @} +; +@end group + +@group +maybeword: + /* empty */ @{ printf ("empty maybeword\n"); @} +| word @{ printf ("single word %s\n", $1); @} +; +@end group +@end example + +@noindent +The error is an ambiguity: there is more than one way to parse a single +@code{word} into a @code{sequence}. It could be reduced to a +@code{maybeword} and then into a @code{sequence} via the second rule. +Alternatively, nothing-at-all could be reduced into a @code{sequence} +via the first rule, and this could be combined with the @code{word} +using the third rule for @code{sequence}. + +There is also more than one way to reduce nothing-at-all into a +@code{sequence}. This can be done directly via the first rule, +or indirectly via @code{maybeword} and then the second rule. + +You might think that this is a distinction without a difference, because it +does not change whether any particular input is valid or not. But it does +affect which actions are run. One parsing order runs the second rule's +action; the other runs the first rule's action and the third rule's action. +In this example, the output of the program changes. + +Bison resolves a reduce/reduce conflict by choosing to use the rule that +appears first in the grammar, but it is very risky to rely on this. Every +reduce/reduce conflict must be studied and usually eliminated. Here is the +proper way to define @code{sequence}: + +@example +sequence: + /* empty */ @{ printf ("empty sequence\n"); @} +| sequence word @{ printf ("added word %s\n", $2); @} +; +@end example + +Here is another common error that yields a reduce/reduce conflict: + +@example +sequence: + /* empty */ +| sequence words +| sequence redirects +; + +words: + /* empty */ +| words word +; + +redirects: + /* empty */ +| redirects redirect +; +@end example + +@noindent +The intention here is to define a sequence which can contain either +@code{word} or @code{redirect} groupings. The individual definitions of +@code{sequence}, @code{words} and @code{redirects} are error-free, but the +three together make a subtle ambiguity: even an empty input can be parsed +in infinitely many ways! + +Consider: nothing-at-all could be a @code{words}. Or it could be two +@code{words} in a row, or three, or any number. It could equally well be a +@code{redirects}, or two, or any number. Or it could be a @code{words} +followed by three @code{redirects} and another @code{words}. And so on. + +Here are two ways to correct these rules. First, to make it a single level +of sequence: + +@example +sequence: + /* empty */ +| sequence word +| sequence redirect +; +@end example + +Second, to prevent either a @code{words} or a @code{redirects} +from being empty: + +@example +@group +sequence: + /* empty */ +| sequence words +| sequence redirects +; +@end group + +@group +words: + word +| words word +; +@end group + +@group +redirects: + redirect +| redirects redirect +; +@end group +@end example + +@node Mysterious Conflicts +@section Mysterious Conflicts +@cindex Mysterious Conflicts + +Sometimes reduce/reduce conflicts can occur that don't look warranted. +Here is an example: + +@example +@group +%token ID + +%% +def: param_spec return_spec ','; +param_spec: + type +| name_list ':' type +; +@end group +@group +return_spec: + type +| name ':' type +; +@end group +@group +type: ID; +@end group +@group +name: ID; +name_list: + name +| name ',' name_list +; +@end group +@end example + +It would seem that this grammar can be parsed with only a single token +of lookahead: when a @code{param_spec} is being read, an @code{ID} is +a @code{name} if a comma or colon follows, or a @code{type} if another +@code{ID} follows. In other words, this grammar is LR(1). + +@cindex LR +@cindex LALR +However, for historical reasons, Bison cannot by default handle all +LR(1) grammars. +In this grammar, two contexts, that after an @code{ID} at the beginning +of a @code{param_spec} and likewise at the beginning of a +@code{return_spec}, are similar enough that Bison assumes they are the +same. +They appear similar because the same set of rules would be +active---the rule for reducing to a @code{name} and that for reducing to +a @code{type}. Bison is unable to determine at that stage of processing +that the rules would require different lookahead tokens in the two +contexts, so it makes a single parser state for them both. Combining +the two contexts causes a conflict later. In parser terminology, this +occurrence means that the grammar is not LALR(1). + +@cindex IELR +@cindex canonical LR +For many practical grammars (specifically those that fall into the non-LR(1) +class), the limitations of LALR(1) result in difficulties beyond just +mysterious reduce/reduce conflicts. The best way to fix all these problems +is to select a different parser table construction algorithm. Either +IELR(1) or canonical LR(1) would suffice, but the former is more efficient +and easier to debug during development. @xref{LR Table Construction}, for +details. (Bison's IELR(1) and canonical LR(1) implementations are +experimental. More user feedback will help to stabilize them.) + +If you instead wish to work around LALR(1)'s limitations, you +can often fix a mysterious conflict by identifying the two parser states +that are being confused, and adding something to make them look +distinct. In the above example, adding one rule to +@code{return_spec} as follows makes the problem go away: + +@example +@group +%token BOGUS +@dots{} +%% +@dots{} +return_spec: + type +| name ':' type +| ID BOGUS /* This rule is never used. */ +; +@end group +@end example + +This corrects the problem because it introduces the possibility of an +additional active rule in the context after the @code{ID} at the beginning of +@code{return_spec}. This rule is not active in the corresponding context +in a @code{param_spec}, so the two contexts receive distinct parser states. +As long as the token @code{BOGUS} is never generated by @code{yylex}, +the added rule cannot alter the way actual input is parsed. + +In this particular example, there is another way to solve the problem: +rewrite the rule for @code{return_spec} to use @code{ID} directly +instead of via @code{name}. This also causes the two confusing +contexts to have different sets of active rules, because the one for +@code{return_spec} activates the altered rule for @code{return_spec} +rather than the one for @code{name}. + +@example +param_spec: + type +| name_list ':' type +; +return_spec: + type +| ID ':' type +; +@end example + +For a more detailed exposition of LALR(1) parsers and parser +generators, @pxref{Bibliography,,DeRemer 1982}. + +@node Tuning LR +@section Tuning LR + +The default behavior of Bison's LR-based parsers is chosen mostly for +historical reasons, but that behavior is often not robust. For example, in +the previous section, we discussed the mysterious conflicts that can be +produced by LALR(1), Bison's default parser table construction algorithm. +Another example is Bison's @code{%error-verbose} directive, which instructs +the generated parser to produce verbose syntax error messages, which can +sometimes contain incorrect information. + +In this section, we explore several modern features of Bison that allow you +to tune fundamental aspects of the generated LR-based parsers. Some of +these features easily eliminate shortcomings like those mentioned above. +Others can be helpful purely for understanding your parser. + +Most of the features discussed in this section are still experimental. More +user feedback will help to stabilize them. + +@menu +* LR Table Construction:: Choose a different construction algorithm. +* Default Reductions:: Disable default reductions. +* LAC:: Correct lookahead sets in the parser states. +* Unreachable States:: Keep unreachable parser states for debugging. +@end menu + +@node LR Table Construction +@subsection LR Table Construction +@cindex Mysterious Conflict +@cindex LALR +@cindex IELR +@cindex canonical LR +@findex %define lr.type + +For historical reasons, Bison constructs LALR(1) parser tables by default. +However, LALR does not possess the full language-recognition power of LR. +As a result, the behavior of parsers employing LALR parser tables is often +mysterious. We presented a simple example of this effect in @ref{Mysterious +Conflicts}. + +As we also demonstrated in that example, the traditional approach to +eliminating such mysterious behavior is to restructure the grammar. +Unfortunately, doing so correctly is often difficult. Moreover, merely +discovering that LALR causes mysterious behavior in your parser can be +difficult as well. + +Fortunately, Bison provides an easy way to eliminate the possibility of such +mysterious behavior altogether. You simply need to activate a more powerful +parser table construction algorithm by using the @code{%define lr.type} +directive. + +@deffn {Directive} {%define lr.type @var{TYPE}} +Specify the type of parser tables within the LR(1) family. The accepted +values for @var{TYPE} are: + +@itemize +@item @code{lalr} (default) +@item @code{ielr} +@item @code{canonical-lr} +@end itemize + +(This feature is experimental. More user feedback will help to stabilize +it.) +@end deffn + +For example, to activate IELR, you might add the following directive to you +grammar file: + +@example +%define lr.type ielr +@end example + +@noindent For the example in @ref{Mysterious Conflicts}, the mysterious +conflict is then eliminated, so there is no need to invest time in +comprehending the conflict or restructuring the grammar to fix it. If, +during future development, the grammar evolves such that all mysterious +behavior would have disappeared using just LALR, you need not fear that +continuing to use IELR will result in unnecessarily large parser tables. +That is, IELR generates LALR tables when LALR (using a deterministic parsing +algorithm) is sufficient to support the full language-recognition power of +LR. Thus, by enabling IELR at the start of grammar development, you can +safely and completely eliminate the need to consider LALR's shortcomings. + +While IELR is almost always preferable, there are circumstances where LALR +or the canonical LR parser tables described by Knuth +(@pxref{Bibliography,,Knuth 1965}) can be useful. Here we summarize the +relative advantages of each parser table construction algorithm within +Bison: + +@itemize +@item LALR + +There are at least two scenarios where LALR can be worthwhile: + +@itemize +@item GLR without static conflict resolution. + +@cindex GLR with LALR +When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any +conflicts statically (for example, with @code{%left} or @code{%prec}), then +the parser explores all potential parses of any given input. In this case, +the choice of parser table construction algorithm is guaranteed not to alter +the language accepted by the parser. LALR parser tables are the smallest +parser tables Bison can currently construct, so they may then be preferable. +Nevertheless, once you begin to resolve conflicts statically, GLR behaves +more like a deterministic parser in the syntactic contexts where those +conflicts appear, and so either IELR or canonical LR can then be helpful to +avoid LALR's mysterious behavior. + +@item Malformed grammars. + +Occasionally during development, an especially malformed grammar with a +major recurring flaw may severely impede the IELR or canonical LR parser +table construction algorithm. LALR can be a quick way to construct parser +tables in order to investigate such problems while ignoring the more subtle +differences from IELR and canonical LR. +@end itemize + +@item IELR + +IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given +any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables +always accept exactly the same set of sentences. However, like LALR, IELR +merges parser states during parser table construction so that the number of +parser states is often an order of magnitude less than for canonical LR. +More importantly, because canonical LR's extra parser states may contain +duplicate conflicts in the case of non-LR grammars, the number of conflicts +for IELR is often an order of magnitude less as well. This effect can +significantly reduce the complexity of developing a grammar. + +@item Canonical LR + +@cindex delayed syntax error detection +@cindex LAC +@findex %nonassoc +While inefficient, canonical LR parser tables can be an interesting means to +explore a grammar because they possess a property that IELR and LALR tables +do not. That is, if @code{%nonassoc} is not used and default reductions are +left disabled (@pxref{Default Reductions}), then, for every left context of +every canonical LR state, the set of tokens accepted by that state is +guaranteed to be the exact set of tokens that is syntactically acceptable in +that left context. It might then seem that an advantage of canonical LR +parsers in production is that, under the above constraints, they are +guaranteed to detect a syntax error as soon as possible without performing +any unnecessary reductions. However, IELR parsers that use LAC are also +able to achieve this behavior without sacrificing @code{%nonassoc} or +default reductions. For details and a few caveats of LAC, @pxref{LAC}. +@end itemize + +For a more detailed exposition of the mysterious behavior in LALR parsers +and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and +@ref{Bibliography,,Denny 2010 November}. + +@node Default Reductions +@subsection Default Reductions +@cindex default reductions +@findex %define lr.default-reductions +@findex %nonassoc + +After parser table construction, Bison identifies the reduction with the +largest lookahead set in each parser state. To reduce the size of the +parser state, traditional Bison behavior is to remove that lookahead set and +to assign that reduction to be the default parser action. Such a reduction +is known as a @dfn{default reduction}. + +Default reductions affect more than the size of the parser tables. They +also affect the behavior of the parser: + +@itemize +@item Delayed @code{yylex} invocations. + +@cindex delayed yylex invocations +@cindex consistent states +@cindex defaulted states +A @dfn{consistent state} is a state that has only one possible parser +action. If that action is a reduction and is encoded as a default +reduction, then that consistent state is called a @dfn{defaulted state}. +Upon reaching a defaulted state, a Bison-generated parser does not bother to +invoke @code{yylex} to fetch the next token before performing the reduction. +In other words, whether default reductions are enabled in consistent states +determines how soon a Bison-generated parser invokes @code{yylex} for a +token: immediately when it @emph{reaches} that token in the input or when it +eventually @emph{needs} that token as a lookahead to determine the next +parser action. Traditionally, default reductions are enabled, and so the +parser exhibits the latter behavior. + +The presence of defaulted states is an important consideration when +designing @code{yylex} and the grammar file. That is, if the behavior of +@code{yylex} can influence or be influenced by the semantic actions +associated with the reductions in defaulted states, then the delay of the +next @code{yylex} invocation until after those reductions is significant. +For example, the semantic actions might pop a scope stack that @code{yylex} +uses to determine what token to return. Thus, the delay might be necessary +to ensure that @code{yylex} does not look up the next token in a scope that +should already be considered closed. + +@item Delayed syntax error detection. + +@cindex delayed syntax error detection +When the parser fetches a new token by invoking @code{yylex}, it checks +whether there is an action for that token in the current parser state. The +parser detects a syntax error if and only if either (1) there is no action +for that token or (2) the action for that token is the error action (due to +the use of @code{%nonassoc}). However, if there is a default reduction in +that state (which might or might not be a defaulted state), then it is +impossible for condition 1 to exist. That is, all tokens have an action. +Thus, the parser sometimes fails to detect the syntax error until it reaches +a later state. + +@cindex LAC +@c If there's an infinite loop, default reductions can prevent an incorrect +@c sentence from being rejected. +While default reductions never cause the parser to accept syntactically +incorrect sentences, the delay of syntax error detection can have unexpected +effects on the behavior of the parser. However, the delay can be caused +anyway by parser state merging and the use of @code{%nonassoc}, and it can +be fixed by another Bison feature, LAC. We discuss the effects of delayed +syntax error detection and LAC more in the next section (@pxref{LAC}). +@end itemize + +For canonical LR, the only default reduction that Bison enables by default +is the accept action, which appears only in the accepting state, which has +no other action and is thus a defaulted state. However, the default accept +action does not delay any @code{yylex} invocation or syntax error detection +because the accept action ends the parse. + +For LALR and IELR, Bison enables default reductions in nearly all states by +default. There are only two exceptions. First, states that have a shift +action on the @code{error} token do not have default reductions because +delayed syntax error detection could then prevent the @code{error} token +from ever being shifted in that state. However, parser state merging can +cause the same effect anyway, and LAC fixes it in both cases, so future +versions of Bison might drop this exception when LAC is activated. Second, +GLR parsers do not record the default reduction as the action on a lookahead +token for which there is a conflict. The correct action in this case is to +split the parse instead. + +To adjust which states have default reductions enabled, use the +@code{%define lr.default-reductions} directive. + +@deffn {Directive} {%define lr.default-reductions @var{WHERE}} +Specify the kind of states that are permitted to contain default reductions. +The accepted values of @var{WHERE} are: +@itemize +@item @code{most} (default for LALR and IELR) +@item @code{consistent} +@item @code{accepting} (default for canonical LR) +@end itemize + +(The ability to specify where default reductions are permitted is +experimental. More user feedback will help to stabilize it.) +@end deffn + +@node LAC +@subsection LAC +@findex %define parse.lac +@cindex LAC +@cindex lookahead correction + +Canonical LR, IELR, and LALR can suffer from a couple of problems upon +encountering a syntax error. First, the parser might perform additional +parser stack reductions before discovering the syntax error. Such +reductions can perform user semantic actions that are unexpected because +they are based on an invalid token, and they cause error recovery to begin +in a different syntactic context than the one in which the invalid token was +encountered. Second, when verbose error messages are enabled (@pxref{Error +Reporting}), the expected token list in the syntax error message can both +contain invalid tokens and omit valid tokens. + +The culprits for the above problems are @code{%nonassoc}, default reductions +in inconsistent states (@pxref{Default Reductions}), and parser state +merging. Because IELR and LALR merge parser states, they suffer the most. +Canonical LR can suffer only if @code{%nonassoc} is used or if default +reductions are enabled for inconsistent states. + +LAC (Lookahead Correction) is a new mechanism within the parsing algorithm +that solves these problems for canonical LR, IELR, and LALR without +sacrificing @code{%nonassoc}, default reductions, or state merging. You can +enable LAC with the @code{%define parse.lac} directive. + +@deffn {Directive} {%define parse.lac @var{VALUE}} +Enable LAC to improve syntax error handling. +@itemize +@item @code{none} (default) +@item @code{full} +@end itemize +(This feature is experimental. More user feedback will help to stabilize +it. Moreover, it is currently only available for deterministic parsers in +C.) +@end deffn + +Conceptually, the LAC mechanism is straight-forward. Whenever the parser +fetches a new token from the scanner so that it can determine the next +parser action, it immediately suspends normal parsing and performs an +exploratory parse using a temporary copy of the normal parser state stack. +During this exploratory parse, the parser does not perform user semantic +actions. If the exploratory parse reaches a shift action, normal parsing +then resumes on the normal parser stacks. If the exploratory parse reaches +an error instead, the parser reports a syntax error. If verbose syntax +error messages are enabled, the parser must then discover the list of +expected tokens, so it performs a separate exploratory parse for each token +in the grammar. + +There is one subtlety about the use of LAC. That is, when in a consistent +parser state with a default reduction, the parser will not attempt to fetch +a token from the scanner because no lookahead is needed to determine the +next parser action. Thus, whether default reductions are enabled in +consistent states (@pxref{Default Reductions}) affects how soon the parser +detects a syntax error: immediately when it @emph{reaches} an erroneous +token or when it eventually @emph{needs} that token as a lookahead to +determine the next parser action. The latter behavior is probably more +intuitive, so Bison currently provides no way to achieve the former behavior +while default reductions are enabled in consistent states. + +Thus, when LAC is in use, for some fixed decision of whether to enable +default reductions in consistent states, canonical LR and IELR behave almost +exactly the same for both syntactically acceptable and syntactically +unacceptable input. While LALR still does not support the full +language-recognition power of canonical LR and IELR, LAC at least enables +LALR's syntax error handling to correctly reflect LALR's +language-recognition power. + +There are a few caveats to consider when using LAC: + +@itemize +@item Infinite parsing loops. + +IELR plus LAC does have one shortcoming relative to canonical LR. Some +parsers generated by Bison can loop infinitely. LAC does not fix infinite +parsing loops that occur between encountering a syntax error and detecting +it, but enabling canonical LR or disabling default reductions sometimes +does. + +@item Verbose error message limitations. + +Because of internationalization considerations, Bison-generated parsers +limit the size of the expected token list they are willing to report in a +verbose syntax error message. If the number of expected tokens exceeds that +limit, the list is simply dropped from the message. Enabling LAC can +increase the size of the list and thus cause the parser to drop it. Of +course, dropping the list is better than reporting an incorrect list. + +@item Performance. + +Because LAC requires many parse actions to be performed twice, it can have a +performance penalty. However, not all parse actions must be performed +twice. Specifically, during a series of default reductions in consistent +states and shift actions, the parser never has to initiate an exploratory +parse. Moreover, the most time-consuming tasks in a parse are often the +file I/O, the lexical analysis performed by the scanner, and the user's +semantic actions, but none of these are performed during the exploratory +parse. Finally, the base of the temporary stack used during an exploratory +parse is a pointer into the normal parser state stack so that the stack is +never physically copied. In our experience, the performance penalty of LAC +has proved insignificant for practical grammars. +@end itemize + +While the LAC algorithm shares techniques that have been recognized in the +parser community for years, for the publication that introduces LAC, +@pxref{Bibliography,,Denny 2010 May}. + +@node Unreachable States +@subsection Unreachable States +@findex %define lr.keep-unreachable-states +@cindex unreachable states + +If there exists no sequence of transitions from the parser's start state to +some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable +state}. A state can become unreachable during conflict resolution if Bison +disables a shift action leading to it from a predecessor state. + +By default, Bison removes unreachable states from the parser after conflict +resolution because they are useless in the generated parser. However, +keeping unreachable states is sometimes useful when trying to understand the +relationship between the parser and the grammar. + +@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}} +Request that Bison allow unreachable states to remain in the parser tables. +@var{VALUE} must be a Boolean. The default is @code{false}. +@end deffn + +There are a few caveats to consider: + +@itemize @bullet +@item Missing or extraneous warnings. + +Unreachable states may contain conflicts and may use rules not used in any +other state. Thus, keeping unreachable states may induce warnings that are +irrelevant to your parser's behavior, and it may eliminate warnings that are +relevant. Of course, the change in warnings may actually be relevant to a +parser table analysis that wants to keep unreachable states, so this +behavior will likely remain in future Bison releases. + +@item Other useless states. + +While Bison is able to remove unreachable states, it is not guaranteed to +remove other kinds of useless states. Specifically, when Bison disables +reduce actions during conflict resolution, some goto actions may become +useless, and thus some additional states may become useless. If Bison were +to compute which goto actions were useless and then disable those actions, +it could identify such states as unreachable and then remove those states. +However, Bison does not compute which goto actions are useless. +@end itemize + +@node Generalized LR Parsing +@section Generalized LR (GLR) Parsing +@cindex GLR parsing +@cindex generalized LR (GLR) parsing +@cindex ambiguous grammars +@cindex nondeterministic parsing + +Bison produces @emph{deterministic} parsers that choose uniquely +when to reduce and which reduction to apply +based on a summary of the preceding input and on one extra token of lookahead. +As a result, normal Bison handles a proper subset of the family of +context-free languages. +Ambiguous grammars, since they have strings with more than one possible +sequence of reductions cannot have deterministic parsers in this sense. +The same is true of languages that require more than one symbol of +lookahead, since the parser lacks the information necessary to make a +decision at the point it must be made in a shift-reduce parser. +Finally, as previously mentioned (@pxref{Mysterious Conflicts}), +there are languages where Bison's default choice of how to +summarize the input seen so far loses necessary information. + +When you use the @samp{%glr-parser} declaration in your grammar file, +Bison generates a parser that uses a different algorithm, called +Generalized LR (or GLR). A Bison GLR +parser uses the same basic +algorithm for parsing as an ordinary Bison parser, but behaves +differently in cases where there is a shift-reduce conflict that has not +been resolved by precedence rules (@pxref{Precedence}) or a +reduce-reduce conflict. When a GLR parser encounters such a +situation, it +effectively @emph{splits} into a several parsers, one for each possible +shift or reduction. These parsers then proceed as usual, consuming +tokens in lock-step. Some of the stacks may encounter other conflicts +and split further, with the result that instead of a sequence of states, +a Bison GLR parsing stack is what is in effect a tree of states. + +In effect, each stack represents a guess as to what the proper parse +is. Additional input may indicate that a guess was wrong, in which case +the appropriate stack silently disappears. Otherwise, the semantics +actions generated in each stack are saved, rather than being executed +immediately. When a stack disappears, its saved semantic actions never +get executed. When a reduction causes two stacks to become equivalent, +their sets of semantic actions are both saved with the state that +results from the reduction. We say that two stacks are equivalent +when they both represent the same sequence of states, +and each pair of corresponding states represents a +grammar symbol that produces the same segment of the input token +stream. + +Whenever the parser makes a transition from having multiple +states to having one, it reverts to the normal deterministic parsing +algorithm, after resolving and executing the saved-up actions. +At this transition, some of the states on the stack will have semantic +values that are sets (actually multisets) of possible actions. The +parser tries to pick one of the actions by first finding one whose rule +has the highest dynamic precedence, as set by the @samp{%dprec} +declaration. Otherwise, if the alternative actions are not ordered by +precedence, but there the same merging function is declared for both +rules by the @samp{%merge} declaration, +Bison resolves and evaluates both and then calls the merge function on +the result. Otherwise, it reports an ambiguity. + +It is possible to use a data structure for the GLR parsing tree that +permits the processing of any LR(1) grammar in linear time (in the +size of the input), any unambiguous (not necessarily +LR(1)) grammar in +quadratic worst-case time, and any general (possibly ambiguous) +context-free grammar in cubic worst-case time. However, Bison currently +uses a simpler data structure that requires time proportional to the +length of the input times the maximum number of stacks required for any +prefix of the input. Thus, really ambiguous or nondeterministic +grammars can require exponential time and space to process. Such badly +behaving examples, however, are not generally of practical interest. +Usually, nondeterminism in a grammar is local---the parser is ``in +doubt'' only for a few tokens at a time. Therefore, the current data +structure should generally be adequate. On LR(1) portions of a +grammar, in particular, it is only slightly slower than with the +deterministic LR(1) Bison parser. + +For a more detailed exposition of GLR parsers, @pxref{Bibliography,,Scott +2000}. + +@node Memory Management +@section Memory Management, and How to Avoid Memory Exhaustion +@cindex memory exhaustion +@cindex memory management +@cindex stack overflow +@cindex parser stack overflow +@cindex overflow of parser stack + +The Bison parser stack can run out of memory if too many tokens are shifted and +not reduced. When this happens, the parser function @code{yyparse} +calls @code{yyerror} and then returns 2. + +Because Bison parsers have growing stacks, hitting the upper limit +usually results from using a right recursion instead of a left +recursion, see @ref{Recursion, ,Recursive Rules}. + +@vindex YYMAXDEPTH +By defining the macro @code{YYMAXDEPTH}, you can control how deep the +parser stack can become before memory is exhausted. Define the +macro with a value that is an integer. This value is the maximum number +of tokens that can be shifted (and not reduced) before overflow. + +The stack space allowed is not necessarily allocated. If you specify a +large value for @code{YYMAXDEPTH}, the parser normally allocates a small +stack at first, and then makes it bigger by stages as needed. This +increasing allocation happens automatically and silently. Therefore, +you do not need to make @code{YYMAXDEPTH} painfully small merely to save +space for ordinary inputs that do not need much stack. + +However, do not allow @code{YYMAXDEPTH} to be a value so large that +arithmetic overflow could occur when calculating the size of the stack +space. Also, do not allow @code{YYMAXDEPTH} to be less than +@code{YYINITDEPTH}. + +@cindex default stack limit +The default value of @code{YYMAXDEPTH}, if you do not define it, is +10000. + +@vindex YYINITDEPTH +You can control how much stack is allocated initially by defining the +macro @code{YYINITDEPTH} to a positive integer. For the deterministic +parser in C, this value must be a compile-time constant +unless you are assuming C99 or some other target language or compiler +that allows variable-length arrays. The default is 200. + +Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}. + +@c FIXME: C++ output. +Because of semantic differences between C and C++, the deterministic +parsers in C produced by Bison cannot grow when compiled +by C++ compilers. In this precise case (compiling a C parser as C++) you are +suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix +this deficiency in a future release. + +@node Error Recovery +@chapter Error Recovery +@cindex error recovery +@cindex recovery from errors + +It is not usually acceptable to have a program terminate on a syntax +error. For example, a compiler should recover sufficiently to parse the +rest of the input file and check it for errors; a calculator should accept +another expression. + +In a simple interactive command parser where each input is one line, it may +be sufficient to allow @code{yyparse} to return 1 on error and have the +caller ignore the rest of the input line when that happens (and then call +@code{yyparse} again). But this is inadequate for a compiler, because it +forgets all the syntactic context leading up to the error. A syntax error +deep within a function in the compiler input should not cause the compiler +to treat the following line like the beginning of a source file. + +@findex error +You can define how to recover from a syntax error by writing rules to +recognize the special token @code{error}. This is a terminal symbol that +is always defined (you need not declare it) and reserved for error +handling. The Bison parser generates an @code{error} token whenever a +syntax error happens; if you have provided a rule to recognize this token +in the current context, the parse can continue. + +For example: + +@example +stmts: + /* empty string */ +| stmts '\n' +| stmts exp '\n' +| stmts error '\n' +@end example + +The fourth rule in this example says that an error followed by a newline +makes a valid addition to any @code{stmts}. + +What happens if a syntax error occurs in the middle of an @code{exp}? The +error recovery rule, interpreted strictly, applies to the precise sequence +of a @code{stmts}, an @code{error} and a newline. If an error occurs in +the middle of an @code{exp}, there will probably be some additional tokens +and subexpressions on the stack after the last @code{stmts}, and there +will be tokens to read before the next newline. So the rule is not +applicable in the ordinary way. + +But Bison can force the situation to fit the rule, by discarding part of +the semantic context and part of the input. First it discards states +and objects from the stack until it gets back to a state in which the +@code{error} token is acceptable. (This means that the subexpressions +already parsed are discarded, back to the last complete @code{stmts}.) +At this point the @code{error} token can be shifted. Then, if the old +lookahead token is not acceptable to be shifted next, the parser reads +tokens and discards them until it finds a token which is acceptable. In +this example, Bison reads and discards input until the next newline so +that the fourth rule can apply. Note that discarded symbols are +possible sources of memory leaks, see @ref{Destructor Decl, , Freeing +Discarded Symbols}, for a means to reclaim this memory. + +The choice of error rules in the grammar is a choice of strategies for +error recovery. A simple and useful strategy is simply to skip the rest of +the current input line or current statement if an error is detected: + +@example +stmt: error ';' /* On error, skip until ';' is read. */ +@end example + +It is also useful to recover to the matching close-delimiter of an +opening-delimiter that has already been parsed. Otherwise the +close-delimiter will probably appear to be unmatched, and generate another, +spurious error message: + +@example +primary: + '(' expr ')' +| '(' error ')' +@dots{} +; +@end example + +Error recovery strategies are necessarily guesses. When they guess wrong, +one syntax error often leads to another. In the above example, the error +recovery rule guesses that an error is due to bad input within one +@code{stmt}. Suppose that instead a spurious semicolon is inserted in the +middle of a valid @code{stmt}. After the error recovery rule recovers +from the first error, another syntax error will be found straightaway, +since the text following the spurious semicolon is also an invalid +@code{stmt}. + +To prevent an outpouring of error messages, the parser will output no error +message for another syntax error that happens shortly after the first; only +after three consecutive input tokens have been successfully shifted will +error messages resume. + +Note that rules which accept the @code{error} token may have actions, just +as any other rules can. + +@findex yyerrok +You can make error messages resume immediately by using the macro +@code{yyerrok} in an action. If you do this in the error rule's action, no +error messages will be suppressed. This macro requires no arguments; +@samp{yyerrok;} is a valid C statement. + +@findex yyclearin +The previous lookahead token is reanalyzed immediately after an error. If +this is unacceptable, then the macro @code{yyclearin} may be used to clear +this token. Write the statement @samp{yyclearin;} in the error rule's +action. +@xref{Action Features, ,Special Features for Use in Actions}. + +For example, suppose that on a syntax error, an error handling routine is +called that advances the input stream to some point where parsing should +once again commence. The next symbol returned by the lexical scanner is +probably correct. The previous lookahead token ought to be discarded +with @samp{yyclearin;}. + +@vindex YYRECOVERING +The expression @code{YYRECOVERING ()} yields 1 when the parser +is recovering from a syntax error, and 0 otherwise. +Syntax error diagnostics are suppressed while recovering from a syntax +error. + +@node Context Dependency +@chapter Handling Context Dependencies + +The Bison paradigm is to parse tokens first, then group them into larger +syntactic units. In many languages, the meaning of a token is affected by +its context. Although this violates the Bison paradigm, certain techniques +(known as @dfn{kludges}) may enable you to write Bison parsers for such +languages. + +@menu +* Semantic Tokens:: Token parsing can depend on the semantic context. +* Lexical Tie-ins:: Token parsing can depend on the syntactic context. +* Tie-in Recovery:: Lexical tie-ins have implications for how + error recovery rules must be written. +@end menu + +(Actually, ``kludge'' means any technique that gets its job done but is +neither clean nor robust.) + +@node Semantic Tokens +@section Semantic Info in Token Types + +The C language has a context dependency: the way an identifier is used +depends on what its current meaning is. For example, consider this: + +@example +foo (x); +@end example + +This looks like a function call statement, but if @code{foo} is a typedef +name, then this is actually a declaration of @code{x}. How can a Bison +parser for C decide how to parse this input? + +The method used in GNU C is to have two different token types, +@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an +identifier, it looks up the current declaration of the identifier in order +to decide which token type to return: @code{TYPENAME} if the identifier is +declared as a typedef, @code{IDENTIFIER} otherwise. + +The grammar rules can then express the context dependency by the choice of +token type to recognize. @code{IDENTIFIER} is accepted as an expression, +but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but +@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier +is @emph{not} significant, such as in declarations that can shadow a +typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is +accepted---there is one rule for each of the two token types. + +This technique is simple to use if the decision of which kinds of +identifiers to allow is made at a place close to where the identifier is +parsed. But in C this is not always so: C allows a declaration to +redeclare a typedef name provided an explicit type has been specified +earlier: + +@example +typedef int foo, bar; +int baz (void) +@group +@{ + static bar (bar); /* @r{redeclare @code{bar} as static variable} */ + extern foo foo (foo); /* @r{redeclare @code{foo} as function} */ + return foo (bar); +@} +@end group +@end example + +Unfortunately, the name being declared is separated from the declaration +construct itself by a complicated syntactic structure---the ``declarator''. + +As a result, part of the Bison parser for C needs to be duplicated, with +all the nonterminal names changed: once for parsing a declaration in +which a typedef name can be redefined, and once for parsing a +declaration in which that can't be done. Here is a part of the +duplication, with actions omitted for brevity: + +@example +@group +initdcl: + declarator maybeasm '=' init +| declarator maybeasm +; +@end group + +@group +notype_initdcl: + notype_declarator maybeasm '=' init +| notype_declarator maybeasm +; +@end group +@end example + +@noindent +Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl} +cannot. The distinction between @code{declarator} and +@code{notype_declarator} is the same sort of thing. + +There is some similarity between this technique and a lexical tie-in +(described next), in that information which alters the lexical analysis is +changed during parsing by other parts of the program. The difference is +here the information is global, and is used for other purposes in the +program. A true lexical tie-in has a special-purpose flag controlled by +the syntactic context. + +@node Lexical Tie-ins +@section Lexical Tie-ins +@cindex lexical tie-in + +One way to handle context-dependency is the @dfn{lexical tie-in}: a flag +which is set by Bison actions, whose purpose is to alter the way tokens are +parsed. + +For example, suppose we have a language vaguely like C, but with a special +construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes +an expression in parentheses in which all integers are hexadecimal. In +particular, the token @samp{a1b} must be treated as an integer rather than +as an identifier if it appears in that context. Here is how you can do it: + +@example +@group +%@{ + int hexflag; + int yylex (void); + void yyerror (char const *); +%@} +%% +@dots{} +@end group +@group +expr: + IDENTIFIER +| constant +| HEX '(' @{ hexflag = 1; @} + expr ')' @{ hexflag = 0; $$ = $4; @} +| expr '+' expr @{ $$ = make_sum ($1, $3); @} +@dots{} +; +@end group + +@group +constant: + INTEGER +| STRING +; +@end group +@end example + +@noindent +Here we assume that @code{yylex} looks at the value of @code{hexflag}; when +it is nonzero, all integers are parsed in hexadecimal, and tokens starting +with letters are parsed as integers if possible. + +The declaration of @code{hexflag} shown in the prologue of the grammar +file is needed to make it accessible to the actions (@pxref{Prologue, +,The Prologue}). You must also write the code in @code{yylex} to obey +the flag. + +@node Tie-in Recovery +@section Lexical Tie-ins and Error Recovery + +Lexical tie-ins make strict demands on any error recovery rules you have. +@xref{Error Recovery}. + +The reason for this is that the purpose of an error recovery rule is to +abort the parsing of one construct and resume in some larger construct. +For example, in C-like languages, a typical error recovery rule is to skip +tokens until the next semicolon, and then start a new statement, like this: + +@example +stmt: + expr ';' +| IF '(' expr ')' stmt @{ @dots{} @} +@dots{} +| error ';' @{ hexflag = 0; @} +; +@end example + +If there is a syntax error in the middle of a @samp{hex (@var{expr})} +construct, this error rule will apply, and then the action for the +completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would +remain set for the entire rest of the input, or until the next @code{hex} +keyword, causing identifiers to be misinterpreted as integers. + +To avoid this problem the error recovery rule itself clears @code{hexflag}. + +There may also be an error recovery rule that works within expressions. +For example, there could be a rule which applies within parentheses +and skips to the close-parenthesis: + +@example +@group +expr: + @dots{} +| '(' expr ')' @{ $$ = $2; @} +| '(' error ')' +@dots{} +@end group +@end example + +If this rule acts within the @code{hex} construct, it is not going to abort +that construct (since it applies to an inner level of parentheses within +the construct). Therefore, it should not clear the flag: the rest of +the @code{hex} construct should be parsed with the flag still in effect. + +What if there is an error recovery rule which might abort out of the +@code{hex} construct or might not, depending on circumstances? There is no +way you can write the action to determine whether a @code{hex} construct is +being aborted or not. So if you are using a lexical tie-in, you had better +make sure your error recovery rules are not of this kind. Each rule must +be such that you can be sure that it always will, or always won't, have to +clear the flag. + +@c ================================================== Debugging Your Parser + +@node Debugging +@chapter Debugging Your Parser + +Developing a parser can be a challenge, especially if you don't understand +the algorithm (@pxref{Algorithm, ,The Bison Parser Algorithm}). This +chapter explains how to generate and read the detailed description of the +automaton, and how to enable and understand the parser run-time traces. + +@menu +* Understanding:: Understanding the structure of your parser. +* Tracing:: Tracing the execution of your parser. +@end menu + +@node Understanding +@section Understanding Your Parser + +As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm}) +Bison parsers are @dfn{shift/reduce automata}. In some cases (much more +frequent than one would hope), looking at this automaton is required to +tune or simply fix a parser. Bison provides two different +representation of it, either textually or graphically (as a DOT file). + +The textual file is generated when the options @option{--report} or +@option{--verbose} are specified, see @ref{Invocation, , Invoking +Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from +the parser implementation file name, and adding @samp{.output} +instead. Therefore, if the grammar file is @file{foo.y}, then the +parser implementation file is called @file{foo.tab.c} by default. As +a consequence, the verbose output file is called @file{foo.output}. + +The following grammar file, @file{calc.y}, will be used in the sequel: + +@example +%token NUM STR +%left '+' '-' +%left '*' +%% +exp: + exp '+' exp +| exp '-' exp +| exp '*' exp +| exp '/' exp +| NUM +; +useless: STR; +%% +@end example + +@command{bison} reports: + +@example +calc.y: warning: 1 nonterminal useless in grammar +calc.y: warning: 1 rule useless in grammar +calc.y:11.1-7: warning: nonterminal useless in grammar: useless +calc.y:11.10-12: warning: rule useless in grammar: useless: STR +calc.y: conflicts: 7 shift/reduce +@end example + +When given @option{--report=state}, in addition to @file{calc.tab.c}, it +creates a file @file{calc.output} with contents detailed below. The +order of the output and the exact presentation might vary, but the +interpretation is the same. + +@noindent +@cindex token, useless +@cindex useless token +@cindex nonterminal, useless +@cindex useless nonterminal +@cindex rule, useless +@cindex useless rule +The first section reports useless tokens, nonterminals and rules. Useless +nonterminals and rules are removed in order to produce a smaller parser, but +useless tokens are preserved, since they might be used by the scanner (note +the difference between ``useless'' and ``unused'' below): + +@example +Nonterminals useless in grammar + useless + +Terminals unused in grammar + STR + +Rules useless in grammar + 6 useless: STR +@end example + +@noindent +The next section lists states that still have conflicts. + +@example +State 8 conflicts: 1 shift/reduce +State 9 conflicts: 1 shift/reduce +State 10 conflicts: 1 shift/reduce +State 11 conflicts: 4 shift/reduce +@end example + +@noindent +Then Bison reproduces the exact grammar it used: + +@example +Grammar + + 0 $accept: exp $end + + 1 exp: exp '+' exp + 2 | exp '-' exp + 3 | exp '*' exp + 4 | exp '/' exp + 5 | NUM +@end example + +@noindent +and reports the uses of the symbols: + +@example +@group +Terminals, with rules where they appear + +$end (0) 0 +'*' (42) 3 +'+' (43) 1 +'-' (45) 2 +'/' (47) 4 +error (256) +NUM (258) 5 +STR (259) +@end group + +@group +Nonterminals, with rules where they appear + +$accept (9) + on left: 0 +exp (10) + on left: 1 2 3 4 5, on right: 0 1 2 3 4 +@end group +@end example + +@noindent +@cindex item +@cindex pointed rule +@cindex rule, pointed +Bison then proceeds onto the automaton itself, describing each state +with its set of @dfn{items}, also known as @dfn{pointed rules}. Each +item is a production rule together with a point (@samp{.}) marking +the location of the input cursor. + +@example +state 0 + + 0 $accept: . exp $end + + NUM shift, and go to state 1 + + exp go to state 2 +@end example + +This reads as follows: ``state 0 corresponds to being at the very +beginning of the parsing, in the initial rule, right before the start +symbol (here, @code{exp}). When the parser returns to this state right +after having reduced a rule that produced an @code{exp}, the control +flow jumps to state 2. If there is no such transition on a nonterminal +symbol, and the lookahead is a @code{NUM}, then this token is shifted onto +the parse stack, and the control flow jumps to state 1. Any other +lookahead triggers a syntax error.'' + +@cindex core, item set +@cindex item set core +@cindex kernel, item set +@cindex item set core +Even though the only active rule in state 0 seems to be rule 0, the +report lists @code{NUM} as a lookahead token because @code{NUM} can be +at the beginning of any rule deriving an @code{exp}. By default Bison +reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if +you want to see more detail you can invoke @command{bison} with +@option{--report=itemset} to list the derived items as well: + +@example +state 0 + + 0 $accept: . exp $end + 1 exp: . exp '+' exp + 2 | . exp '-' exp + 3 | . exp '*' exp + 4 | . exp '/' exp + 5 | . NUM + + NUM shift, and go to state 1 + + exp go to state 2 +@end example + +@noindent +In the state 1@dots{} + +@example +state 1 + + 5 exp: NUM . + + $default reduce using rule 5 (exp) +@end example + +@noindent +the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token +(@samp{$default}), the parser will reduce it. If it was coming from +state 0, then, after this reduction it will return to state 0, and will +jump to state 2 (@samp{exp: go to state 2}). + +@example +state 2 + + 0 $accept: exp . $end + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp + + $end shift, and go to state 3 + '+' shift, and go to state 4 + '-' shift, and go to state 5 + '*' shift, and go to state 6 + '/' shift, and go to state 7 +@end example + +@noindent +In state 2, the automaton can only shift a symbol. For instance, +because of the item @samp{exp: exp . '+' exp}, if the lookahead is +@samp{+} it is shifted onto the parse stack, and the automaton +jumps to state 4, corresponding to the item @samp{exp: exp '+' . exp}. +Since there is no default action, any lookahead not listed triggers a syntax +error. + +@cindex accepting state +The state 3 is named the @dfn{final state}, or the @dfn{accepting +state}: + +@example +state 3 + + 0 $accept: exp $end . + + $default accept +@end example + +@noindent +the initial rule is completed (the start symbol and the end-of-input were +read), the parsing exits successfully. + +The interpretation of states 4 to 7 is straightforward, and is left to +the reader. + +@example +state 4 + + 1 exp: exp '+' . exp + + NUM shift, and go to state 1 + + exp go to state 8 + + +state 5 + + 2 exp: exp '-' . exp + + NUM shift, and go to state 1 + + exp go to state 9 + + +state 6 + + 3 exp: exp '*' . exp + + NUM shift, and go to state 1 + + exp go to state 10 + + +state 7 + + 4 exp: exp '/' . exp + + NUM shift, and go to state 1 + + exp go to state 11 +@end example + +As was announced in beginning of the report, @samp{State 8 conflicts: +1 shift/reduce}: + +@example +state 8 + + 1 exp: exp . '+' exp + 1 | exp '+' exp . + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp + + '*' shift, and go to state 6 + '/' shift, and go to state 7 + + '/' [reduce using rule 1 (exp)] + $default reduce using rule 1 (exp) +@end example + +Indeed, there are two actions associated to the lookahead @samp{/}: +either shifting (and going to state 7), or reducing rule 1. The +conflict means that either the grammar is ambiguous, or the parser lacks +information to make the right decision. Indeed the grammar is +ambiguous, as, since we did not specify the precedence of @samp{/}, the +sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM / +NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) / +NUM}, which corresponds to reducing rule 1. + +Because in deterministic parsing a single decision can be made, Bison +arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, , +Shift/Reduce Conflicts}. Discarded actions are reported between +square brackets. + +Note that all the previous states had a single possible action: either +shifting the next token and going to the corresponding state, or +reducing a single rule. In the other cases, i.e., when shifting +@emph{and} reducing is possible or when @emph{several} reductions are +possible, the lookahead is required to select the action. State 8 is +one such state: if the lookahead is @samp{*} or @samp{/} then the action +is shifting, otherwise the action is reducing rule 1. In other words, +the first two items, corresponding to rule 1, are not eligible when the +lookahead token is @samp{*}, since we specified that @samp{*} has higher +precedence than @samp{+}. More generally, some items are eligible only +with some set of possible lookahead tokens. When run with +@option{--report=lookahead}, Bison specifies these lookahead tokens: + +@example +state 8 + + 1 exp: exp . '+' exp + 1 | exp '+' exp . [$end, '+', '-', '/'] + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp + + '*' shift, and go to state 6 + '/' shift, and go to state 7 + + '/' [reduce using rule 1 (exp)] + $default reduce using rule 1 (exp) +@end example + +Note however that while @samp{NUM + NUM / NUM} is ambiguous (which results in +the conflicts on @samp{/}), @samp{NUM + NUM * NUM} is not: the conflict was +solved thanks to associativity and precedence directives. If invoked with +@option{--report=solved}, Bison includes information about the solved +conflicts in the report: + +@example +Conflict between rule 1 and token '+' resolved as reduce (%left '+'). +Conflict between rule 1 and token '-' resolved as reduce (%left '-'). +Conflict between rule 1 and token '*' resolved as shift ('+' < '*'). +@end example + + +The remaining states are similar: + +@example +@group +state 9 + + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 2 | exp '-' exp . + 3 | exp . '*' exp + 4 | exp . '/' exp + + '*' shift, and go to state 6 + '/' shift, and go to state 7 + + '/' [reduce using rule 2 (exp)] + $default reduce using rule 2 (exp) +@end group + +@group +state 10 + + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 3 | exp . '*' exp + 3 | exp '*' exp . + 4 | exp . '/' exp + + '/' shift, and go to state 7 + + '/' [reduce using rule 3 (exp)] + $default reduce using rule 3 (exp) +@end group + +@group +state 11 + + 1 exp: exp . '+' exp + 2 | exp . '-' exp + 3 | exp . '*' exp + 4 | exp . '/' exp + 4 | exp '/' exp . + + '+' shift, and go to state 4 + '-' shift, and go to state 5 + '*' shift, and go to state 6 + '/' shift, and go to state 7 + + '+' [reduce using rule 4 (exp)] + '-' [reduce using rule 4 (exp)] + '*' [reduce using rule 4 (exp)] + '/' [reduce using rule 4 (exp)] + $default reduce using rule 4 (exp) +@end group +@end example + +@noindent +Observe that state 11 contains conflicts not only due to the lack of +precedence of @samp{/} with respect to @samp{+}, @samp{-}, and +@samp{*}, but also because the +associativity of @samp{/} is not specified. + + +@node Tracing +@section Tracing Your Parser +@findex yydebug +@cindex debugging +@cindex tracing the parser + +When a Bison grammar compiles properly but parses ``incorrectly'', the +@code{yydebug} parser-trace feature helps figuring out why. + +@menu +* Enabling Traces:: Activating run-time trace support +* Mfcalc Traces:: Extending @code{mfcalc} to support traces +* The YYPRINT Macro:: Obsolete interface for semantic value reports +@end menu + +@node Enabling Traces +@subsection Enabling Traces +There are several means to enable compilation of trace facilities: + +@table @asis +@item the macro @code{YYDEBUG} +@findex YYDEBUG +Define the macro @code{YYDEBUG} to a nonzero value when you compile the +parser. This is compliant with POSIX Yacc. You could use +@samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define +YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue, , The +Prologue}). + +@item the option @option{-t}, @option{--debug} +Use the @samp{-t} option when you run Bison (@pxref{Invocation, +,Invoking Bison}). This is POSIX compliant too. + +@item the directive @samp{%debug} +@findex %debug +Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison +Declaration Summary}). This is a Bison extension, which will prove +useful when Bison will output parsers for languages that don't use a +preprocessor. Unless POSIX and Yacc portability matter to +you, this is +the preferred solution. +@end table + +We suggest that you always enable the debug option so that debugging is +always possible. + +@findex YYFPRINTF +The trace facility outputs messages with macro calls of the form +@code{YYFPRINTF (stderr, @var{format}, @var{args})} where +@var{format} and @var{args} are the usual @code{printf} format and variadic +arguments. If you define @code{YYDEBUG} to a nonzero value but do not +define @code{YYFPRINTF}, @code{} is automatically included +and @code{YYFPRINTF} is defined to @code{fprintf}. + +Once you have compiled the program with trace facilities, the way to +request a trace is to store a nonzero value in the variable @code{yydebug}. +You can do this by making the C code do it (in @code{main}, perhaps), or +you can alter the value with a C debugger. + +Each step taken by the parser when @code{yydebug} is nonzero produces a +line or two of trace information, written on @code{stderr}. The trace +messages tell you these things: + +@itemize @bullet +@item +Each time the parser calls @code{yylex}, what kind of token was read. + +@item +Each time a token is shifted, the depth and complete contents of the +state stack (@pxref{Parser States}). + +@item +Each time a rule is reduced, which rule it is, and the complete contents +of the state stack afterward. +@end itemize + +To make sense of this information, it helps to refer to the automaton +description file (@pxref{Understanding, ,Understanding Your Parser}). +This file shows the meaning of each state in terms of +positions in various rules, and also what each state will do with each +possible input token. As you read the successive trace messages, you +can see that the parser is functioning according to its specification in +the listing file. Eventually you will arrive at the place where +something undesirable happens, and you will see which parts of the +grammar are to blame. + +The parser implementation file is a C/C++/Java program and you can use +debuggers on it, but it's not easy to interpret what it is doing. The +parser function is a finite-state machine interpreter, and aside from +the actions it executes the same code over and over. Only the values +of variables show where in the grammar it is working. + +@node Mfcalc Traces +@subsection Enabling Debug Traces for @code{mfcalc} + +The debugging information normally gives the token type of each token read, +but not its semantic value. The @code{%printer} directive allows specify +how semantic values are reported, see @ref{Printer Decl, , Printing +Semantic Values}. For backward compatibility, Yacc like C parsers may also +use the @code{YYPRINT} (@pxref{The YYPRINT Macro, , The @code{YYPRINT} +Macro}), but its use is discouraged. + +As a demonstration of @code{%printer}, consider the multi-function +calculator, @code{mfcalc} (@pxref{Multi-function Calc}). To enable run-time +traces, and semantic value reports, insert the following directives in its +prologue: + +@comment file: mfcalc.y: 2 +@example +/* Generate the parser description file. */ +%verbose +/* Enable run-time traces (yydebug). */ +%define parse.trace + +/* Formatting semantic values. */ +%printer @{ fprintf (yyoutput, "%s", $$->name); @} VAR; +%printer @{ fprintf (yyoutput, "%s()", $$->name); @} FNCT; +%printer @{ fprintf (yyoutput, "%g", $$); @} ; +@end example + +The @code{%define} directive instructs Bison to generate run-time trace +support. Then, activation of these traces is controlled at run-time by the +@code{yydebug} variable, which is disabled by default. Because these traces +will refer to the ``states'' of the parser, it is helpful to ask for the +creation of a description of that parser; this is the purpose of (admittedly +ill-named) @code{%verbose} directive. + +The set of @code{%printer} directives demonstrates how to format the +semantic value in the traces. Note that the specification can be done +either on the symbol type (e.g., @code{VAR} or @code{FNCT}), or on the type +tag: since @code{} is the type for both @code{NUM} and @code{exp}, this +printer will be used for them. + +Here is a sample of the information provided by run-time traces. The traces +are sent onto standard error. + +@example +$ @kbd{echo 'sin(1-1)' | ./mfcalc -p} +Starting parse +Entering state 0 +Reducing stack by rule 1 (line 34): +-> $$ = nterm input () +Stack now 0 +Entering state 1 +@end example + +@noindent +This first batch shows a specific feature of this grammar: the first rule +(which is in line 34 of @file{mfcalc.y} can be reduced without even having +to look for the first token. The resulting left-hand symbol (@code{$$}) is +a valueless (@samp{()}) @code{input} non terminal (@code{nterm}). + +Then the parser calls the scanner. +@example +Reading a token: Next token is token FNCT (sin()) +Shifting token FNCT (sin()) +Entering state 6 +@end example + +@noindent +That token (@code{token}) is a function (@code{FNCT}) whose value is +@samp{sin} as formatted per our @code{%printer} specification: @samp{sin()}. +The parser stores (@code{Shifting}) that token, and others, until it can do +something about it. + +@example +Reading a token: Next token is token '(' () +Shifting token '(' () +Entering state 14 +Reading a token: Next token is token NUM (1.000000) +Shifting token NUM (1.000000) +Entering state 4 +Reducing stack by rule 6 (line 44): + $1 = token NUM (1.000000) +-> $$ = nterm exp (1.000000) +Stack now 0 1 6 14 +Entering state 24 +@end example + +@noindent +The previous reduction demonstrates the @code{%printer} directive for +@code{}: both the token @code{NUM} and the resulting non-terminal +@code{exp} have @samp{1} as value. + +@example +Reading a token: Next token is token '-' () +Shifting token '-' () +Entering state 17 +Reading a token: Next token is token NUM (1.000000) +Shifting token NUM (1.000000) +Entering state 4 +Reducing stack by rule 6 (line 44): + $1 = token NUM (1.000000) +-> $$ = nterm exp (1.000000) +Stack now 0 1 6 14 24 17 +Entering state 26 +Reading a token: Next token is token ')' () +Reducing stack by rule 11 (line 49): + $1 = nterm exp (1.000000) + $2 = token '-' () + $3 = nterm exp (1.000000) +-> $$ = nterm exp (0.000000) +Stack now 0 1 6 14 +Entering state 24 +@end example + +@noindent +The rule for the subtraction was just reduced. The parser is about to +discover the end of the call to @code{sin}. + +@example +Next token is token ')' () +Shifting token ')' () +Entering state 31 +Reducing stack by rule 9 (line 47): + $1 = token FNCT (sin()) + $2 = token '(' () + $3 = nterm exp (0.000000) + $4 = token ')' () +-> $$ = nterm exp (0.000000) +Stack now 0 1 +Entering state 11 +@end example + +@noindent +Finally, the end-of-line allow the parser to complete the computation, and +display its result. + +@example +Reading a token: Next token is token '\n' () +Shifting token '\n' () +Entering state 22 +Reducing stack by rule 4 (line 40): + $1 = nterm exp (0.000000) + $2 = token '\n' () +@result{} 0 +-> $$ = nterm line () +Stack now 0 1 +Entering state 10 +Reducing stack by rule 2 (line 35): + $1 = nterm input () + $2 = nterm line () +-> $$ = nterm input () +Stack now 0 +Entering state 1 +@end example + +The parser has returned into state 1, in which it is waiting for the next +expression to evaluate, or for the end-of-file token, which causes the +completion of the parsing. + +@example +Reading a token: Now at end of input. +Shifting token $end () +Entering state 2 +Stack now 0 1 2 +Cleanup: popping token $end () +Cleanup: popping nterm input () +@end example + + +@node The YYPRINT Macro +@subsection The @code{YYPRINT} Macro + +@findex YYPRINT +Before @code{%printer} support, semantic values could be displayed using the +@code{YYPRINT} macro, which works only for terminal symbols and only with +the @file{yacc.c} skeleton. + +@deffn {Macro} YYPRINT (@var{stream}, @var{token}, @var{value}); +@findex YYPRINT +If you define @code{YYPRINT}, it should take three arguments. The parser +will pass a standard I/O stream, the numeric code for the token type, and +the token value (from @code{yylval}). + +For @file{yacc.c} only. Obsoleted by @code{%printer}. +@end deffn + +Here is an example of @code{YYPRINT} suitable for the multi-function +calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}): + +@example +%@{ + static void print_token_value (FILE *, int, YYSTYPE); + #define YYPRINT(File, Type, Value) \ + print_token_value (File, Type, Value) +%@} + +@dots{} %% @dots{} %% @dots{} + +static void +print_token_value (FILE *file, int type, YYSTYPE value) +@{ + if (type == VAR) + fprintf (file, "%s", value.tptr->name); + else if (type == NUM) + fprintf (file, "%d", value.val); +@} +@end example + +@c ================================================= Invoking Bison + +@node Invocation +@chapter Invoking Bison +@cindex invoking Bison +@cindex Bison invocation +@cindex options for invoking Bison + +The usual way to invoke Bison is as follows: + +@example +bison @var{infile} +@end example + +Here @var{infile} is the grammar file name, which usually ends in +@samp{.y}. The parser implementation file's name is made by replacing +the @samp{.y} with @samp{.tab.c} and removing any leading directory. +Thus, the @samp{bison foo.y} file name yields @file{foo.tab.c}, and +the @samp{bison hack/foo.y} file name yields @file{foo.tab.c}. It's +also possible, in case you are writing C++ code instead of C in your +grammar file, to name it @file{foo.ypp} or @file{foo.y++}. Then, the +output files will take an extension like the given one as input +(respectively @file{foo.tab.cpp} and @file{foo.tab.c++}). This +feature takes effect with all options that manipulate file names like +@samp{-o} or @samp{-d}. + +For example : + +@example +bison -d @var{infile.yxx} +@end example +@noindent +will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}, and + +@example +bison -d -o @var{output.c++} @var{infile.y} +@end example +@noindent +will produce @file{output.c++} and @file{outfile.h++}. + +For compatibility with POSIX, the standard Bison +distribution also contains a shell script called @command{yacc} that +invokes Bison with the @option{-y} option. + +@menu +* Bison Options:: All the options described in detail, + in alphabetical order by short options. +* Option Cross Key:: Alphabetical list of long options. +* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}. +@end menu + +@node Bison Options +@section Bison Options + +Bison supports both traditional single-letter options and mnemonic long +option names. Long option names are indicated with @samp{--} instead of +@samp{-}. Abbreviations for option names are allowed as long as they +are unique. When a long option takes an argument, like +@samp{--file-prefix}, connect the option name and the argument with +@samp{=}. + +Here is a list of options that can be used with Bison, alphabetized by +short option. It is followed by a cross key alphabetized by long +option. + +@c Please, keep this ordered as in `bison --help'. +@noindent +Operations modes: +@table @option +@item -h +@itemx --help +Print a summary of the command-line options to Bison and exit. + +@item -V +@itemx --version +Print the version number of Bison and exit. + +@item --print-localedir +Print the name of the directory containing locale-dependent data. + +@item --print-datadir +Print the name of the directory containing skeletons and XSLT. + +@item -y +@itemx --yacc +Act more like the traditional Yacc command. This can cause different +diagnostics to be generated, and may change behavior in other minor +ways. Most importantly, imitate Yacc's output file name conventions, +so that the parser implementation file is called @file{y.tab.c}, and +the other outputs are called @file{y.output} and @file{y.tab.h}. +Also, if generating a deterministic parser in C, generate +@code{#define} statements in addition to an @code{enum} to associate +token numbers with token names. Thus, the following shell script can +substitute for Yacc, and the Bison distribution contains such a script +for compatibility with POSIX: + +@example +#! /bin/sh +bison -y "$@@" +@end example + +The @option{-y}/@option{--yacc} option is intended for use with +traditional Yacc grammars. If your grammar uses a Bison extension +like @samp{%glr-parser}, Bison might not be Yacc-compatible even if +this option is specified. + +@item -W [@var{category}] +@itemx --warnings[=@var{category}] +Output warnings falling in @var{category}. @var{category} can be one +of: +@table @code +@item midrule-values +Warn about mid-rule values that are set but not used within any of the actions +of the parent rule. +For example, warn about unused @code{$2} in: + +@example +exp: '1' @{ $$ = 1; @} '+' exp @{ $$ = $1 + $4; @}; +@end example + +Also warn about mid-rule values that are used but not set. +For example, warn about unset @code{$$} in the mid-rule action in: + +@example +exp: '1' @{ $1 = 1; @} '+' exp @{ $$ = $2 + $4; @}; +@end example + +These warnings are not enabled by default since they sometimes prove to +be false alarms in existing grammars employing the Yacc constructs +@code{$0} or @code{$-@var{n}} (where @var{n} is some positive integer). + +@item yacc +Incompatibilities with POSIX Yacc. + +@item conflicts-sr +@itemx conflicts-rr +S/R and R/R conflicts. These warnings are enabled by default. However, if +the @code{%expect} or @code{%expect-rr} directive is specified, an +unexpected number of conflicts is an error, and an expected number of +conflicts is not reported, so @option{-W} and @option{--warning} then have +no effect on the conflict report. + +@item other +All warnings not categorized above. These warnings are enabled by default. + +This category is provided merely for the sake of completeness. Future +releases of Bison may move warnings from this category to new, more specific +categories. + +@item all +All the warnings. +@item none +Turn off all the warnings. +@item error +Treat warnings as errors. +@end table + +A category can be turned off by prefixing its name with @samp{no-}. For +instance, @option{-Wno-yacc} will hide the warnings about +POSIX Yacc incompatibilities. +@end table + +@noindent +Tuning the parser: + +@table @option +@item -t +@itemx --debug +In the parser implementation file, define the macro @code{YYDEBUG} to +1 if it is not already defined, so that the debugging facilities are +compiled. @xref{Tracing, ,Tracing Your Parser}. + +@item -D @var{name}[=@var{value}] +@itemx --define=@var{name}[=@var{value}] +@itemx -F @var{name}[=@var{value}] +@itemx --force-define=@var{name}[=@var{value}] +Each of these is equivalent to @samp{%define @var{name} "@var{value}"} +(@pxref{%define Summary}) except that Bison processes multiple +definitions for the same @var{name} as follows: + +@itemize +@item +Bison quietly ignores all command-line definitions for @var{name} except +the last. +@item +If that command-line definition is specified by a @code{-D} or +@code{--define}, Bison reports an error for any @code{%define} +definition for @var{name}. +@item +If that command-line definition is specified by a @code{-F} or +@code{--force-define} instead, Bison quietly ignores all @code{%define} +definitions for @var{name}. +@item +Otherwise, Bison reports an error if there are multiple @code{%define} +definitions for @var{name}. +@end itemize + +You should avoid using @code{-F} and @code{--force-define} in your +make files unless you are confident that it is safe to quietly ignore +any conflicting @code{%define} that may be added to the grammar file. + +@item -L @var{language} +@itemx --language=@var{language} +Specify the programming language for the generated parser, as if +@code{%language} was specified (@pxref{Decl Summary, , Bison Declaration +Summary}). Currently supported languages include C, C++, and Java. +@var{language} is case-insensitive. + +This option is experimental and its effect may be modified in future +releases. + +@item --locations +Pretend that @code{%locations} was specified. @xref{Decl Summary}. + +@item -p @var{prefix} +@itemx --name-prefix=@var{prefix} +Pretend that @code{%name-prefix "@var{prefix}"} was specified. +@xref{Decl Summary}. + +@item -l +@itemx --no-lines +Don't put any @code{#line} preprocessor commands in the parser +implementation file. Ordinarily Bison puts them in the parser +implementation file so that the C compiler and debuggers will +associate errors with your source file, the grammar file. This option +causes them to associate errors with the parser implementation file, +treating it as an independent source file in its own right. + +@item -S @var{file} +@itemx --skeleton=@var{file} +Specify the skeleton to use, similar to @code{%skeleton} +(@pxref{Decl Summary, , Bison Declaration Summary}). + +@c You probably don't need this option unless you are developing Bison. +@c You should use @option{--language} if you want to specify the skeleton for a +@c different language, because it is clearer and because it will always +@c choose the correct skeleton for non-deterministic or push parsers. + +If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton +file in the Bison installation directory. +If it does, @var{file} is an absolute file name or a file name relative to the +current working directory. +This is similar to how most shells resolve commands. + +@item -k +@itemx --token-table +Pretend that @code{%token-table} was specified. @xref{Decl Summary}. +@end table + +@noindent +Adjust the output: + +@table @option +@item --defines[=@var{file}] +Pretend that @code{%defines} was specified, i.e., write an extra output +file containing macro definitions for the token type names defined in +the grammar, as well as a few other declarations. @xref{Decl Summary}. + +@item -d +This is the same as @code{--defines} except @code{-d} does not accept a +@var{file} argument since POSIX Yacc requires that @code{-d} can be bundled +with other short options. + +@item -b @var{file-prefix} +@itemx --file-prefix=@var{prefix} +Pretend that @code{%file-prefix} was specified, i.e., specify prefix to use +for all Bison output file names. @xref{Decl Summary}. + +@item -r @var{things} +@itemx --report=@var{things} +Write an extra output file containing verbose description of the comma +separated list of @var{things} among: + +@table @code +@item state +Description of the grammar, conflicts (resolved and unresolved), and +parser's automaton. + +@item lookahead +Implies @code{state} and augments the description of the automaton with +each rule's lookahead set. + +@item itemset +Implies @code{state} and augments the description of the automaton with +the full set of items for each state, instead of its core only. +@end table + +@item --report-file=@var{file} +Specify the @var{file} for the verbose description. + +@item -v +@itemx --verbose +Pretend that @code{%verbose} was specified, i.e., write an extra output +file containing verbose descriptions of the grammar and +parser. @xref{Decl Summary}. + +@item -o @var{file} +@itemx --output=@var{file} +Specify the @var{file} for the parser implementation file. + +The other output files' names are constructed from @var{file} as +described under the @samp{-v} and @samp{-d} options. + +@item -g [@var{file}] +@itemx --graph[=@var{file}] +Output a graphical representation of the parser's +automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz} +@uref{http://www.graphviz.org/doc/info/lang.html, DOT} format. +@code{@var{file}} is optional. +If omitted and the grammar file is @file{foo.y}, the output file will be +@file{foo.dot}. + +@item -x [@var{file}] +@itemx --xml[=@var{file}] +Output an XML report of the parser's automaton computed by Bison. +@code{@var{file}} is optional. +If omitted and the grammar file is @file{foo.y}, the output file will be +@file{foo.xml}. +(The current XML schema is experimental and may evolve. +More user feedback will help to stabilize it.) +@end table + +@node Option Cross Key +@section Option Cross Key + +Here is a list of options, alphabetized by long option, to help you find +the corresponding short option and directive. + +@multitable {@option{--force-define=@var{name}[=@var{value}]}} {@option{-F @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}} +@headitem Long Option @tab Short Option @tab Bison Directive +@include cross-options.texi +@end multitable + +@node Yacc Library +@section Yacc Library + +The Yacc library contains default implementations of the +@code{yyerror} and @code{main} functions. These default +implementations are normally not useful, but POSIX requires +them. To use the Yacc library, link your program with the +@option{-ly} option. Note that Bison's implementation of the Yacc +library is distributed under the terms of the GNU General +Public License (@pxref{Copying}). + +If you use the Yacc library's @code{yyerror} function, you should +declare @code{yyerror} as follows: + +@example +int yyerror (char const *); +@end example + +Bison ignores the @code{int} value returned by this @code{yyerror}. +If you use the Yacc library's @code{main} function, your +@code{yyparse} function should have the following type signature: + +@example +int yyparse (void); +@end example + +@c ================================================= C++ Bison + +@node Other Languages +@chapter Parsers Written In Other Languages + +@menu +* C++ Parsers:: The interface to generate C++ parser classes +* Java Parsers:: The interface to generate Java parser classes +@end menu + +@node C++ Parsers +@section C++ Parsers + +@menu +* C++ Bison Interface:: Asking for C++ parser generation +* C++ Semantic Values:: %union vs. C++ +* C++ Location Values:: The position and location classes +* C++ Parser Interface:: Instantiating and running the parser +* C++ Scanner Interface:: Exchanges between yylex and parse +* A Complete C++ Example:: Demonstrating their use +@end menu + +@node C++ Bison Interface +@subsection C++ Bison Interface +@c - %skeleton "lalr1.cc" +@c - Always pure +@c - initial action + +The C++ deterministic parser is selected using the skeleton directive, +@samp{%skeleton "lalr1.cc"}, or the synonymous command-line option +@option{--skeleton=lalr1.cc}. +@xref{Decl Summary}. + +When run, @command{bison} will create several entities in the @samp{yy} +namespace. +@findex %define namespace +Use the @samp{%define namespace} directive to change the namespace +name, see @ref{%define Summary,,namespace}. The various classes are +generated in the following files: + +@table @file +@item position.hh +@itemx location.hh +The definition of the classes @code{position} and @code{location}, +used for location tracking. @xref{C++ Location Values}. + +@item stack.hh +An auxiliary class @code{stack} used by the parser. + +@item @var{file}.hh +@itemx @var{file}.cc +(Assuming the extension of the grammar file was @samp{.yy}.) The +declaration and implementation of the C++ parser class. The basename +and extension of these two files follow the same rules as with regular C +parsers (@pxref{Invocation}). + +The header is @emph{mandatory}; you must either pass +@option{-d}/@option{--defines} to @command{bison}, or use the +@samp{%defines} directive. +@end table + +All these files are documented using Doxygen; run @command{doxygen} +for a complete and accurate documentation. + +@node C++ Semantic Values +@subsection C++ Semantic Values +@c - No objects in unions +@c - YYSTYPE +@c - Printer and destructor + +The @code{%union} directive works as for C, see @ref{Union Decl, ,The +Collection of Value Types}. In particular it produces a genuine +@code{union}@footnote{In the future techniques to allow complex types +within pseudo-unions (similar to Boost variants) might be implemented to +alleviate these issues.}, which have a few specific features in C++. +@itemize @minus +@item +The type @code{YYSTYPE} is defined but its use is discouraged: rather +you should refer to the parser's encapsulated type +@code{yy::parser::semantic_type}. +@item +Non POD (Plain Old Data) types cannot be used. C++ forbids any +instance of classes with constructors in unions: only @emph{pointers} +to such objects are allowed. +@end itemize + +Because objects have to be stored via pointers, memory is not +reclaimed automatically: using the @code{%destructor} directive is the +only means to avoid leaks. @xref{Destructor Decl, , Freeing Discarded +Symbols}. + + +@node C++ Location Values +@subsection C++ Location Values +@c - %locations +@c - class Position +@c - class Location +@c - %define filename_type "const symbol::Symbol" + +When the directive @code{%locations} is used, the C++ parser supports +location tracking, see @ref{Tracking Locations}. Two auxiliary classes +define a @code{position}, a single point in a file, and a @code{location}, a +range composed of a pair of @code{position}s (possibly spanning several +files). + +@tindex uint +In this section @code{uint} is an abbreviation for @code{unsigned int}: in +genuine code only the latter is used. + +@menu +* C++ position:: One point in the source file +* C++ location:: Two points in the source file +@end menu + +@node C++ position +@subsubsection C++ @code{position} + +@deftypeop {Constructor} {position} {} position (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) +Create a @code{position} denoting a given point. Note that @code{file} is +not reclaimed when the @code{position} is destroyed: memory managed must be +handled elsewhere. +@end deftypeop + +@deftypemethod {position} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) +Reset the position to the given values. +@end deftypemethod + +@deftypeivar {position} {std::string*} file +The name of the file. It will always be handled as a pointer, the +parser will never duplicate nor deallocate it. As an experimental +feature you may change it to @samp{@var{type}*} using @samp{%define +filename_type "@var{type}"}. +@end deftypeivar + +@deftypeivar {position} {uint} line +The line, starting at 1. +@end deftypeivar + +@deftypemethod {position} {uint} lines (int @var{height} = 1) +Advance by @var{height} lines, resetting the column number. +@end deftypemethod + +@deftypeivar {position} {uint} column +The column, starting at 1. +@end deftypeivar + +@deftypemethod {position} {uint} columns (int @var{width} = 1) +Advance by @var{width} columns, without changing the line number. +@end deftypemethod + +@deftypemethod {position} {position&} operator+= (int @var{width}) +@deftypemethodx {position} {position} operator+ (int @var{width}) +@deftypemethodx {position} {position&} operator-= (int @var{width}) +@deftypemethodx {position} {position} operator- (int @var{width}) +Various forms of syntactic sugar for @code{columns}. +@end deftypemethod + +@deftypemethod {position} {bool} operator== (const position& @var{that}) +@deftypemethodx {position} {bool} operator!= (const position& @var{that}) +Whether @code{*this} and @code{that} denote equal/different positions. +@end deftypemethod + +@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const position& @var{p}) +Report @var{p} on @var{o} like this: +@samp{@var{file}:@var{line}.@var{column}}, or +@samp{@var{line}.@var{column}} if @var{file} is null. +@end deftypefun + +@node C++ location +@subsubsection C++ @code{location} + +@deftypeop {Constructor} {location} {} location (const position& @var{begin}, const position& @var{end}) +Create a @code{Location} from the endpoints of the range. +@end deftypeop + +@deftypeop {Constructor} {location} {} location (const position& @var{pos} = position()) +@deftypeopx {Constructor} {location} {} location (std::string* @var{file}, uint @var{line}, uint @var{col}) +Create a @code{Location} denoting an empty range located at a given point. +@end deftypeop + +@deftypemethod {location} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) +Reset the location to an empty range at the given values. +@end deftypemethod + +@deftypeivar {location} {position} begin +@deftypeivarx {location} {position} end +The first, inclusive, position of the range, and the first beyond. +@end deftypeivar + +@deftypemethod {location} {uint} columns (int @var{width} = 1) +@deftypemethodx {location} {uint} lines (int @var{height} = 1) +Advance the @code{end} position. +@end deftypemethod + +@deftypemethod {location} {location} operator+ (const location& @var{end}) +@deftypemethodx {location} {location} operator+ (int @var{width}) +@deftypemethodx {location} {location} operator+= (int @var{width}) +Various forms of syntactic sugar. +@end deftypemethod + +@deftypemethod {location} {void} step () +Move @code{begin} onto @code{end}. +@end deftypemethod + +@deftypemethod {location} {bool} operator== (const location& @var{that}) +@deftypemethodx {location} {bool} operator!= (const location& @var{that}) +Whether @code{*this} and @code{that} denote equal/different ranges of +positions. +@end deftypemethod + +@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const location& @var{p}) +Report @var{p} on @var{o}, taking care of special cases such as: no +@code{filename} defined, or equal filename/line or column. +@end deftypefun + +@node C++ Parser Interface +@subsection C++ Parser Interface +@c - define parser_class_name +@c - Ctor +@c - parse, error, set_debug_level, debug_level, set_debug_stream, +@c debug_stream. +@c - Reporting errors + +The output files @file{@var{output}.hh} and @file{@var{output}.cc} +declare and define the parser class in the namespace @code{yy}. The +class name defaults to @code{parser}, but may be changed using +@samp{%define parser_class_name "@var{name}"}. The interface of +this class is detailed below. It can be extended using the +@code{%parse-param} feature: its semantics is slightly changed since +it describes an additional member of the parser class, and an +additional argument for its constructor. + +@defcv {Type} {parser} {semantic_type} +@defcvx {Type} {parser} {location_type} +The types for semantics value and locations. +@end defcv + +@defcv {Type} {parser} {token} +A structure that contains (only) the @code{yytokentype} enumeration, which +defines the tokens. To refer to the token @code{FOO}, +use @code{yy::parser::token::FOO}. The scanner can use +@samp{typedef yy::parser::token token;} to ``import'' the token enumeration +(@pxref{Calc++ Scanner}). +@end defcv + +@deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...) +Build a new parser object. There are no arguments by default, unless +@samp{%parse-param @{@var{type1} @var{arg1}@}} was used. +@end deftypemethod + +@deftypemethod {parser} {int} parse () +Run the syntactic analysis, and return 0 on success, 1 otherwise. +@end deftypemethod + +@deftypemethod {parser} {std::ostream&} debug_stream () +@deftypemethodx {parser} {void} set_debug_stream (std::ostream& @var{o}) +Get or set the stream used for tracing the parsing. It defaults to +@code{std::cerr}. +@end deftypemethod + +@deftypemethod {parser} {debug_level_type} debug_level () +@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l}) +Get or set the tracing level. Currently its value is either 0, no trace, +or nonzero, full tracing. +@end deftypemethod + +@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m}) +The definition for this member function must be supplied by the user: +the parser uses it to report a parser error occurring at @var{l}, +described by @var{m}. +@end deftypemethod + + +@node C++ Scanner Interface +@subsection C++ Scanner Interface +@c - prefix for yylex. +@c - Pure interface to yylex +@c - %lex-param + +The parser invokes the scanner by calling @code{yylex}. Contrary to C +parsers, C++ parsers are always pure: there is no point in using the +@code{%define api.pure} directive. Therefore the interface is as follows. + +@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...) +Return the next token. Its type is the return value, its semantic +value and location being @var{yylval} and @var{yylloc}. Invocations of +@samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments. +@end deftypemethod + + +@node A Complete C++ Example +@subsection A Complete C++ Example + +This section demonstrates the use of a C++ parser with a simple but +complete example. This example should be available on your system, +ready to compile, in the directory @dfn{../bison/examples/calc++}. It +focuses on the use of Bison, therefore the design of the various C++ +classes is very naive: no accessors, no encapsulation of members etc. +We will use a Lex scanner, and more precisely, a Flex scanner, to +demonstrate the various interaction. A hand written scanner is +actually easier to interface with. + +@menu +* Calc++ --- C++ Calculator:: The specifications +* Calc++ Parsing Driver:: An active parsing context +* Calc++ Parser:: A parser class +* Calc++ Scanner:: A pure C++ Flex scanner +* Calc++ Top Level:: Conducting the band +@end menu + +@node Calc++ --- C++ Calculator +@subsubsection Calc++ --- C++ Calculator + +Of course the grammar is dedicated to arithmetics, a single +expression, possibly preceded by variable assignments. An +environment containing possibly predefined variables such as +@code{one} and @code{two}, is exchanged with the parser. An example +of valid input follows. + +@example +three := 3 +seven := one + two * three +seven * seven +@end example + +@node Calc++ Parsing Driver +@subsubsection Calc++ Parsing Driver +@c - An env +@c - A place to store error messages +@c - A place for the result + +To support a pure interface with the parser (and the scanner) the +technique of the ``parsing context'' is convenient: a structure +containing all the data to exchange. Since, in addition to simply +launch the parsing, there are several auxiliary tasks to execute (open +the file for parsing, instantiate the parser etc.), we recommend +transforming the simple parsing context structure into a fully blown +@dfn{parsing driver} class. + +The declaration of this driver class, @file{calc++-driver.hh}, is as +follows. The first part includes the CPP guard and imports the +required standard library components, and the declaration of the parser +class. + +@comment file: calc++-driver.hh +@example +#ifndef CALCXX_DRIVER_HH +# define CALCXX_DRIVER_HH +# include +# include +# include "calc++-parser.hh" +@end example + + +@noindent +Then comes the declaration of the scanning function. Flex expects +the signature of @code{yylex} to be defined in the macro +@code{YY_DECL}, and the C++ parser expects it to be declared. We can +factor both as follows. + +@comment file: calc++-driver.hh +@example +// Tell Flex the lexer's prototype ... +# define YY_DECL \ + yy::calcxx_parser::token_type \ + yylex (yy::calcxx_parser::semantic_type* yylval, \ + yy::calcxx_parser::location_type* yylloc, \ + calcxx_driver& driver) +// ... and declare it for the parser's sake. +YY_DECL; +@end example + +@noindent +The @code{calcxx_driver} class is then declared with its most obvious +members. + +@comment file: calc++-driver.hh +@example +// Conducting the whole scanning and parsing of Calc++. +class calcxx_driver +@{ +public: + calcxx_driver (); + virtual ~calcxx_driver (); + + std::map variables; + + int result; +@end example + +@noindent +To encapsulate the coordination with the Flex scanner, it is useful to +have two members function to open and close the scanning phase. + +@comment file: calc++-driver.hh +@example + // Handling the scanner. + void scan_begin (); + void scan_end (); + bool trace_scanning; +@end example + +@noindent +Similarly for the parser itself. + +@comment file: calc++-driver.hh +@example + // Run the parser. Return 0 on success. + int parse (const std::string& f); + std::string file; + bool trace_parsing; +@end example + +@noindent +To demonstrate pure handling of parse errors, instead of simply +dumping them on the standard error output, we will pass them to the +compiler driver using the following two member functions. Finally, we +close the class declaration and CPP guard. + +@comment file: calc++-driver.hh +@example + // Error handling. + void error (const yy::location& l, const std::string& m); + void error (const std::string& m); +@}; +#endif // ! CALCXX_DRIVER_HH +@end example + +The implementation of the driver is straightforward. The @code{parse} +member function deserves some attention. The @code{error} functions +are simple stubs, they should actually register the located error +messages and set error state. + +@comment file: calc++-driver.cc +@example +#include "calc++-driver.hh" +#include "calc++-parser.hh" + +calcxx_driver::calcxx_driver () + : trace_scanning (false), trace_parsing (false) +@{ + variables["one"] = 1; + variables["two"] = 2; +@} + +calcxx_driver::~calcxx_driver () +@{ +@} + +int +calcxx_driver::parse (const std::string &f) +@{ + file = f; + scan_begin (); + yy::calcxx_parser parser (*this); + parser.set_debug_level (trace_parsing); + int res = parser.parse (); + scan_end (); + return res; +@} + +void +calcxx_driver::error (const yy::location& l, const std::string& m) +@{ + std::cerr << l << ": " << m << std::endl; +@} + +void +calcxx_driver::error (const std::string& m) +@{ + std::cerr << m << std::endl; +@} +@end example + +@node Calc++ Parser +@subsubsection Calc++ Parser + +The grammar file @file{calc++-parser.yy} starts by asking for the C++ +deterministic parser skeleton, the creation of the parser header file, +and specifies the name of the parser class. Because the C++ skeleton +changed several times, it is safer to require the version you designed +the grammar for. + +@comment file: calc++-parser.yy +@example +%skeleton "lalr1.cc" /* -*- C++ -*- */ +%require "@value{VERSION}" +%defines +%define parser_class_name "calcxx_parser" +@end example + +@noindent +@findex %code requires +Then come the declarations/inclusions needed to define the +@code{%union}. Because the parser uses the parsing driver and +reciprocally, both cannot include the header of the other. Because the +driver's header needs detailed knowledge about the parser class (in +particular its inner types), it is the parser's header which will simply +use a forward declaration of the driver. +@xref{%code Summary}. + +@comment file: calc++-parser.yy +@example +%code requires @{ +# include +class calcxx_driver; +@} +@end example + +@noindent +The driver is passed by reference to the parser and to the scanner. +This provides a simple but effective pure interface, not relying on +global variables. + +@comment file: calc++-parser.yy +@example +// The parsing context. +%parse-param @{ calcxx_driver& driver @} +%lex-param @{ calcxx_driver& driver @} +@end example + +@noindent +Then we request the location tracking feature, and initialize the +first location's file name. Afterward new locations are computed +relatively to the previous locations: the file name will be +automatically propagated. + +@comment file: calc++-parser.yy +@example +%locations +%initial-action +@{ + // Initialize the initial location. + @@$.begin.filename = @@$.end.filename = &driver.file; +@}; +@end example + +@noindent +Use the two following directives to enable parser tracing and verbose error +messages. However, verbose error messages can contain incorrect information +(@pxref{LAC}). + +@comment file: calc++-parser.yy +@example +%debug +%error-verbose +@end example + +@noindent +Semantic values cannot use ``real'' objects, but only pointers to +them. + +@comment file: calc++-parser.yy +@example +// Symbols. +%union +@{ + int ival; + std::string *sval; +@}; +@end example + +@noindent +@findex %code +The code between @samp{%code @{} and @samp{@}} is output in the +@file{*.cc} file; it needs detailed knowledge about the driver. + +@comment file: calc++-parser.yy +@example +%code @{ +# include "calc++-driver.hh" +@} +@end example + + +@noindent +The token numbered as 0 corresponds to end of file; the following line +allows for nicer error messages referring to ``end of file'' instead +of ``$end''. Similarly user friendly named are provided for each +symbol. Note that the tokens names are prefixed by @code{TOKEN_} to +avoid name clashes. + +@comment file: calc++-parser.yy +@example +%token END 0 "end of file" +%token ASSIGN ":=" +%token IDENTIFIER "identifier" +%token NUMBER "number" +%type exp +@end example + +@noindent +To enable memory deallocation during error recovery, use +@code{%destructor}. + +@c FIXME: Document %printer, and mention that it takes a braced-code operand. +@comment file: calc++-parser.yy +@example +%printer @{ yyoutput << *$$; @} "identifier" +%destructor @{ delete $$; @} "identifier" + +%printer @{ yyoutput << $$; @} +@end example + +@noindent +The grammar itself is straightforward. + +@comment file: calc++-parser.yy +@example +%% +%start unit; +unit: assignments exp @{ driver.result = $2; @}; + +assignments: + /* Nothing. */ @{@} +| assignments assignment @{@}; + +assignment: + "identifier" ":=" exp + @{ driver.variables[*$1] = $3; delete $1; @}; + +%left '+' '-'; +%left '*' '/'; +exp: exp '+' exp @{ $$ = $1 + $3; @} + | exp '-' exp @{ $$ = $1 - $3; @} + | exp '*' exp @{ $$ = $1 * $3; @} + | exp '/' exp @{ $$ = $1 / $3; @} + | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} + | "number" @{ $$ = $1; @}; +%% +@end example + +@noindent +Finally the @code{error} member function registers the errors to the +driver. + +@comment file: calc++-parser.yy +@example +void +yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l, + const std::string& m) +@{ + driver.error (l, m); +@} +@end example + +@node Calc++ Scanner +@subsubsection Calc++ Scanner + +The Flex scanner first includes the driver declaration, then the +parser's to get the set of defined tokens. + +@comment file: calc++-scanner.ll +@example +%@{ /* -*- C++ -*- */ +# include +# include +# include +# include +# include "calc++-driver.hh" +# include "calc++-parser.hh" + +/* Work around an incompatibility in flex (at least versions + 2.5.31 through 2.5.33): it generates code that does + not conform to C89. See Debian bug 333231 + . */ +# undef yywrap +# define yywrap() 1 + +/* By default yylex returns int, we use token_type. + Unfortunately yyterminate by default returns 0, which is + not of token_type. */ +#define yyterminate() return token::END +%@} +@end example + +@noindent +Because there is no @code{#include}-like feature we don't need +@code{yywrap}, we don't need @code{unput} either, and we parse an +actual file, this is not an interactive session with the user. +Finally we enable the scanner tracing features. + +@comment file: calc++-scanner.ll +@example +%option noyywrap nounput batch debug +@end example + +@noindent +Abbreviations allow for more readable rules. + +@comment file: calc++-scanner.ll +@example +id [a-zA-Z][a-zA-Z_0-9]* +int [0-9]+ +blank [ \t] +@end example + +@noindent +The following paragraph suffices to track locations accurately. Each +time @code{yylex} is invoked, the begin position is moved onto the end +position. Then when a pattern is matched, the end position is +advanced of its width. In case it matched ends of lines, the end +cursor is adjusted, and each time blanks are matched, the begin cursor +is moved onto the end cursor to effectively ignore the blanks +preceding tokens. Comments would be treated equally. + +@comment file: calc++-scanner.ll +@example +@group +%@{ +# define YY_USER_ACTION yylloc->columns (yyleng); +%@} +@end group +%% +%@{ + yylloc->step (); +%@} +@{blank@}+ yylloc->step (); +[\n]+ yylloc->lines (yyleng); yylloc->step (); +@end example + +@noindent +The rules are simple, just note the use of the driver to report errors. +It is convenient to use a typedef to shorten +@code{yy::calcxx_parser::token::identifier} into +@code{token::identifier} for instance. + +@comment file: calc++-scanner.ll +@example +%@{ + typedef yy::calcxx_parser::token token; +%@} + /* Convert ints to the actual type of tokens. */ +[-+*/] return yy::calcxx_parser::token_type (yytext[0]); +":=" return token::ASSIGN; +@{int@} @{ + errno = 0; + long n = strtol (yytext, NULL, 10); + if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) + driver.error (*yylloc, "integer is out of range"); + yylval->ival = n; + return token::NUMBER; +@} +@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; +. driver.error (*yylloc, "invalid character"); +%% +@end example + +@noindent +Finally, because the scanner related driver's member function depend +on the scanner's data, it is simpler to implement them in this file. + +@comment file: calc++-scanner.ll +@example +@group +void +calcxx_driver::scan_begin () +@{ + yy_flex_debug = trace_scanning; + if (file.empty () || file == "-") + yyin = stdin; + else if (!(yyin = fopen (file.c_str (), "r"))) + @{ + error ("cannot open " + file + ": " + strerror(errno)); + exit (EXIT_FAILURE); + @} +@} +@end group + +@group +void +calcxx_driver::scan_end () +@{ + fclose (yyin); +@} +@end group +@end example + +@node Calc++ Top Level +@subsubsection Calc++ Top Level + +The top level file, @file{calc++.cc}, poses no problem. + +@comment file: calc++.cc +@example +#include +#include "calc++-driver.hh" + +@group +int +main (int argc, char *argv[]) +@{ + calcxx_driver driver; + for (int i = 1; i < argc; ++i) + if (argv[i] == std::string ("-p")) + driver.trace_parsing = true; + else if (argv[i] == std::string ("-s")) + driver.trace_scanning = true; + else if (!driver.parse (argv[i])) + std::cout << driver.result << std::endl; +@} +@end group +@end example + +@node Java Parsers +@section Java Parsers + +@menu +* Java Bison Interface:: Asking for Java parser generation +* Java Semantic Values:: %type and %token vs. Java +* Java Location Values:: The position and location classes +* Java Parser Interface:: Instantiating and running the parser +* Java Scanner Interface:: Specifying the scanner for the parser +* Java Action Features:: Special features for use in actions +* Java Differences:: Differences between C/C++ and Java Grammars +* Java Declarations Summary:: List of Bison declarations used with Java +@end menu + +@node Java Bison Interface +@subsection Java Bison Interface +@c - %language "Java" + +(The current Java interface is experimental and may evolve. +More user feedback will help to stabilize it.) + +The Java parser skeletons are selected using the @code{%language "Java"} +directive or the @option{-L java}/@option{--language=java} option. + +@c FIXME: Documented bug. +When generating a Java parser, @code{bison @var{basename}.y} will +create a single Java source file named @file{@var{basename}.java} +containing the parser implementation. Using a grammar file without a +@file{.y} suffix is currently broken. The basename of the parser +implementation file can be changed by the @code{%file-prefix} +directive or the @option{-p}/@option{--name-prefix} option. The +entire parser implementation file name can be changed by the +@code{%output} directive or the @option{-o}/@option{--output} option. +The parser implementation file contains a single class for the parser. + +You can create documentation for generated parsers using Javadoc. + +Contrary to C parsers, Java parsers do not use global variables; the +state of the parser is always local to an instance of the parser class. +Therefore, all Java parsers are ``pure'', and the @code{%pure-parser} +and @code{%define api.pure} directives does not do anything when used in +Java. + +Push parsers are currently unsupported in Java and @code{%define +api.push-pull} have no effect. + +GLR parsers are currently unsupported in Java. Do not use the +@code{glr-parser} directive. + +No header file can be generated for Java parsers. Do not use the +@code{%defines} directive or the @option{-d}/@option{--defines} options. + +@c FIXME: Possible code change. +Currently, support for debugging and verbose errors are always compiled +in. Thus the @code{%debug} and @code{%token-table} directives and the +@option{-t}/@option{--debug} and @option{-k}/@option{--token-table} +options have no effect. This may change in the future to eliminate +unused code in the generated parser, so use @code{%debug} and +@code{%verbose-error} explicitly if needed. Also, in the future the +@code{%token-table} directive might enable a public interface to +access the token names and codes. + +@node Java Semantic Values +@subsection Java Semantic Values +@c - No %union, specify type in %type/%token. +@c - YYSTYPE +@c - Printer and destructor + +There is no @code{%union} directive in Java parsers. Instead, the +semantic values' types (class names) should be specified in the +@code{%type} or @code{%token} directive: + +@example +%type expr assignment_expr term factor +%type number +@end example + +By default, the semantic stack is declared to have @code{Object} members, +which means that the class types you specify can be of any class. +To improve the type safety of the parser, you can declare the common +superclass of all the semantic values using the @code{%define stype} +directive. For example, after the following declaration: + +@example +%define stype "ASTNode" +@end example + +@noindent +any @code{%type} or @code{%token} specifying a semantic type which +is not a subclass of ASTNode, will cause a compile-time error. + +@c FIXME: Documented bug. +Types used in the directives may be qualified with a package name. +Primitive data types are accepted for Java version 1.5 or later. Note +that in this case the autoboxing feature of Java 1.5 will be used. +Generic types may not be used; this is due to a limitation in the +implementation of Bison, and may change in future releases. + +Java parsers do not support @code{%destructor}, since the language +adopts garbage collection. The parser will try to hold references +to semantic values for as little time as needed. + +Java parsers do not support @code{%printer}, as @code{toString()} +can be used to print the semantic values. This however may change +(in a backwards-compatible way) in future versions of Bison. + + +@node Java Location Values +@subsection Java Location Values +@c - %locations +@c - class Position +@c - class Location + +When the directive @code{%locations} is used, the Java parser supports +location tracking, see @ref{Tracking Locations}. An auxiliary user-defined +class defines a @dfn{position}, a single point in a file; Bison itself +defines a class representing a @dfn{location}, a range composed of a pair of +positions (possibly spanning several files). The location class is an inner +class of the parser; the name is @code{Location} by default, and may also be +renamed using @code{%define location_type "@var{class-name}"}. + +The location class treats the position as a completely opaque value. +By default, the class name is @code{Position}, but this can be changed +with @code{%define position_type "@var{class-name}"}. This class must +be supplied by the user. + + +@deftypeivar {Location} {Position} begin +@deftypeivarx {Location} {Position} end +The first, inclusive, position of the range, and the first beyond. +@end deftypeivar + +@deftypeop {Constructor} {Location} {} Location (Position @var{loc}) +Create a @code{Location} denoting an empty range located at a given point. +@end deftypeop + +@deftypeop {Constructor} {Location} {} Location (Position @var{begin}, Position @var{end}) +Create a @code{Location} from the endpoints of the range. +@end deftypeop + +@deftypemethod {Location} {String} toString () +Prints the range represented by the location. For this to work +properly, the position class should override the @code{equals} and +@code{toString} methods appropriately. +@end deftypemethod + + +@node Java Parser Interface +@subsection Java Parser Interface +@c - define parser_class_name +@c - Ctor +@c - parse, error, set_debug_level, debug_level, set_debug_stream, +@c debug_stream. +@c - Reporting errors + +The name of the generated parser class defaults to @code{YYParser}. The +@code{YY} prefix may be changed using the @code{%name-prefix} directive +or the @option{-p}/@option{--name-prefix} option. Alternatively, use +@code{%define parser_class_name "@var{name}"} to give a custom name to +the class. The interface of this class is detailed below. + +By default, the parser class has package visibility. A declaration +@code{%define public} will change to public visibility. Remember that, +according to the Java language specification, the name of the @file{.java} +file should match the name of the class in this case. Similarly, you can +use @code{abstract}, @code{final} and @code{strictfp} with the +@code{%define} declaration to add other modifiers to the parser class. + +The Java package name of the parser class can be specified using the +@code{%define package} directive. The superclass and the implemented +interfaces of the parser class can be specified with the @code{%define +extends} and @code{%define implements} directives. + +The parser class defines an inner class, @code{Location}, that is used +for location tracking (see @ref{Java Location Values}), and a inner +interface, @code{Lexer} (see @ref{Java Scanner Interface}). Other than +these inner class/interface, and the members described in the interface +below, all the other members and fields are preceded with a @code{yy} or +@code{YY} prefix to avoid clashes with user code. + +@c FIXME: The following constants and variables are still undocumented: +@c @code{bisonVersion}, @code{bisonSkeleton} and @code{errorVerbose}. + +The parser class can be extended using the @code{%parse-param} +directive. Each occurrence of the directive will add a @code{protected +final} field to the parser class, and an argument to its constructor, +which initialize them automatically. + +Token names defined by @code{%token} and the predefined @code{EOF} token +name are added as constant fields to the parser class. + +@deftypeop {Constructor} {YYParser} {} YYParser (@var{lex_param}, @dots{}, @var{parse_param}, @dots{}) +Build a new parser object with embedded @code{%code lexer}. There are +no parameters, unless @code{%parse-param}s and/or @code{%lex-param}s are +used. +@end deftypeop + +@deftypeop {Constructor} {YYParser} {} YYParser (Lexer @var{lexer}, @var{parse_param}, @dots{}) +Build a new parser object using the specified scanner. There are no +additional parameters unless @code{%parse-param}s are used. + +If the scanner is defined by @code{%code lexer}, this constructor is +declared @code{protected} and is called automatically with a scanner +created with the correct @code{%lex-param}s. +@end deftypeop + +@deftypemethod {YYParser} {boolean} parse () +Run the syntactic analysis, and return @code{true} on success, +@code{false} otherwise. +@end deftypemethod + +@deftypemethod {YYParser} {boolean} recovering () +During the syntactic analysis, return @code{true} if recovering +from a syntax error. +@xref{Error Recovery}. +@end deftypemethod + +@deftypemethod {YYParser} {java.io.PrintStream} getDebugStream () +@deftypemethodx {YYParser} {void} setDebugStream (java.io.printStream @var{o}) +Get or set the stream used for tracing the parsing. It defaults to +@code{System.err}. +@end deftypemethod + +@deftypemethod {YYParser} {int} getDebugLevel () +@deftypemethodx {YYParser} {void} setDebugLevel (int @var{l}) +Get or set the tracing level. Currently its value is either 0, no trace, +or nonzero, full tracing. +@end deftypemethod + + +@node Java Scanner Interface +@subsection Java Scanner Interface +@c - %code lexer +@c - %lex-param +@c - Lexer interface + +There are two possible ways to interface a Bison-generated Java parser +with a scanner: the scanner may be defined by @code{%code lexer}, or +defined elsewhere. In either case, the scanner has to implement the +@code{Lexer} inner interface of the parser class. + +In the first case, the body of the scanner class is placed in +@code{%code lexer} blocks. If you want to pass parameters from the +parser constructor to the scanner constructor, specify them with +@code{%lex-param}; they are passed before @code{%parse-param}s to the +constructor. + +In the second case, the scanner has to implement the @code{Lexer} interface, +which is defined within the parser class (e.g., @code{YYParser.Lexer}). +The constructor of the parser object will then accept an object +implementing the interface; @code{%lex-param} is not used in this +case. + +In both cases, the scanner has to implement the following methods. + +@deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg}) +This method is defined by the user to emit an error message. The first +parameter is omitted if location tracking is not active. Its type can be +changed using @code{%define location_type "@var{class-name}".} +@end deftypemethod + +@deftypemethod {Lexer} {int} yylex () +Return the next token. Its type is the return value, its semantic +value and location are saved and returned by the their methods in the +interface. + +Use @code{%define lex_throws} to specify any uncaught exceptions. +Default is @code{java.io.IOException}. +@end deftypemethod + +@deftypemethod {Lexer} {Position} getStartPos () +@deftypemethodx {Lexer} {Position} getEndPos () +Return respectively the first position of the last token that +@code{yylex} returned, and the first position beyond it. These +methods are not needed unless location tracking is active. + +The return type can be changed using @code{%define position_type +"@var{class-name}".} +@end deftypemethod + +@deftypemethod {Lexer} {Object} getLVal () +Return the semantic value of the last token that yylex returned. + +The return type can be changed using @code{%define stype +"@var{class-name}".} +@end deftypemethod + + +@node Java Action Features +@subsection Special Features for Use in Java Actions + +The following special constructs can be uses in Java actions. +Other analogous C action features are currently unavailable for Java. + +Use @code{%define throws} to specify any uncaught exceptions from parser +actions, and initial actions specified by @code{%initial-action}. + +@defvar $@var{n} +The semantic value for the @var{n}th component of the current rule. +This may not be assigned to. +@xref{Java Semantic Values}. +@end defvar + +@defvar $<@var{typealt}>@var{n} +Like @code{$@var{n}} but specifies a alternative type @var{typealt}. +@xref{Java Semantic Values}. +@end defvar + +@defvar $$ +The semantic value for the grouping made by the current rule. As a +value, this is in the base type (@code{Object} or as specified by +@code{%define stype}) as in not cast to the declared subtype because +casts are not allowed on the left-hand side of Java assignments. +Use an explicit Java cast if the correct subtype is needed. +@xref{Java Semantic Values}. +@end defvar + +@defvar $<@var{typealt}>$ +Same as @code{$$} since Java always allow assigning to the base type. +Perhaps we should use this and @code{$<>$} for the value and @code{$$} +for setting the value but there is currently no easy way to distinguish +these constructs. +@xref{Java Semantic Values}. +@end defvar + +@defvar @@@var{n} +The location information of the @var{n}th component of the current rule. +This may not be assigned to. +@xref{Java Location Values}. +@end defvar + +@defvar @@$ +The location information of the grouping made by the current rule. +@xref{Java Location Values}. +@end defvar + +@deftypefn {Statement} return YYABORT @code{;} +Return immediately from the parser, indicating failure. +@xref{Java Parser Interface}. +@end deftypefn + +@deftypefn {Statement} return YYACCEPT @code{;} +Return immediately from the parser, indicating success. +@xref{Java Parser Interface}. +@end deftypefn + +@deftypefn {Statement} {return} YYERROR @code{;} +Start error recovery (without printing an error message). +@xref{Error Recovery}. +@end deftypefn + +@deftypefn {Function} {boolean} recovering () +Return whether error recovery is being done. In this state, the parser +reads token until it reaches a known state, and then restarts normal +operation. +@xref{Error Recovery}. +@end deftypefn + +@deftypefn {Function} {protected void} yyerror (String msg) +@deftypefnx {Function} {protected void} yyerror (Position pos, String msg) +@deftypefnx {Function} {protected void} yyerror (Location loc, String msg) +Print an error message using the @code{yyerror} method of the scanner +instance in use. +@end deftypefn + + +@node Java Differences +@subsection Differences between C/C++ and Java Grammars + +The different structure of the Java language forces several differences +between C/C++ grammars, and grammars designed for Java parsers. This +section summarizes these differences. + +@itemize +@item +Java lacks a preprocessor, so the @code{YYERROR}, @code{YYACCEPT}, +@code{YYABORT} symbols (@pxref{Table of Symbols}) cannot obviously be +macros. Instead, they should be preceded by @code{return} when they +appear in an action. The actual definition of these symbols is +opaque to the Bison grammar, and it might change in the future. The +only meaningful operation that you can do, is to return them. +@xref{Java Action Features}. + +Note that of these three symbols, only @code{YYACCEPT} and +@code{YYABORT} will cause a return from the @code{yyparse} +method@footnote{Java parsers include the actions in a separate +method than @code{yyparse} in order to have an intuitive syntax that +corresponds to these C macros.}. + +@item +Java lacks unions, so @code{%union} has no effect. Instead, semantic +values have a common base type: @code{Object} or as specified by +@samp{%define stype}. Angle brackets on @code{%token}, @code{type}, +@code{$@var{n}} and @code{$$} specify subtypes rather than fields of +an union. The type of @code{$$}, even with angle brackets, is the base +type since Java casts are not allow on the left-hand side of assignments. +Also, @code{$@var{n}} and @code{@@@var{n}} are not allowed on the +left-hand side of assignments. @xref{Java Semantic Values}, and +@ref{Java Action Features}. + +@item +The prologue declarations have a different meaning than in C/C++ code. +@table @asis +@item @code{%code imports} +blocks are placed at the beginning of the Java source code. They may +include copyright notices. For a @code{package} declarations, it is +suggested to use @code{%define package} instead. + +@item unqualified @code{%code} +blocks are placed inside the parser class. + +@item @code{%code lexer} +blocks, if specified, should include the implementation of the +scanner. If there is no such block, the scanner can be any class +that implements the appropriate interface (@pxref{Java Scanner +Interface}). +@end table + +Other @code{%code} blocks are not supported in Java parsers. +In particular, @code{%@{ @dots{} %@}} blocks should not be used +and may give an error in future versions of Bison. + +The epilogue has the same meaning as in C/C++ code and it can +be used to define other classes used by the parser @emph{outside} +the parser class. +@end itemize + + +@node Java Declarations Summary +@subsection Java Declarations Summary + +This summary only include declarations specific to Java or have special +meaning when used in a Java parser. + +@deffn {Directive} {%language "Java"} +Generate a Java class for the parser. +@end deffn + +@deffn {Directive} %lex-param @{@var{type} @var{name}@} +A parameter for the lexer class defined by @code{%code lexer} +@emph{only}, added as parameters to the lexer constructor and the parser +constructor that @emph{creates} a lexer. Default is none. +@xref{Java Scanner Interface}. +@end deffn + +@deffn {Directive} %name-prefix "@var{prefix}" +The prefix of the parser class name @code{@var{prefix}Parser} if +@code{%define parser_class_name} is not used. Default is @code{YY}. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} %parse-param @{@var{type} @var{name}@} +A parameter for the parser class added as parameters to constructor(s) +and as fields initialized by the constructor(s). Default is none. +@xref{Java Parser Interface}. +@end deffn + +@deffn {Directive} %token <@var{type}> @var{token} @dots{} +Declare tokens. Note that the angle brackets enclose a Java @emph{type}. +@xref{Java Semantic Values}. +@end deffn + +@deffn {Directive} %type <@var{type}> @var{nonterminal} @dots{} +Declare the type of nonterminals. Note that the angle brackets enclose +a Java @emph{type}. +@xref{Java Semantic Values}. +@end deffn + +@deffn {Directive} %code @{ @var{code} @dots{} @} +Code appended to the inside of the parser class. +@xref{Java Differences}. +@end deffn + +@deffn {Directive} {%code imports} @{ @var{code} @dots{} @} +Code inserted just after the @code{package} declaration. +@xref{Java Differences}. +@end deffn + +@deffn {Directive} {%code lexer} @{ @var{code} @dots{} @} +Code added to the body of a inner lexer class within the parser class. +@xref{Java Scanner Interface}. +@end deffn + +@deffn {Directive} %% @var{code} @dots{} +Code (after the second @code{%%}) appended to the end of the file, +@emph{outside} the parser class. +@xref{Java Differences}. +@end deffn + +@deffn {Directive} %@{ @var{code} @dots{} %@} +Not supported. Use @code{%code import} instead. +@xref{Java Differences}. +@end deffn + +@deffn {Directive} {%define abstract} +Whether the parser class is declared @code{abstract}. Default is false. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define extends} "@var{superclass}" +The superclass of the parser class. Default is none. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define final} +Whether the parser class is declared @code{final}. Default is false. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define implements} "@var{interfaces}" +The implemented interfaces of the parser class, a comma-separated list. +Default is none. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define lex_throws} "@var{exceptions}" +The exceptions thrown by the @code{yylex} method of the lexer, a +comma-separated list. Default is @code{java.io.IOException}. +@xref{Java Scanner Interface}. +@end deffn + +@deffn {Directive} {%define location_type} "@var{class}" +The name of the class used for locations (a range between two +positions). This class is generated as an inner class of the parser +class by @command{bison}. Default is @code{Location}. +@xref{Java Location Values}. +@end deffn + +@deffn {Directive} {%define package} "@var{package}" +The package to put the parser class in. Default is none. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define parser_class_name} "@var{name}" +The name of the parser class. Default is @code{YYParser} or +@code{@var{name-prefix}Parser}. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define position_type} "@var{class}" +The name of the class used for positions. This class must be supplied by +the user. Default is @code{Position}. +@xref{Java Location Values}. +@end deffn + +@deffn {Directive} {%define public} +Whether the parser class is declared @code{public}. Default is false. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define stype} "@var{class}" +The base type of semantic values. Default is @code{Object}. +@xref{Java Semantic Values}. +@end deffn + +@deffn {Directive} {%define strictfp} +Whether the parser class is declared @code{strictfp}. Default is false. +@xref{Java Bison Interface}. +@end deffn + +@deffn {Directive} {%define throws} "@var{exceptions}" +The exceptions thrown by user-supplied parser actions and +@code{%initial-action}, a comma-separated list. Default is none. +@xref{Java Parser Interface}. +@end deffn + + +@c ================================================= FAQ + +@node FAQ +@chapter Frequently Asked Questions +@cindex frequently asked questions +@cindex questions + +Several questions about Bison come up occasionally. Here some of them +are addressed. + +@menu +* Memory Exhausted:: Breaking the Stack Limits +* How Can I Reset the Parser:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings +* Implementing Gotos/Loops:: Control Flow in the Calculator +* Multiple start-symbols:: Factoring closely related grammars +* Secure? Conform?:: Is Bison POSIX safe? +* I can't build Bison:: Troubleshooting +* Where can I find help?:: Troubleshouting +* Bug Reports:: Troublereporting +* More Languages:: Parsers in C++, Java, and so on +* Beta Testing:: Experimenting development versions +* Mailing Lists:: Meeting other Bison users +@end menu + +@node Memory Exhausted +@section Memory Exhausted + +@quotation +My parser returns with error with a @samp{memory exhausted} +message. What can I do? +@end quotation + +This question is already addressed elsewhere, see @ref{Recursion, ,Recursive +Rules}. + +@node How Can I Reset the Parser +@section How Can I Reset the Parser + +The following phenomenon has several symptoms, resulting in the +following typical questions: + +@quotation +I invoke @code{yyparse} several times, and on correct input it works +properly; but when a parse error is found, all the other calls fail +too. How can I reset the error flag of @code{yyparse}? +@end quotation + +@noindent +or + +@quotation +My parser includes support for an @samp{#include}-like feature, in +which case I run @code{yyparse} from @code{yyparse}. This fails +although I did specify @samp{%define api.pure}. +@end quotation + +These problems typically come not from Bison itself, but from +Lex-generated scanners. Because these scanners use large buffers for +speed, they might not notice a change of input file. As a +demonstration, consider the following source file, +@file{first-line.l}: + +@example +@group +%@{ +#include +#include +%@} +@end group +%% +.*\n ECHO; return 1; +%% +@group +int +yyparse (char const *file) +@{ + yyin = fopen (file, "r"); + if (!yyin) + @{ + perror ("fopen"); + exit (EXIT_FAILURE); + @} +@end group +@group + /* One token only. */ + yylex (); + if (fclose (yyin) != 0) + @{ + perror ("fclose"); + exit (EXIT_FAILURE); + @} + return 0; +@} +@end group + +@group +int +main (void) +@{ + yyparse ("input"); + yyparse ("input"); + return 0; +@} +@end group +@end example + +@noindent +If the file @file{input} contains + +@example +input:1: Hello, +input:2: World! +@end example + +@noindent +then instead of getting the first line twice, you get: + +@example +$ @kbd{flex -ofirst-line.c first-line.l} +$ @kbd{gcc -ofirst-line first-line.c -ll} +$ @kbd{./first-line} +input:1: Hello, +input:2: World! +@end example + +Therefore, whenever you change @code{yyin}, you must tell the +Lex-generated scanner to discard its current buffer and switch to the +new one. This depends upon your implementation of Lex; see its +documentation for more. For Flex, it suffices to call +@samp{YY_FLUSH_BUFFER} after each change to @code{yyin}. If your +Flex-generated scanner needs to read from several input streams to +handle features like include files, you might consider using Flex +functions like @samp{yy_switch_to_buffer} that manipulate multiple +input buffers. + +If your Flex-generated scanner uses start conditions (@pxref{Start +conditions, , Start conditions, flex, The Flex Manual}), you might +also want to reset the scanner's state, i.e., go back to the initial +start condition, through a call to @samp{BEGIN (0)}. + +@node Strings are Destroyed +@section Strings are Destroyed + +@quotation +My parser seems to destroy old strings, or maybe it loses track of +them. Instead of reporting @samp{"foo", "bar"}, it reports +@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}. +@end quotation + +This error is probably the single most frequent ``bug report'' sent to +Bison lists, but is only concerned with a misunderstanding of the role +of the scanner. Consider the following Lex code: + +@example +@group +%@{ +#include +char *yylval = NULL; +%@} +@end group +@group +%% +.* yylval = yytext; return 1; +\n /* IGNORE */ +%% +@end group +@group +int +main () +@{ + /* Similar to using $1, $2 in a Bison action. */ + char *fst = (yylex (), yylval); + char *snd = (yylex (), yylval); + printf ("\"%s\", \"%s\"\n", fst, snd); + return 0; +@} +@end group +@end example + +If you compile and run this code, you get: + +@example +$ @kbd{flex -osplit-lines.c split-lines.l} +$ @kbd{gcc -osplit-lines split-lines.c -ll} +$ @kbd{printf 'one\ntwo\n' | ./split-lines} +"one +two", "two" +@end example + +@noindent +this is because @code{yytext} is a buffer provided for @emph{reading} +in the action, but if you want to keep it, you have to duplicate it +(e.g., using @code{strdup}). Note that the output may depend on how +your implementation of Lex handles @code{yytext}. For instance, when +given the Lex compatibility option @option{-l} (which triggers the +option @samp{%array}) Flex generates a different behavior: + +@example +$ @kbd{flex -l -osplit-lines.c split-lines.l} +$ @kbd{gcc -osplit-lines split-lines.c -ll} +$ @kbd{printf 'one\ntwo\n' | ./split-lines} +"two", "two" +@end example + + +@node Implementing Gotos/Loops +@section Implementing Gotos/Loops + +@quotation +My simple calculator supports variables, assignments, and functions, +but how can I implement gotos, or loops? +@end quotation + +Although very pedagogical, the examples included in the document blur +the distinction to make between the parser---whose job is to recover +the structure of a text and to transmit it to subsequent modules of +the program---and the processing (such as the execution) of this +structure. This works well with so called straight line programs, +i.e., precisely those that have a straightforward execution model: +execute simple instructions one after the others. + +@cindex abstract syntax tree +@cindex AST +If you want a richer model, you will probably need to use the parser +to construct a tree that does represent the structure it has +recovered; this tree is usually called the @dfn{abstract syntax tree}, +or @dfn{AST} for short. Then, walking through this tree, +traversing it in various ways, will enable treatments such as its +execution or its translation, which will result in an interpreter or a +compiler. + +This topic is way beyond the scope of this manual, and the reader is +invited to consult the dedicated literature. + + +@node Multiple start-symbols +@section Multiple start-symbols + +@quotation +I have several closely related grammars, and I would like to share their +implementations. In fact, I could use a single grammar but with +multiple entry points. +@end quotation + +Bison does not support multiple start-symbols, but there is a very +simple means to simulate them. If @code{foo} and @code{bar} are the two +pseudo start-symbols, then introduce two new tokens, say +@code{START_FOO} and @code{START_BAR}, and use them as switches from the +real start-symbol: + +@example +%token START_FOO START_BAR; +%start start; +start: + START_FOO foo +| START_BAR bar; +@end example + +These tokens prevents the introduction of new conflicts. As far as the +parser goes, that is all that is needed. + +Now the difficult part is ensuring that the scanner will send these +tokens first. If your scanner is hand-written, that should be +straightforward. If your scanner is generated by Lex, them there is +simple means to do it: recall that anything between @samp{%@{ ... %@}} +after the first @code{%%} is copied verbatim in the top of the generated +@code{yylex} function. Make sure a variable @code{start_token} is +available in the scanner (e.g., a global variable or using +@code{%lex-param} etc.), and use the following: + +@example + /* @r{Prologue.} */ +%% +%@{ + if (start_token) + @{ + int t = start_token; + start_token = 0; + return t; + @} +%@} + /* @r{The rules.} */ +@end example + + +@node Secure? Conform? +@section Secure? Conform? + +@quotation +Is Bison secure? Does it conform to POSIX? +@end quotation + +If you're looking for a guarantee or certification, we don't provide it. +However, Bison is intended to be a reliable program that conforms to the +POSIX specification for Yacc. If you run into problems, +please send us a bug report. + +@node I can't build Bison +@section I can't build Bison + +@quotation +I can't build Bison because @command{make} complains that +@code{msgfmt} is not found. +What should I do? +@end quotation + +Like most GNU packages with internationalization support, that feature +is turned on by default. If you have problems building in the @file{po} +subdirectory, it indicates that your system's internationalization +support is lacking. You can re-configure Bison with +@option{--disable-nls} to turn off this support, or you can install GNU +gettext from @url{ftp://ftp.gnu.org/gnu/gettext/} and re-configure +Bison. See the file @file{ABOUT-NLS} for more information. + + +@node Where can I find help? +@section Where can I find help? + +@quotation +I'm having trouble using Bison. Where can I find help? +@end quotation + +First, read this fine manual. Beyond that, you can send mail to +@email{help-bison@@gnu.org}. This mailing list is intended to be +populated with people who are willing to answer questions about using +and installing Bison. Please keep in mind that (most of) the people on +the list have aspects of their lives which are not related to Bison (!), +so you may not receive an answer to your question right away. This can +be frustrating, but please try not to honk them off; remember that any +help they provide is purely voluntary and out of the kindness of their +hearts. + +@node Bug Reports +@section Bug Reports + +@quotation +I found a bug. What should I include in the bug report? +@end quotation + +Before you send a bug report, make sure you are using the latest +version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its +mirrors. Be sure to include the version number in your bug report. If +the bug is present in the latest version but not in a previous version, +try to determine the most recent version which did not contain the bug. + +If the bug is parser-related, you should include the smallest grammar +you can which demonstrates the bug. The grammar file should also be +complete (i.e., I should be able to run it through Bison without having +to edit or add anything). The smaller and simpler the grammar, the +easier it will be to fix the bug. + +Include information about your compilation environment, including your +operating system's name and version and your compiler's name and +version. If you have trouble compiling, you should also include a +transcript of the build session, starting with the invocation of +`configure'. Depending on the nature of the bug, you may be asked to +send additional files as well (such as `config.h' or `config.cache'). + +Patches are most welcome, but not required. That is, do not hesitate to +send a bug report just because you cannot provide a fix. + +Send bug reports to @email{bug-bison@@gnu.org}. + +@node More Languages +@section More Languages + +@quotation +Will Bison ever have C++ and Java support? How about @var{insert your +favorite language here}? +@end quotation + +C++ and Java support is there now, and is documented. We'd love to add other +languages; contributions are welcome. + +@node Beta Testing +@section Beta Testing + +@quotation +What is involved in being a beta tester? +@end quotation + +It's not terribly involved. Basically, you would download a test +release, compile it, and use it to build and run a parser or two. After +that, you would submit either a bug report or a message saying that +everything is okay. It is important to report successes as well as +failures because test releases eventually become mainstream releases, +but only if they are adequately tested. If no one tests, development is +essentially halted. + +Beta testers are particularly needed for operating systems to which the +developers do not have easy access. They currently have easy access to +recent GNU/Linux and Solaris versions. Reports about other operating +systems are especially welcome. + +@node Mailing Lists +@section Mailing Lists + +@quotation +How do I join the help-bison and bug-bison mailing lists? +@end quotation + +See @url{http://lists.gnu.org/}. + +@c ================================================= Table of Symbols + +@node Table of Symbols +@appendix Bison Symbols +@cindex Bison symbols, table of +@cindex symbols in Bison, table of + +@deffn {Variable} @@$ +In an action, the location of the left-hand side of the rule. +@xref{Tracking Locations}. +@end deffn + +@deffn {Variable} @@@var{n} +In an action, the location of the @var{n}-th symbol of the right-hand side +of the rule. @xref{Tracking Locations}. +@end deffn + +@deffn {Variable} @@@var{name} +In an action, the location of a symbol addressed by name. @xref{Tracking +Locations}. +@end deffn + +@deffn {Variable} @@[@var{name}] +In an action, the location of a symbol addressed by name. @xref{Tracking +Locations}. +@end deffn + +@deffn {Variable} $$ +In an action, the semantic value of the left-hand side of the rule. +@xref{Actions}. +@end deffn + +@deffn {Variable} $@var{n} +In an action, the semantic value of the @var{n}-th symbol of the +right-hand side of the rule. @xref{Actions}. +@end deffn + +@deffn {Variable} $@var{name} +In an action, the semantic value of a symbol addressed by name. +@xref{Actions}. +@end deffn + +@deffn {Variable} $[@var{name}] +In an action, the semantic value of a symbol addressed by name. +@xref{Actions}. +@end deffn + +@deffn {Delimiter} %% +Delimiter used to separate the grammar rule section from the +Bison declarations section or the epilogue. +@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}. +@end deffn + +@c Don't insert spaces, or check the DVI output. +@deffn {Delimiter} %@{@var{code}%@} +All code listed between @samp{%@{} and @samp{%@}} is copied verbatim +to the parser implementation file. Such code forms the prologue of +the grammar file. @xref{Grammar Outline, ,Outline of a Bison +Grammar}. +@end deffn + +@deffn {Construct} /*@dots{}*/ +Comment delimiters, as in C. +@end deffn + +@deffn {Delimiter} : +Separates a rule's result from its components. @xref{Rules, ,Syntax of +Grammar Rules}. +@end deffn + +@deffn {Delimiter} ; +Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}. +@end deffn + +@deffn {Delimiter} | +Separates alternate rules for the same result nonterminal. +@xref{Rules, ,Syntax of Grammar Rules}. +@end deffn + +@deffn {Directive} <*> +Used to define a default tagged @code{%destructor} or default tagged +@code{%printer}. + +This feature is experimental. +More user feedback will help to determine whether it should become a permanent +feature. + +@xref{Destructor Decl, , Freeing Discarded Symbols}. +@end deffn + +@deffn {Directive} <> +Used to define a default tagless @code{%destructor} or default tagless +@code{%printer}. + +This feature is experimental. +More user feedback will help to determine whether it should become a permanent +feature. + +@xref{Destructor Decl, , Freeing Discarded Symbols}. +@end deffn + +@deffn {Symbol} $accept +The predefined nonterminal whose only rule is @samp{$accept: @var{start} +$end}, where @var{start} is the start symbol. @xref{Start Decl, , The +Start-Symbol}. It cannot be used in the grammar. +@end deffn + +@deffn {Directive} %code @{@var{code}@} +@deffnx {Directive} %code @var{qualifier} @{@var{code}@} +Insert @var{code} verbatim into the output parser source at the +default location or at the location specified by @var{qualifier}. +@xref{%code Summary}. +@end deffn + +@deffn {Directive} %debug +Equip the parser for debugging. @xref{Decl Summary}. +@end deffn + +@ifset defaultprec +@deffn {Directive} %default-prec +Assign a precedence to rules that lack an explicit @samp{%prec} +modifier. @xref{Contextual Precedence, ,Context-Dependent +Precedence}. +@end deffn +@end ifset + +@deffn {Directive} %define @var{variable} +@deffnx {Directive} %define @var{variable} @var{value} +@deffnx {Directive} %define @var{variable} "@var{value}" +Define a variable to adjust Bison's behavior. @xref{%define Summary}. +@end deffn + +@deffn {Directive} %defines +Bison declaration to create a parser header file, which is usually +meant for the scanner. @xref{Decl Summary}. +@end deffn + +@deffn {Directive} %defines @var{defines-file} +Same as above, but save in the file @var{defines-file}. +@xref{Decl Summary}. +@end deffn + +@deffn {Directive} %destructor +Specify how the parser should reclaim the memory associated to +discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}. +@end deffn + +@deffn {Directive} %dprec +Bison declaration to assign a precedence to a rule that is used at parse +time to resolve reduce/reduce conflicts. @xref{GLR Parsers, ,Writing +GLR Parsers}. +@end deffn + +@deffn {Symbol} $end +The predefined token marking the end of the token stream. It cannot be +used in the grammar. +@end deffn + +@deffn {Symbol} error +A token name reserved for error recovery. This token may be used in +grammar rules so as to allow the Bison parser to recognize an error in +the grammar without halting the process. In effect, a sentence +containing an error may be recognized as valid. On a syntax error, the +token @code{error} becomes the current lookahead token. Actions +corresponding to @code{error} are then executed, and the lookahead +token is reset to the token that originally caused the violation. +@xref{Error Recovery}. +@end deffn + +@deffn {Directive} %error-verbose +Bison declaration to request verbose, specific error message strings +when @code{yyerror} is called. @xref{Error Reporting}. +@end deffn + +@deffn {Directive} %file-prefix "@var{prefix}" +Bison declaration to set the prefix of the output files. @xref{Decl +Summary}. +@end deffn + +@deffn {Directive} %glr-parser +Bison declaration to produce a GLR parser. @xref{GLR +Parsers, ,Writing GLR Parsers}. +@end deffn + +@deffn {Directive} %initial-action +Run user code before parsing. @xref{Initial Action Decl, , Performing Actions before Parsing}. +@end deffn + +@deffn {Directive} %language +Specify the programming language for the generated parser. +@xref{Decl Summary}. +@end deffn + +@deffn {Directive} %left +Bison declaration to assign left associativity to token(s). +@xref{Precedence Decl, ,Operator Precedence}. +@end deffn + +@deffn {Directive} %lex-param @{@var{argument-declaration}@} +Bison declaration to specifying an additional parameter that +@code{yylex} should accept. @xref{Pure Calling,, Calling Conventions +for Pure Parsers}. +@end deffn + +@deffn {Directive} %merge +Bison declaration to assign a merging function to a rule. If there is a +reduce/reduce conflict with a rule having the same merging function, the +function is applied to the two semantic values to get a single result. +@xref{GLR Parsers, ,Writing GLR Parsers}. +@end deffn + +@deffn {Directive} %name-prefix "@var{prefix}" +Bison declaration to rename the external symbols. @xref{Decl Summary}. +@end deffn + +@ifset defaultprec +@deffn {Directive} %no-default-prec +Do not assign a precedence to rules that lack an explicit @samp{%prec} +modifier. @xref{Contextual Precedence, ,Context-Dependent +Precedence}. +@end deffn +@end ifset + +@deffn {Directive} %no-lines +Bison declaration to avoid generating @code{#line} directives in the +parser implementation file. @xref{Decl Summary}. +@end deffn + +@deffn {Directive} %nonassoc +Bison declaration to assign nonassociativity to token(s). +@xref{Precedence Decl, ,Operator Precedence}. +@end deffn + +@deffn {Directive} %output "@var{file}" +Bison declaration to set the name of the parser implementation file. +@xref{Decl Summary}. +@end deffn + +@deffn {Directive} %parse-param @{@var{argument-declaration}@} +Bison declaration to specifying an additional parameter that +@code{yyparse} should accept. @xref{Parser Function,, The Parser +Function @code{yyparse}}. +@end deffn + +@deffn {Directive} %prec +Bison declaration to assign a precedence to a specific rule. +@xref{Contextual Precedence, ,Context-Dependent Precedence}. +@end deffn + +@deffn {Directive} %pure-parser +Deprecated version of @code{%define api.pure} (@pxref{%define +Summary,,api.pure}), for which Bison is more careful to warn about +unreasonable usage. +@end deffn + +@deffn {Directive} %require "@var{version}" +Require version @var{version} or higher of Bison. @xref{Require Decl, , +Require a Version of Bison}. +@end deffn + +@deffn {Directive} %right +Bison declaration to assign right associativity to token(s). +@xref{Precedence Decl, ,Operator Precedence}. +@end deffn + +@deffn {Directive} %skeleton +Specify the skeleton to use; usually for development. +@xref{Decl Summary}. +@end deffn + +@deffn {Directive} %start +Bison declaration to specify the start symbol. @xref{Start Decl, ,The +Start-Symbol}. +@end deffn + +@deffn {Directive} %token +Bison declaration to declare token(s) without specifying precedence. +@xref{Token Decl, ,Token Type Names}. +@end deffn + +@deffn {Directive} %token-table +Bison declaration to include a token name table in the parser +implementation file. @xref{Decl Summary}. +@end deffn + +@deffn {Directive} %type +Bison declaration to declare nonterminals. @xref{Type Decl, +,Nonterminal Symbols}. +@end deffn + +@deffn {Symbol} $undefined +The predefined token onto which all undefined values returned by +@code{yylex} are mapped. It cannot be used in the grammar, rather, use +@code{error}. +@end deffn + +@deffn {Directive} %union +Bison declaration to specify several possible data types for semantic +values. @xref{Union Decl, ,The Collection of Value Types}. +@end deffn + +@deffn {Macro} YYABORT +Macro to pretend that an unrecoverable syntax error has occurred, by +making @code{yyparse} return 1 immediately. The error reporting +function @code{yyerror} is not called. @xref{Parser Function, ,The +Parser Function @code{yyparse}}. + +For Java parsers, this functionality is invoked using @code{return YYABORT;} +instead. +@end deffn + +@deffn {Macro} YYACCEPT +Macro to pretend that a complete utterance of the language has been +read, by making @code{yyparse} return 0 immediately. +@xref{Parser Function, ,The Parser Function @code{yyparse}}. + +For Java parsers, this functionality is invoked using @code{return YYACCEPT;} +instead. +@end deffn + +@deffn {Macro} YYBACKUP +Macro to discard a value from the parser stack and fake a lookahead +token. @xref{Action Features, ,Special Features for Use in Actions}. +@end deffn + +@deffn {Variable} yychar +External integer variable that contains the integer value of the +lookahead token. (In a pure parser, it is a local variable within +@code{yyparse}.) Error-recovery rule actions may examine this variable. +@xref{Action Features, ,Special Features for Use in Actions}. +@end deffn + +@deffn {Variable} yyclearin +Macro used in error-recovery rule actions. It clears the previous +lookahead token. @xref{Error Recovery}. +@end deffn + +@deffn {Macro} YYDEBUG +Macro to define to equip the parser with tracing code. @xref{Tracing, +,Tracing Your Parser}. +@end deffn + +@deffn {Variable} yydebug +External integer variable set to zero by default. If @code{yydebug} +is given a nonzero value, the parser will output information on input +symbols and parser action. @xref{Tracing, ,Tracing Your Parser}. +@end deffn + +@deffn {Macro} yyerrok +Macro to cause parser to recover immediately to its normal mode +after a syntax error. @xref{Error Recovery}. +@end deffn + +@deffn {Macro} YYERROR +Cause an immediate syntax error. This statement initiates error +recovery just as if the parser itself had detected an error; however, it +does not call @code{yyerror}, and does not print any message. If you +want to print an error message, call @code{yyerror} explicitly before +the @samp{YYERROR;} statement. @xref{Error Recovery}. + +For Java parsers, this functionality is invoked using @code{return YYERROR;} +instead. +@end deffn + +@deffn {Function} yyerror +User-supplied function to be called by @code{yyparse} on error. +@xref{Error Reporting, ,The Error +Reporting Function @code{yyerror}}. +@end deffn + +@deffn {Macro} YYERROR_VERBOSE +An obsolete macro that you define with @code{#define} in the prologue +to request verbose, specific error message strings +when @code{yyerror} is called. It doesn't matter what definition you +use for @code{YYERROR_VERBOSE}, just whether you define it. Using +@code{%error-verbose} is preferred. @xref{Error Reporting}. +@end deffn + +@deffn {Macro} YYFPRINTF +Macro used to output run-time traces. +@xref{Enabling Traces}. +@end deffn + +@deffn {Macro} YYINITDEPTH +Macro for specifying the initial size of the parser stack. +@xref{Memory Management}. +@end deffn + +@deffn {Function} yylex +User-supplied lexical analyzer function, called with no arguments to get +the next token. @xref{Lexical, ,The Lexical Analyzer Function +@code{yylex}}. +@end deffn + +@deffn {Macro} YYLEX_PARAM +An obsolete macro for specifying an extra argument (or list of extra +arguments) for @code{yyparse} to pass to @code{yylex}. The use of this +macro is deprecated, and is supported only for Yacc like parsers. +@xref{Pure Calling,, Calling Conventions for Pure Parsers}. +@end deffn + +@deffn {Variable} yylloc +External variable in which @code{yylex} should place the line and column +numbers associated with a token. (In a pure parser, it is a local +variable within @code{yyparse}, and its address is passed to +@code{yylex}.) +You can ignore this variable if you don't use the @samp{@@} feature in the +grammar actions. +@xref{Token Locations, ,Textual Locations of Tokens}. +In semantic actions, it stores the location of the lookahead token. +@xref{Actions and Locations, ,Actions and Locations}. +@end deffn + +@deffn {Type} YYLTYPE +Data type of @code{yylloc}; by default, a structure with four +members. @xref{Location Type, , Data Types of Locations}. +@end deffn + +@deffn {Variable} yylval +External variable in which @code{yylex} should place the semantic +value associated with a token. (In a pure parser, it is a local +variable within @code{yyparse}, and its address is passed to +@code{yylex}.) +@xref{Token Values, ,Semantic Values of Tokens}. +In semantic actions, it stores the semantic value of the lookahead token. +@xref{Actions, ,Actions}. +@end deffn + +@deffn {Macro} YYMAXDEPTH +Macro for specifying the maximum size of the parser stack. @xref{Memory +Management}. +@end deffn + +@deffn {Variable} yynerrs +Global variable which Bison increments each time it reports a syntax error. +(In a pure parser, it is a local variable within @code{yyparse}. In a +pure push parser, it is a member of yypstate.) +@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. +@end deffn + +@deffn {Function} yyparse +The parser function produced by Bison; call this function to start +parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}. +@end deffn + +@deffn {Macro} YYPRINT +Macro used to output token semantic values. For @file{yacc.c} only. +Obsoleted by @code{%printer}. +@xref{The YYPRINT Macro, , The @code{YYPRINT} Macro}. +@end deffn + +@deffn {Function} yypstate_delete +The function to delete a parser instance, produced by Bison in push mode; +call this function to delete the memory associated with a parser. +@xref{Parser Delete Function, ,The Parser Delete Function +@code{yypstate_delete}}. +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) +@end deffn + +@deffn {Function} yypstate_new +The function to create a parser instance, produced by Bison in push mode; +call this function to create a new parser. +@xref{Parser Create Function, ,The Parser Create Function +@code{yypstate_new}}. +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) +@end deffn + +@deffn {Function} yypull_parse +The parser function produced by Bison in push mode; call this function to +parse the rest of the input stream. +@xref{Pull Parser Function, ,The Pull Parser Function +@code{yypull_parse}}. +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) +@end deffn + +@deffn {Function} yypush_parse +The parser function produced by Bison in push mode; call this function to +parse a single token. @xref{Push Parser Function, ,The Push Parser Function +@code{yypush_parse}}. +(The current push parsing interface is experimental and may evolve. +More user feedback will help to stabilize it.) +@end deffn + +@deffn {Macro} YYPARSE_PARAM +An obsolete macro for specifying the name of a parameter that +@code{yyparse} should accept. The use of this macro is deprecated, and +is supported only for Yacc like parsers. @xref{Pure Calling,, Calling +Conventions for Pure Parsers}. +@end deffn + +@deffn {Macro} YYRECOVERING +The expression @code{YYRECOVERING ()} yields 1 when the parser +is recovering from a syntax error, and 0 otherwise. +@xref{Action Features, ,Special Features for Use in Actions}. +@end deffn + +@deffn {Macro} YYSTACK_USE_ALLOCA +Macro used to control the use of @code{alloca} when the +deterministic parser in C needs to extend its stacks. If defined to 0, +the parser will use @code{malloc} to extend its stacks. If defined to +1, the parser will use @code{alloca}. Values other than 0 and 1 are +reserved for future Bison extensions. If not defined, +@code{YYSTACK_USE_ALLOCA} defaults to 0. + +In the all-too-common case where your code may run on a host with a +limited stack and with unreliable stack-overflow checking, you should +set @code{YYMAXDEPTH} to a value that cannot possibly result in +unchecked stack overflow on any of your target hosts when +@code{alloca} is called. You can inspect the code that Bison +generates in order to determine the proper numeric values. This will +require some expertise in low-level implementation details. +@end deffn + +@deffn {Type} YYSTYPE +Data type of semantic values; @code{int} by default. +@xref{Value Type, ,Data Types of Semantic Values}. +@end deffn + +@node Glossary +@appendix Glossary +@cindex glossary + +@table @asis +@item Accepting state +A state whose only action is the accept action. +The accepting state is thus a consistent state. +@xref{Understanding,,}. + +@item Backus-Naur Form (BNF; also called ``Backus Normal Form'') +Formal method of specifying context-free grammars originally proposed +by John Backus, and slightly improved by Peter Naur in his 1960-01-02 +committee document contributing to what became the Algol 60 report. +@xref{Language and Grammar, ,Languages and Context-Free Grammars}. + +@item Consistent state +A state containing only one possible action. @xref{Default Reductions}. + +@item Context-free grammars +Grammars specified as rules that can be applied regardless of context. +Thus, if there is a rule which says that an integer can be used as an +expression, integers are allowed @emph{anywhere} an expression is +permitted. @xref{Language and Grammar, ,Languages and Context-Free +Grammars}. + +@item Default reduction +The reduction that a parser should perform if the current parser state +contains no other action for the lookahead token. In permitted parser +states, Bison declares the reduction with the largest lookahead set to be +the default reduction and removes that lookahead set. @xref{Default +Reductions}. + +@item Defaulted state +A consistent state with a default reduction. @xref{Default Reductions}. + +@item Dynamic allocation +Allocation of memory that occurs during execution, rather than at +compile time or on entry to a function. + +@item Empty string +Analogous to the empty set in set theory, the empty string is a +character string of length zero. + +@item Finite-state stack machine +A ``machine'' that has discrete states in which it is said to exist at +each instant in time. As input to the machine is processed, the +machine moves from state to state as specified by the logic of the +machine. In the case of the parser, the input is the language being +parsed, and the states correspond to various stages in the grammar +rules. @xref{Algorithm, ,The Bison Parser Algorithm}. + +@item Generalized LR (GLR) +A parsing algorithm that can handle all context-free grammars, including those +that are not LR(1). It resolves situations that Bison's +deterministic parsing +algorithm cannot by effectively splitting off multiple parsers, trying all +possible parsers, and discarding those that fail in the light of additional +right context. @xref{Generalized LR Parsing, ,Generalized +LR Parsing}. + +@item Grouping +A language construct that is (in general) grammatically divisible; +for example, `expression' or `declaration' in C@. +@xref{Language and Grammar, ,Languages and Context-Free Grammars}. + +@item IELR(1) (Inadequacy Elimination LR(1)) +A minimal LR(1) parser table construction algorithm. That is, given any +context-free grammar, IELR(1) generates parser tables with the full +language-recognition power of canonical LR(1) but with nearly the same +number of parser states as LALR(1). This reduction in parser states is +often an order of magnitude. More importantly, because canonical LR(1)'s +extra parser states may contain duplicate conflicts in the case of non-LR(1) +grammars, the number of conflicts for IELR(1) is often an order of magnitude +less as well. This can significantly reduce the complexity of developing a +grammar. @xref{LR Table Construction}. + +@item Infix operator +An arithmetic operator that is placed between the operands on which it +performs some operation. + +@item Input stream +A continuous flow of data between devices or programs. + +@item LAC (Lookahead Correction) +A parsing mechanism that fixes the problem of delayed syntax error +detection, which is caused by LR state merging, default reductions, and the +use of @code{%nonassoc}. Delayed syntax error detection results in +unexpected semantic actions, initiation of error recovery in the wrong +syntactic context, and an incorrect list of expected tokens in a verbose +syntax error message. @xref{LAC}. + +@item Language construct +One of the typical usage schemas of the language. For example, one of +the constructs of the C language is the @code{if} statement. +@xref{Language and Grammar, ,Languages and Context-Free Grammars}. + +@item Left associativity +Operators having left associativity are analyzed from left to right: +@samp{a+b+c} first computes @samp{a+b} and then combines with +@samp{c}. @xref{Precedence, ,Operator Precedence}. + +@item Left recursion +A rule whose result symbol is also its first component symbol; for +example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive +Rules}. + +@item Left-to-right parsing +Parsing a sentence of a language by analyzing it token by token from +left to right. @xref{Algorithm, ,The Bison Parser Algorithm}. + +@item Lexical analyzer (scanner) +A function that reads an input stream and returns tokens one by one. +@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. + +@item Lexical tie-in +A flag, set by actions in the grammar rules, which alters the way +tokens are parsed. @xref{Lexical Tie-ins}. + +@item Literal string token +A token which consists of two or more fixed characters. @xref{Symbols}. + +@item Lookahead token +A token already read but not yet shifted. @xref{Lookahead, ,Lookahead +Tokens}. + +@item LALR(1) +The class of context-free grammars that Bison (like most other parser +generators) can handle by default; a subset of LR(1). +@xref{Mysterious Conflicts}. + +@item LR(1) +The class of context-free grammars in which at most one token of +lookahead is needed to disambiguate the parsing of any piece of input. + +@item Nonterminal symbol +A grammar symbol standing for a grammatical construct that can +be expressed through rules in terms of smaller constructs; in other +words, a construct that is not a token. @xref{Symbols}. + +@item Parser +A function that recognizes valid sentences of a language by analyzing +the syntax structure of a set of tokens passed to it from a lexical +analyzer. + +@item Postfix operator +An arithmetic operator that is placed after the operands upon which it +performs some operation. + +@item Reduction +Replacing a string of nonterminals and/or terminals with a single +nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison +Parser Algorithm}. + +@item Reentrant +A reentrant subprogram is a subprogram which can be in invoked any +number of times in parallel, without interference between the various +invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. + +@item Reverse polish notation +A language in which all operators are postfix operators. + +@item Right recursion +A rule whose result symbol is also its last component symbol; for +example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive +Rules}. + +@item Semantics +In computer languages, the semantics are specified by the actions +taken for each instance of the language, i.e., the meaning of +each statement. @xref{Semantics, ,Defining Language Semantics}. + +@item Shift +A parser is said to shift when it makes the choice of analyzing +further input from the stream rather than reducing immediately some +already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm}. + +@item Single-character literal +A single character that is recognized and interpreted as is. +@xref{Grammar in Bison, ,From Formal Rules to Bison Input}. + +@item Start symbol +The nonterminal symbol that stands for a complete valid utterance in +the language being parsed. The start symbol is usually listed as the +first nonterminal symbol in a language specification. +@xref{Start Decl, ,The Start-Symbol}. + +@item Symbol table +A data structure where symbol names and associated data are stored +during parsing to allow for recognition and use of existing +information in repeated uses of a symbol. @xref{Multi-function Calc}. + +@item Syntax error +An error encountered during parsing of an input stream due to invalid +syntax. @xref{Error Recovery}. + +@item Token +A basic, grammatically indivisible unit of a language. The symbol +that describes a token in the grammar is a terminal symbol. +The input of the Bison parser is a stream of tokens which comes from +the lexical analyzer. @xref{Symbols}. + +@item Terminal symbol +A grammar symbol that has no rules in the grammar and therefore is +grammatically indivisible. The piece of text it represents is a token. +@xref{Language and Grammar, ,Languages and Context-Free Grammars}. + +@item Unreachable state +A parser state to which there does not exist a sequence of transitions from +the parser's start state. A state can become unreachable during conflict +resolution. @xref{Unreachable States}. +@end table + +@node Copying This Manual +@appendix Copying This Manual +@include fdl.texi + +@node Bibliography +@unnumbered Bibliography + +@table @asis +@item [Denny 2008] +Joel E. Denny and Brian A. Malloy, IELR(1): Practical LR(1) Parser Tables +for Non-LR(1) Grammars with Conflict Resolution, in @cite{Proceedings of the +2008 ACM Symposium on Applied Computing} (SAC'08), ACM, New York, NY, USA, +pp.@: 240--245. @uref{http://dx.doi.org/10.1145/1363686.1363747} + +@item [Denny 2010 May] +Joel E. Denny, PSLR(1): Pseudo-Scannerless Minimal LR(1) for the +Deterministic Parsing of Composite Languages, Ph.D. Dissertation, Clemson +University, Clemson, SC, USA (May 2010). +@uref{http://proquest.umi.com/pqdlink?did=2041473591&Fmt=7&clientId=79356&RQT=309&VName=PQD} + +@item [Denny 2010 November] +Joel E. Denny and Brian A. Malloy, The IELR(1) Algorithm for Generating +Minimal LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution, +in @cite{Science of Computer Programming}, Vol.@: 75, Issue 11 (November +2010), pp.@: 943--979. @uref{http://dx.doi.org/10.1016/j.scico.2009.08.001} + +@item [DeRemer 1982] +Frank DeRemer and Thomas Pennello, Efficient Computation of LALR(1) +Look-Ahead Sets, in @cite{ACM Transactions on Programming Languages and +Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@: +615--649. @uref{http://dx.doi.org/10.1145/69622.357187} + +@item [Knuth 1965] +Donald E. Knuth, On the Translation of Languages from Left to Right, in +@cite{Information and Control}, Vol.@: 8, Issue 6 (December 1965), pp.@: +607--639. @uref{http://dx.doi.org/10.1016/S0019-9958(65)90426-2} + +@item [Scott 2000] +Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf Hussain, +@cite{Tomita-Style Generalised LR Parsers}, Royal Holloway, University of +London, Department of Computer Science, TR-00-12 (December 2000). +@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps} +@end table + +@node Index +@unnumbered Index + +@printindex cp + +@bye + +@c LocalWords: texinfo setfilename settitle setchapternewpage finalout texi FSF +@c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex FSF's +@c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry Naur +@c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa Multi +@c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc multi +@c LocalWords: rpcalc Lexer Expr ltcalc mfcalc yylex defaultprec Donnelly Gotos +@c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref yypush +@c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex lr +@c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge POSIX +@c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG yypull +@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit nonfree +@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok rr +@c LocalWords: longjmp fprintf stderr yylloc YYLTYPE cos ln Stallman Destructor +@c LocalWords: symrec val tptr FNCT fnctptr func struct sym enum IEC syntaxes +@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof Lex +@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum DOTDOT +@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype Unary +@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless yynerrs nonterminal +@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES reentrant +@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param yypstate +@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP subrange +@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword loc +@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH inline +@c LocalWords: YYINITDEPTH stmts ref initdcl maybeasm notype Lookahead yyoutput +@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args Autoconf +@c LocalWords: infile ypp yxx outfile itemx tex leaderfill Troubleshouting sqrt +@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead +@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th +@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps +@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC +@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs sr +@c LocalWords: yytokentype destructor multicharacter nonnull EBCDIC nterm LR's +@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK +@c LocalWords: destructors Reentrancy nonreentrant subgrammar nonassociative Ph +@c LocalWords: deffnx namespace xml goto lalr ielr runtime lex yacc yyps env +@c LocalWords: yystate variadic Unshift NLS gettext po UTF Automake LOCALEDIR +@c LocalWords: YYENABLE bindtextdomain Makefile DEFS CPPFLAGS DBISON DeRemer +@c LocalWords: autoreconf Pennello multisets nondeterminism Generalised baz ACM +@c LocalWords: redeclare automata Dparse localedir datadir XSLT midrule Wno +@c LocalWords: Graphviz multitable headitem hh basename Doxygen fno filename +@c LocalWords: doxygen ival sval deftypemethod deallocate pos deftypemethodx +@c LocalWords: Ctor defcv defcvx arg accessors arithmetics CPP ifndef CALCXX +@c LocalWords: lexer's calcxx bool LPAREN RPAREN deallocation cerrno climits +@c LocalWords: cstdlib Debian undef yywrap unput noyywrap nounput zA yyleng +@c LocalWords: errno strtol ERANGE str strerror iostream argc argv Javadoc PSLR +@c LocalWords: bytecode initializers superclass stype ASTNode autoboxing nls +@c LocalWords: toString deftypeivar deftypeivarx deftypeop YYParser strictfp +@c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv +@c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url +@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos +@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's +@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy +@c LocalWords: Scannerless ispell american + +@c Local Variables: +@c ispell-dictionary: "american" +@c fill-column: 76 +@c End: diff --git a/doc/bison.texinfo b/doc/bison.texinfo deleted file mode 100644 index 4f2e1c62..00000000 --- a/doc/bison.texinfo +++ /dev/null @@ -1,11687 +0,0 @@ -\input texinfo @c -*-texinfo-*- -@comment %**start of header -@setfilename bison.info -@include version.texi -@settitle Bison @value{VERSION} -@setchapternewpage odd - -@finalout - -@c SMALL BOOK version -@c This edition has been formatted so that you can format and print it in -@c the smallbook format. -@c @smallbook - -@c Set following if you want to document %default-prec and %no-default-prec. -@c This feature is experimental and may change in future Bison versions. -@c @set defaultprec - -@ifnotinfo -@syncodeindex fn cp -@syncodeindex vr cp -@syncodeindex tp cp -@end ifnotinfo -@ifinfo -@synindex fn cp -@synindex vr cp -@synindex tp cp -@end ifinfo -@comment %**end of header - -@copying - -This manual (@value{UPDATED}) is for GNU Bison (version -@value{VERSION}), the GNU parser generator. - -Copyright @copyright{} 1988-1993, 1995, 1998-2012 Free Software -Foundation, Inc. - -@quotation -Permission is granted to copy, distribute and/or modify this document -under the terms of the GNU Free Documentation License, -Version 1.3 or any later version published by the Free Software -Foundation; with no Invariant Sections, with the Front-Cover texts -being ``A GNU Manual,'' and with the Back-Cover Texts as in -(a) below. A copy of the license is included in the section entitled -``GNU Free Documentation License.'' - -(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and -modify this GNU manual. Buying copies from the FSF -supports it in developing GNU and promoting software -freedom.'' -@end quotation -@end copying - -@dircategory Software development -@direntry -* bison: (bison). GNU parser generator (Yacc replacement). -@end direntry - -@titlepage -@title Bison -@subtitle The Yacc-compatible Parser Generator -@subtitle @value{UPDATED}, Bison Version @value{VERSION} - -@author by Charles Donnelly and Richard Stallman - -@page -@vskip 0pt plus 1filll -@insertcopying -@sp 2 -Published by the Free Software Foundation @* -51 Franklin Street, Fifth Floor @* -Boston, MA 02110-1301 USA @* -Printed copies are available from the Free Software Foundation.@* -ISBN 1-882114-44-2 -@sp 2 -Cover art by Etienne Suvasa. -@end titlepage - -@contents - -@ifnottex -@node Top -@top Bison -@insertcopying -@end ifnottex - -@menu -* Introduction:: -* Conditions:: -* Copying:: The GNU General Public License says - how you can copy and share Bison. - -Tutorial sections: -* Concepts:: Basic concepts for understanding Bison. -* Examples:: Three simple explained examples of using Bison. - -Reference sections: -* Grammar File:: Writing Bison declarations and rules. -* Interface:: C-language interface to the parser function @code{yyparse}. -* Algorithm:: How the Bison parser works at run-time. -* Error Recovery:: Writing rules for error recovery. -* Context Dependency:: What to do if your language syntax is too - messy for Bison to handle straightforwardly. -* Debugging:: Understanding or debugging Bison parsers. -* Invocation:: How to run Bison (to produce the parser implementation). -* Other Languages:: Creating C++ and Java parsers. -* FAQ:: Frequently Asked Questions -* Table of Symbols:: All the keywords of the Bison language are explained. -* Glossary:: Basic concepts are explained. -* Copying This Manual:: License for copying this manual. -* Bibliography:: Publications cited in this manual. -* Index:: Cross-references to the text. - -@detailmenu - --- The Detailed Node Listing --- - -The Concepts of Bison - -* Language and Grammar:: Languages and context-free grammars, - as mathematical ideas. -* Grammar in Bison:: How we represent grammars for Bison's sake. -* Semantic Values:: Each token or syntactic grouping can have - a semantic value (the value of an integer, - the name of an identifier, etc.). -* Semantic Actions:: Each rule can have an action containing C code. -* GLR Parsers:: Writing parsers for general context-free languages. -* Locations:: Overview of location tracking. -* Bison Parser:: What are Bison's input and output, - how is the output used? -* Stages:: Stages in writing and running Bison grammars. -* Grammar Layout:: Overall structure of a Bison grammar file. - -Writing GLR Parsers - -* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. -* Merging GLR Parses:: Using GLR parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. -* Compiler Requirements:: GLR parsers require a modern C compiler. - -Examples - -* RPN Calc:: Reverse polish notation calculator; - a first example with no operator precedence. -* Infix Calc:: Infix (algebraic) notation calculator. - Operator precedence is introduced. -* Simple Error Recovery:: Continuing after syntax errors. -* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. -* Multi-function Calc:: Calculator with memory and trig functions. - It uses multiple data-types for semantic values. -* Exercises:: Ideas for improving the multi-function calculator. - -Reverse Polish Notation Calculator - -* Rpcalc Declarations:: Prologue (declarations) for rpcalc. -* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. -* Rpcalc Lexer:: The lexical analyzer. -* Rpcalc Main:: The controlling function. -* Rpcalc Error:: The error reporting function. -* Rpcalc Generate:: Running Bison on the grammar file. -* Rpcalc Compile:: Run the C compiler on the output code. - -Grammar Rules for @code{rpcalc} - -* Rpcalc Input:: -* Rpcalc Line:: -* Rpcalc Expr:: - -Location Tracking Calculator: @code{ltcalc} - -* Ltcalc Declarations:: Bison and C declarations for ltcalc. -* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. -* Ltcalc Lexer:: The lexical analyzer. - -Multi-Function Calculator: @code{mfcalc} - -* Mfcalc Declarations:: Bison declarations for multi-function calculator. -* Mfcalc Rules:: Grammar rules for the calculator. -* Mfcalc Symbol Table:: Symbol table management subroutines. - -Bison Grammar Files - -* Grammar Outline:: Overall layout of the grammar file. -* Symbols:: Terminal and nonterminal symbols. -* Rules:: How to write grammar rules. -* Recursion:: Writing recursive rules. -* Semantics:: Semantic values and actions. -* Tracking Locations:: Locations and actions. -* Named References:: Using named references in actions. -* Declarations:: All kinds of Bison declarations are described here. -* Multiple Parsers:: Putting more than one Bison parser in one program. - -Outline of a Bison Grammar - -* Prologue:: Syntax and usage of the prologue. -* Prologue Alternatives:: Syntax and usage of alternatives to the prologue. -* Bison Declarations:: Syntax and usage of the Bison declarations section. -* Grammar Rules:: Syntax and usage of the grammar rules section. -* Epilogue:: Syntax and usage of the epilogue. - -Defining Language Semantics - -* Value Type:: Specifying one data type for all semantic values. -* Multiple Types:: Specifying several alternative data types. -* Actions:: An action is the semantic definition of a grammar rule. -* Action Types:: Specifying data types for actions to operate on. -* Mid-Rule Actions:: Most actions go at the end of a rule. - This says when, why and how to use the exceptional - action in the middle of a rule. - -Tracking Locations - -* Location Type:: Specifying a data type for locations. -* Actions and Locations:: Using locations in actions. -* Location Default Action:: Defining a general way to compute locations. - -Bison Declarations - -* Require Decl:: Requiring a Bison version. -* Token Decl:: Declaring terminal symbols. -* Precedence Decl:: Declaring terminals with precedence and associativity. -* Union Decl:: Declaring the set of all semantic value types. -* Type Decl:: Declaring the choice of type for a nonterminal symbol. -* Initial Action Decl:: Code run before parsing starts. -* Destructor Decl:: Declaring how symbols are freed. -* Printer Decl:: Declaring how symbol values are displayed. -* Expect Decl:: Suppressing warnings about parsing conflicts. -* Start Decl:: Specifying the start symbol. -* Pure Decl:: Requesting a reentrant parser. -* Push Decl:: Requesting a push parser. -* Decl Summary:: Table of all Bison declarations. -* %define Summary:: Defining variables to adjust Bison's behavior. -* %code Summary:: Inserting code into the parser source. - -Parser C-Language Interface - -* Parser Function:: How to call @code{yyparse} and what it returns. -* Push Parser Function:: How to call @code{yypush_parse} and what it returns. -* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. -* Parser Create Function:: How to call @code{yypstate_new} and what it returns. -* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. -* Lexical:: You must supply a function @code{yylex} - which reads tokens. -* Error Reporting:: You must supply a function @code{yyerror}. -* Action Features:: Special features for use in actions. -* Internationalization:: How to let the parser speak in the user's - native language. - -The Lexical Analyzer Function @code{yylex} - -* Calling Convention:: How @code{yyparse} calls @code{yylex}. -* Token Values:: How @code{yylex} must return the semantic value - of the token it has read. -* Token Locations:: How @code{yylex} must return the text location - (line number, etc.) of the token, if the - actions want that. -* Pure Calling:: How the calling convention differs in a pure parser - (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). - -The Bison Parser Algorithm - -* Lookahead:: Parser looks one token ahead when deciding what to do. -* Shift/Reduce:: Conflicts: when either shifting or reduction is valid. -* Precedence:: Operator precedence works by resolving conflicts. -* Contextual Precedence:: When an operator's precedence depends on context. -* Parser States:: The parser is a finite-state-machine with stack. -* Reduce/Reduce:: When two rules are applicable in the same situation. -* Mysterious Conflicts:: Conflicts that look unjustified. -* Tuning LR:: How to tune fundamental aspects of LR-based parsing. -* Generalized LR Parsing:: Parsing arbitrary context-free grammars. -* Memory Management:: What happens when memory is exhausted. How to avoid it. - -Operator Precedence - -* Why Precedence:: An example showing why precedence is needed. -* Using Precedence:: How to specify precedence in Bison grammars. -* Precedence Examples:: How these features are used in the previous example. -* How Precedence:: How they work. - -Tuning LR - -* LR Table Construction:: Choose a different construction algorithm. -* Default Reductions:: Disable default reductions. -* LAC:: Correct lookahead sets in the parser states. -* Unreachable States:: Keep unreachable parser states for debugging. - -Handling Context Dependencies - -* Semantic Tokens:: Token parsing can depend on the semantic context. -* Lexical Tie-ins:: Token parsing can depend on the syntactic context. -* Tie-in Recovery:: Lexical tie-ins have implications for how - error recovery rules must be written. - -Debugging Your Parser - -* Understanding:: Understanding the structure of your parser. -* Tracing:: Tracing the execution of your parser. - -Tracing Your Parser - -* Enabling Traces:: Activating run-time trace support -* Mfcalc Traces:: Extending @code{mfcalc} to support traces -* The YYPRINT Macro:: Obsolete interface for semantic value reports - -Invoking Bison - -* Bison Options:: All the options described in detail, - in alphabetical order by short options. -* Option Cross Key:: Alphabetical list of long options. -* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}. - -Parsers Written In Other Languages - -* C++ Parsers:: The interface to generate C++ parser classes -* Java Parsers:: The interface to generate Java parser classes - -C++ Parsers - -* C++ Bison Interface:: Asking for C++ parser generation -* C++ Semantic Values:: %union vs. C++ -* C++ Location Values:: The position and location classes -* C++ Parser Interface:: Instantiating and running the parser -* C++ Scanner Interface:: Exchanges between yylex and parse -* A Complete C++ Example:: Demonstrating their use - -C++ Location Values - -* C++ position:: One point in the source file -* C++ location:: Two points in the source file - -A Complete C++ Example - -* Calc++ --- C++ Calculator:: The specifications -* Calc++ Parsing Driver:: An active parsing context -* Calc++ Parser:: A parser class -* Calc++ Scanner:: A pure C++ Flex scanner -* Calc++ Top Level:: Conducting the band - -Java Parsers - -* Java Bison Interface:: Asking for Java parser generation -* Java Semantic Values:: %type and %token vs. Java -* Java Location Values:: The position and location classes -* Java Parser Interface:: Instantiating and running the parser -* Java Scanner Interface:: Specifying the scanner for the parser -* Java Action Features:: Special features for use in actions -* Java Differences:: Differences between C/C++ and Java Grammars -* Java Declarations Summary:: List of Bison declarations used with Java - -Frequently Asked Questions - -* Memory Exhausted:: Breaking the Stack Limits -* How Can I Reset the Parser:: @code{yyparse} Keeps some State -* Strings are Destroyed:: @code{yylval} Loses Track of Strings -* Implementing Gotos/Loops:: Control Flow in the Calculator -* Multiple start-symbols:: Factoring closely related grammars -* Secure? Conform?:: Is Bison POSIX safe? -* I can't build Bison:: Troubleshooting -* Where can I find help?:: Troubleshouting -* Bug Reports:: Troublereporting -* More Languages:: Parsers in C++, Java, and so on -* Beta Testing:: Experimenting development versions -* Mailing Lists:: Meeting other Bison users - -Copying This Manual - -* Copying This Manual:: License for copying this manual. - -@end detailmenu -@end menu - -@node Introduction -@unnumbered Introduction -@cindex introduction - -@dfn{Bison} is a general-purpose parser generator that converts an -annotated context-free grammar into a deterministic LR or generalized -LR (GLR) parser employing LALR(1) parser tables. As an experimental -feature, Bison can also generate IELR(1) or canonical LR(1) parser -tables. Once you are proficient with Bison, you can use it to develop -a wide range of language parsers, from those used in simple desk -calculators to complex programming languages. - -Bison is upward compatible with Yacc: all properly-written Yacc -grammars ought to work with Bison with no change. Anyone familiar -with Yacc should be able to use Bison with little trouble. You need -to be fluent in C or C++ programming in order to use Bison or to -understand this manual. Java is also supported as an experimental -feature. - -We begin with tutorial chapters that explain the basic concepts of -using Bison and show three explained examples, each building on the -last. If you don't know Bison or Yacc, start by reading these -chapters. Reference chapters follow, which describe specific aspects -of Bison in detail. - -Bison was written originally by Robert Corbett. Richard Stallman made -it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University -added multi-character string literals and other features. Since then, -Bison has grown more robust and evolved many other new features thanks -to the hard work of a long list of volunteers. For details, see the -@file{THANKS} and @file{ChangeLog} files included in the Bison -distribution. - -This edition corresponds to version @value{VERSION} of Bison. - -@node Conditions -@unnumbered Conditions for Using Bison - -The distribution terms for Bison-generated parsers permit using the -parsers in nonfree programs. Before Bison version 2.2, these extra -permissions applied only when Bison was generating LALR(1) -parsers in C@. And before Bison version 1.24, Bison-generated -parsers could be used only in programs that were free software. - -The other GNU programming tools, such as the GNU C -compiler, have never -had such a requirement. They could always be used for nonfree -software. The reason Bison was different was not due to a special -policy decision; it resulted from applying the usual General Public -License to all of the Bison source code. - -The main output of the Bison utility---the Bison parser implementation -file---contains a verbatim copy of a sizable piece of Bison, which is -the code for the parser's implementation. (The actions from your -grammar are inserted into this implementation at one point, but most -of the rest of the implementation is not changed.) When we applied -the GPL terms to the skeleton code for the parser's implementation, -the effect was to restrict the use of Bison output to free software. - -We didn't change the terms because of sympathy for people who want to -make software proprietary. @strong{Software should be free.} But we -concluded that limiting Bison's use to free software was doing little to -encourage people to make other software free. So we decided to make the -practical conditions for using Bison match the practical conditions for -using the other GNU tools. - -This exception applies when Bison is generating code for a parser. -You can tell whether the exception applies to a Bison output file by -inspecting the file for text beginning with ``As a special -exception@dots{}''. The text spells out the exact terms of the -exception. - -@node Copying -@unnumbered GNU GENERAL PUBLIC LICENSE -@include gpl-3.0.texi - -@node Concepts -@chapter The Concepts of Bison - -This chapter introduces many of the basic concepts without which the -details of Bison will not make sense. If you do not already know how to -use Bison or Yacc, we suggest you start by reading this chapter carefully. - -@menu -* Language and Grammar:: Languages and context-free grammars, - as mathematical ideas. -* Grammar in Bison:: How we represent grammars for Bison's sake. -* Semantic Values:: Each token or syntactic grouping can have - a semantic value (the value of an integer, - the name of an identifier, etc.). -* Semantic Actions:: Each rule can have an action containing C code. -* GLR Parsers:: Writing parsers for general context-free languages. -* Locations:: Overview of location tracking. -* Bison Parser:: What are Bison's input and output, - how is the output used? -* Stages:: Stages in writing and running Bison grammars. -* Grammar Layout:: Overall structure of a Bison grammar file. -@end menu - -@node Language and Grammar -@section Languages and Context-Free Grammars - -@cindex context-free grammar -@cindex grammar, context-free -In order for Bison to parse a language, it must be described by a -@dfn{context-free grammar}. This means that you specify one or more -@dfn{syntactic groupings} and give rules for constructing them from their -parts. For example, in the C language, one kind of grouping is called an -`expression'. One rule for making an expression might be, ``An expression -can be made of a minus sign and another expression''. Another would be, -``An expression can be an integer''. As you can see, rules are often -recursive, but there must be at least one rule which leads out of the -recursion. - -@cindex BNF -@cindex Backus-Naur form -The most common formal system for presenting such rules for humans to read -is @dfn{Backus-Naur Form} or ``BNF'', which was developed in -order to specify the language Algol 60. Any grammar expressed in -BNF is a context-free grammar. The input to Bison is -essentially machine-readable BNF. - -@cindex LALR grammars -@cindex IELR grammars -@cindex LR grammars -There are various important subclasses of context-free grammars. Although -it can handle almost all context-free grammars, Bison is optimized for what -are called LR(1) grammars. In brief, in these grammars, it must be possible -to tell how to parse any portion of an input string with just a single token -of lookahead. For historical reasons, Bison by default is limited by the -additional restrictions of LALR(1), which is hard to explain simply. -@xref{Mysterious Conflicts}, for more information on this. As an -experimental feature, you can escape these additional restrictions by -requesting IELR(1) or canonical LR(1) parser tables. @xref{LR Table -Construction}, to learn how. - -@cindex GLR parsing -@cindex generalized LR (GLR) parsing -@cindex ambiguous grammars -@cindex nondeterministic parsing - -Parsers for LR(1) grammars are @dfn{deterministic}, meaning -roughly that the next grammar rule to apply at any point in the input is -uniquely determined by the preceding input and a fixed, finite portion -(called a @dfn{lookahead}) of the remaining input. A context-free -grammar can be @dfn{ambiguous}, meaning that there are multiple ways to -apply the grammar rules to get the same inputs. Even unambiguous -grammars can be @dfn{nondeterministic}, meaning that no fixed -lookahead always suffices to determine the next grammar rule to apply. -With the proper declarations, Bison is also able to parse these more -general context-free grammars, using a technique known as GLR -parsing (for Generalized LR). Bison's GLR parsers -are able to handle any context-free grammar for which the number of -possible parses of any given string is finite. - -@cindex symbols (abstract) -@cindex token -@cindex syntactic grouping -@cindex grouping, syntactic -In the formal grammatical rules for a language, each kind of syntactic -unit or grouping is named by a @dfn{symbol}. Those which are built by -grouping smaller constructs according to grammatical rules are called -@dfn{nonterminal symbols}; those which can't be subdivided are called -@dfn{terminal symbols} or @dfn{token types}. We call a piece of input -corresponding to a single terminal symbol a @dfn{token}, and a piece -corresponding to a single nonterminal symbol a @dfn{grouping}. - -We can use the C language as an example of what symbols, terminal and -nonterminal, mean. The tokens of C are identifiers, constants (numeric -and string), and the various keywords, arithmetic operators and -punctuation marks. So the terminal symbols of a grammar for C include -`identifier', `number', `string', plus one symbol for each keyword, -operator or punctuation mark: `if', `return', `const', `static', `int', -`char', `plus-sign', `open-brace', `close-brace', `comma' and many more. -(These tokens can be subdivided into characters, but that is a matter of -lexicography, not grammar.) - -Here is a simple C function subdivided into tokens: - -@example -int /* @r{keyword `int'} */ -square (int x) /* @r{identifier, open-paren, keyword `int',} - @r{identifier, close-paren} */ -@{ /* @r{open-brace} */ - return x * x; /* @r{keyword `return', identifier, asterisk,} - @r{identifier, semicolon} */ -@} /* @r{close-brace} */ -@end example - -The syntactic groupings of C include the expression, the statement, the -declaration, and the function definition. These are represented in the -grammar of C by nonterminal symbols `expression', `statement', -`declaration' and `function definition'. The full grammar uses dozens of -additional language constructs, each with its own nonterminal symbol, in -order to express the meanings of these four. The example above is a -function definition; it contains one declaration, and one statement. In -the statement, each @samp{x} is an expression and so is @samp{x * x}. - -Each nonterminal symbol must have grammatical rules showing how it is made -out of simpler constructs. For example, one kind of C statement is the -@code{return} statement; this would be described with a grammar rule which -reads informally as follows: - -@quotation -A `statement' can be made of a `return' keyword, an `expression' and a -`semicolon'. -@end quotation - -@noindent -There would be many other rules for `statement', one for each kind of -statement in C. - -@cindex start symbol -One nonterminal symbol must be distinguished as the special one which -defines a complete utterance in the language. It is called the @dfn{start -symbol}. In a compiler, this means a complete input program. In the C -language, the nonterminal symbol `sequence of definitions and declarations' -plays this role. - -For example, @samp{1 + 2} is a valid C expression---a valid part of a C -program---but it is not valid as an @emph{entire} C program. In the -context-free grammar of C, this follows from the fact that `expression' is -not the start symbol. - -The Bison parser reads a sequence of tokens as its input, and groups the -tokens using the grammar rules. If the input is valid, the end result is -that the entire token sequence reduces to a single grouping whose symbol is -the grammar's start symbol. If we use a grammar for C, the entire input -must be a `sequence of definitions and declarations'. If not, the parser -reports a syntax error. - -@node Grammar in Bison -@section From Formal Rules to Bison Input -@cindex Bison grammar -@cindex grammar, Bison -@cindex formal grammar - -A formal grammar is a mathematical construct. To define the language -for Bison, you must write a file expressing the grammar in Bison syntax: -a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}. - -A nonterminal symbol in the formal grammar is represented in Bison input -as an identifier, like an identifier in C@. By convention, it should be -in lower case, such as @code{expr}, @code{stmt} or @code{declaration}. - -The Bison representation for a terminal symbol is also called a @dfn{token -type}. Token types as well can be represented as C-like identifiers. By -convention, these identifiers should be upper case to distinguish them from -nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or -@code{RETURN}. A terminal symbol that stands for a particular keyword in -the language should be named after that keyword converted to upper case. -The terminal symbol @code{error} is reserved for error recovery. -@xref{Symbols}. - -A terminal symbol can also be represented as a character literal, just like -a C character constant. You should do this whenever a token is just a -single character (parenthesis, plus-sign, etc.): use that same character in -a literal as the terminal symbol for that token. - -A third way to represent a terminal symbol is with a C string constant -containing several characters. @xref{Symbols}, for more information. - -The grammar rules also have an expression in Bison syntax. For example, -here is the Bison rule for a C @code{return} statement. The semicolon in -quotes is a literal character token, representing part of the C syntax for -the statement; the naked semicolon, and the colon, are Bison punctuation -used in every rule. - -@example -stmt: RETURN expr ';' ; -@end example - -@noindent -@xref{Rules, ,Syntax of Grammar Rules}. - -@node Semantic Values -@section Semantic Values -@cindex semantic value -@cindex value, semantic - -A formal grammar selects tokens only by their classifications: for example, -if a rule mentions the terminal symbol `integer constant', it means that -@emph{any} integer constant is grammatically valid in that position. The -precise value of the constant is irrelevant to how to parse the input: if -@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally -grammatical. - -But the precise value is very important for what the input means once it is -parsed. A compiler is useless if it fails to distinguish between 4, 1 and -3989 as constants in the program! Therefore, each token in a Bison grammar -has both a token type and a @dfn{semantic value}. @xref{Semantics, -,Defining Language Semantics}, -for details. - -The token type is a terminal symbol defined in the grammar, such as -@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything -you need to know to decide where the token may validly appear and how to -group it with other tokens. The grammar rules know nothing about tokens -except their types. - -The semantic value has all the rest of the information about the -meaning of the token, such as the value of an integer, or the name of an -identifier. (A token such as @code{','} which is just punctuation doesn't -need to have any semantic value.) - -For example, an input token might be classified as token type -@code{INTEGER} and have the semantic value 4. Another input token might -have the same token type @code{INTEGER} but value 3989. When a grammar -rule says that @code{INTEGER} is allowed, either of these tokens is -acceptable because each is an @code{INTEGER}. When the parser accepts the -token, it keeps track of the token's semantic value. - -Each grouping can also have a semantic value as well as its nonterminal -symbol. For example, in a calculator, an expression typically has a -semantic value that is a number. In a compiler for a programming -language, an expression typically has a semantic value that is a tree -structure describing the meaning of the expression. - -@node Semantic Actions -@section Semantic Actions -@cindex semantic actions -@cindex actions, semantic - -In order to be useful, a program must do more than parse input; it must -also produce some output based on the input. In a Bison grammar, a grammar -rule can have an @dfn{action} made up of C statements. Each time the -parser recognizes a match for that rule, the action is executed. -@xref{Actions}. - -Most of the time, the purpose of an action is to compute the semantic value -of the whole construct from the semantic values of its parts. For example, -suppose we have a rule which says an expression can be the sum of two -expressions. When the parser recognizes such a sum, each of the -subexpressions has a semantic value which describes how it was built up. -The action for this rule should create a similar sort of value for the -newly recognized larger expression. - -For example, here is a rule that says an expression can be the sum of -two subexpressions: - -@example -expr: expr '+' expr @{ $$ = $1 + $3; @} ; -@end example - -@noindent -The action says how to produce the semantic value of the sum expression -from the values of the two subexpressions. - -@node GLR Parsers -@section Writing GLR Parsers -@cindex GLR parsing -@cindex generalized LR (GLR) parsing -@findex %glr-parser -@cindex conflicts -@cindex shift/reduce conflicts -@cindex reduce/reduce conflicts - -In some grammars, Bison's deterministic -LR(1) parsing algorithm cannot decide whether to apply a -certain grammar rule at a given point. That is, it may not be able to -decide (on the basis of the input read so far) which of two possible -reductions (applications of a grammar rule) applies, or whether to apply -a reduction or read more of the input and apply a reduction later in the -input. These are known respectively as @dfn{reduce/reduce} conflicts -(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts -(@pxref{Shift/Reduce}). - -To use a grammar that is not easily modified to be LR(1), a -more general parsing algorithm is sometimes necessary. If you include -@code{%glr-parser} among the Bison declarations in your file -(@pxref{Grammar Outline}), the result is a Generalized LR -(GLR) parser. These parsers handle Bison grammars that -contain no unresolved conflicts (i.e., after applying precedence -declarations) identically to deterministic parsers. However, when -faced with unresolved shift/reduce and reduce/reduce conflicts, -GLR parsers use the simple expedient of doing both, -effectively cloning the parser to follow both possibilities. Each of -the resulting parsers can again split, so that at any given time, there -can be any number of possible parses being explored. The parsers -proceed in lockstep; that is, all of them consume (shift) a given input -symbol before any of them proceed to the next. Each of the cloned -parsers eventually meets one of two possible fates: either it runs into -a parsing error, in which case it simply vanishes, or it merges with -another parser, because the two of them have reduced the input to an -identical set of symbols. - -During the time that there are multiple parsers, semantic actions are -recorded, but not performed. When a parser disappears, its recorded -semantic actions disappear as well, and are never performed. When a -reduction makes two parsers identical, causing them to merge, Bison -records both sets of semantic actions. Whenever the last two parsers -merge, reverting to the single-parser case, Bison resolves all the -outstanding actions either by precedences given to the grammar rules -involved, or by performing both actions, and then calling a designated -user-defined function on the resulting values to produce an arbitrary -merged result. - -@menu -* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars. -* Merging GLR Parses:: Using GLR parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. -* Compiler Requirements:: GLR parsers require a modern C compiler. -@end menu - -@node Simple GLR Parsers -@subsection Using GLR on Unambiguous Grammars -@cindex GLR parsing, unambiguous grammars -@cindex generalized LR (GLR) parsing, unambiguous grammars -@findex %glr-parser -@findex %expect-rr -@cindex conflicts -@cindex reduce/reduce conflicts -@cindex shift/reduce conflicts - -In the simplest cases, you can use the GLR algorithm -to parse grammars that are unambiguous but fail to be LR(1). -Such grammars typically require more than one symbol of lookahead. - -Consider a problem that -arises in the declaration of enumerated and subrange types in the -programming language Pascal. Here are some examples: - -@example -type subrange = lo .. hi; -type enum = (a, b, c); -@end example - -@noindent -The original language standard allows only numeric -literals and constant identifiers for the subrange bounds (@samp{lo} -and @samp{hi}), but Extended Pascal (ISO/IEC -10206) and many other -Pascal implementations allow arbitrary expressions there. This gives -rise to the following situation, containing a superfluous pair of -parentheses: - -@example -type subrange = (a) .. b; -@end example - -@noindent -Compare this to the following declaration of an enumerated -type with only one value: - -@example -type enum = (a); -@end example - -@noindent -(These declarations are contrived, but they are syntactically -valid, and more-complicated cases can come up in practical programs.) - -These two declarations look identical until the @samp{..} token. -With normal LR(1) one-token lookahead it is not -possible to decide between the two forms when the identifier -@samp{a} is parsed. It is, however, desirable -for a parser to decide this, since in the latter case -@samp{a} must become a new identifier to represent the enumeration -value, while in the former case @samp{a} must be evaluated with its -current meaning, which may be a constant or even a function call. - -You could parse @samp{(a)} as an ``unspecified identifier in parentheses'', -to be resolved later, but this typically requires substantial -contortions in both semantic actions and large parts of the -grammar, where the parentheses are nested in the recursive rules for -expressions. - -You might think of using the lexer to distinguish between the two -forms by returning different tokens for currently defined and -undefined identifiers. But if these declarations occur in a local -scope, and @samp{a} is defined in an outer scope, then both forms -are possible---either locally redefining @samp{a}, or using the -value of @samp{a} from the outer scope. So this approach cannot -work. - -A simple solution to this problem is to declare the parser to -use the GLR algorithm. -When the GLR parser reaches the critical state, it -merely splits into two branches and pursues both syntax rules -simultaneously. Sooner or later, one of them runs into a parsing -error. If there is a @samp{..} token before the next -@samp{;}, the rule for enumerated types fails since it cannot -accept @samp{..} anywhere; otherwise, the subrange type rule -fails since it requires a @samp{..} token. So one of the branches -fails silently, and the other one continues normally, performing -all the intermediate actions that were postponed during the split. - -If the input is syntactically incorrect, both branches fail and the parser -reports a syntax error as usual. - -The effect of all this is that the parser seems to ``guess'' the -correct branch to take, or in other words, it seems to use more -lookahead than the underlying LR(1) algorithm actually allows -for. In this example, LR(2) would suffice, but also some cases -that are not LR(@math{k}) for any @math{k} can be handled this way. - -In general, a GLR parser can take quadratic or cubic worst-case time, -and the current Bison parser even takes exponential time and space -for some grammars. In practice, this rarely happens, and for many -grammars it is possible to prove that it cannot happen. -The present example contains only one conflict between two -rules, and the type-declaration context containing the conflict -cannot be nested. So the number of -branches that can exist at any time is limited by the constant 2, -and the parsing time is still linear. - -Here is a Bison grammar corresponding to the example above. It -parses a vastly simplified form of Pascal type declarations. - -@example -%token TYPE DOTDOT ID - -@group -%left '+' '-' -%left '*' '/' -@end group - -%% - -@group -type_decl: TYPE ID '=' type ';' ; -@end group - -@group -type: - '(' id_list ')' -| expr DOTDOT expr -; -@end group - -@group -id_list: - ID -| id_list ',' ID -; -@end group - -@group -expr: - '(' expr ')' -| expr '+' expr -| expr '-' expr -| expr '*' expr -| expr '/' expr -| ID -; -@end group -@end example - -When used as a normal LR(1) grammar, Bison correctly complains -about one reduce/reduce conflict. In the conflicting situation the -parser chooses one of the alternatives, arbitrarily the one -declared first. Therefore the following correct input is not -recognized: - -@example -type t = (a) .. b; -@end example - -The parser can be turned into a GLR parser, while also telling Bison -to be silent about the one known reduce/reduce conflict, by adding -these two declarations to the Bison grammar file (before the first -@samp{%%}): - -@example -%glr-parser -%expect-rr 1 -@end example - -@noindent -No change in the grammar itself is required. Now the -parser recognizes all valid declarations, according to the -limited syntax above, transparently. In fact, the user does not even -notice when the parser splits. - -So here we have a case where we can use the benefits of GLR, -almost without disadvantages. Even in simple cases like this, however, -there are at least two potential problems to beware. First, always -analyze the conflicts reported by Bison to make sure that GLR -splitting is only done where it is intended. A GLR parser -splitting inadvertently may cause problems less obvious than an -LR parser statically choosing the wrong alternative in a -conflict. Second, consider interactions with the lexer (@pxref{Semantic -Tokens}) with great care. Since a split parser consumes tokens without -performing any actions during the split, the lexer cannot obtain -information via parser actions. Some cases of lexer interactions can be -eliminated by using GLR to shift the complications from the -lexer to the parser. You must check the remaining cases for -correctness. - -In our example, it would be safe for the lexer to return tokens based on -their current meanings in some symbol table, because no new symbols are -defined in the middle of a type declaration. Though it is possible for -a parser to define the enumeration constants as they are parsed, before -the type declaration is completed, it actually makes no difference since -they cannot be used within the same enumerated type declaration. - -@node Merging GLR Parses -@subsection Using GLR to Resolve Ambiguities -@cindex GLR parsing, ambiguous grammars -@cindex generalized LR (GLR) parsing, ambiguous grammars -@findex %dprec -@findex %merge -@cindex conflicts -@cindex reduce/reduce conflicts - -Let's consider an example, vastly simplified from a C++ grammar. - -@example -%@{ - #include - #define YYSTYPE char const * - int yylex (void); - void yyerror (char const *); -%@} - -%token TYPENAME ID - -%right '=' -%left '+' - -%glr-parser - -%% - -prog: - /* Nothing. */ -| prog stmt @{ printf ("\n"); @} -; - -stmt: - expr ';' %dprec 1 -| decl %dprec 2 -; - -expr: - ID @{ printf ("%s ", $$); @} -| TYPENAME '(' expr ')' - @{ printf ("%s ", $1); @} -| expr '+' expr @{ printf ("+ "); @} -| expr '=' expr @{ printf ("= "); @} -; - -decl: - TYPENAME declarator ';' - @{ printf ("%s ", $1); @} -| TYPENAME declarator '=' expr ';' - @{ printf ("%s ", $1); @} -; - -declarator: - ID @{ printf ("\"%s\" ", $1); @} -| '(' declarator ')' -; -@end example - -@noindent -This models a problematic part of the C++ grammar---the ambiguity between -certain declarations and statements. For example, - -@example -T (x) = y+z; -@end example - -@noindent -parses as either an @code{expr} or a @code{stmt} -(assuming that @samp{T} is recognized as a @code{TYPENAME} and -@samp{x} as an @code{ID}). -Bison detects this as a reduce/reduce conflict between the rules -@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the -time it encounters @code{x} in the example above. Since this is a -GLR parser, it therefore splits the problem into two parses, one for -each choice of resolving the reduce/reduce conflict. -Unlike the example from the previous section (@pxref{Simple GLR Parsers}), -however, neither of these parses ``dies,'' because the grammar as it stands is -ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and -the other reduces @code{stmt : decl}, after which both parsers are in an -identical state: they've seen @samp{prog stmt} and have the same unprocessed -input remaining. We say that these parses have @dfn{merged.} - -At this point, the GLR parser requires a specification in the -grammar of how to choose between the competing parses. -In the example above, the two @code{%dprec} -declarations specify that Bison is to give precedence -to the parse that interprets the example as a -@code{decl}, which implies that @code{x} is a declarator. -The parser therefore prints - -@example -"x" y z + T -@end example - -The @code{%dprec} declarations only come into play when more than one -parse survives. Consider a different input string for this parser: - -@example -T (x) + y; -@end example - -@noindent -This is another example of using GLR to parse an unambiguous -construct, as shown in the previous section (@pxref{Simple GLR Parsers}). -Here, there is no ambiguity (this cannot be parsed as a declaration). -However, at the time the Bison parser encounters @code{x}, it does not -have enough information to resolve the reduce/reduce conflict (again, -between @code{x} as an @code{expr} or a @code{declarator}). In this -case, no precedence declaration is used. Again, the parser splits -into two, one assuming that @code{x} is an @code{expr}, and the other -assuming @code{x} is a @code{declarator}. The second of these parsers -then vanishes when it sees @code{+}, and the parser prints - -@example -x T y + -@end example - -Suppose that instead of resolving the ambiguity, you wanted to see all -the possibilities. For this purpose, you must merge the semantic -actions of the two possible parsers, rather than choosing one over the -other. To do so, you could change the declaration of @code{stmt} as -follows: - -@example -stmt: - expr ';' %merge -| decl %merge -; -@end example - -@noindent -and define the @code{stmtMerge} function as: - -@example -static YYSTYPE -stmtMerge (YYSTYPE x0, YYSTYPE x1) -@{ - printf (" "); - return ""; -@} -@end example - -@noindent -with an accompanying forward declaration -in the C declarations at the beginning of the file: - -@example -%@{ - #define YYSTYPE char const * - static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1); -%@} -@end example - -@noindent -With these declarations, the resulting parser parses the first example -as both an @code{expr} and a @code{decl}, and prints - -@example -"x" y z + T x T y z + = -@end example - -Bison requires that all of the -productions that participate in any particular merge have identical -@samp{%merge} clauses. Otherwise, the ambiguity would be unresolvable, -and the parser will report an error during any parse that results in -the offending merge. - -@node GLR Semantic Actions -@subsection GLR Semantic Actions - -@cindex deferred semantic actions -By definition, a deferred semantic action is not performed at the same time as -the associated reduction. -This raises caveats for several Bison features you might use in a semantic -action in a GLR parser. - -@vindex yychar -@cindex GLR parsers and @code{yychar} -@vindex yylval -@cindex GLR parsers and @code{yylval} -@vindex yylloc -@cindex GLR parsers and @code{yylloc} -In any semantic action, you can examine @code{yychar} to determine the type of -the lookahead token present at the time of the associated reduction. -After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF}, -you can then examine @code{yylval} and @code{yylloc} to determine the -lookahead token's semantic value and location, if any. -In a nondeferred semantic action, you can also modify any of these variables to -influence syntax analysis. -@xref{Lookahead, ,Lookahead Tokens}. - -@findex yyclearin -@cindex GLR parsers and @code{yyclearin} -In a deferred semantic action, it's too late to influence syntax analysis. -In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to -shallow copies of the values they had at the time of the associated reduction. -For this reason alone, modifying them is dangerous. -Moreover, the result of modifying them is undefined and subject to change with -future versions of Bison. -For example, if a semantic action might be deferred, you should never write it -to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free -memory referenced by @code{yylval}. - -@findex YYERROR -@cindex GLR parsers and @code{YYERROR} -Another Bison feature requiring special consideration is @code{YYERROR} -(@pxref{Action Features}), which you can invoke in a semantic action to -initiate error recovery. -During deterministic GLR operation, the effect of @code{YYERROR} is -the same as its effect in a deterministic parser. -In a deferred semantic action, its effect is undefined. -@c The effect is probably a syntax error at the split point. - -Also, see @ref{Location Default Action, ,Default Action for Locations}, which -describes a special usage of @code{YYLLOC_DEFAULT} in GLR parsers. - -@node Compiler Requirements -@subsection Considerations when Compiling GLR Parsers -@cindex @code{inline} -@cindex GLR parsers and @code{inline} - -The GLR parsers require a compiler for ISO C89 or -later. In addition, they use the @code{inline} keyword, which is not -C89, but is C99 and is a common extension in pre-C99 compilers. It is -up to the user of these parsers to handle -portability issues. For instance, if using Autoconf and the Autoconf -macro @code{AC_C_INLINE}, a mere - -@example -%@{ - #include -%@} -@end example - -@noindent -will suffice. Otherwise, we suggest - -@example -%@{ - #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \ - && ! defined inline) - # define inline - #endif -%@} -@end example - -@node Locations -@section Locations -@cindex location -@cindex textual location -@cindex location, textual - -Many applications, like interpreters or compilers, have to produce verbose -and useful error messages. To achieve this, one must be able to keep track of -the @dfn{textual location}, or @dfn{location}, of each syntactic construct. -Bison provides a mechanism for handling these locations. - -Each token has a semantic value. In a similar fashion, each token has an -associated location, but the type of locations is the same for all tokens -and groupings. Moreover, the output parser is equipped with a default data -structure for storing locations (@pxref{Tracking Locations}, for more -details). - -Like semantic values, locations can be reached in actions using a dedicated -set of constructs. In the example above, the location of the whole grouping -is @code{@@$}, while the locations of the subexpressions are @code{@@1} and -@code{@@3}. - -When a rule is matched, a default action is used to compute the semantic value -of its left hand side (@pxref{Actions}). In the same way, another default -action is used for locations. However, the action for locations is general -enough for most cases, meaning there is usually no need to describe for each -rule how @code{@@$} should be formed. When building a new location for a given -grouping, the default behavior of the output parser is to take the beginning -of the first symbol, and the end of the last symbol. - -@node Bison Parser -@section Bison Output: the Parser Implementation File -@cindex Bison parser -@cindex Bison utility -@cindex lexical analyzer, purpose -@cindex parser - -When you run Bison, you give it a Bison grammar file as input. The -most important output is a C source file that implements a parser for -the language described by the grammar. This parser is called a -@dfn{Bison parser}, and this file is called a @dfn{Bison parser -implementation file}. Keep in mind that the Bison utility and the -Bison parser are two distinct programs: the Bison utility is a program -whose output is the Bison parser implementation file that becomes part -of your program. - -The job of the Bison parser is to group tokens into groupings according to -the grammar rules---for example, to build identifiers and operators into -expressions. As it does this, it runs the actions for the grammar rules it -uses. - -The tokens come from a function called the @dfn{lexical analyzer} that -you must supply in some fashion (such as by writing it in C). The Bison -parser calls the lexical analyzer each time it wants a new token. It -doesn't know what is ``inside'' the tokens (though their semantic values -may reflect this). Typically the lexical analyzer makes the tokens by -parsing characters of text, but Bison does not depend on this. -@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. - -The Bison parser implementation file is C code which defines a -function named @code{yyparse} which implements that grammar. This -function does not make a complete C program: you must supply some -additional functions. One is the lexical analyzer. Another is an -error-reporting function which the parser calls to report an error. -In addition, a complete C program must start with a function called -@code{main}; you have to provide this, and arrange for it to call -@code{yyparse} or the parser will never run. @xref{Interface, ,Parser -C-Language Interface}. - -Aside from the token type names and the symbols in the actions you -write, all symbols defined in the Bison parser implementation file -itself begin with @samp{yy} or @samp{YY}. This includes interface -functions such as the lexical analyzer function @code{yylex}, the -error reporting function @code{yyerror} and the parser function -@code{yyparse} itself. This also includes numerous identifiers used -for internal purposes. Therefore, you should avoid using C -identifiers starting with @samp{yy} or @samp{YY} in the Bison grammar -file except for the ones defined in this manual. Also, you should -avoid using the C identifiers @samp{malloc} and @samp{free} for -anything other than their usual meanings. - -In some cases the Bison parser implementation file includes system -headers, and in those cases your code should respect the identifiers -reserved by those headers. On some non-GNU hosts, @code{}, -@code{}, @code{}, and @code{} are -included as needed to declare memory allocators and related types. -@code{} is included if message translation is in use -(@pxref{Internationalization}). Other system headers may be included -if you define @code{YYDEBUG} to a nonzero value (@pxref{Tracing, -,Tracing Your Parser}). - -@node Stages -@section Stages in Using Bison -@cindex stages in using Bison -@cindex using Bison - -The actual language-design process using Bison, from grammar specification -to a working compiler or interpreter, has these parts: - -@enumerate -@item -Formally specify the grammar in a form recognized by Bison -(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule -in the language, describe the action that is to be taken when an -instance of that rule is recognized. The action is described by a -sequence of C statements. - -@item -Write a lexical analyzer to process input and pass tokens to the parser. -The lexical analyzer may be written by hand in C (@pxref{Lexical, ,The -Lexical Analyzer Function @code{yylex}}). It could also be produced -using Lex, but the use of Lex is not discussed in this manual. - -@item -Write a controlling function that calls the Bison-produced parser. - -@item -Write error-reporting routines. -@end enumerate - -To turn this source code as written into a runnable program, you -must follow these steps: - -@enumerate -@item -Run Bison on the grammar to produce the parser. - -@item -Compile the code output by Bison, as well as any other source files. - -@item -Link the object files to produce the finished product. -@end enumerate - -@node Grammar Layout -@section The Overall Layout of a Bison Grammar -@cindex grammar file -@cindex file format -@cindex format of grammar file -@cindex layout of Bison grammar - -The input file for the Bison utility is a @dfn{Bison grammar file}. The -general form of a Bison grammar file is as follows: - -@example -%@{ -@var{Prologue} -%@} - -@var{Bison declarations} - -%% -@var{Grammar rules} -%% -@var{Epilogue} -@end example - -@noindent -The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears -in every Bison grammar file to separate the sections. - -The prologue may define types and variables used in the actions. You can -also use preprocessor commands to define macros used there, and use -@code{#include} to include header files that do any of these things. -You need to declare the lexical analyzer @code{yylex} and the error -printer @code{yyerror} here, along with any other global identifiers -used by the actions in the grammar rules. - -The Bison declarations declare the names of the terminal and nonterminal -symbols, and may also describe operator precedence and the data types of -semantic values of various symbols. - -The grammar rules define how to construct each nonterminal symbol from its -parts. - -The epilogue can contain any code you want to use. Often the -definitions of functions declared in the prologue go here. In a -simple program, all the rest of the program can go here. - -@node Examples -@chapter Examples -@cindex simple examples -@cindex examples, simple - -Now we show and explain several sample programs written using Bison: a -reverse polish notation calculator, an algebraic (infix) notation -calculator --- later extended to track ``locations'' --- -and a multi-function calculator. All -produce usable, though limited, interactive desk-top calculators. - -These examples are simple, but Bison grammars for real programming -languages are written the same way. You can copy these examples into a -source file to try them. - -@menu -* RPN Calc:: Reverse polish notation calculator; - a first example with no operator precedence. -* Infix Calc:: Infix (algebraic) notation calculator. - Operator precedence is introduced. -* Simple Error Recovery:: Continuing after syntax errors. -* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. -* Multi-function Calc:: Calculator with memory and trig functions. - It uses multiple data-types for semantic values. -* Exercises:: Ideas for improving the multi-function calculator. -@end menu - -@node RPN Calc -@section Reverse Polish Notation Calculator -@cindex reverse polish notation -@cindex polish notation calculator -@cindex @code{rpcalc} -@cindex calculator, simple - -The first example is that of a simple double-precision @dfn{reverse polish -notation} calculator (a calculator using postfix operators). This example -provides a good starting point, since operator precedence is not an issue. -The second example will illustrate how operator precedence is handled. - -The source code for this calculator is named @file{rpcalc.y}. The -@samp{.y} extension is a convention used for Bison grammar files. - -@menu -* Rpcalc Declarations:: Prologue (declarations) for rpcalc. -* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. -* Rpcalc Lexer:: The lexical analyzer. -* Rpcalc Main:: The controlling function. -* Rpcalc Error:: The error reporting function. -* Rpcalc Generate:: Running Bison on the grammar file. -* Rpcalc Compile:: Run the C compiler on the output code. -@end menu - -@node Rpcalc Declarations -@subsection Declarations for @code{rpcalc} - -Here are the C and Bison declarations for the reverse polish notation -calculator. As in C, comments are placed between @samp{/*@dots{}*/}. - -@example -/* Reverse polish notation calculator. */ - -%@{ - #define YYSTYPE double - #include - int yylex (void); - void yyerror (char const *); -%@} - -%token NUM - -%% /* Grammar rules and actions follow. */ -@end example - -The declarations section (@pxref{Prologue, , The prologue}) contains two -preprocessor directives and two forward declarations. - -The @code{#define} directive defines the macro @code{YYSTYPE}, thus -specifying the C data type for semantic values of both tokens and -groupings (@pxref{Value Type, ,Data Types of Semantic Values}). The -Bison parser will use whatever type @code{YYSTYPE} is defined as; if you -don't define it, @code{int} is the default. Because we specify -@code{double}, each token and each expression has an associated value, -which is a floating point number. - -The @code{#include} directive is used to declare the exponentiation -function @code{pow}. - -The forward declarations for @code{yylex} and @code{yyerror} are -needed because the C language requires that functions be declared -before they are used. These functions will be defined in the -epilogue, but the parser calls them so they must be declared in the -prologue. - -The second section, Bison declarations, provides information to Bison -about the token types (@pxref{Bison Declarations, ,The Bison -Declarations Section}). Each terminal symbol that is not a -single-character literal must be declared here. (Single-character -literals normally don't need to be declared.) In this example, all the -arithmetic operators are designated by single-character literals, so the -only terminal symbol that needs to be declared is @code{NUM}, the token -type for numeric constants. - -@node Rpcalc Rules -@subsection Grammar Rules for @code{rpcalc} - -Here are the grammar rules for the reverse polish notation calculator. - -@example -@group -input: - /* empty */ -| input line -; -@end group - -@group -line: - '\n' -| exp '\n' @{ printf ("%.10g\n", $1); @} -; -@end group - -@group -exp: - NUM @{ $$ = $1; @} -| exp exp '+' @{ $$ = $1 + $2; @} -| exp exp '-' @{ $$ = $1 - $2; @} -| exp exp '*' @{ $$ = $1 * $2; @} -| exp exp '/' @{ $$ = $1 / $2; @} -| exp exp '^' @{ $$ = pow ($1, $2); @} /* Exponentiation */ -| exp 'n' @{ $$ = -$1; @} /* Unary minus */ -; -@end group -%% -@end example - -The groupings of the rpcalc ``language'' defined here are the expression -(given the name @code{exp}), the line of input (@code{line}), and the -complete input transcript (@code{input}). Each of these nonterminal -symbols has several alternate rules, joined by the vertical bar @samp{|} -which is read as ``or''. The following sections explain what these rules -mean. - -The semantics of the language is determined by the actions taken when a -grouping is recognized. The actions are the C code that appears inside -braces. @xref{Actions}. - -You must specify these actions in C, but Bison provides the means for -passing semantic values between the rules. In each action, the -pseudo-variable @code{$$} stands for the semantic value for the grouping -that the rule is going to construct. Assigning a value to @code{$$} is the -main job of most actions. The semantic values of the components of the -rule are referred to as @code{$1}, @code{$2}, and so on. - -@menu -* Rpcalc Input:: -* Rpcalc Line:: -* Rpcalc Expr:: -@end menu - -@node Rpcalc Input -@subsubsection Explanation of @code{input} - -Consider the definition of @code{input}: - -@example -input: - /* empty */ -| input line -; -@end example - -This definition reads as follows: ``A complete input is either an empty -string, or a complete input followed by an input line''. Notice that -``complete input'' is defined in terms of itself. This definition is said -to be @dfn{left recursive} since @code{input} appears always as the -leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}. - -The first alternative is empty because there are no symbols between the -colon and the first @samp{|}; this means that @code{input} can match an -empty string of input (no tokens). We write the rules this way because it -is legitimate to type @kbd{Ctrl-d} right after you start the calculator. -It's conventional to put an empty alternative first and write the comment -@samp{/* empty */} in it. - -The second alternate rule (@code{input line}) handles all nontrivial input. -It means, ``After reading any number of lines, read one more line if -possible.'' The left recursion makes this rule into a loop. Since the -first alternative matches empty input, the loop can be executed zero or -more times. - -The parser function @code{yyparse} continues to process input until a -grammatical error is seen or the lexical analyzer says there are no more -input tokens; we will arrange for the latter to happen at end-of-input. - -@node Rpcalc Line -@subsubsection Explanation of @code{line} - -Now consider the definition of @code{line}: - -@example -line: - '\n' -| exp '\n' @{ printf ("%.10g\n", $1); @} -; -@end example - -The first alternative is a token which is a newline character; this means -that rpcalc accepts a blank line (and ignores it, since there is no -action). The second alternative is an expression followed by a newline. -This is the alternative that makes rpcalc useful. The semantic value of -the @code{exp} grouping is the value of @code{$1} because the @code{exp} in -question is the first symbol in the alternative. The action prints this -value, which is the result of the computation the user asked for. - -This action is unusual because it does not assign a value to @code{$$}. As -a consequence, the semantic value associated with the @code{line} is -uninitialized (its value will be unpredictable). This would be a bug if -that value were ever used, but we don't use it: once rpcalc has printed the -value of the user's input line, that value is no longer needed. - -@node Rpcalc Expr -@subsubsection Explanation of @code{expr} - -The @code{exp} grouping has several rules, one for each kind of expression. -The first rule handles the simplest expressions: those that are just numbers. -The second handles an addition-expression, which looks like two expressions -followed by a plus-sign. The third handles subtraction, and so on. - -@example -exp: - NUM -| exp exp '+' @{ $$ = $1 + $2; @} -| exp exp '-' @{ $$ = $1 - $2; @} -@dots{} -; -@end example - -We have used @samp{|} to join all the rules for @code{exp}, but we could -equally well have written them separately: - -@example -exp: NUM ; -exp: exp exp '+' @{ $$ = $1 + $2; @}; -exp: exp exp '-' @{ $$ = $1 - $2; @}; -@dots{} -@end example - -Most of the rules have actions that compute the value of the expression in -terms of the value of its parts. For example, in the rule for addition, -@code{$1} refers to the first component @code{exp} and @code{$2} refers to -the second one. The third component, @code{'+'}, has no meaningful -associated semantic value, but if it had one you could refer to it as -@code{$3}. When @code{yyparse} recognizes a sum expression using this -rule, the sum of the two subexpressions' values is produced as the value of -the entire expression. @xref{Actions}. - -You don't have to give an action for every rule. When a rule has no -action, Bison by default copies the value of @code{$1} into @code{$$}. -This is what happens in the first rule (the one that uses @code{NUM}). - -The formatting shown here is the recommended convention, but Bison does -not require it. You can add or change white space as much as you wish. -For example, this: - -@example -exp: NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ; -@end example - -@noindent -means the same thing as this: - -@example -exp: - NUM -| exp exp '+' @{ $$ = $1 + $2; @} -| @dots{} -; -@end example - -@noindent -The latter, however, is much more readable. - -@node Rpcalc Lexer -@subsection The @code{rpcalc} Lexical Analyzer -@cindex writing a lexical analyzer -@cindex lexical analyzer, writing - -The lexical analyzer's job is low-level parsing: converting characters -or sequences of characters into tokens. The Bison parser gets its -tokens by calling the lexical analyzer. @xref{Lexical, ,The Lexical -Analyzer Function @code{yylex}}. - -Only a simple lexical analyzer is needed for the RPN -calculator. This -lexical analyzer skips blanks and tabs, then reads in numbers as -@code{double} and returns them as @code{NUM} tokens. Any other character -that isn't part of a number is a separate token. Note that the token-code -for such a single-character token is the character itself. - -The return value of the lexical analyzer function is a numeric code which -represents a token type. The same text used in Bison rules to stand for -this token type is also a C expression for the numeric code for the type. -This works in two ways. If the token type is a character literal, then its -numeric code is that of the character; you can use the same -character literal in the lexical analyzer to express the number. If the -token type is an identifier, that identifier is defined by Bison as a C -macro whose definition is the appropriate number. In this example, -therefore, @code{NUM} becomes a macro for @code{yylex} to use. - -The semantic value of the token (if it has one) is stored into the -global variable @code{yylval}, which is where the Bison parser will look -for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was -defined at the beginning of the grammar; @pxref{Rpcalc Declarations, -,Declarations for @code{rpcalc}}.) - -A token type code of zero is returned if the end-of-input is encountered. -(Bison recognizes any nonpositive value as indicating end-of-input.) - -Here is the code for the lexical analyzer: - -@example -@group -/* The lexical analyzer returns a double floating point - number on the stack and the token NUM, or the numeric code - of the character read if not a number. It skips all blanks - and tabs, and returns 0 for end-of-input. */ - -#include -@end group - -@group -int -yylex (void) -@{ - int c; - - /* Skip white space. */ - while ((c = getchar ()) == ' ' || c == '\t') - continue; -@end group -@group - /* Process numbers. */ - if (c == '.' || isdigit (c)) - @{ - ungetc (c, stdin); - scanf ("%lf", &yylval); - return NUM; - @} -@end group -@group - /* Return end-of-input. */ - if (c == EOF) - return 0; - /* Return a single char. */ - return c; -@} -@end group -@end example - -@node Rpcalc Main -@subsection The Controlling Function -@cindex controlling function -@cindex main function in simple example - -In keeping with the spirit of this example, the controlling function is -kept to the bare minimum. The only requirement is that it call -@code{yyparse} to start the process of parsing. - -@example -@group -int -main (void) -@{ - return yyparse (); -@} -@end group -@end example - -@node Rpcalc Error -@subsection The Error Reporting Routine -@cindex error reporting routine - -When @code{yyparse} detects a syntax error, it calls the error reporting -function @code{yyerror} to print an error message (usually but not -always @code{"syntax error"}). It is up to the programmer to supply -@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so -here is the definition we will use: - -@example -@group -#include -@end group - -@group -/* Called by yyparse on error. */ -void -yyerror (char const *s) -@{ - fprintf (stderr, "%s\n", s); -@} -@end group -@end example - -After @code{yyerror} returns, the Bison parser may recover from the error -and continue parsing if the grammar contains a suitable error rule -(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We -have not written any error rules in this example, so any invalid input will -cause the calculator program to exit. This is not clean behavior for a -real calculator, but it is adequate for the first example. - -@node Rpcalc Generate -@subsection Running Bison to Make the Parser -@cindex running Bison (introduction) - -Before running Bison to produce a parser, we need to decide how to -arrange all the source code in one or more source files. For such a -simple example, the easiest thing is to put everything in one file, -the grammar file. The definitions of @code{yylex}, @code{yyerror} and -@code{main} go at the end, in the epilogue of the grammar file -(@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}). - -For a large project, you would probably have several source files, and use -@code{make} to arrange to recompile them. - -With all the source in the grammar file, you use the following command -to convert it into a parser implementation file: - -@example -bison @var{file}.y -@end example - -@noindent -In this example, the grammar file is called @file{rpcalc.y} (for -``Reverse Polish @sc{calc}ulator''). Bison produces a parser -implementation file named @file{@var{file}.tab.c}, removing the -@samp{.y} from the grammar file name. The parser implementation file -contains the source code for @code{yyparse}. The additional functions -in the grammar file (@code{yylex}, @code{yyerror} and @code{main}) are -copied verbatim to the parser implementation file. - -@node Rpcalc Compile -@subsection Compiling the Parser Implementation File -@cindex compiling the parser - -Here is how to compile and run the parser implementation file: - -@example -@group -# @r{List files in current directory.} -$ @kbd{ls} -rpcalc.tab.c rpcalc.y -@end group - -@group -# @r{Compile the Bison parser.} -# @r{@samp{-lm} tells compiler to search math library for @code{pow}.} -$ @kbd{cc -lm -o rpcalc rpcalc.tab.c} -@end group - -@group -# @r{List files again.} -$ @kbd{ls} -rpcalc rpcalc.tab.c rpcalc.y -@end group -@end example - -The file @file{rpcalc} now contains the executable code. Here is an -example session using @code{rpcalc}. - -@example -$ @kbd{rpcalc} -@kbd{4 9 +} -13 -@kbd{3 7 + 3 4 5 *+-} --13 -@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}} -13 -@kbd{5 6 / 4 n +} --3.166666667 -@kbd{3 4 ^} @r{Exponentiation} -81 -@kbd{^D} @r{End-of-file indicator} -$ -@end example - -@node Infix Calc -@section Infix Notation Calculator: @code{calc} -@cindex infix notation calculator -@cindex @code{calc} -@cindex calculator, infix notation - -We now modify rpcalc to handle infix operators instead of postfix. Infix -notation involves the concept of operator precedence and the need for -parentheses nested to arbitrary depth. Here is the Bison code for -@file{calc.y}, an infix desk-top calculator. - -@example -/* Infix notation calculator. */ - -@group -%@{ - #define YYSTYPE double - #include - #include - int yylex (void); - void yyerror (char const *); -%@} -@end group - -@group -/* Bison declarations. */ -%token NUM -%left '-' '+' -%left '*' '/' -%left NEG /* negation--unary minus */ -%right '^' /* exponentiation */ -@end group - -%% /* The grammar follows. */ -@group -input: - /* empty */ -| input line -; -@end group - -@group -line: - '\n' -| exp '\n' @{ printf ("\t%.10g\n", $1); @} -; -@end group - -@group -exp: - NUM @{ $$ = $1; @} -| exp '+' exp @{ $$ = $1 + $3; @} -| exp '-' exp @{ $$ = $1 - $3; @} -| exp '*' exp @{ $$ = $1 * $3; @} -| exp '/' exp @{ $$ = $1 / $3; @} -| '-' exp %prec NEG @{ $$ = -$2; @} -| exp '^' exp @{ $$ = pow ($1, $3); @} -| '(' exp ')' @{ $$ = $2; @} -; -@end group -%% -@end example - -@noindent -The functions @code{yylex}, @code{yyerror} and @code{main} can be the -same as before. - -There are two important new features shown in this code. - -In the second section (Bison declarations), @code{%left} declares token -types and says they are left-associative operators. The declarations -@code{%left} and @code{%right} (right associativity) take the place of -@code{%token} which is used to declare a token type name without -associativity. (These tokens are single-character literals, which -ordinarily don't need to be declared. We declare them here to specify -the associativity.) - -Operator precedence is determined by the line ordering of the -declarations; the higher the line number of the declaration (lower on -the page or screen), the higher the precedence. Hence, exponentiation -has the highest precedence, unary minus (@code{NEG}) is next, followed -by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator -Precedence}. - -The other important new feature is the @code{%prec} in the grammar -section for the unary minus operator. The @code{%prec} simply instructs -Bison that the rule @samp{| '-' exp} has the same precedence as -@code{NEG}---in this case the next-to-highest. @xref{Contextual -Precedence, ,Context-Dependent Precedence}. - -Here is a sample run of @file{calc.y}: - -@need 500 -@example -$ @kbd{calc} -@kbd{4 + 4.5 - (34/(8*3+-3))} -6.880952381 -@kbd{-56 + 2} --54 -@kbd{3 ^ 2} -9 -@end example - -@node Simple Error Recovery -@section Simple Error Recovery -@cindex error recovery, simple - -Up to this point, this manual has not addressed the issue of @dfn{error -recovery}---how to continue parsing after the parser detects a syntax -error. All we have handled is error reporting with @code{yyerror}. -Recall that by default @code{yyparse} returns after calling -@code{yyerror}. This means that an erroneous input line causes the -calculator program to exit. Now we show how to rectify this deficiency. - -The Bison language itself includes the reserved word @code{error}, which -may be included in the grammar rules. In the example below it has -been added to one of the alternatives for @code{line}: - -@example -@group -line: - '\n' -| exp '\n' @{ printf ("\t%.10g\n", $1); @} -| error '\n' @{ yyerrok; @} -; -@end group -@end example - -This addition to the grammar allows for simple error recovery in the -event of a syntax error. If an expression that cannot be evaluated is -read, the error will be recognized by the third rule for @code{line}, -and parsing will continue. (The @code{yyerror} function is still called -upon to print its message as well.) The action executes the statement -@code{yyerrok}, a macro defined automatically by Bison; its meaning is -that error recovery is complete (@pxref{Error Recovery}). Note the -difference between @code{yyerrok} and @code{yyerror}; neither one is a -misprint. - -This form of error recovery deals with syntax errors. There are other -kinds of errors; for example, division by zero, which raises an exception -signal that is normally fatal. A real calculator program must handle this -signal and use @code{longjmp} to return to @code{main} and resume parsing -input lines; it would also have to discard the rest of the current line of -input. We won't discuss this issue further because it is not specific to -Bison programs. - -@node Location Tracking Calc -@section Location Tracking Calculator: @code{ltcalc} -@cindex location tracking calculator -@cindex @code{ltcalc} -@cindex calculator, location tracking - -This example extends the infix notation calculator with location -tracking. This feature will be used to improve the error messages. For -the sake of clarity, this example is a simple integer calculator, since -most of the work needed to use locations will be done in the lexical -analyzer. - -@menu -* Ltcalc Declarations:: Bison and C declarations for ltcalc. -* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. -* Ltcalc Lexer:: The lexical analyzer. -@end menu - -@node Ltcalc Declarations -@subsection Declarations for @code{ltcalc} - -The C and Bison declarations for the location tracking calculator are -the same as the declarations for the infix notation calculator. - -@example -/* Location tracking calculator. */ - -%@{ - #define YYSTYPE int - #include - int yylex (void); - void yyerror (char const *); -%@} - -/* Bison declarations. */ -%token NUM - -%left '-' '+' -%left '*' '/' -%left NEG -%right '^' - -%% /* The grammar follows. */ -@end example - -@noindent -Note there are no declarations specific to locations. Defining a data -type for storing locations is not needed: we will use the type provided -by default (@pxref{Location Type, ,Data Types of Locations}), which is a -four member structure with the following integer fields: -@code{first_line}, @code{first_column}, @code{last_line} and -@code{last_column}. By conventions, and in accordance with the GNU -Coding Standards and common practice, the line and column count both -start at 1. - -@node Ltcalc Rules -@subsection Grammar Rules for @code{ltcalc} - -Whether handling locations or not has no effect on the syntax of your -language. Therefore, grammar rules for this example will be very close -to those of the previous example: we will only modify them to benefit -from the new information. - -Here, we will use locations to report divisions by zero, and locate the -wrong expressions or subexpressions. - -@example -@group -input: - /* empty */ -| input line -; -@end group - -@group -line: - '\n' -| exp '\n' @{ printf ("%d\n", $1); @} -; -@end group - -@group -exp: - NUM @{ $$ = $1; @} -| exp '+' exp @{ $$ = $1 + $3; @} -| exp '-' exp @{ $$ = $1 - $3; @} -| exp '*' exp @{ $$ = $1 * $3; @} -@end group -@group -| exp '/' exp - @{ - if ($3) - $$ = $1 / $3; - else - @{ - $$ = 1; - fprintf (stderr, "%d.%d-%d.%d: division by zero", - @@3.first_line, @@3.first_column, - @@3.last_line, @@3.last_column); - @} - @} -@end group -@group -| '-' exp %prec NEG @{ $$ = -$2; @} -| exp '^' exp @{ $$ = pow ($1, $3); @} -| '(' exp ')' @{ $$ = $2; @} -@end group -@end example - -This code shows how to reach locations inside of semantic actions, by -using the pseudo-variables @code{@@@var{n}} for rule components, and the -pseudo-variable @code{@@$} for groupings. - -We don't need to assign a value to @code{@@$}: the output parser does it -automatically. By default, before executing the C code of each action, -@code{@@$} is set to range from the beginning of @code{@@1} to the end -of @code{@@@var{n}}, for a rule with @var{n} components. This behavior -can be redefined (@pxref{Location Default Action, , Default Action for -Locations}), and for very specific rules, @code{@@$} can be computed by -hand. - -@node Ltcalc Lexer -@subsection The @code{ltcalc} Lexical Analyzer. - -Until now, we relied on Bison's defaults to enable location -tracking. The next step is to rewrite the lexical analyzer, and make it -able to feed the parser with the token locations, as it already does for -semantic values. - -To this end, we must take into account every single character of the -input text, to avoid the computed locations of being fuzzy or wrong: - -@example -@group -int -yylex (void) -@{ - int c; -@end group - -@group - /* Skip white space. */ - while ((c = getchar ()) == ' ' || c == '\t') - ++yylloc.last_column; -@end group - -@group - /* Step. */ - yylloc.first_line = yylloc.last_line; - yylloc.first_column = yylloc.last_column; -@end group - -@group - /* Process numbers. */ - if (isdigit (c)) - @{ - yylval = c - '0'; - ++yylloc.last_column; - while (isdigit (c = getchar ())) - @{ - ++yylloc.last_column; - yylval = yylval * 10 + c - '0'; - @} - ungetc (c, stdin); - return NUM; - @} -@end group - - /* Return end-of-input. */ - if (c == EOF) - return 0; - -@group - /* Return a single char, and update location. */ - if (c == '\n') - @{ - ++yylloc.last_line; - yylloc.last_column = 0; - @} - else - ++yylloc.last_column; - return c; -@} -@end group -@end example - -Basically, the lexical analyzer performs the same processing as before: -it skips blanks and tabs, and reads numbers or single-character tokens. -In addition, it updates @code{yylloc}, the global variable (of type -@code{YYLTYPE}) containing the token's location. - -Now, each time this function returns a token, the parser has its number -as well as its semantic value, and its location in the text. The last -needed change is to initialize @code{yylloc}, for example in the -controlling function: - -@example -@group -int -main (void) -@{ - yylloc.first_line = yylloc.last_line = 1; - yylloc.first_column = yylloc.last_column = 0; - return yyparse (); -@} -@end group -@end example - -Remember that computing locations is not a matter of syntax. Every -character must be associated to a location update, whether it is in -valid input, in comments, in literal strings, and so on. - -@node Multi-function Calc -@section Multi-Function Calculator: @code{mfcalc} -@cindex multi-function calculator -@cindex @code{mfcalc} -@cindex calculator, multi-function - -Now that the basics of Bison have been discussed, it is time to move on to -a more advanced problem. The above calculators provided only five -functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would -be nice to have a calculator that provides other mathematical functions such -as @code{sin}, @code{cos}, etc. - -It is easy to add new operators to the infix calculator as long as they are -only single-character literals. The lexical analyzer @code{yylex} passes -back all nonnumeric characters as tokens, so new grammar rules suffice for -adding a new operator. But we want something more flexible: built-in -functions whose syntax has this form: - -@example -@var{function_name} (@var{argument}) -@end example - -@noindent -At the same time, we will add memory to the calculator, by allowing you -to create named variables, store values in them, and use them later. -Here is a sample session with the multi-function calculator: - -@example -$ @kbd{mfcalc} -@kbd{pi = 3.141592653589} -3.1415926536 -@kbd{sin(pi)} -0.0000000000 -@kbd{alpha = beta1 = 2.3} -2.3000000000 -@kbd{alpha} -2.3000000000 -@kbd{ln(alpha)} -0.8329091229 -@kbd{exp(ln(beta1))} -2.3000000000 -$ -@end example - -Note that multiple assignment and nested function calls are permitted. - -@menu -* Mfcalc Declarations:: Bison declarations for multi-function calculator. -* Mfcalc Rules:: Grammar rules for the calculator. -* Mfcalc Symbol Table:: Symbol table management subroutines. -@end menu - -@node Mfcalc Declarations -@subsection Declarations for @code{mfcalc} - -Here are the C and Bison declarations for the multi-function calculator. - -@comment file: mfcalc.y: 1 -@example -@group -%@{ - #include /* For math functions, cos(), sin(), etc. */ - #include "calc.h" /* Contains definition of `symrec'. */ - int yylex (void); - void yyerror (char const *); -%@} -@end group - -@group -%union @{ - double val; /* For returning numbers. */ - symrec *tptr; /* For returning symbol-table pointers. */ -@} -@end group -%token NUM /* Simple double precision number. */ -%token VAR FNCT /* Variable and function. */ -%type exp - -@group -%right '=' -%left '-' '+' -%left '*' '/' -%left NEG /* negation--unary minus */ -%right '^' /* exponentiation */ -@end group -@end example - -The above grammar introduces only two new features of the Bison language. -These features allow semantic values to have various data types -(@pxref{Multiple Types, ,More Than One Value Type}). - -The @code{%union} declaration specifies the entire list of possible types; -this is instead of defining @code{YYSTYPE}. The allowable types are now -double-floats (for @code{exp} and @code{NUM}) and pointers to entries in -the symbol table. @xref{Union Decl, ,The Collection of Value Types}. - -Since values can now have various types, it is necessary to associate a -type with each grammar symbol whose semantic value is used. These symbols -are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their -declarations are augmented with information about their data type (placed -between angle brackets). - -The Bison construct @code{%type} is used for declaring nonterminal -symbols, just as @code{%token} is used for declaring token types. We -have not used @code{%type} before because nonterminal symbols are -normally declared implicitly by the rules that define them. But -@code{exp} must be declared explicitly so we can specify its value type. -@xref{Type Decl, ,Nonterminal Symbols}. - -@node Mfcalc Rules -@subsection Grammar Rules for @code{mfcalc} - -Here are the grammar rules for the multi-function calculator. -Most of them are copied directly from @code{calc}; three rules, -those which mention @code{VAR} or @code{FNCT}, are new. - -@comment file: mfcalc.y: 3 -@example -%% /* The grammar follows. */ -@group -input: - /* empty */ -| input line -; -@end group - -@group -line: - '\n' -| exp '\n' @{ printf ("%.10g\n", $1); @} -| error '\n' @{ yyerrok; @} -; -@end group - -@group -exp: - NUM @{ $$ = $1; @} -| VAR @{ $$ = $1->value.var; @} -| VAR '=' exp @{ $$ = $3; $1->value.var = $3; @} -| FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @} -| exp '+' exp @{ $$ = $1 + $3; @} -| exp '-' exp @{ $$ = $1 - $3; @} -| exp '*' exp @{ $$ = $1 * $3; @} -| exp '/' exp @{ $$ = $1 / $3; @} -| '-' exp %prec NEG @{ $$ = -$2; @} -| exp '^' exp @{ $$ = pow ($1, $3); @} -| '(' exp ')' @{ $$ = $2; @} -; -@end group -/* End of grammar. */ -%% -@end example - -@node Mfcalc Symbol Table -@subsection The @code{mfcalc} Symbol Table -@cindex symbol table example - -The multi-function calculator requires a symbol table to keep track of the -names and meanings of variables and functions. This doesn't affect the -grammar rules (except for the actions) or the Bison declarations, but it -requires some additional C functions for support. - -The symbol table itself consists of a linked list of records. Its -definition, which is kept in the header @file{calc.h}, is as follows. It -provides for either functions or variables to be placed in the table. - -@comment file: calc.h -@example -@group -/* Function type. */ -typedef double (*func_t) (double); -@end group - -@group -/* Data type for links in the chain of symbols. */ -struct symrec -@{ - char *name; /* name of symbol */ - int type; /* type of symbol: either VAR or FNCT */ - union - @{ - double var; /* value of a VAR */ - func_t fnctptr; /* value of a FNCT */ - @} value; - struct symrec *next; /* link field */ -@}; -@end group - -@group -typedef struct symrec symrec; - -/* The symbol table: a chain of `struct symrec'. */ -extern symrec *sym_table; - -symrec *putsym (char const *, int); -symrec *getsym (char const *); -@end group -@end example - -The new version of @code{main} includes a call to @code{init_table}, a -function that initializes the symbol table. Here it is, and -@code{init_table} as well: - -@comment file: mfcalc.y: 3 -@example -#include - -@group -/* Called by yyparse on error. */ -void -yyerror (char const *s) -@{ - printf ("%s\n", s); -@} -@end group - -@group -struct init -@{ - char const *fname; - double (*fnct) (double); -@}; -@end group - -@group -struct init const arith_fncts[] = -@{ - "sin", sin, - "cos", cos, - "atan", atan, - "ln", log, - "exp", exp, - "sqrt", sqrt, - 0, 0 -@}; -@end group - -@group -/* The symbol table: a chain of `struct symrec'. */ -symrec *sym_table; -@end group - -@group -/* Put arithmetic functions in table. */ -void -init_table (void) -@{ - int i; - for (i = 0; arith_fncts[i].fname != 0; i++) - @{ - symrec *ptr = putsym (arith_fncts[i].fname, FNCT); - ptr->value.fnctptr = arith_fncts[i].fnct; - @} -@} -@end group - -@group -int -main (void) -@{ - init_table (); - return yyparse (); -@} -@end group -@end example - -By simply editing the initialization list and adding the necessary include -files, you can add additional functions to the calculator. - -Two important functions allow look-up and installation of symbols in the -symbol table. The function @code{putsym} is passed a name and the type -(@code{VAR} or @code{FNCT}) of the object to be installed. The object is -linked to the front of the list, and a pointer to the object is returned. -The function @code{getsym} is passed the name of the symbol to look up. If -found, a pointer to that symbol is returned; otherwise zero is returned. - -@comment file: mfcalc.y: 3 -@example -#include /* malloc. */ -#include /* strlen. */ - -@group -symrec * -putsym (char const *sym_name, int sym_type) -@{ - symrec *ptr = (symrec *) malloc (sizeof (symrec)); - ptr->name = (char *) malloc (strlen (sym_name) + 1); - strcpy (ptr->name,sym_name); - ptr->type = sym_type; - ptr->value.var = 0; /* Set value to 0 even if fctn. */ - ptr->next = (struct symrec *)sym_table; - sym_table = ptr; - return ptr; -@} -@end group - -@group -symrec * -getsym (char const *sym_name) -@{ - symrec *ptr; - for (ptr = sym_table; ptr != (symrec *) 0; - ptr = (symrec *)ptr->next) - if (strcmp (ptr->name,sym_name) == 0) - return ptr; - return 0; -@} -@end group -@end example - -The function @code{yylex} must now recognize variables, numeric values, and -the single-character arithmetic operators. Strings of alphanumeric -characters with a leading letter are recognized as either variables or -functions depending on what the symbol table says about them. - -The string is passed to @code{getsym} for look up in the symbol table. If -the name appears in the table, a pointer to its location and its type -(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not -already in the table, then it is installed as a @code{VAR} using -@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is -returned to @code{yyparse}. - -No change is needed in the handling of numeric values and arithmetic -operators in @code{yylex}. - -@comment file: mfcalc.y: 3 -@example -@group -#include -@end group - -@group -int -yylex (void) -@{ - int c; - - /* Ignore white space, get first nonwhite character. */ - while ((c = getchar ()) == ' ' || c == '\t') - continue; - - if (c == EOF) - return 0; -@end group - -@group - /* Char starts a number => parse the number. */ - if (c == '.' || isdigit (c)) - @{ - ungetc (c, stdin); - scanf ("%lf", &yylval.val); - return NUM; - @} -@end group - -@group - /* Char starts an identifier => read the name. */ - if (isalpha (c)) - @{ - /* Initially make the buffer long enough - for a 40-character symbol name. */ - static size_t length = 40; - static char *symbuf = 0; - symrec *s; - int i; -@end group - - if (!symbuf) - symbuf = (char *) malloc (length + 1); - - i = 0; - do -@group - @{ - /* If buffer is full, make it bigger. */ - if (i == length) - @{ - length *= 2; - symbuf = (char *) realloc (symbuf, length + 1); - @} - /* Add this character to the buffer. */ - symbuf[i++] = c; - /* Get another character. */ - c = getchar (); - @} -@end group -@group - while (isalnum (c)); - - ungetc (c, stdin); - symbuf[i] = '\0'; -@end group - -@group - s = getsym (symbuf); - if (s == 0) - s = putsym (symbuf, VAR); - yylval.tptr = s; - return s->type; - @} - - /* Any other character is a token by itself. */ - return c; -@} -@end group -@end example - -The error reporting function is unchanged, and the new version of -@code{main} includes a call to @code{init_table} and sets the @code{yydebug} -on user demand (@xref{Tracing, , Tracing Your Parser}, for details): - -@comment file: mfcalc.y: 3 -@example -@group -/* Called by yyparse on error. */ -void -yyerror (char const *s) -@{ - fprintf (stderr, "%s\n", s); -@} -@end group - -@group -int -main (int argc, char const* argv[]) -@{ - int i; - /* Enable parse traces on option -p. */ - for (i = 1; i < argc; ++i) - if (!strcmp(argv[i], "-p")) - yydebug = 1; - init_table (); - return yyparse (); -@} -@end group -@end example - -This program is both powerful and flexible. You may easily add new -functions, and it is a simple job to modify this code to install -predefined variables such as @code{pi} or @code{e} as well. - -@node Exercises -@section Exercises -@cindex exercises - -@enumerate -@item -Add some new functions from @file{math.h} to the initialization list. - -@item -Add another array that contains constants and their values. Then -modify @code{init_table} to add these constants to the symbol table. -It will be easiest to give the constants type @code{VAR}. - -@item -Make the program report an error if the user refers to an -uninitialized variable in any way except to store a value in it. -@end enumerate - -@node Grammar File -@chapter Bison Grammar Files - -Bison takes as input a context-free grammar specification and produces a -C-language function that recognizes correct instances of the grammar. - -The Bison grammar file conventionally has a name ending in @samp{.y}. -@xref{Invocation, ,Invoking Bison}. - -@menu -* Grammar Outline:: Overall layout of the grammar file. -* Symbols:: Terminal and nonterminal symbols. -* Rules:: How to write grammar rules. -* Recursion:: Writing recursive rules. -* Semantics:: Semantic values and actions. -* Tracking Locations:: Locations and actions. -* Named References:: Using named references in actions. -* Declarations:: All kinds of Bison declarations are described here. -* Multiple Parsers:: Putting more than one Bison parser in one program. -@end menu - -@node Grammar Outline -@section Outline of a Bison Grammar - -A Bison grammar file has four main sections, shown here with the -appropriate delimiters: - -@example -%@{ - @var{Prologue} -%@} - -@var{Bison declarations} - -%% -@var{Grammar rules} -%% - -@var{Epilogue} -@end example - -Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections. -As a GNU extension, @samp{//} introduces a comment that -continues until end of line. - -@menu -* Prologue:: Syntax and usage of the prologue. -* Prologue Alternatives:: Syntax and usage of alternatives to the prologue. -* Bison Declarations:: Syntax and usage of the Bison declarations section. -* Grammar Rules:: Syntax and usage of the grammar rules section. -* Epilogue:: Syntax and usage of the epilogue. -@end menu - -@node Prologue -@subsection The prologue -@cindex declarations section -@cindex Prologue -@cindex declarations - -The @var{Prologue} section contains macro definitions and declarations -of functions and variables that are used in the actions in the grammar -rules. These are copied to the beginning of the parser implementation -file so that they precede the definition of @code{yyparse}. You can -use @samp{#include} to get the declarations from a header file. If -you don't need any C declarations, you may omit the @samp{%@{} and -@samp{%@}} delimiters that bracket this section. - -The @var{Prologue} section is terminated by the first occurrence -of @samp{%@}} that is outside a comment, a string literal, or a -character constant. - -You may have more than one @var{Prologue} section, intermixed with the -@var{Bison declarations}. This allows you to have C and Bison -declarations that refer to each other. For example, the @code{%union} -declaration may use types defined in a header file, and you may wish to -prototype functions that take arguments of type @code{YYSTYPE}. This -can be done with two @var{Prologue} blocks, one before and one after the -@code{%union} declaration. - -@example -%@{ - #define _GNU_SOURCE - #include - #include "ptypes.h" -%@} - -%union @{ - long int n; - tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ -@} - -%@{ - static void print_token_value (FILE *, int, YYSTYPE); - #define YYPRINT(F, N, L) print_token_value (F, N, L) -%@} - -@dots{} -@end example - -When in doubt, it is usually safer to put prologue code before all -Bison declarations, rather than after. For example, any definitions -of feature test macros like @code{_GNU_SOURCE} or -@code{_POSIX_C_SOURCE} should appear before all Bison declarations, as -feature test macros can affect the behavior of Bison-generated -@code{#include} directives. - -@node Prologue Alternatives -@subsection Prologue Alternatives -@cindex Prologue Alternatives - -@findex %code -@findex %code requires -@findex %code provides -@findex %code top - -The functionality of @var{Prologue} sections can often be subtle and -inflexible. As an alternative, Bison provides a @code{%code} -directive with an explicit qualifier field, which identifies the -purpose of the code and thus the location(s) where Bison should -generate it. For C/C++, the qualifier can be omitted for the default -location, or it can be one of @code{requires}, @code{provides}, -@code{top}. @xref{%code Summary}. - -Look again at the example of the previous section: - -@example -%@{ - #define _GNU_SOURCE - #include - #include "ptypes.h" -%@} - -%union @{ - long int n; - tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ -@} - -%@{ - static void print_token_value (FILE *, int, YYSTYPE); - #define YYPRINT(F, N, L) print_token_value (F, N, L) -%@} - -@dots{} -@end example - -@noindent -Notice that there are two @var{Prologue} sections here, but there's a -subtle distinction between their functionality. For example, if you -decide to override Bison's default definition for @code{YYLTYPE}, in -which @var{Prologue} section should you write your new definition? -You should write it in the first since Bison will insert that code -into the parser implementation file @emph{before} the default -@code{YYLTYPE} definition. In which @var{Prologue} section should you -prototype an internal function, @code{trace_token}, that accepts -@code{YYLTYPE} and @code{yytokentype} as arguments? You should -prototype it in the second since Bison will insert that code -@emph{after} the @code{YYLTYPE} and @code{yytokentype} definitions. - -This distinction in functionality between the two @var{Prologue} sections is -established by the appearance of the @code{%union} between them. -This behavior raises a few questions. -First, why should the position of a @code{%union} affect definitions related to -@code{YYLTYPE} and @code{yytokentype}? -Second, what if there is no @code{%union}? -In that case, the second kind of @var{Prologue} section is not available. -This behavior is not intuitive. - -To avoid this subtle @code{%union} dependency, rewrite the example using a -@code{%code top} and an unqualified @code{%code}. -Let's go ahead and add the new @code{YYLTYPE} definition and the -@code{trace_token} prototype at the same time: - -@example -%code top @{ - #define _GNU_SOURCE - #include - - /* WARNING: The following code really belongs - * in a `%code requires'; see below. */ - - #include "ptypes.h" - #define YYLTYPE YYLTYPE - typedef struct YYLTYPE - @{ - int first_line; - int first_column; - int last_line; - int last_column; - char *filename; - @} YYLTYPE; -@} - -%union @{ - long int n; - tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ -@} - -%code @{ - static void print_token_value (FILE *, int, YYSTYPE); - #define YYPRINT(F, N, L) print_token_value (F, N, L) - static void trace_token (enum yytokentype token, YYLTYPE loc); -@} - -@dots{} -@end example - -@noindent -In this way, @code{%code top} and the unqualified @code{%code} achieve the same -functionality as the two kinds of @var{Prologue} sections, but it's always -explicit which kind you intend. -Moreover, both kinds are always available even in the absence of @code{%union}. - -The @code{%code top} block above logically contains two parts. The -first two lines before the warning need to appear near the top of the -parser implementation file. The first line after the warning is -required by @code{YYSTYPE} and thus also needs to appear in the parser -implementation file. However, if you've instructed Bison to generate -a parser header file (@pxref{Decl Summary, ,%defines}), you probably -want that line to appear before the @code{YYSTYPE} definition in that -header file as well. The @code{YYLTYPE} definition should also appear -in the parser header file to override the default @code{YYLTYPE} -definition there. - -In other words, in the @code{%code top} block above, all but the first two -lines are dependency code required by the @code{YYSTYPE} and @code{YYLTYPE} -definitions. -Thus, they belong in one or more @code{%code requires}: - -@example -@group -%code top @{ - #define _GNU_SOURCE - #include -@} -@end group - -@group -%code requires @{ - #include "ptypes.h" -@} -@end group -@group -%union @{ - long int n; - tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ -@} -@end group - -@group -%code requires @{ - #define YYLTYPE YYLTYPE - typedef struct YYLTYPE - @{ - int first_line; - int first_column; - int last_line; - int last_column; - char *filename; - @} YYLTYPE; -@} -@end group - -@group -%code @{ - static void print_token_value (FILE *, int, YYSTYPE); - #define YYPRINT(F, N, L) print_token_value (F, N, L) - static void trace_token (enum yytokentype token, YYLTYPE loc); -@} -@end group - -@dots{} -@end example - -@noindent -Now Bison will insert @code{#include "ptypes.h"} and the new -@code{YYLTYPE} definition before the Bison-generated @code{YYSTYPE} -and @code{YYLTYPE} definitions in both the parser implementation file -and the parser header file. (By the same reasoning, @code{%code -requires} would also be the appropriate place to write your own -definition for @code{YYSTYPE}.) - -When you are writing dependency code for @code{YYSTYPE} and -@code{YYLTYPE}, you should prefer @code{%code requires} over -@code{%code top} regardless of whether you instruct Bison to generate -a parser header file. When you are writing code that you need Bison -to insert only into the parser implementation file and that has no -special need to appear at the top of that file, you should prefer the -unqualified @code{%code} over @code{%code top}. These practices will -make the purpose of each block of your code explicit to Bison and to -other developers reading your grammar file. Following these -practices, we expect the unqualified @code{%code} and @code{%code -requires} to be the most important of the four @var{Prologue} -alternatives. - -At some point while developing your parser, you might decide to -provide @code{trace_token} to modules that are external to your -parser. Thus, you might wish for Bison to insert the prototype into -both the parser header file and the parser implementation file. Since -this function is not a dependency required by @code{YYSTYPE} or -@code{YYLTYPE}, it doesn't make sense to move its prototype to a -@code{%code requires}. More importantly, since it depends upon -@code{YYLTYPE} and @code{yytokentype}, @code{%code requires} is not -sufficient. Instead, move its prototype from the unqualified -@code{%code} to a @code{%code provides}: - -@example -@group -%code top @{ - #define _GNU_SOURCE - #include -@} -@end group - -@group -%code requires @{ - #include "ptypes.h" -@} -@end group -@group -%union @{ - long int n; - tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */ -@} -@end group - -@group -%code requires @{ - #define YYLTYPE YYLTYPE - typedef struct YYLTYPE - @{ - int first_line; - int first_column; - int last_line; - int last_column; - char *filename; - @} YYLTYPE; -@} -@end group - -@group -%code provides @{ - void trace_token (enum yytokentype token, YYLTYPE loc); -@} -@end group - -@group -%code @{ - static void print_token_value (FILE *, int, YYSTYPE); - #define YYPRINT(F, N, L) print_token_value (F, N, L) -@} -@end group - -@dots{} -@end example - -@noindent -Bison will insert the @code{trace_token} prototype into both the -parser header file and the parser implementation file after the -definitions for @code{yytokentype}, @code{YYLTYPE}, and -@code{YYSTYPE}. - -The above examples are careful to write directives in an order that -reflects the layout of the generated parser implementation and header -files: @code{%code top}, @code{%code requires}, @code{%code provides}, -and then @code{%code}. While your grammar files may generally be -easier to read if you also follow this order, Bison does not require -it. Instead, Bison lets you choose an organization that makes sense -to you. - -You may declare any of these directives multiple times in the grammar file. -In that case, Bison concatenates the contained code in declaration order. -This is the only way in which the position of one of these directives within -the grammar file affects its functionality. - -The result of the previous two properties is greater flexibility in how you may -organize your grammar file. -For example, you may organize semantic-type-related directives by semantic -type: - -@example -@group -%code requires @{ #include "type1.h" @} -%union @{ type1 field1; @} -%destructor @{ type1_free ($$); @} -%printer @{ type1_print (yyoutput, $$); @} -@end group - -@group -%code requires @{ #include "type2.h" @} -%union @{ type2 field2; @} -%destructor @{ type2_free ($$); @} -%printer @{ type2_print (yyoutput, $$); @} -@end group -@end example - -@noindent -You could even place each of the above directive groups in the rules section of -the grammar file next to the set of rules that uses the associated semantic -type. -(In the rules section, you must terminate each of those directives with a -semicolon.) -And you don't have to worry that some directive (like a @code{%union}) in the -definitions section is going to adversely affect their functionality in some -counter-intuitive manner just because it comes first. -Such an organization is not possible using @var{Prologue} sections. - -This section has been concerned with explaining the advantages of the four -@var{Prologue} alternatives over the original Yacc @var{Prologue}. -However, in most cases when using these directives, you shouldn't need to -think about all the low-level ordering issues discussed here. -Instead, you should simply use these directives to label each block of your -code according to its purpose and let Bison handle the ordering. -@code{%code} is the most generic label. -Move code to @code{%code requires}, @code{%code provides}, or @code{%code top} -as needed. - -@node Bison Declarations -@subsection The Bison Declarations Section -@cindex Bison declarations (introduction) -@cindex declarations, Bison (introduction) - -The @var{Bison declarations} section contains declarations that define -terminal and nonterminal symbols, specify precedence, and so on. -In some simple grammars you may not need any declarations. -@xref{Declarations, ,Bison Declarations}. - -@node Grammar Rules -@subsection The Grammar Rules Section -@cindex grammar rules section -@cindex rules section for grammar - -The @dfn{grammar rules} section contains one or more Bison grammar -rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}. - -There must always be at least one grammar rule, and the first -@samp{%%} (which precedes the grammar rules) may never be omitted even -if it is the first thing in the file. - -@node Epilogue -@subsection The epilogue -@cindex additional C code section -@cindex epilogue -@cindex C code, section for additional - -The @var{Epilogue} is copied verbatim to the end of the parser -implementation file, just as the @var{Prologue} is copied to the -beginning. This is the most convenient place to put anything that you -want to have in the parser implementation file but which need not come -before the definition of @code{yyparse}. For example, the definitions -of @code{yylex} and @code{yyerror} often go here. Because C requires -functions to be declared before being used, you often need to declare -functions like @code{yylex} and @code{yyerror} in the Prologue, even -if you define them in the Epilogue. @xref{Interface, ,Parser -C-Language Interface}. - -If the last section is empty, you may omit the @samp{%%} that separates it -from the grammar rules. - -The Bison parser itself contains many macros and identifiers whose names -start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using -any such names (except those documented in this manual) in the epilogue -of the grammar file. - -@node Symbols -@section Symbols, Terminal and Nonterminal -@cindex nonterminal symbol -@cindex terminal symbol -@cindex token type -@cindex symbol - -@dfn{Symbols} in Bison grammars represent the grammatical classifications -of the language. - -A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a -class of syntactically equivalent tokens. You use the symbol in grammar -rules to mean that a token in that class is allowed. The symbol is -represented in the Bison parser by a numeric code, and the @code{yylex} -function returns a token type code to indicate what kind of token has -been read. You don't need to know what the code value is; you can use -the symbol to stand for it. - -A @dfn{nonterminal symbol} stands for a class of syntactically -equivalent groupings. The symbol name is used in writing grammar rules. -By convention, it should be all lower case. - -Symbol names can contain letters, underscores, periods, and non-initial -digits and dashes. Dashes in symbol names are a GNU extension, incompatible -with POSIX Yacc. Periods and dashes make symbol names less convenient to -use with named references, which require brackets around such names -(@pxref{Named References}). Terminal symbols that contain periods or dashes -make little sense: since they are not valid symbols (in most programming -languages) they are not exported as token names. - -There are three ways of writing terminal symbols in the grammar: - -@itemize @bullet -@item -A @dfn{named token type} is written with an identifier, like an -identifier in C@. By convention, it should be all upper case. Each -such name must be defined with a Bison declaration such as -@code{%token}. @xref{Token Decl, ,Token Type Names}. - -@item -@cindex character token -@cindex literal token -@cindex single-character literal -A @dfn{character token type} (or @dfn{literal character token}) is -written in the grammar using the same syntax used in C for character -constants; for example, @code{'+'} is a character token type. A -character token type doesn't need to be declared unless you need to -specify its semantic value data type (@pxref{Value Type, ,Data Types of -Semantic Values}), associativity, or precedence (@pxref{Precedence, -,Operator Precedence}). - -By convention, a character token type is used only to represent a -token that consists of that particular character. Thus, the token -type @code{'+'} is used to represent the character @samp{+} as a -token. Nothing enforces this convention, but if you depart from it, -your program will confuse other readers. - -All the usual escape sequences used in character literals in C can be -used in Bison as well, but you must not use the null character as a -character literal because its numeric code, zero, signifies -end-of-input (@pxref{Calling Convention, ,Calling Convention -for @code{yylex}}). Also, unlike standard C, trigraphs have no -special meaning in Bison character literals, nor is backslash-newline -allowed. - -@item -@cindex string token -@cindex literal string token -@cindex multicharacter literal -A @dfn{literal string token} is written like a C string constant; for -example, @code{"<="} is a literal string token. A literal string token -doesn't need to be declared unless you need to specify its semantic -value data type (@pxref{Value Type}), associativity, or precedence -(@pxref{Precedence}). - -You can associate the literal string token with a symbolic name as an -alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token -Declarations}). If you don't do that, the lexical analyzer has to -retrieve the token number for the literal string token from the -@code{yytname} table (@pxref{Calling Convention}). - -@strong{Warning}: literal string tokens do not work in Yacc. - -By convention, a literal string token is used only to represent a token -that consists of that particular string. Thus, you should use the token -type @code{"<="} to represent the string @samp{<=} as a token. Bison -does not enforce this convention, but if you depart from it, people who -read your program will be confused. - -All the escape sequences used in string literals in C can be used in -Bison as well, except that you must not use a null character within a -string literal. Also, unlike Standard C, trigraphs have no special -meaning in Bison string literals, nor is backslash-newline allowed. A -literal string token must contain two or more characters; for a token -containing just one character, use a character token (see above). -@end itemize - -How you choose to write a terminal symbol has no effect on its -grammatical meaning. That depends only on where it appears in rules and -on when the parser function returns that symbol. - -The value returned by @code{yylex} is always one of the terminal -symbols, except that a zero or negative value signifies end-of-input. -Whichever way you write the token type in the grammar rules, you write -it the same way in the definition of @code{yylex}. The numeric code -for a character token type is simply the positive numeric code of the -character, so @code{yylex} can use the identical value to generate the -requisite code, though you may need to convert it to @code{unsigned -char} to avoid sign-extension on hosts where @code{char} is signed. -Each named token type becomes a C macro in the parser implementation -file, so @code{yylex} can use the name to stand for the code. (This -is why periods don't make sense in terminal symbols.) @xref{Calling -Convention, ,Calling Convention for @code{yylex}}. - -If @code{yylex} is defined in a separate file, you need to arrange for the -token-type macro definitions to be available there. Use the @samp{-d} -option when you run Bison, so that it will write these macro definitions -into a separate header file @file{@var{name}.tab.h} which you can include -in the other source files that need it. @xref{Invocation, ,Invoking Bison}. - -If you want to write a grammar that is portable to any Standard C -host, you must use only nonnull character tokens taken from the basic -execution character set of Standard C@. This set consists of the ten -digits, the 52 lower- and upper-case English letters, and the -characters in the following C-language string: - -@example -"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~" -@end example - -The @code{yylex} function and Bison must use a consistent character set -and encoding for character tokens. For example, if you run Bison in an -ASCII environment, but then compile and run the resulting -program in an environment that uses an incompatible character set like -EBCDIC, the resulting program may not work because the tables -generated by Bison will assume ASCII numeric values for -character tokens. It is standard practice for software distributions to -contain C source files that were generated by Bison in an -ASCII environment, so installers on platforms that are -incompatible with ASCII must rebuild those files before -compiling them. - -The symbol @code{error} is a terminal symbol reserved for error recovery -(@pxref{Error Recovery}); you shouldn't use it for any other purpose. -In particular, @code{yylex} should never return this value. The default -value of the error token is 256, unless you explicitly assigned 256 to -one of your tokens with a @code{%token} declaration. - -@node Rules -@section Syntax of Grammar Rules -@cindex rule syntax -@cindex grammar rule syntax -@cindex syntax of grammar rules - -A Bison grammar rule has the following general form: - -@example -@group -@var{result}: @var{components}@dots{}; -@end group -@end example - -@noindent -where @var{result} is the nonterminal symbol that this rule describes, -and @var{components} are various terminal and nonterminal symbols that -are put together by this rule (@pxref{Symbols}). - -For example, - -@example -@group -exp: exp '+' exp; -@end group -@end example - -@noindent -says that two groupings of type @code{exp}, with a @samp{+} token in between, -can be combined into a larger grouping of type @code{exp}. - -White space in rules is significant only to separate symbols. You can add -extra white space as you wish. - -Scattered among the components can be @var{actions} that determine -the semantics of the rule. An action looks like this: - -@example -@{@var{C statements}@} -@end example - -@noindent -@cindex braced code -This is an example of @dfn{braced code}, that is, C code surrounded by -braces, much like a compound statement in C@. Braced code can contain -any sequence of C tokens, so long as its braces are balanced. Bison -does not check the braced code for correctness directly; it merely -copies the code to the parser implementation file, where the C -compiler can check it. - -Within braced code, the balanced-brace count is not affected by braces -within comments, string literals, or character constants, but it is -affected by the C digraphs @samp{<%} and @samp{%>} that represent -braces. At the top level braced code must be terminated by @samp{@}} -and not by a digraph. Bison does not look for trigraphs, so if braced -code uses trigraphs you should ensure that they do not affect the -nesting of braces or the boundaries of comments, string literals, or -character constants. - -Usually there is only one action and it follows the components. -@xref{Actions}. - -@findex | -Multiple rules for the same @var{result} can be written separately or can -be joined with the vertical-bar character @samp{|} as follows: - -@example -@group -@var{result}: - @var{rule1-components}@dots{} -| @var{rule2-components}@dots{} -@dots{} -; -@end group -@end example - -@noindent -They are still considered distinct rules even when joined in this way. - -If @var{components} in a rule is empty, it means that @var{result} can -match the empty string. For example, here is how to define a -comma-separated sequence of zero or more @code{exp} groupings: - -@example -@group -expseq: - /* empty */ -| expseq1 -; -@end group - -@group -expseq1: - exp -| expseq1 ',' exp -; -@end group -@end example - -@noindent -It is customary to write a comment @samp{/* empty */} in each rule -with no components. - -@node Recursion -@section Recursive Rules -@cindex recursive rule - -A rule is called @dfn{recursive} when its @var{result} nonterminal -appears also on its right hand side. Nearly all Bison grammars need to -use recursion, because that is the only way to define a sequence of any -number of a particular thing. Consider this recursive definition of a -comma-separated sequence of one or more expressions: - -@example -@group -expseq1: - exp -| expseq1 ',' exp -; -@end group -@end example - -@cindex left recursion -@cindex right recursion -@noindent -Since the recursive use of @code{expseq1} is the leftmost symbol in the -right hand side, we call this @dfn{left recursion}. By contrast, here -the same construct is defined using @dfn{right recursion}: - -@example -@group -expseq1: - exp -| exp ',' expseq1 -; -@end group -@end example - -@noindent -Any kind of sequence can be defined using either left recursion or right -recursion, but you should always use left recursion, because it can -parse a sequence of any number of elements with bounded stack space. -Right recursion uses up space on the Bison stack in proportion to the -number of elements in the sequence, because all the elements must be -shifted onto the stack before the rule can be applied even once. -@xref{Algorithm, ,The Bison Parser Algorithm}, for further explanation -of this. - -@cindex mutual recursion -@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the -rule does not appear directly on its right hand side, but does appear -in rules for other nonterminals which do appear on its right hand -side. - -For example: - -@example -@group -expr: - primary -| primary '+' primary -; -@end group - -@group -primary: - constant -| '(' expr ')' -; -@end group -@end example - -@noindent -defines two mutually-recursive nonterminals, since each refers to the -other. - -@node Semantics -@section Defining Language Semantics -@cindex defining language semantics -@cindex language semantics, defining - -The grammar rules for a language determine only the syntax. The semantics -are determined by the semantic values associated with various tokens and -groupings, and by the actions taken when various groupings are recognized. - -For example, the calculator calculates properly because the value -associated with each expression is the proper number; it adds properly -because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add -the numbers associated with @var{x} and @var{y}. - -@menu -* Value Type:: Specifying one data type for all semantic values. -* Multiple Types:: Specifying several alternative data types. -* Actions:: An action is the semantic definition of a grammar rule. -* Action Types:: Specifying data types for actions to operate on. -* Mid-Rule Actions:: Most actions go at the end of a rule. - This says when, why and how to use the exceptional - action in the middle of a rule. -@end menu - -@node Value Type -@subsection Data Types of Semantic Values -@cindex semantic value type -@cindex value type, semantic -@cindex data types of semantic values -@cindex default data type - -In a simple program it may be sufficient to use the same data type for -the semantic values of all language constructs. This was true in the -RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish -Notation Calculator}). - -Bison normally uses the type @code{int} for semantic values if your -program uses the same data type for all language constructs. To -specify some other type, define @code{YYSTYPE} as a macro, like this: - -@example -#define YYSTYPE double -@end example - -@noindent -@code{YYSTYPE}'s replacement list should be a type name -that does not contain parentheses or square brackets. -This macro definition must go in the prologue of the grammar file -(@pxref{Grammar Outline, ,Outline of a Bison Grammar}). - -@node Multiple Types -@subsection More Than One Value Type - -In most programs, you will need different data types for different kinds -of tokens and groupings. For example, a numeric constant may need type -@code{int} or @code{long int}, while a string constant needs type -@code{char *}, and an identifier might need a pointer to an entry in the -symbol table. - -To use more than one data type for semantic values in one parser, Bison -requires you to do two things: - -@itemize @bullet -@item -Specify the entire collection of possible data types, either by using the -@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of -Value Types}), or by using a @code{typedef} or a @code{#define} to -define @code{YYSTYPE} to be a union type whose member names are -the type tags. - -@item -Choose one of those types for each symbol (terminal or nonterminal) for -which semantic values are used. This is done for tokens with the -@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) -and for groupings with the @code{%type} Bison declaration (@pxref{Type -Decl, ,Nonterminal Symbols}). -@end itemize - -@node Actions -@subsection Actions -@cindex action -@vindex $$ -@vindex $@var{n} -@vindex $@var{name} -@vindex $[@var{name}] - -An action accompanies a syntactic rule and contains C code to be executed -each time an instance of that rule is recognized. The task of most actions -is to compute a semantic value for the grouping built by the rule from the -semantic values associated with tokens or smaller groupings. - -An action consists of braced code containing C statements, and can be -placed at any position in the rule; -it is executed at that position. Most rules have just one action at the -end of the rule, following all the components. Actions in the middle of -a rule are tricky and used only for special purposes (@pxref{Mid-Rule -Actions, ,Actions in Mid-Rule}). - -The C code in an action can refer to the semantic values of the -components matched by the rule with the construct @code{$@var{n}}, -which stands for the value of the @var{n}th component. The semantic -value for the grouping being constructed is @code{$$}. In addition, -the semantic values of symbols can be accessed with the named -references construct @code{$@var{name}} or @code{$[@var{name}]}. -Bison translates both of these constructs into expressions of the -appropriate type when it copies the actions into the parser -implementation file. @code{$$} (or @code{$@var{name}}, when it stands -for the current grouping) is translated to a modifiable lvalue, so it -can be assigned to. - -Here is a typical example: - -@example -@group -exp: -@dots{} -| exp '+' exp @{ $$ = $1 + $3; @} -@end group -@end example - -Or, in terms of named references: - -@example -@group -exp[result]: -@dots{} -| exp[left] '+' exp[right] @{ $result = $left + $right; @} -@end group -@end example - -@noindent -This rule constructs an @code{exp} from two smaller @code{exp} groupings -connected by a plus-sign token. In the action, @code{$1} and @code{$3} -(@code{$left} and @code{$right}) -refer to the semantic values of the two component @code{exp} groupings, -which are the first and third symbols on the right hand side of the rule. -The sum is stored into @code{$$} (@code{$result}) so that it becomes the -semantic value of -the addition-expression just recognized by the rule. If there were a -useful semantic value associated with the @samp{+} token, it could be -referred to as @code{$2}. - -@xref{Named References}, for more information about using the named -references construct. - -Note that the vertical-bar character @samp{|} is really a rule -separator, and actions are attached to a single rule. This is a -difference with tools like Flex, for which @samp{|} stands for either -``or'', or ``the same action as that of the next rule''. In the -following example, the action is triggered only when @samp{b} is found: - -@example -@group -a-or-b: 'a'|'b' @{ a_or_b_found = 1; @}; -@end group -@end example - -@cindex default action -If you don't specify an action for a rule, Bison supplies a default: -@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule -becomes the value of the whole rule. Of course, the default action is -valid only if the two data types match. There is no meaningful default -action for an empty rule; every empty rule must have an explicit action -unless the rule's value does not matter. - -@code{$@var{n}} with @var{n} zero or negative is allowed for reference -to tokens and groupings on the stack @emph{before} those that match the -current rule. This is a very risky practice, and to use it reliably -you must be certain of the context in which the rule is applied. Here -is a case in which you can use this reliably: - -@example -@group -foo: - expr bar '+' expr @{ @dots{} @} -| expr bar '-' expr @{ @dots{} @} -; -@end group - -@group -bar: - /* empty */ @{ previous_expr = $0; @} -; -@end group -@end example - -As long as @code{bar} is used only in the fashion shown here, @code{$0} -always refers to the @code{expr} which precedes @code{bar} in the -definition of @code{foo}. - -@vindex yylval -It is also possible to access the semantic value of the lookahead token, if -any, from a semantic action. -This semantic value is stored in @code{yylval}. -@xref{Action Features, ,Special Features for Use in Actions}. - -@node Action Types -@subsection Data Types of Values in Actions -@cindex action data types -@cindex data types in actions - -If you have chosen a single data type for semantic values, the @code{$$} -and @code{$@var{n}} constructs always have that data type. - -If you have used @code{%union} to specify a variety of data types, then you -must declare a choice among these types for each terminal or nonterminal -symbol that can have a semantic value. Then each time you use @code{$$} or -@code{$@var{n}}, its data type is determined by which symbol it refers to -in the rule. In this example, - -@example -@group -exp: - @dots{} -| exp '+' exp @{ $$ = $1 + $3; @} -@end group -@end example - -@noindent -@code{$1} and @code{$3} refer to instances of @code{exp}, so they all -have the data type declared for the nonterminal symbol @code{exp}. If -@code{$2} were used, it would have the data type declared for the -terminal symbol @code{'+'}, whatever that might be. - -Alternatively, you can specify the data type when you refer to the value, -by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the -reference. For example, if you have defined types as shown here: - -@example -@group -%union @{ - int itype; - double dtype; -@} -@end group -@end example - -@noindent -then you can write @code{$1} to refer to the first subunit of the -rule as an integer, or @code{$1} to refer to it as a double. - -@node Mid-Rule Actions -@subsection Actions in Mid-Rule -@cindex actions in mid-rule -@cindex mid-rule actions - -Occasionally it is useful to put an action in the middle of a rule. -These actions are written just like usual end-of-rule actions, but they -are executed before the parser even recognizes the following components. - -A mid-rule action may refer to the components preceding it using -@code{$@var{n}}, but it may not refer to subsequent components because -it is run before they are parsed. - -The mid-rule action itself counts as one of the components of the rule. -This makes a difference when there is another action later in the same rule -(and usually there is another at the end): you have to count the actions -along with the symbols when working out which number @var{n} to use in -@code{$@var{n}}. - -The mid-rule action can also have a semantic value. The action can set -its value with an assignment to @code{$$}, and actions later in the rule -can refer to the value using @code{$@var{n}}. Since there is no symbol -to name the action, there is no way to declare a data type for the value -in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to -specify a data type each time you refer to this value. - -There is no way to set the value of the entire rule with a mid-rule -action, because assignments to @code{$$} do not have that effect. The -only way to set the value for the entire rule is with an ordinary action -at the end of the rule. - -Here is an example from a hypothetical compiler, handling a @code{let} -statement that looks like @samp{let (@var{variable}) @var{statement}} and -serves to create a variable named @var{variable} temporarily for the -duration of @var{statement}. To parse this construct, we must put -@var{variable} into the symbol table while @var{statement} is parsed, then -remove it afterward. Here is how it is done: - -@example -@group -stmt: - LET '(' var ')' - @{ $$ = push_context (); declare_variable ($3); @} - stmt - @{ $$ = $6; pop_context ($5); @} -@end group -@end example - -@noindent -As soon as @samp{let (@var{variable})} has been recognized, the first -action is run. It saves a copy of the current semantic context (the -list of accessible variables) as its semantic value, using alternative -@code{context} in the data-type union. Then it calls -@code{declare_variable} to add the new variable to that list. Once the -first action is finished, the embedded statement @code{stmt} can be -parsed. Note that the mid-rule action is component number 5, so the -@samp{stmt} is component number 6. - -After the embedded statement is parsed, its semantic value becomes the -value of the entire @code{let}-statement. Then the semantic value from the -earlier action is used to restore the prior list of variables. This -removes the temporary @code{let}-variable from the list so that it won't -appear to exist while the rest of the program is parsed. - -@findex %destructor -@cindex discarded symbols, mid-rule actions -@cindex error recovery, mid-rule actions -In the above example, if the parser initiates error recovery (@pxref{Error -Recovery}) while parsing the tokens in the embedded statement @code{stmt}, -it might discard the previous semantic context @code{$5} without -restoring it. -Thus, @code{$5} needs a destructor (@pxref{Destructor Decl, , Freeing -Discarded Symbols}). -However, Bison currently provides no means to declare a destructor specific to -a particular mid-rule action's semantic value. - -One solution is to bury the mid-rule action inside a nonterminal symbol and to -declare a destructor for that symbol: - -@example -@group -%type let -%destructor @{ pop_context ($$); @} let - -%% - -stmt: - let stmt - @{ - $$ = $2; - pop_context ($1); - @}; - -let: - LET '(' var ')' - @{ - $$ = push_context (); - declare_variable ($3); - @}; - -@end group -@end example - -@noindent -Note that the action is now at the end of its rule. -Any mid-rule action can be converted to an end-of-rule action in this way, and -this is what Bison actually does to implement mid-rule actions. - -Taking action before a rule is completely recognized often leads to -conflicts since the parser must commit to a parse in order to execute the -action. For example, the following two rules, without mid-rule actions, -can coexist in a working parser because the parser can shift the open-brace -token and look at what follows before deciding whether there is a -declaration or not: - -@example -@group -compound: - '@{' declarations statements '@}' -| '@{' statements '@}' -; -@end group -@end example - -@noindent -But when we add a mid-rule action as follows, the rules become nonfunctional: - -@example -@group -compound: - @{ prepare_for_local_variables (); @} - '@{' declarations statements '@}' -@end group -@group -| '@{' statements '@}' -; -@end group -@end example - -@noindent -Now the parser is forced to decide whether to run the mid-rule action -when it has read no farther than the open-brace. In other words, it -must commit to using one rule or the other, without sufficient -information to do it correctly. (The open-brace token is what is called -the @dfn{lookahead} token at this time, since the parser is still -deciding what to do about it. @xref{Lookahead, ,Lookahead Tokens}.) - -You might think that you could correct the problem by putting identical -actions into the two rules, like this: - -@example -@group -compound: - @{ prepare_for_local_variables (); @} - '@{' declarations statements '@}' -| @{ prepare_for_local_variables (); @} - '@{' statements '@}' -; -@end group -@end example - -@noindent -But this does not help, because Bison does not realize that the two actions -are identical. (Bison never tries to understand the C code in an action.) - -If the grammar is such that a declaration can be distinguished from a -statement by the first token (which is true in C), then one solution which -does work is to put the action after the open-brace, like this: - -@example -@group -compound: - '@{' @{ prepare_for_local_variables (); @} - declarations statements '@}' -| '@{' statements '@}' -; -@end group -@end example - -@noindent -Now the first token of the following declaration or statement, -which would in any case tell Bison which rule to use, can still do so. - -Another solution is to bury the action inside a nonterminal symbol which -serves as a subroutine: - -@example -@group -subroutine: - /* empty */ @{ prepare_for_local_variables (); @} -; -@end group - -@group -compound: - subroutine '@{' declarations statements '@}' -| subroutine '@{' statements '@}' -; -@end group -@end example - -@noindent -Now Bison can execute the action in the rule for @code{subroutine} without -deciding which rule for @code{compound} it will eventually use. - -@node Tracking Locations -@section Tracking Locations -@cindex location -@cindex textual location -@cindex location, textual - -Though grammar rules and semantic actions are enough to write a fully -functional parser, it can be useful to process some additional information, -especially symbol locations. - -The way locations are handled is defined by providing a data type, and -actions to take when rules are matched. - -@menu -* Location Type:: Specifying a data type for locations. -* Actions and Locations:: Using locations in actions. -* Location Default Action:: Defining a general way to compute locations. -@end menu - -@node Location Type -@subsection Data Type of Locations -@cindex data type of locations -@cindex default location type - -Defining a data type for locations is much simpler than for semantic values, -since all tokens and groupings always use the same type. - -You can specify the type of locations by defining a macro called -@code{YYLTYPE}, just as you can specify the semantic value type by -defining a @code{YYSTYPE} macro (@pxref{Value Type}). -When @code{YYLTYPE} is not defined, Bison uses a default structure type with -four members: - -@example -typedef struct YYLTYPE -@{ - int first_line; - int first_column; - int last_line; - int last_column; -@} YYLTYPE; -@end example - -When @code{YYLTYPE} is not defined, at the beginning of the parsing, Bison -initializes all these fields to 1 for @code{yylloc}. To initialize -@code{yylloc} with a custom location type (or to chose a different -initialization), use the @code{%initial-action} directive. @xref{Initial -Action Decl, , Performing Actions before Parsing}. - -@node Actions and Locations -@subsection Actions and Locations -@cindex location actions -@cindex actions, location -@vindex @@$ -@vindex @@@var{n} -@vindex @@@var{name} -@vindex @@[@var{name}] - -Actions are not only useful for defining language semantics, but also for -describing the behavior of the output parser with locations. - -The most obvious way for building locations of syntactic groupings is very -similar to the way semantic values are computed. In a given rule, several -constructs can be used to access the locations of the elements being matched. -The location of the @var{n}th component of the right hand side is -@code{@@@var{n}}, while the location of the left hand side grouping is -@code{@@$}. - -In addition, the named references construct @code{@@@var{name}} and -@code{@@[@var{name}]} may also be used to address the symbol locations. -@xref{Named References}, for more information about using the named -references construct. - -Here is a basic example using the default data type for locations: - -@example -@group -exp: - @dots{} -| exp '/' exp - @{ - @@$.first_column = @@1.first_column; - @@$.first_line = @@1.first_line; - @@$.last_column = @@3.last_column; - @@$.last_line = @@3.last_line; - if ($3) - $$ = $1 / $3; - else - @{ - $$ = 1; - fprintf (stderr, - "Division by zero, l%d,c%d-l%d,c%d", - @@3.first_line, @@3.first_column, - @@3.last_line, @@3.last_column); - @} - @} -@end group -@end example - -As for semantic values, there is a default action for locations that is -run each time a rule is matched. It sets the beginning of @code{@@$} to the -beginning of the first symbol, and the end of @code{@@$} to the end of the -last symbol. - -With this default action, the location tracking can be fully automatic. The -example above simply rewrites this way: - -@example -@group -exp: - @dots{} -| exp '/' exp - @{ - if ($3) - $$ = $1 / $3; - else - @{ - $$ = 1; - fprintf (stderr, - "Division by zero, l%d,c%d-l%d,c%d", - @@3.first_line, @@3.first_column, - @@3.last_line, @@3.last_column); - @} - @} -@end group -@end example - -@vindex yylloc -It is also possible to access the location of the lookahead token, if any, -from a semantic action. -This location is stored in @code{yylloc}. -@xref{Action Features, ,Special Features for Use in Actions}. - -@node Location Default Action -@subsection Default Action for Locations -@vindex YYLLOC_DEFAULT -@cindex GLR parsers and @code{YYLLOC_DEFAULT} - -Actually, actions are not the best place to compute locations. Since -locations are much more general than semantic values, there is room in -the output parser to redefine the default action to take for each -rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is -matched, before the associated action is run. It is also invoked -while processing a syntax error, to compute the error's location. -Before reporting an unresolvable syntactic ambiguity, a GLR -parser invokes @code{YYLLOC_DEFAULT} recursively to compute the location -of that ambiguity. - -Most of the time, this macro is general enough to suppress location -dedicated code from semantic actions. - -The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is -the location of the grouping (the result of the computation). When a -rule is matched, the second parameter identifies locations of -all right hand side elements of the rule being matched, and the third -parameter is the size of the rule's right hand side. -When a GLR parser reports an ambiguity, which of multiple candidate -right hand sides it passes to @code{YYLLOC_DEFAULT} is undefined. -When processing a syntax error, the second parameter identifies locations -of the symbols that were discarded during error processing, and the third -parameter is the number of discarded symbols. - -By default, @code{YYLLOC_DEFAULT} is defined this way: - -@example -@group -# define YYLLOC_DEFAULT(Cur, Rhs, N) \ -do \ - if (N) \ - @{ \ - (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \ - (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \ - (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \ - (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \ - @} \ - else \ - @{ \ - (Cur).first_line = (Cur).last_line = \ - YYRHSLOC(Rhs, 0).last_line; \ - (Cur).first_column = (Cur).last_column = \ - YYRHSLOC(Rhs, 0).last_column; \ - @} \ -while (0) -@end group -@end example - -@noindent -where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol -in @var{rhs} when @var{k} is positive, and the location of the symbol -just before the reduction when @var{k} and @var{n} are both zero. - -When defining @code{YYLLOC_DEFAULT}, you should consider that: - -@itemize @bullet -@item -All arguments are free of side-effects. However, only the first one (the -result) should be modified by @code{YYLLOC_DEFAULT}. - -@item -For consistency with semantic actions, valid indexes within the -right hand side range from 1 to @var{n}. When @var{n} is zero, only 0 is a -valid index, and it refers to the symbol just before the reduction. -During error processing @var{n} is always positive. - -@item -Your macro should parenthesize its arguments, if need be, since the -actual arguments may not be surrounded by parentheses. Also, your -macro should expand to something that can be used as a single -statement when it is followed by a semicolon. -@end itemize - -@node Named References -@section Named References -@cindex named references - -As described in the preceding sections, the traditional way to refer to any -semantic value or location is a @dfn{positional reference}, which takes the -form @code{$@var{n}}, @code{$$}, @code{@@@var{n}}, and @code{@@$}. However, -such a reference is not very descriptive. Moreover, if you later decide to -insert or remove symbols in the right-hand side of a grammar rule, the need -to renumber such references can be tedious and error-prone. - -To avoid these issues, you can also refer to a semantic value or location -using a @dfn{named reference}. First of all, original symbol names may be -used as named references. For example: - -@example -@group -invocation: op '(' args ')' - @{ $invocation = new_invocation ($op, $args, @@invocation); @} -@end group -@end example - -@noindent -Positional and named references can be mixed arbitrarily. For example: - -@example -@group -invocation: op '(' args ')' - @{ $$ = new_invocation ($op, $args, @@$); @} -@end group -@end example - -@noindent -However, sometimes regular symbol names are not sufficient due to -ambiguities: - -@example -@group -exp: exp '/' exp - @{ $exp = $exp / $exp; @} // $exp is ambiguous. - -exp: exp '/' exp - @{ $$ = $1 / $exp; @} // One usage is ambiguous. - -exp: exp '/' exp - @{ $$ = $1 / $3; @} // No error. -@end group -@end example - -@noindent -When ambiguity occurs, explicitly declared names may be used for values and -locations. Explicit names are declared as a bracketed name after a symbol -appearance in rule definitions. For example: -@example -@group -exp[result]: exp[left] '/' exp[right] - @{ $result = $left / $right; @} -@end group -@end example - -@noindent -In order to access a semantic value generated by a mid-rule action, an -explicit name may also be declared by putting a bracketed name after the -closing brace of the mid-rule action code: -@example -@group -exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right] - @{ $res = $left + $right; @} -@end group -@end example - -@noindent - -In references, in order to specify names containing dots and dashes, an explicit -bracketed syntax @code{$[name]} and @code{@@[name]} must be used: -@example -@group -if-stmt: "if" '(' expr ')' "then" then.stmt ';' - @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @} -@end group -@end example - -It often happens that named references are followed by a dot, dash or other -C punctuation marks and operators. By default, Bison will read -@samp{$name.suffix} as a reference to symbol value @code{$name} followed by -@samp{.suffix}, i.e., an access to the @code{suffix} field of the semantic -value. In order to force Bison to recognize @samp{name.suffix} in its -entirety as the name of a semantic value, the bracketed syntax -@samp{$[name.suffix]} must be used. - -The named references feature is experimental. More user feedback will help -to stabilize it. - -@node Declarations -@section Bison Declarations -@cindex declarations, Bison -@cindex Bison declarations - -The @dfn{Bison declarations} section of a Bison grammar defines the symbols -used in formulating the grammar and the data types of semantic values. -@xref{Symbols}. - -All token type names (but not single-character literal tokens such as -@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be -declared if you need to specify which data type to use for the semantic -value (@pxref{Multiple Types, ,More Than One Value Type}). - -The first rule in the grammar file also specifies the start symbol, by -default. If you want some other symbol to be the start symbol, you -must declare it explicitly (@pxref{Language and Grammar, ,Languages -and Context-Free Grammars}). - -@menu -* Require Decl:: Requiring a Bison version. -* Token Decl:: Declaring terminal symbols. -* Precedence Decl:: Declaring terminals with precedence and associativity. -* Union Decl:: Declaring the set of all semantic value types. -* Type Decl:: Declaring the choice of type for a nonterminal symbol. -* Initial Action Decl:: Code run before parsing starts. -* Destructor Decl:: Declaring how symbols are freed. -* Printer Decl:: Declaring how symbol values are displayed. -* Expect Decl:: Suppressing warnings about parsing conflicts. -* Start Decl:: Specifying the start symbol. -* Pure Decl:: Requesting a reentrant parser. -* Push Decl:: Requesting a push parser. -* Decl Summary:: Table of all Bison declarations. -* %define Summary:: Defining variables to adjust Bison's behavior. -* %code Summary:: Inserting code into the parser source. -@end menu - -@node Require Decl -@subsection Require a Version of Bison -@cindex version requirement -@cindex requiring a version of Bison -@findex %require - -You may require the minimum version of Bison to process the grammar. If -the requirement is not met, @command{bison} exits with an error (exit -status 63). - -@example -%require "@var{version}" -@end example - -@node Token Decl -@subsection Token Type Names -@cindex declaring token type names -@cindex token type names, declaring -@cindex declaring literal string tokens -@findex %token - -The basic way to declare a token type name (terminal symbol) is as follows: - -@example -%token @var{name} -@end example - -Bison will convert this into a @code{#define} directive in -the parser, so that the function @code{yylex} (if it is in this file) -can use the name @var{name} to stand for this token type's code. - -Alternatively, you can use @code{%left}, @code{%right}, or -@code{%nonassoc} instead of @code{%token}, if you wish to specify -associativity and precedence. @xref{Precedence Decl, ,Operator -Precedence}. - -You can explicitly specify the numeric code for a token type by appending -a nonnegative decimal or hexadecimal integer value in the field immediately -following the token name: - -@example -%token NUM 300 -%token XNUM 0x12d // a GNU extension -@end example - -@noindent -It is generally best, however, to let Bison choose the numeric codes for -all token types. Bison will automatically select codes that don't conflict -with each other or with normal characters. - -In the event that the stack type is a union, you must augment the -@code{%token} or other token declaration to include the data type -alternative delimited by angle-brackets (@pxref{Multiple Types, ,More -Than One Value Type}). - -For example: - -@example -@group -%union @{ /* define stack type */ - double val; - symrec *tptr; -@} -%token NUM /* define token NUM and its type */ -@end group -@end example - -You can associate a literal string token with a token type name by -writing the literal string at the end of a @code{%token} -declaration which declares the name. For example: - -@example -%token arrow "=>" -@end example - -@noindent -For example, a grammar for the C language might specify these names with -equivalent literal string tokens: - -@example -%token OR "||" -%token LE 134 "<=" -%left OR "<=" -@end example - -@noindent -Once you equate the literal string and the token name, you can use them -interchangeably in further declarations or the grammar rules. The -@code{yylex} function can use the token name or the literal string to -obtain the token type code number (@pxref{Calling Convention}). -Syntax error messages passed to @code{yyerror} from the parser will reference -the literal string instead of the token name. - -The token numbered as 0 corresponds to end of file; the following line -allows for nicer error messages referring to ``end of file'' instead -of ``$end'': - -@example -%token END 0 "end of file" -@end example - -@node Precedence Decl -@subsection Operator Precedence -@cindex precedence declarations -@cindex declaring operator precedence -@cindex operator precedence, declaring - -Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to -declare a token and specify its precedence and associativity, all at -once. These are called @dfn{precedence declarations}. -@xref{Precedence, ,Operator Precedence}, for general information on -operator precedence. - -The syntax of a precedence declaration is nearly the same as that of -@code{%token}: either - -@example -%left @var{symbols}@dots{} -@end example - -@noindent -or - -@example -%left <@var{type}> @var{symbols}@dots{} -@end example - -And indeed any of these declarations serves the purposes of @code{%token}. -But in addition, they specify the associativity and relative precedence for -all the @var{symbols}: - -@itemize @bullet -@item -The associativity of an operator @var{op} determines how repeated uses -of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op} -@var{z}} is parsed by grouping @var{x} with @var{y} first or by -grouping @var{y} with @var{z} first. @code{%left} specifies -left-associativity (grouping @var{x} with @var{y} first) and -@code{%right} specifies right-associativity (grouping @var{y} with -@var{z} first). @code{%nonassoc} specifies no associativity, which -means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is -considered a syntax error. - -@item -The precedence of an operator determines how it nests with other operators. -All the tokens declared in a single precedence declaration have equal -precedence and nest together according to their associativity. -When two tokens declared in different precedence declarations associate, -the one declared later has the higher precedence and is grouped first. -@end itemize - -For backward compatibility, there is a confusing difference between the -argument lists of @code{%token} and precedence declarations. -Only a @code{%token} can associate a literal string with a token type name. -A precedence declaration always interprets a literal string as a reference to a -separate token. -For example: - -@example -%left OR "<=" // Does not declare an alias. -%left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=". -@end example - -@node Union Decl -@subsection The Collection of Value Types -@cindex declaring value types -@cindex value types, declaring -@findex %union - -The @code{%union} declaration specifies the entire collection of -possible data types for semantic values. The keyword @code{%union} is -followed by braced code containing the same thing that goes inside a -@code{union} in C@. - -For example: - -@example -@group -%union @{ - double val; - symrec *tptr; -@} -@end group -@end example - -@noindent -This says that the two alternative types are @code{double} and @code{symrec -*}. They are given names @code{val} and @code{tptr}; these names are used -in the @code{%token} and @code{%type} declarations to pick one of the types -for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}). - -As an extension to POSIX, a tag is allowed after the -@code{union}. For example: - -@example -@group -%union value @{ - double val; - symrec *tptr; -@} -@end group -@end example - -@noindent -specifies the union tag @code{value}, so the corresponding C type is -@code{union value}. If you do not specify a tag, it defaults to -@code{YYSTYPE}. - -As another extension to POSIX, you may specify multiple -@code{%union} declarations; their contents are concatenated. However, -only the first @code{%union} declaration can specify a tag. - -Note that, unlike making a @code{union} declaration in C, you need not write -a semicolon after the closing brace. - -Instead of @code{%union}, you can define and use your own union type -@code{YYSTYPE} if your grammar contains at least one -@samp{<@var{type}>} tag. For example, you can put the following into -a header file @file{parser.h}: - -@example -@group -union YYSTYPE @{ - double val; - symrec *tptr; -@}; -typedef union YYSTYPE YYSTYPE; -@end group -@end example - -@noindent -and then your grammar can use the following -instead of @code{%union}: - -@example -@group -%@{ -#include "parser.h" -%@} -%type expr -%token ID -@end group -@end example - -@node Type Decl -@subsection Nonterminal Symbols -@cindex declaring value types, nonterminals -@cindex value types, nonterminals, declaring -@findex %type - -@noindent -When you use @code{%union} to specify multiple value types, you must -declare the value type of each nonterminal symbol for which values are -used. This is done with a @code{%type} declaration, like this: - -@example -%type <@var{type}> @var{nonterminal}@dots{} -@end example - -@noindent -Here @var{nonterminal} is the name of a nonterminal symbol, and -@var{type} is the name given in the @code{%union} to the alternative -that you want (@pxref{Union Decl, ,The Collection of Value Types}). You -can give any number of nonterminal symbols in the same @code{%type} -declaration, if they have the same value type. Use spaces to separate -the symbol names. - -You can also declare the value type of a terminal symbol. To do this, -use the same @code{<@var{type}>} construction in a declaration for the -terminal symbol. All kinds of token declarations allow -@code{<@var{type}>}. - -@node Initial Action Decl -@subsection Performing Actions before Parsing -@findex %initial-action - -Sometimes your parser needs to perform some initializations before -parsing. The @code{%initial-action} directive allows for such arbitrary -code. - -@deffn {Directive} %initial-action @{ @var{code} @} -@findex %initial-action -Declare that the braced @var{code} must be invoked before parsing each time -@code{yyparse} is called. The @var{code} may use @code{$$} and -@code{@@$} --- initial value and location of the lookahead --- and the -@code{%parse-param}. -@end deffn - -For instance, if your locations use a file name, you may use - -@example -%parse-param @{ char const *file_name @}; -%initial-action -@{ - @@$.initialize (file_name); -@}; -@end example - - -@node Destructor Decl -@subsection Freeing Discarded Symbols -@cindex freeing discarded symbols -@findex %destructor -@findex <*> -@findex <> -During error recovery (@pxref{Error Recovery}), symbols already pushed -on the stack and tokens coming from the rest of the file are discarded -until the parser falls on its feet. If the parser runs out of memory, -or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the -symbols on the stack must be discarded. Even if the parser succeeds, it -must discard the start symbol. - -When discarded symbols convey heap based information, this memory is -lost. While this behavior can be tolerable for batch parsers, such as -in traditional compilers, it is unacceptable for programs like shells or -protocol implementations that may parse and execute indefinitely. - -The @code{%destructor} directive defines code that is called when a -symbol is automatically discarded. - -@deffn {Directive} %destructor @{ @var{code} @} @var{symbols} -@findex %destructor -Invoke the braced @var{code} whenever the parser discards one of the -@var{symbols}. -Within @var{code}, @code{$$} designates the semantic value associated -with the discarded symbol, and @code{@@$} designates its location. -The additional parser parameters are also available (@pxref{Parser Function, , -The Parser Function @code{yyparse}}). - -When a symbol is listed among @var{symbols}, its @code{%destructor} is called a -per-symbol @code{%destructor}. -You may also define a per-type @code{%destructor} by listing a semantic type -tag among @var{symbols}. -In that case, the parser will invoke this @var{code} whenever it discards any -grammar symbol that has that semantic type tag unless that symbol has its own -per-symbol @code{%destructor}. - -Finally, you can define two different kinds of default @code{%destructor}s. -(These default forms are experimental. -More user feedback will help to determine whether they should become permanent -features.) -You can place each of @code{<*>} and @code{<>} in the @var{symbols} list of -exactly one @code{%destructor} declaration in your grammar file. -The parser will invoke the @var{code} associated with one of these whenever it -discards any user-defined grammar symbol that has no per-symbol and no per-type -@code{%destructor}. -The parser uses the @var{code} for @code{<*>} in the case of such a grammar -symbol for which you have formally declared a semantic type tag (@code{%type} -counts as such a declaration, but @code{$$} does not). -The parser uses the @var{code} for @code{<>} in the case of such a grammar -symbol that has no declared semantic type tag. -@end deffn - -@noindent -For example: - -@example -%union @{ char *string; @} -%token STRING1 -%token STRING2 -%type string1 -%type string2 -%union @{ char character; @} -%token CHR -%type chr -%token TAGLESS - -%destructor @{ @} -%destructor @{ free ($$); @} <*> -%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1 -%destructor @{ printf ("Discarding tagless symbol.\n"); @} <> -@end example - -@noindent -guarantees that, when the parser discards any user-defined symbol that has a -semantic type tag other than @code{}, it passes its semantic value -to @code{free} by default. -However, when the parser discards a @code{STRING1} or a @code{string1}, it also -prints its line number to @code{stdout}. -It performs only the second @code{%destructor} in this case, so it invokes -@code{free} only once. -Finally, the parser merely prints a message whenever it discards any symbol, -such as @code{TAGLESS}, that has no semantic type tag. - -A Bison-generated parser invokes the default @code{%destructor}s only for -user-defined as opposed to Bison-defined symbols. -For example, the parser will not invoke either kind of default -@code{%destructor} for the special Bison-defined symbols @code{$accept}, -@code{$undefined}, or @code{$end} (@pxref{Table of Symbols, ,Bison Symbols}), -none of which you can reference in your grammar. -It also will not invoke either for the @code{error} token (@pxref{Table of -Symbols, ,error}), which is always defined by Bison regardless of whether you -reference it in your grammar. -However, it may invoke one of them for the end token (token 0) if you -redefine it from @code{$end} to, for example, @code{END}: - -@example -%token END 0 -@end example - -@cindex actions in mid-rule -@cindex mid-rule actions -Finally, Bison will never invoke a @code{%destructor} for an unreferenced -mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}). -That is, Bison does not consider a mid-rule to have a semantic value if you -do not reference @code{$$} in the mid-rule's action or @code{$@var{n}} -(where @var{n} is the right-hand side symbol position of the mid-rule) in -any later action in that rule. However, if you do reference either, the -Bison-generated parser will invoke the @code{<>} @code{%destructor} whenever -it discards the mid-rule symbol. - -@ignore -@noindent -In the future, it may be possible to redefine the @code{error} token as a -nonterminal that captures the discarded symbols. -In that case, the parser will invoke the default destructor for it as well. -@end ignore - -@sp 1 - -@cindex discarded symbols -@dfn{Discarded symbols} are the following: - -@itemize -@item -stacked symbols popped during the first phase of error recovery, -@item -incoming terminals during the second phase of error recovery, -@item -the current lookahead and the entire stack (except the current -right-hand side symbols) when the parser returns immediately, and -@item -the start symbol, when the parser succeeds. -@end itemize - -The parser can @dfn{return immediately} because of an explicit call to -@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory -exhaustion. - -Right-hand side symbols of a rule that explicitly triggers a syntax -error via @code{YYERROR} are not discarded automatically. As a rule -of thumb, destructors are invoked only when user actions cannot manage -the memory. - -@node Printer Decl -@subsection Printing Semantic Values -@cindex printing semantic values -@findex %printer -@findex <*> -@findex <> -When run-time traces are enabled (@pxref{Tracing, ,Tracing Your Parser}), -the parser reports its actions, such as reductions. When a symbol involved -in an action is reported, only its kind is displayed, as the parser cannot -know how semantic values should be formatted. - -The @code{%printer} directive defines code that is called when a symbol is -reported. Its syntax is the same as @code{%destructor} (@pxref{Destructor -Decl, , Freeing Discarded Symbols}). - -@deffn {Directive} %printer @{ @var{code} @} @var{symbols} -@findex %printer -@vindex yyoutput -@c This is the same text as for %destructor. -Invoke the braced @var{code} whenever the parser displays one of the -@var{symbols}. Within @var{code}, @code{yyoutput} denotes the output stream -(a @code{FILE*} in C, and an @code{std::ostream&} in C++), -@code{$$} designates the semantic value associated with the symbol, and -@code{@@$} its location. The additional parser parameters are also -available (@pxref{Parser Function, , The Parser Function @code{yyparse}}). - -The @var{symbols} are defined as for @code{%destructor} (@pxref{Destructor -Decl, , Freeing Discarded Symbols}.): they can be per-type (e.g., -@samp{}), per-symbol (e.g., @samp{exp}, @samp{NUM}, @samp{"float"}), -typed per-default (i.e., @samp{<*>}, or untyped per-default (i.e., -@samp{<>}). -@end deffn - -@noindent -For example: - -@example -%union @{ char *string; @} -%token STRING1 -%token STRING2 -%type string1 -%type string2 -%union @{ char character; @} -%token CHR -%type chr -%token TAGLESS - -%printer @{ fprintf (yyoutput, "'%c'", $$); @} -%printer @{ fprintf (yyoutput, "&%p", $$); @} <*> -%printer @{ fprintf (yyoutput, "\"%s\"", $$); @} STRING1 string1 -%printer @{ fprintf (yyoutput, "<>"); @} <> -@end example - -@noindent -guarantees that, when the parser print any symbol that has a semantic type -tag other than @code{}, it display the address of the semantic -value by default. However, when the parser displays a @code{STRING1} or a -@code{string1}, it formats it as a string in double quotes. It performs -only the second @code{%printer} in this case, so it prints only once. -Finally, the parser print @samp{<>} for any symbol, such as @code{TAGLESS}, -that has no semantic type tag. See also - - -@node Expect Decl -@subsection Suppressing Conflict Warnings -@cindex suppressing conflict warnings -@cindex preventing warnings about conflicts -@cindex warnings, preventing -@cindex conflicts, suppressing warnings of -@findex %expect -@findex %expect-rr - -Bison normally warns if there are any conflicts in the grammar -(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars -have harmless shift/reduce conflicts which are resolved in a predictable -way and would be difficult to eliminate. It is desirable to suppress -the warning about these conflicts unless the number of conflicts -changes. You can do this with the @code{%expect} declaration. - -The declaration looks like this: - -@example -%expect @var{n} -@end example - -Here @var{n} is a decimal integer. The declaration says there should -be @var{n} shift/reduce conflicts and no reduce/reduce conflicts. -Bison reports an error if the number of shift/reduce conflicts differs -from @var{n}, or if there are any reduce/reduce conflicts. - -For deterministic parsers, reduce/reduce conflicts are more -serious, and should be eliminated entirely. Bison will always report -reduce/reduce conflicts for these parsers. With GLR -parsers, however, both kinds of conflicts are routine; otherwise, -there would be no need to use GLR parsing. Therefore, it is -also possible to specify an expected number of reduce/reduce conflicts -in GLR parsers, using the declaration: - -@example -%expect-rr @var{n} -@end example - -In general, using @code{%expect} involves these steps: - -@itemize @bullet -@item -Compile your grammar without @code{%expect}. Use the @samp{-v} option -to get a verbose list of where the conflicts occur. Bison will also -print the number of conflicts. - -@item -Check each of the conflicts to make sure that Bison's default -resolution is what you really want. If not, rewrite the grammar and -go back to the beginning. - -@item -Add an @code{%expect} declaration, copying the number @var{n} from the -number which Bison printed. With GLR parsers, add an -@code{%expect-rr} declaration as well. -@end itemize - -Now Bison will report an error if you introduce an unexpected conflict, -but will keep silent otherwise. - -@node Start Decl -@subsection The Start-Symbol -@cindex declaring the start symbol -@cindex start symbol, declaring -@cindex default start symbol -@findex %start - -Bison assumes by default that the start symbol for the grammar is the first -nonterminal specified in the grammar specification section. The programmer -may override this restriction with the @code{%start} declaration as follows: - -@example -%start @var{symbol} -@end example - -@node Pure Decl -@subsection A Pure (Reentrant) Parser -@cindex reentrant parser -@cindex pure parser -@findex %define api.pure - -A @dfn{reentrant} program is one which does not alter in the course of -execution; in other words, it consists entirely of @dfn{pure} (read-only) -code. Reentrancy is important whenever asynchronous execution is possible; -for example, a nonreentrant program may not be safe to call from a signal -handler. In systems with multiple threads of control, a nonreentrant -program must be called only within interlocks. - -Normally, Bison generates a parser which is not reentrant. This is -suitable for most uses, and it permits compatibility with Yacc. (The -standard Yacc interfaces are inherently nonreentrant, because they use -statically allocated variables for communication with @code{yylex}, -including @code{yylval} and @code{yylloc}.) - -Alternatively, you can generate a pure, reentrant parser. The Bison -declaration @code{%define api.pure} says that you want the parser to be -reentrant. It looks like this: - -@example -%define api.pure -@end example - -The result is that the communication variables @code{yylval} and -@code{yylloc} become local variables in @code{yyparse}, and a different -calling convention is used for the lexical analyzer function -@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure -Parsers}, for the details of this. The variable @code{yynerrs} -becomes local in @code{yyparse} in pull mode but it becomes a member -of yypstate in push mode. (@pxref{Error Reporting, ,The Error -Reporting Function @code{yyerror}}). The convention for calling -@code{yyparse} itself is unchanged. - -Whether the parser is pure has nothing to do with the grammar rules. -You can generate either a pure parser or a nonreentrant parser from any -valid grammar. - -@node Push Decl -@subsection A Push Parser -@cindex push parser -@cindex push parser -@findex %define api.push-pull - -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) - -A pull parser is called once and it takes control until all its input -is completely parsed. A push parser, on the other hand, is called -each time a new token is made available. - -A push parser is typically useful when the parser is part of a -main event loop in the client's application. This is typically -a requirement of a GUI, when the main event loop needs to be triggered -within a certain time period. - -Normally, Bison generates a pull parser. -The following Bison declaration says that you want the parser to be a push -parser (@pxref{%define Summary,,api.push-pull}): - -@example -%define api.push-pull push -@end example - -In almost all cases, you want to ensure that your push parser is also -a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). The only -time you should create an impure push parser is to have backwards -compatibility with the impure Yacc pull mode interface. Unless you know -what you are doing, your declarations should look like this: - -@example -%define api.pure -%define api.push-pull push -@end example - -There is a major notable functional difference between the pure push parser -and the impure push parser. It is acceptable for a pure push parser to have -many parser instances, of the same type of parser, in memory at the same time. -An impure push parser should only use one parser at a time. - -When a push parser is selected, Bison will generate some new symbols in -the generated parser. @code{yypstate} is a structure that the generated -parser uses to store the parser's state. @code{yypstate_new} is the -function that will create a new parser instance. @code{yypstate_delete} -will free the resources associated with the corresponding parser instance. -Finally, @code{yypush_parse} is the function that should be called whenever a -token is available to provide the parser. A trivial example -of using a pure push parser would look like this: - -@example -int status; -yypstate *ps = yypstate_new (); -do @{ - status = yypush_parse (ps, yylex (), NULL); -@} while (status == YYPUSH_MORE); -yypstate_delete (ps); -@end example - -If the user decided to use an impure push parser, a few things about -the generated parser will change. The @code{yychar} variable becomes -a global variable instead of a variable in the @code{yypush_parse} function. -For this reason, the signature of the @code{yypush_parse} function is -changed to remove the token as a parameter. A nonreentrant push parser -example would thus look like this: - -@example -extern int yychar; -int status; -yypstate *ps = yypstate_new (); -do @{ - yychar = yylex (); - status = yypush_parse (ps); -@} while (status == YYPUSH_MORE); -yypstate_delete (ps); -@end example - -That's it. Notice the next token is put into the global variable @code{yychar} -for use by the next invocation of the @code{yypush_parse} function. - -Bison also supports both the push parser interface along with the pull parser -interface in the same generated parser. In order to get this functionality, -you should replace the @code{%define api.push-pull push} declaration with the -@code{%define api.push-pull both} declaration. Doing this will create all of -the symbols mentioned earlier along with the two extra symbols, @code{yyparse} -and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally -would be used. However, the user should note that it is implemented in the -generated parser by calling @code{yypull_parse}. -This makes the @code{yyparse} function that is generated with the -@code{%define api.push-pull both} declaration slower than the normal -@code{yyparse} function. If the user -calls the @code{yypull_parse} function it will parse the rest of the input -stream. It is possible to @code{yypush_parse} tokens to select a subgrammar -and then @code{yypull_parse} the rest of the input stream. If you would like -to switch back and forth between between parsing styles, you would have to -write your own @code{yypull_parse} function that knows when to quit looking -for input. An example of using the @code{yypull_parse} function would look -like this: - -@example -yypstate *ps = yypstate_new (); -yypull_parse (ps); /* Will call the lexer */ -yypstate_delete (ps); -@end example - -Adding the @code{%define api.pure} declaration does exactly the same thing to -the generated parser with @code{%define api.push-pull both} as it did for -@code{%define api.push-pull push}. - -@node Decl Summary -@subsection Bison Declaration Summary -@cindex Bison declaration summary -@cindex declaration summary -@cindex summary, Bison declaration - -Here is a summary of the declarations used to define a grammar: - -@deffn {Directive} %union -Declare the collection of data types that semantic values may have -(@pxref{Union Decl, ,The Collection of Value Types}). -@end deffn - -@deffn {Directive} %token -Declare a terminal symbol (token type name) with no precedence -or associativity specified (@pxref{Token Decl, ,Token Type Names}). -@end deffn - -@deffn {Directive} %right -Declare a terminal symbol (token type name) that is right-associative -(@pxref{Precedence Decl, ,Operator Precedence}). -@end deffn - -@deffn {Directive} %left -Declare a terminal symbol (token type name) that is left-associative -(@pxref{Precedence Decl, ,Operator Precedence}). -@end deffn - -@deffn {Directive} %nonassoc -Declare a terminal symbol (token type name) that is nonassociative -(@pxref{Precedence Decl, ,Operator Precedence}). -Using it in a way that would be associative is a syntax error. -@end deffn - -@ifset defaultprec -@deffn {Directive} %default-prec -Assign a precedence to rules lacking an explicit @code{%prec} modifier -(@pxref{Contextual Precedence, ,Context-Dependent Precedence}). -@end deffn -@end ifset - -@deffn {Directive} %type -Declare the type of semantic values for a nonterminal symbol -(@pxref{Type Decl, ,Nonterminal Symbols}). -@end deffn - -@deffn {Directive} %start -Specify the grammar's start symbol (@pxref{Start Decl, ,The -Start-Symbol}). -@end deffn - -@deffn {Directive} %expect -Declare the expected number of shift-reduce conflicts -(@pxref{Expect Decl, ,Suppressing Conflict Warnings}). -@end deffn - - -@sp 1 -@noindent -In order to change the behavior of @command{bison}, use the following -directives: - -@deffn {Directive} %code @{@var{code}@} -@deffnx {Directive} %code @var{qualifier} @{@var{code}@} -@findex %code -Insert @var{code} verbatim into the output parser source at the -default location or at the location specified by @var{qualifier}. -@xref{%code Summary}. -@end deffn - -@deffn {Directive} %debug -In the parser implementation file, define the macro @code{YYDEBUG} to -1 if it is not already defined, so that the debugging facilities are -compiled. @xref{Tracing, ,Tracing Your Parser}. -@end deffn - -@deffn {Directive} %define @var{variable} -@deffnx {Directive} %define @var{variable} @var{value} -@deffnx {Directive} %define @var{variable} "@var{value}" -Define a variable to adjust Bison's behavior. @xref{%define Summary}. -@end deffn - -@deffn {Directive} %defines -Write a parser header file containing macro definitions for the token -type names defined in the grammar as well as a few other declarations. -If the parser implementation file is named @file{@var{name}.c} then -the parser header file is named @file{@var{name}.h}. - -For C parsers, the parser header file declares @code{YYSTYPE} unless -@code{YYSTYPE} is already defined as a macro or you have used a -@code{<@var{type}>} tag without using @code{%union}. Therefore, if -you are using a @code{%union} (@pxref{Multiple Types, ,More Than One -Value Type}) with components that require other definitions, or if you -have defined a @code{YYSTYPE} macro or type definition (@pxref{Value -Type, ,Data Types of Semantic Values}), you need to arrange for these -definitions to be propagated to all modules, e.g., by putting them in -a prerequisite header that is included both by your parser and by any -other module that needs @code{YYSTYPE}. - -Unless your parser is pure, the parser header file declares -@code{yylval} as an external variable. @xref{Pure Decl, ,A Pure -(Reentrant) Parser}. - -If you have also used locations, the parser header file declares -@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of the -@code{YYSTYPE} macro and @code{yylval}. @xref{Tracking Locations}. - -This parser header file is normally essential if you wish to put the -definition of @code{yylex} in a separate source file, because -@code{yylex} typically needs to be able to refer to the -above-mentioned declarations and to the token type codes. @xref{Token -Values, ,Semantic Values of Tokens}. - -@findex %code requires -@findex %code provides -If you have declared @code{%code requires} or @code{%code provides}, the output -header also contains their code. -@xref{%code Summary}. -@end deffn - -@deffn {Directive} %defines @var{defines-file} -Same as above, but save in the file @var{defines-file}. -@end deffn - -@deffn {Directive} %destructor -Specify how the parser should reclaim the memory associated to -discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}. -@end deffn - -@deffn {Directive} %file-prefix "@var{prefix}" -Specify a prefix to use for all Bison output file names. The names -are chosen as if the grammar file were named @file{@var{prefix}.y}. -@end deffn - -@deffn {Directive} %language "@var{language}" -Specify the programming language for the generated parser. Currently -supported languages include C, C++, and Java. -@var{language} is case-insensitive. - -This directive is experimental and its effect may be modified in future -releases. -@end deffn - -@deffn {Directive} %locations -Generate the code processing the locations (@pxref{Action Features, -,Special Features for Use in Actions}). This mode is enabled as soon as -the grammar uses the special @samp{@@@var{n}} tokens, but if your -grammar does not use it, using @samp{%locations} allows for more -accurate syntax error messages. -@end deffn - -@deffn {Directive} %name-prefix "@var{prefix}" -Rename the external symbols used in the parser so that they start with -@var{prefix} instead of @samp{yy}. The precise list of symbols renamed -in C parsers -is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs}, -@code{yylval}, @code{yychar}, @code{yydebug}, and -(if locations are used) @code{yylloc}. If you use a push parser, -@code{yypush_parse}, @code{yypull_parse}, @code{yypstate}, -@code{yypstate_new} and @code{yypstate_delete} will -also be renamed. For example, if you use @samp{%name-prefix "c_"}, the -names become @code{c_parse}, @code{c_lex}, and so on. -For C++ parsers, see the @code{%define namespace} documentation in this -section. -@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. -@end deffn - -@ifset defaultprec -@deffn {Directive} %no-default-prec -Do not assign a precedence to rules lacking an explicit @code{%prec} -modifier (@pxref{Contextual Precedence, ,Context-Dependent -Precedence}). -@end deffn -@end ifset - -@deffn {Directive} %no-lines -Don't generate any @code{#line} preprocessor commands in the parser -implementation file. Ordinarily Bison writes these commands in the -parser implementation file so that the C compiler and debuggers will -associate errors and object code with your source file (the grammar -file). This directive causes them to associate errors with the parser -implementation file, treating it as an independent source file in its -own right. -@end deffn - -@deffn {Directive} %output "@var{file}" -Specify @var{file} for the parser implementation file. -@end deffn - -@deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{%define -Summary,,api.pure}), for which Bison is more careful to warn about -unreasonable usage. -@end deffn - -@deffn {Directive} %require "@var{version}" -Require version @var{version} or higher of Bison. @xref{Require Decl, , -Require a Version of Bison}. -@end deffn - -@deffn {Directive} %skeleton "@var{file}" -Specify the skeleton to use. - -@c You probably don't need this option unless you are developing Bison. -@c You should use @code{%language} if you want to specify the skeleton for a -@c different language, because it is clearer and because it will always choose the -@c correct skeleton for non-deterministic or push parsers. - -If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton -file in the Bison installation directory. -If it does, @var{file} is an absolute file name or a file name relative to the -directory of the grammar file. -This is similar to how most shells resolve commands. -@end deffn - -@deffn {Directive} %token-table -Generate an array of token names in the parser implementation file. -The name of the array is @code{yytname}; @code{yytname[@var{i}]} is -the name of the token whose internal Bison token code number is -@var{i}. The first three elements of @code{yytname} correspond to the -predefined tokens @code{"$end"}, @code{"error"}, and -@code{"$undefined"}; after these come the symbols defined in the -grammar file. - -The name in the table includes all the characters needed to represent -the token in Bison. For single-character literals and literal -strings, this includes the surrounding quoting characters and any -escape sequences. For example, the Bison single-character literal -@code{'+'} corresponds to a three-character name, represented in C as -@code{"'+'"}; and the Bison two-character literal string @code{"\\/"} -corresponds to a five-character name, represented in C as -@code{"\"\\\\/\""}. - -When you specify @code{%token-table}, Bison also generates macro -definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and -@code{YYNRULES}, and @code{YYNSTATES}: - -@table @code -@item YYNTOKENS -The highest token number, plus one. -@item YYNNTS -The number of nonterminal symbols. -@item YYNRULES -The number of grammar rules, -@item YYNSTATES -The number of parser states (@pxref{Parser States}). -@end table -@end deffn - -@deffn {Directive} %verbose -Write an extra output file containing verbose descriptions of the -parser states and what is done for each type of lookahead token in -that state. @xref{Understanding, , Understanding Your Parser}, for more -information. -@end deffn - -@deffn {Directive} %yacc -Pretend the option @option{--yacc} was given, i.e., imitate Yacc, -including its naming conventions. @xref{Bison Options}, for more. -@end deffn - - -@node %define Summary -@subsection %define Summary - -There are many features of Bison's behavior that can be controlled by -assigning the feature a single value. For historical reasons, some -such features are assigned values by dedicated directives, such as -@code{%start}, which assigns the start symbol. However, newer such -features are associated with variables, which are assigned by the -@code{%define} directive: - -@deffn {Directive} %define @var{variable} -@deffnx {Directive} %define @var{variable} @var{value} -@deffnx {Directive} %define @var{variable} "@var{value}" -Define @var{variable} to @var{value}. - -@var{value} must be placed in quotation marks if it contains any -character other than a letter, underscore, period, or non-initial dash -or digit. Omitting @code{"@var{value}"} entirely is always equivalent -to specifying @code{""}. - -It is an error if a @var{variable} is defined by @code{%define} -multiple times, but see @ref{Bison Options,,-D -@var{name}[=@var{value}]}. -@end deffn - -The rest of this section summarizes variables and values that -@code{%define} accepts. - -Some @var{variable}s take Boolean values. In this case, Bison will -complain if the variable definition does not meet one of the following -four conditions: - -@enumerate -@item @code{@var{value}} is @code{true} - -@item @code{@var{value}} is omitted (or @code{""} is specified). -This is equivalent to @code{true}. - -@item @code{@var{value}} is @code{false}. - -@item @var{variable} is never defined. -In this case, Bison selects a default value. -@end enumerate - -What @var{variable}s are accepted, as well as their meanings and default -values, depend on the selected target language and/or the parser -skeleton (@pxref{Decl Summary,,%language}, @pxref{Decl -Summary,,%skeleton}). -Unaccepted @var{variable}s produce an error. -Some of the accepted @var{variable}s are: - -@itemize @bullet -@c ================================================== api.pure -@item api.pure -@findex %define api.pure - -@itemize @bullet -@item Language(s): C - -@item Purpose: Request a pure (reentrant) parser program. -@xref{Pure Decl, ,A Pure (Reentrant) Parser}. - -@item Accepted Values: Boolean - -@item Default Value: @code{false} -@end itemize - -@item api.push-pull -@findex %define api.push-pull - -@itemize @bullet -@item Language(s): C (deterministic parsers only) - -@item Purpose: Request a pull parser, a push parser, or both. -@xref{Push Decl, ,A Push Parser}. -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) - -@item Accepted Values: @code{pull}, @code{push}, @code{both} - -@item Default Value: @code{pull} -@end itemize - -@c ================================================== lr.default-reductions - -@item lr.default-reductions -@findex %define lr.default-reductions - -@itemize @bullet -@item Language(s): all - -@item Purpose: Specify the kind of states that are permitted to -contain default reductions. @xref{Default Reductions}. (The ability to -specify where default reductions should be used is experimental. More user -feedback will help to stabilize it.) - -@item Accepted Values: @code{most}, @code{consistent}, @code{accepting} -@item Default Value: -@itemize -@item @code{accepting} if @code{lr.type} is @code{canonical-lr}. -@item @code{most} otherwise. -@end itemize -@end itemize - -@c ============================================ lr.keep-unreachable-states - -@item lr.keep-unreachable-states -@findex %define lr.keep-unreachable-states - -@itemize @bullet -@item Language(s): all -@item Purpose: Request that Bison allow unreachable parser states to -remain in the parser tables. @xref{Unreachable States}. -@item Accepted Values: Boolean -@item Default Value: @code{false} -@end itemize - -@c ================================================== lr.type - -@item lr.type -@findex %define lr.type - -@itemize @bullet -@item Language(s): all - -@item Purpose: Specify the type of parser tables within the -LR(1) family. @xref{LR Table Construction}. (This feature is experimental. -More user feedback will help to stabilize it.) - -@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr} - -@item Default Value: @code{lalr} -@end itemize - -@item namespace -@findex %define namespace - -@itemize -@item Languages(s): C++ - -@item Purpose: Specify the namespace for the parser class. -For example, if you specify: - -@smallexample -%define namespace "foo::bar" -@end smallexample - -Bison uses @code{foo::bar} verbatim in references such as: - -@smallexample -foo::bar::parser::semantic_type -@end smallexample - -However, to open a namespace, Bison removes any leading @code{::} and then -splits on any remaining occurrences: - -@smallexample -namespace foo @{ namespace bar @{ - class position; - class location; -@} @} -@end smallexample - -@item Accepted Values: Any absolute or relative C++ namespace reference without -a trailing @code{"::"}. -For example, @code{"foo"} or @code{"::foo::bar"}. - -@item Default Value: The value specified by @code{%name-prefix}, which defaults -to @code{yy}. -This usage of @code{%name-prefix} is for backward compatibility and can be -confusing since @code{%name-prefix} also specifies the textual prefix for the -lexical analyzer function. -Thus, if you specify @code{%name-prefix}, it is best to also specify -@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the -lexical analyzer function. -For example, if you specify: - -@smallexample -%define namespace "foo" -%name-prefix "bar::" -@end smallexample - -The parser namespace is @code{foo} and @code{yylex} is referenced as -@code{bar::lex}. -@end itemize - -@c ================================================== parse.lac -@item parse.lac -@findex %define parse.lac - -@itemize -@item Languages(s): C (deterministic parsers only) - -@item Purpose: Enable LAC (lookahead correction) to improve -syntax error handling. @xref{LAC}. -@item Accepted Values: @code{none}, @code{full} -@item Default Value: @code{none} -@end itemize -@end itemize - - -@node %code Summary -@subsection %code Summary -@findex %code -@cindex Prologue - -The @code{%code} directive inserts code verbatim into the output -parser source at any of a predefined set of locations. It thus serves -as a flexible and user-friendly alternative to the traditional Yacc -prologue, @code{%@{@var{code}%@}}. This section summarizes the -functionality of @code{%code} for the various target languages -supported by Bison. For a detailed discussion of how to use -@code{%code} in place of @code{%@{@var{code}%@}} for C/C++ and why it -is advantageous to do so, @pxref{Prologue Alternatives}. - -@deffn {Directive} %code @{@var{code}@} -This is the unqualified form of the @code{%code} directive. It -inserts @var{code} verbatim at a language-dependent default location -in the parser implementation. - -For C/C++, the default location is the parser implementation file -after the usual contents of the parser header file. Thus, the -unqualified form replaces @code{%@{@var{code}%@}} for most purposes. - -For Java, the default location is inside the parser class. -@end deffn - -@deffn {Directive} %code @var{qualifier} @{@var{code}@} -This is the qualified form of the @code{%code} directive. -@var{qualifier} identifies the purpose of @var{code} and thus the -location(s) where Bison should insert it. That is, if you need to -specify location-sensitive @var{code} that does not belong at the -default location selected by the unqualified @code{%code} form, use -this form instead. -@end deffn - -For any particular qualifier or for the unqualified form, if there are -multiple occurrences of the @code{%code} directive, Bison concatenates -the specified code in the order in which it appears in the grammar -file. - -Not all qualifiers are accepted for all target languages. Unaccepted -qualifiers produce an error. Some of the accepted qualifiers are: - -@itemize @bullet -@item requires -@findex %code requires - -@itemize @bullet -@item Language(s): C, C++ - -@item Purpose: This is the best place to write dependency code required for -@code{YYSTYPE} and @code{YYLTYPE}. -In other words, it's the best place to define types referenced in @code{%union} -directives, and it's the best place to override Bison's default @code{YYSTYPE} -and @code{YYLTYPE} definitions. - -@item Location(s): The parser header file and the parser implementation file -before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE} -definitions. -@end itemize - -@item provides -@findex %code provides - -@itemize @bullet -@item Language(s): C, C++ - -@item Purpose: This is the best place to write additional definitions and -declarations that should be provided to other modules. - -@item Location(s): The parser header file and the parser implementation -file after the Bison-generated @code{YYSTYPE}, @code{YYLTYPE}, and -token definitions. -@end itemize - -@item top -@findex %code top - -@itemize @bullet -@item Language(s): C, C++ - -@item Purpose: The unqualified @code{%code} or @code{%code requires} -should usually be more appropriate than @code{%code top}. However, -occasionally it is necessary to insert code much nearer the top of the -parser implementation file. For example: - -@example -%code top @{ - #define _GNU_SOURCE - #include -@} -@end example - -@item Location(s): Near the top of the parser implementation file. -@end itemize - -@item imports -@findex %code imports - -@itemize @bullet -@item Language(s): Java - -@item Purpose: This is the best place to write Java import directives. - -@item Location(s): The parser Java file after any Java package directive and -before any class definitions. -@end itemize -@end itemize - -Though we say the insertion locations are language-dependent, they are -technically skeleton-dependent. Writers of non-standard skeletons -however should choose their locations consistently with the behavior -of the standard Bison skeletons. - - -@node Multiple Parsers -@section Multiple Parsers in the Same Program - -Most programs that use Bison parse only one language and therefore contain -only one Bison parser. But what if you want to parse more than one -language with the same program? Then you need to avoid a name conflict -between different definitions of @code{yyparse}, @code{yylval}, and so on. - -The easy way to do this is to use the option @samp{-p @var{prefix}} -(@pxref{Invocation, ,Invoking Bison}). This renames the interface -functions and variables of the Bison parser to start with @var{prefix} -instead of @samp{yy}. You can use this to give each parser distinct -names that do not conflict. - -The precise list of symbols renamed is @code{yyparse}, @code{yylex}, -@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yylloc}, -@code{yychar} and @code{yydebug}. If you use a push parser, -@code{yypush_parse}, @code{yypull_parse}, @code{yypstate}, -@code{yypstate_new} and @code{yypstate_delete} will also be renamed. -For example, if you use @samp{-p c}, the names become @code{cparse}, -@code{clex}, and so on. - -@strong{All the other variables and macros associated with Bison are not -renamed.} These others are not global; there is no conflict if the same -name is used in different parsers. For example, @code{YYSTYPE} is not -renamed, but defining this in different ways in different parsers causes -no trouble (@pxref{Value Type, ,Data Types of Semantic Values}). - -The @samp{-p} option works by adding macro definitions to the -beginning of the parser implementation file, defining @code{yyparse} -as @code{@var{prefix}parse}, and so on. This effectively substitutes -one name for the other in the entire parser implementation file. - -@node Interface -@chapter Parser C-Language Interface -@cindex C-language interface -@cindex interface - -The Bison parser is actually a C function named @code{yyparse}. Here we -describe the interface conventions of @code{yyparse} and the other -functions that it needs to use. - -Keep in mind that the parser uses many C identifiers starting with -@samp{yy} and @samp{YY} for internal purposes. If you use such an -identifier (aside from those in this manual) in an action or in epilogue -in the grammar file, you are likely to run into trouble. - -@menu -* Parser Function:: How to call @code{yyparse} and what it returns. -* Push Parser Function:: How to call @code{yypush_parse} and what it returns. -* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. -* Parser Create Function:: How to call @code{yypstate_new} and what it returns. -* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. -* Lexical:: You must supply a function @code{yylex} - which reads tokens. -* Error Reporting:: You must supply a function @code{yyerror}. -* Action Features:: Special features for use in actions. -* Internationalization:: How to let the parser speak in the user's - native language. -@end menu - -@node Parser Function -@section The Parser Function @code{yyparse} -@findex yyparse - -You call the function @code{yyparse} to cause parsing to occur. This -function reads tokens, executes actions, and ultimately returns when it -encounters end-of-input or an unrecoverable syntax error. You can also -write an action which directs @code{yyparse} to return immediately -without reading further. - - -@deftypefun int yyparse (void) -The value returned by @code{yyparse} is 0 if parsing was successful (return -is due to end-of-input). - -The value is 1 if parsing failed because of invalid input, i.e., input -that contains a syntax error or that causes @code{YYABORT} to be -invoked. - -The value is 2 if parsing failed due to memory exhaustion. -@end deftypefun - -In an action, you can cause immediate return from @code{yyparse} by using -these macros: - -@defmac YYACCEPT -@findex YYACCEPT -Return immediately with value 0 (to report success). -@end defmac - -@defmac YYABORT -@findex YYABORT -Return immediately with value 1 (to report failure). -@end defmac - -If you use a reentrant parser, you can optionally pass additional -parameter information to it in a reentrant way. To do so, use the -declaration @code{%parse-param}: - -@deffn {Directive} %parse-param @{@var{argument-declaration}@} -@findex %parse-param -Declare that an argument declared by the braced-code -@var{argument-declaration} is an additional @code{yyparse} argument. -The @var{argument-declaration} is used when declaring -functions or prototypes. The last identifier in -@var{argument-declaration} must be the argument name. -@end deffn - -Here's an example. Write this in the parser: - -@example -%parse-param @{int *nastiness@} -%parse-param @{int *randomness@} -@end example - -@noindent -Then call the parser like this: - -@example -@{ - int nastiness, randomness; - @dots{} /* @r{Store proper data in @code{nastiness} and @code{randomness}.} */ - value = yyparse (&nastiness, &randomness); - @dots{} -@} -@end example - -@noindent -In the grammar actions, use expressions like this to refer to the data: - -@example -exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} -@end example - -@node Push Parser Function -@section The Push Parser Function @code{yypush_parse} -@findex yypush_parse - -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) - -You call the function @code{yypush_parse} to parse a single token. This -function is available if either the @code{%define api.push-pull push} or -@code{%define api.push-pull both} declaration is used. -@xref{Push Decl, ,A Push Parser}. - -@deftypefun int yypush_parse (yypstate *yyps) -The value returned by @code{yypush_parse} is the same as for yyparse with the -following exception. @code{yypush_parse} will return YYPUSH_MORE if more input -is required to finish parsing the grammar. -@end deftypefun - -@node Pull Parser Function -@section The Pull Parser Function @code{yypull_parse} -@findex yypull_parse - -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) - -You call the function @code{yypull_parse} to parse the rest of the input -stream. This function is available if the @code{%define api.push-pull both} -declaration is used. -@xref{Push Decl, ,A Push Parser}. - -@deftypefun int yypull_parse (yypstate *yyps) -The value returned by @code{yypull_parse} is the same as for @code{yyparse}. -@end deftypefun - -@node Parser Create Function -@section The Parser Create Function @code{yystate_new} -@findex yypstate_new - -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) - -You call the function @code{yypstate_new} to create a new parser instance. -This function is available if either the @code{%define api.push-pull push} or -@code{%define api.push-pull both} declaration is used. -@xref{Push Decl, ,A Push Parser}. - -@deftypefun {yypstate*} yypstate_new (void) -The function will return a valid parser instance if there was memory available -or 0 if no memory was available. -In impure mode, it will also return 0 if a parser instance is currently -allocated. -@end deftypefun - -@node Parser Delete Function -@section The Parser Delete Function @code{yystate_delete} -@findex yypstate_delete - -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) - -You call the function @code{yypstate_delete} to delete a parser instance. -function is available if either the @code{%define api.push-pull push} or -@code{%define api.push-pull both} declaration is used. -@xref{Push Decl, ,A Push Parser}. - -@deftypefun void yypstate_delete (yypstate *yyps) -This function will reclaim the memory associated with a parser instance. -After this call, you should no longer attempt to use the parser instance. -@end deftypefun - -@node Lexical -@section The Lexical Analyzer Function @code{yylex} -@findex yylex -@cindex lexical analyzer - -The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from -the input stream and returns them to the parser. Bison does not create -this function automatically; you must write it so that @code{yyparse} can -call it. The function is sometimes referred to as a lexical scanner. - -In simple programs, @code{yylex} is often defined at the end of the -Bison grammar file. If @code{yylex} is defined in a separate source -file, you need to arrange for the token-type macro definitions to be -available there. To do this, use the @samp{-d} option when you run -Bison, so that it will write these macro definitions into the separate -parser header file, @file{@var{name}.tab.h}, which you can include in -the other source files that need it. @xref{Invocation, ,Invoking -Bison}. - -@menu -* Calling Convention:: How @code{yyparse} calls @code{yylex}. -* Token Values:: How @code{yylex} must return the semantic value - of the token it has read. -* Token Locations:: How @code{yylex} must return the text location - (line number, etc.) of the token, if the - actions want that. -* Pure Calling:: How the calling convention differs in a pure parser - (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). -@end menu - -@node Calling Convention -@subsection Calling Convention for @code{yylex} - -The value that @code{yylex} returns must be the positive numeric code -for the type of token it has just found; a zero or negative value -signifies end-of-input. - -When a token is referred to in the grammar rules by a name, that name -in the parser implementation file becomes a C macro whose definition -is the proper numeric code for that token type. So @code{yylex} can -use the name to indicate that type. @xref{Symbols}. - -When a token is referred to in the grammar rules by a character literal, -the numeric code for that character is also the code for the token type. -So @code{yylex} can simply return that character code, possibly converted -to @code{unsigned char} to avoid sign-extension. The null character -must not be used this way, because its code is zero and that -signifies end-of-input. - -Here is an example showing these things: - -@example -int -yylex (void) -@{ - @dots{} - if (c == EOF) /* Detect end-of-input. */ - return 0; - @dots{} - if (c == '+' || c == '-') - return c; /* Assume token type for `+' is '+'. */ - @dots{} - return INT; /* Return the type of the token. */ - @dots{} -@} -@end example - -@noindent -This interface has been designed so that the output from the @code{lex} -utility can be used without change as the definition of @code{yylex}. - -If the grammar uses literal string tokens, there are two ways that -@code{yylex} can determine the token type codes for them: - -@itemize @bullet -@item -If the grammar defines symbolic token names as aliases for the -literal string tokens, @code{yylex} can use these symbolic names like -all others. In this case, the use of the literal string tokens in -the grammar file has no effect on @code{yylex}. - -@item -@code{yylex} can find the multicharacter token in the @code{yytname} -table. The index of the token in the table is the token type's code. -The name of a multicharacter token is recorded in @code{yytname} with a -double-quote, the token's characters, and another double-quote. The -token's characters are escaped as necessary to be suitable as input -to Bison. - -Here's code for looking up a multicharacter token in @code{yytname}, -assuming that the characters of the token are stored in -@code{token_buffer}, and assuming that the token does not contain any -characters like @samp{"} that require escaping. - -@example -for (i = 0; i < YYNTOKENS; i++) - @{ - if (yytname[i] != 0 - && yytname[i][0] == '"' - && ! strncmp (yytname[i] + 1, token_buffer, - strlen (token_buffer)) - && yytname[i][strlen (token_buffer) + 1] == '"' - && yytname[i][strlen (token_buffer) + 2] == 0) - break; - @} -@end example - -The @code{yytname} table is generated only if you use the -@code{%token-table} declaration. @xref{Decl Summary}. -@end itemize - -@node Token Values -@subsection Semantic Values of Tokens - -@vindex yylval -In an ordinary (nonreentrant) parser, the semantic value of the token must -be stored into the global variable @code{yylval}. When you are using -just one data type for semantic values, @code{yylval} has that type. -Thus, if the type is @code{int} (the default), you might write this in -@code{yylex}: - -@example -@group - @dots{} - yylval = value; /* Put value onto Bison stack. */ - return INT; /* Return the type of the token. */ - @dots{} -@end group -@end example - -When you are using multiple data types, @code{yylval}'s type is a union -made from the @code{%union} declaration (@pxref{Union Decl, ,The -Collection of Value Types}). So when you store a token's value, you -must use the proper member of the union. If the @code{%union} -declaration looks like this: - -@example -@group -%union @{ - int intval; - double val; - symrec *tptr; -@} -@end group -@end example - -@noindent -then the code in @code{yylex} might look like this: - -@example -@group - @dots{} - yylval.intval = value; /* Put value onto Bison stack. */ - return INT; /* Return the type of the token. */ - @dots{} -@end group -@end example - -@node Token Locations -@subsection Textual Locations of Tokens - -@vindex yylloc -If you are using the @samp{@@@var{n}}-feature (@pxref{Tracking Locations}) -in actions to keep track of the textual locations of tokens and groupings, -then you must provide this information in @code{yylex}. The function -@code{yyparse} expects to find the textual location of a token just parsed -in the global variable @code{yylloc}. So @code{yylex} must store the proper -data in that variable. - -By default, the value of @code{yylloc} is a structure and you need only -initialize the members that are going to be used by the actions. The -four members are called @code{first_line}, @code{first_column}, -@code{last_line} and @code{last_column}. Note that the use of this -feature makes the parser noticeably slower. - -@tindex YYLTYPE -The data type of @code{yylloc} has the name @code{YYLTYPE}. - -@node Pure Calling -@subsection Calling Conventions for Pure Parsers - -When you use the Bison declaration @code{%define api.pure} to request a -pure, reentrant parser, the global communication variables @code{yylval} -and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) -Parser}.) In such parsers the two global variables are replaced by -pointers passed as arguments to @code{yylex}. You must declare them as -shown here, and pass the information back by storing it through those -pointers. - -@example -int -yylex (YYSTYPE *lvalp, YYLTYPE *llocp) -@{ - @dots{} - *lvalp = value; /* Put value onto Bison stack. */ - return INT; /* Return the type of the token. */ - @dots{} -@} -@end example - -If the grammar file does not use the @samp{@@} constructs to refer to -textual locations, then the type @code{YYLTYPE} will not be defined. In -this case, omit the second argument; @code{yylex} will be called with -only one argument. - - -If you wish to pass the additional parameter data to @code{yylex}, use -@code{%lex-param} just like @code{%parse-param} (@pxref{Parser -Function}). - -@deffn {Directive} lex-param @{@var{argument-declaration}@} -@findex %lex-param -Declare that the braced-code @var{argument-declaration} is an -additional @code{yylex} argument declaration. -@end deffn - -For instance: - -@example -%parse-param @{int *nastiness@} -%lex-param @{int *nastiness@} -%parse-param @{int *randomness@} -@end example - -@noindent -results in the following signatures: - -@example -int yylex (int *nastiness); -int yyparse (int *nastiness, int *randomness); -@end example - -If @code{%define api.pure} is added: - -@example -int yylex (YYSTYPE *lvalp, int *nastiness); -int yyparse (int *nastiness, int *randomness); -@end example - -@noindent -and finally, if both @code{%define api.pure} and @code{%locations} are used: - -@example -int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); -int yyparse (int *nastiness, int *randomness); -@end example - -@node Error Reporting -@section The Error Reporting Function @code{yyerror} -@cindex error reporting function -@findex yyerror -@cindex parse error -@cindex syntax error - -The Bison parser detects a @dfn{syntax error} or @dfn{parse error} -whenever it reads a token which cannot satisfy any syntax rule. An -action in the grammar can also explicitly proclaim an error, using the -macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use -in Actions}). - -The Bison parser expects to report the error by calling an error -reporting function named @code{yyerror}, which you must supply. It is -called by @code{yyparse} whenever a syntax error is found, and it -receives one argument. For a syntax error, the string is normally -@w{@code{"syntax error"}}. - -@findex %error-verbose -If you invoke the directive @code{%error-verbose} in the Bison declarations -section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then -Bison provides a more verbose and specific error message string instead of -just plain @w{@code{"syntax error"}}. However, that message sometimes -contains incorrect information if LAC is not enabled (@pxref{LAC}). - -The parser can detect one other kind of error: memory exhaustion. This -can happen when the input contains constructions that are very deeply -nested. It isn't likely you will encounter this, since the Bison -parser normally extends its stack automatically up to a very large limit. But -if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual -fashion, except that the argument string is @w{@code{"memory exhausted"}}. - -In some cases diagnostics like @w{@code{"syntax error"}} are -translated automatically from English to some other language before -they are passed to @code{yyerror}. @xref{Internationalization}. - -The following definition suffices in simple programs: - -@example -@group -void -yyerror (char const *s) -@{ -@end group -@group - fprintf (stderr, "%s\n", s); -@} -@end group -@end example - -After @code{yyerror} returns to @code{yyparse}, the latter will attempt -error recovery if you have written suitable error recovery grammar rules -(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will -immediately return 1. - -Obviously, in location tracking pure parsers, @code{yyerror} should have -an access to the current location. -This is indeed the case for the GLR -parsers, but not for the Yacc parser, for historical reasons. I.e., if -@samp{%locations %define api.pure} is passed then the prototypes for -@code{yyerror} are: - -@example -void yyerror (char const *msg); /* Yacc parsers. */ -void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ -@end example - -If @samp{%parse-param @{int *nastiness@}} is used, then: - -@example -void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */ -void yyerror (int *nastiness, char const *msg); /* GLR parsers. */ -@end example - -Finally, GLR and Yacc parsers share the same @code{yyerror} calling -convention for absolutely pure parsers, i.e., when the calling -convention of @code{yylex} @emph{and} the calling convention of -@code{%define api.pure} are pure. -I.e.: - -@example -/* Location tracking. */ -%locations -/* Pure yylex. */ -%define api.pure -%lex-param @{int *nastiness@} -/* Pure yyparse. */ -%parse-param @{int *nastiness@} -%parse-param @{int *randomness@} -@end example - -@noindent -results in the following signatures for all the parser kinds: - -@example -int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); -int yyparse (int *nastiness, int *randomness); -void yyerror (YYLTYPE *locp, - int *nastiness, int *randomness, - char const *msg); -@end example - -@noindent -The prototypes are only indications of how the code produced by Bison -uses @code{yyerror}. Bison-generated code always ignores the returned -value, so @code{yyerror} can return any type, including @code{void}. -Also, @code{yyerror} can be a variadic function; that is why the -message is always passed last. - -Traditionally @code{yyerror} returns an @code{int} that is always -ignored, but this is purely for historical reasons, and @code{void} is -preferable since it more accurately describes the return type for -@code{yyerror}. - -@vindex yynerrs -The variable @code{yynerrs} contains the number of syntax errors -reported so far. Normally this variable is global; but if you -request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) -then it is a local variable which only the actions can access. - -@node Action Features -@section Special Features for Use in Actions -@cindex summary, action features -@cindex action features summary - -Here is a table of Bison constructs, variables and macros that -are useful in actions. - -@deffn {Variable} $$ -Acts like a variable that contains the semantic value for the -grouping made by the current rule. @xref{Actions}. -@end deffn - -@deffn {Variable} $@var{n} -Acts like a variable that contains the semantic value for the -@var{n}th component of the current rule. @xref{Actions}. -@end deffn - -@deffn {Variable} $<@var{typealt}>$ -Like @code{$$} but specifies alternative @var{typealt} in the union -specified by the @code{%union} declaration. @xref{Action Types, ,Data -Types of Values in Actions}. -@end deffn - -@deffn {Variable} $<@var{typealt}>@var{n} -Like @code{$@var{n}} but specifies alternative @var{typealt} in the -union specified by the @code{%union} declaration. -@xref{Action Types, ,Data Types of Values in Actions}. -@end deffn - -@deffn {Macro} YYABORT @code{;} -Return immediately from @code{yyparse}, indicating failure. -@xref{Parser Function, ,The Parser Function @code{yyparse}}. -@end deffn - -@deffn {Macro} YYACCEPT @code{;} -Return immediately from @code{yyparse}, indicating success. -@xref{Parser Function, ,The Parser Function @code{yyparse}}. -@end deffn - -@deffn {Macro} YYBACKUP (@var{token}, @var{value})@code{;} -@findex YYBACKUP -Unshift a token. This macro is allowed only for rules that reduce -a single value, and only when there is no lookahead token. -It is also disallowed in GLR parsers. -It installs a lookahead token with token type @var{token} and -semantic value @var{value}; then it discards the value that was -going to be reduced by this rule. - -If the macro is used when it is not valid, such as when there is -a lookahead token already, then it reports a syntax error with -a message @samp{cannot back up} and performs ordinary error -recovery. - -In either case, the rest of the action is not executed. -@end deffn - -@deffn {Macro} YYEMPTY -Value stored in @code{yychar} when there is no lookahead token. -@end deffn - -@deffn {Macro} YYEOF -Value stored in @code{yychar} when the lookahead is the end of the input -stream. -@end deffn - -@deffn {Macro} YYERROR @code{;} -Cause an immediate syntax error. This statement initiates error -recovery just as if the parser itself had detected an error; however, it -does not call @code{yyerror}, and does not print any message. If you -want to print an error message, call @code{yyerror} explicitly before -the @samp{YYERROR;} statement. @xref{Error Recovery}. -@end deffn - -@deffn {Macro} YYRECOVERING -@findex YYRECOVERING -The expression @code{YYRECOVERING ()} yields 1 when the parser -is recovering from a syntax error, and 0 otherwise. -@xref{Error Recovery}. -@end deffn - -@deffn {Variable} yychar -Variable containing either the lookahead token, or @code{YYEOF} when the -lookahead is the end of the input stream, or @code{YYEMPTY} when no lookahead -has been performed so the next token is not yet known. -Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic -Actions}). -@xref{Lookahead, ,Lookahead Tokens}. -@end deffn - -@deffn {Macro} yyclearin @code{;} -Discard the current lookahead token. This is useful primarily in -error rules. -Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR -Semantic Actions}). -@xref{Error Recovery}. -@end deffn - -@deffn {Macro} yyerrok @code{;} -Resume generating error messages immediately for subsequent syntax -errors. This is useful primarily in error rules. -@xref{Error Recovery}. -@end deffn - -@deffn {Variable} yylloc -Variable containing the lookahead token location when @code{yychar} is not set -to @code{YYEMPTY} or @code{YYEOF}. -Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic -Actions}). -@xref{Actions and Locations, ,Actions and Locations}. -@end deffn - -@deffn {Variable} yylval -Variable containing the lookahead token semantic value when @code{yychar} is -not set to @code{YYEMPTY} or @code{YYEOF}. -Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic -Actions}). -@xref{Actions, ,Actions}. -@end deffn - -@deffn {Value} @@$ -@findex @@$ -Acts like a structure variable containing information on the textual -location of the grouping made by the current rule. @xref{Tracking -Locations}. - -@c Check if those paragraphs are still useful or not. - -@c @example -@c struct @{ -@c int first_line, last_line; -@c int first_column, last_column; -@c @}; -@c @end example - -@c Thus, to get the starting line number of the third component, you would -@c use @samp{@@3.first_line}. - -@c In order for the members of this structure to contain valid information, -@c you must make @code{yylex} supply this information about each token. -@c If you need only certain members, then @code{yylex} need only fill in -@c those members. - -@c The use of this feature makes the parser noticeably slower. -@end deffn - -@deffn {Value} @@@var{n} -@findex @@@var{n} -Acts like a structure variable containing information on the textual -location of the @var{n}th component of the current rule. @xref{Tracking -Locations}. -@end deffn - -@node Internationalization -@section Parser Internationalization -@cindex internationalization -@cindex i18n -@cindex NLS -@cindex gettext -@cindex bison-po - -A Bison-generated parser can print diagnostics, including error and -tracing messages. By default, they appear in English. However, Bison -also supports outputting diagnostics in the user's native language. To -make this work, the user should set the usual environment variables. -@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}. -For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might -set the user's locale to French Canadian using the UTF-8 -encoding. The exact set of available locales depends on the user's -installation. - -The maintainer of a package that uses a Bison-generated parser enables -the internationalization of the parser's output through the following -steps. Here we assume a package that uses GNU Autoconf and -GNU Automake. - -@enumerate -@item -@cindex bison-i18n.m4 -Into the directory containing the GNU Autoconf macros used -by the package---often called @file{m4}---copy the -@file{bison-i18n.m4} file installed by Bison under -@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory. -For example: - -@example -cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4 -@end example - -@item -@findex BISON_I18N -@vindex BISON_LOCALEDIR -@vindex YYENABLE_NLS -In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT} -invocation, add an invocation of @code{BISON_I18N}. This macro is -defined in the file @file{bison-i18n.m4} that you copied earlier. It -causes @samp{configure} to find the value of the -@code{BISON_LOCALEDIR} variable, and it defines the source-language -symbol @code{YYENABLE_NLS} to enable translations in the -Bison-generated parser. - -@item -In the @code{main} function of your program, designate the directory -containing Bison's runtime message catalog, through a call to -@samp{bindtextdomain} with domain name @samp{bison-runtime}. -For example: - -@example -bindtextdomain ("bison-runtime", BISON_LOCALEDIR); -@end example - -Typically this appears after any other call @code{bindtextdomain -(PACKAGE, LOCALEDIR)} that your package already has. Here we rely on -@samp{BISON_LOCALEDIR} to be defined as a string through the -@file{Makefile}. - -@item -In the @file{Makefile.am} that controls the compilation of the @code{main} -function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro, -either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example: - -@example -DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' -@end example - -or: - -@example -AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' -@end example - -@item -Finally, invoke the command @command{autoreconf} to generate the build -infrastructure. -@end enumerate - - -@node Algorithm -@chapter The Bison Parser Algorithm -@cindex Bison parser algorithm -@cindex algorithm of parser -@cindex shifting -@cindex reduction -@cindex parser stack -@cindex stack, parser - -As Bison reads tokens, it pushes them onto a stack along with their -semantic values. The stack is called the @dfn{parser stack}. Pushing a -token is traditionally called @dfn{shifting}. - -For example, suppose the infix calculator has read @samp{1 + 5 *}, with a -@samp{3} to come. The stack will have four elements, one for each token -that was shifted. - -But the stack does not always have an element for each token read. When -the last @var{n} tokens and groupings shifted match the components of a -grammar rule, they can be combined according to that rule. This is called -@dfn{reduction}. Those tokens and groupings are replaced on the stack by a -single grouping whose symbol is the result (left hand side) of that rule. -Running the rule's action is part of the process of reduction, because this -is what computes the semantic value of the resulting grouping. - -For example, if the infix calculator's parser stack contains this: - -@example -1 + 5 * 3 -@end example - -@noindent -and the next input token is a newline character, then the last three -elements can be reduced to 15 via the rule: - -@example -expr: expr '*' expr; -@end example - -@noindent -Then the stack contains just these three elements: - -@example -1 + 15 -@end example - -@noindent -At this point, another reduction can be made, resulting in the single value -16. Then the newline token can be shifted. - -The parser tries, by shifts and reductions, to reduce the entire input down -to a single grouping whose symbol is the grammar's start-symbol -(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}). - -This kind of parser is known in the literature as a bottom-up parser. - -@menu -* Lookahead:: Parser looks one token ahead when deciding what to do. -* Shift/Reduce:: Conflicts: when either shifting or reduction is valid. -* Precedence:: Operator precedence works by resolving conflicts. -* Contextual Precedence:: When an operator's precedence depends on context. -* Parser States:: The parser is a finite-state-machine with stack. -* Reduce/Reduce:: When two rules are applicable in the same situation. -* Mysterious Conflicts:: Conflicts that look unjustified. -* Tuning LR:: How to tune fundamental aspects of LR-based parsing. -* Generalized LR Parsing:: Parsing arbitrary context-free grammars. -* Memory Management:: What happens when memory is exhausted. How to avoid it. -@end menu - -@node Lookahead -@section Lookahead Tokens -@cindex lookahead token - -The Bison parser does @emph{not} always reduce immediately as soon as the -last @var{n} tokens and groupings match a rule. This is because such a -simple strategy is inadequate to handle most languages. Instead, when a -reduction is possible, the parser sometimes ``looks ahead'' at the next -token in order to decide what to do. - -When a token is read, it is not immediately shifted; first it becomes the -@dfn{lookahead token}, which is not on the stack. Now the parser can -perform one or more reductions of tokens and groupings on the stack, while -the lookahead token remains off to the side. When no more reductions -should take place, the lookahead token is shifted onto the stack. This -does not mean that all possible reductions have been done; depending on the -token type of the lookahead token, some rules may choose to delay their -application. - -Here is a simple case where lookahead is needed. These three rules define -expressions which contain binary addition operators and postfix unary -factorial operators (@samp{!}), and allow parentheses for grouping. - -@example -@group -expr: - term '+' expr -| term -; -@end group - -@group -term: - '(' expr ')' -| term '!' -| NUMBER -; -@end group -@end example - -Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what -should be done? If the following token is @samp{)}, then the first three -tokens must be reduced to form an @code{expr}. This is the only valid -course, because shifting the @samp{)} would produce a sequence of symbols -@w{@code{term ')'}}, and no rule allows this. - -If the following token is @samp{!}, then it must be shifted immediately so -that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the -parser were to reduce before shifting, @w{@samp{1 + 2}} would become an -@code{expr}. It would then be impossible to shift the @samp{!} because -doing so would produce on the stack the sequence of symbols @code{expr -'!'}. No rule allows that sequence. - -@vindex yychar -@vindex yylval -@vindex yylloc -The lookahead token is stored in the variable @code{yychar}. -Its semantic value and location, if any, are stored in the variables -@code{yylval} and @code{yylloc}. -@xref{Action Features, ,Special Features for Use in Actions}. - -@node Shift/Reduce -@section Shift/Reduce Conflicts -@cindex conflicts -@cindex shift/reduce conflicts -@cindex dangling @code{else} -@cindex @code{else}, dangling - -Suppose we are parsing a language which has if-then and if-then-else -statements, with a pair of rules like this: - -@example -@group -if_stmt: - IF expr THEN stmt -| IF expr THEN stmt ELSE stmt -; -@end group -@end example - -@noindent -Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are -terminal symbols for specific keyword tokens. - -When the @code{ELSE} token is read and becomes the lookahead token, the -contents of the stack (assuming the input is valid) are just right for -reduction by the first rule. But it is also legitimate to shift the -@code{ELSE}, because that would lead to eventual reduction by the second -rule. - -This situation, where either a shift or a reduction would be valid, is -called a @dfn{shift/reduce conflict}. Bison is designed to resolve -these conflicts by choosing to shift, unless otherwise directed by -operator precedence declarations. To see the reason for this, let's -contrast it with the other alternative. - -Since the parser prefers to shift the @code{ELSE}, the result is to attach -the else-clause to the innermost if-statement, making these two inputs -equivalent: - -@example -if x then if y then win (); else lose; - -if x then do; if y then win (); else lose; end; -@end example - -But if the parser chose to reduce when possible rather than shift, the -result would be to attach the else-clause to the outermost if-statement, -making these two inputs equivalent: - -@example -if x then if y then win (); else lose; - -if x then do; if y then win (); end; else lose; -@end example - -The conflict exists because the grammar as written is ambiguous: either -parsing of the simple nested if-statement is legitimate. The established -convention is that these ambiguities are resolved by attaching the -else-clause to the innermost if-statement; this is what Bison accomplishes -by choosing to shift rather than reduce. (It would ideally be cleaner to -write an unambiguous grammar, but that is very hard to do in this case.) -This particular ambiguity was first encountered in the specifications of -Algol 60 and is called the ``dangling @code{else}'' ambiguity. - -To avoid warnings from Bison about predictable, legitimate shift/reduce -conflicts, use the @code{%expect @var{n}} declaration. -There will be no warning as long as the number of shift/reduce conflicts -is exactly @var{n}, and Bison will report an error if there is a -different number. -@xref{Expect Decl, ,Suppressing Conflict Warnings}. - -The definition of @code{if_stmt} above is solely to blame for the -conflict, but the conflict does not actually appear without additional -rules. Here is a complete Bison grammar file that actually manifests -the conflict: - -@example -@group -%token IF THEN ELSE variable -%% -@end group -@group -stmt: - expr -| if_stmt -; -@end group - -@group -if_stmt: - IF expr THEN stmt -| IF expr THEN stmt ELSE stmt -; -@end group - -expr: - variable -; -@end example - -@node Precedence -@section Operator Precedence -@cindex operator precedence -@cindex precedence of operators - -Another situation where shift/reduce conflicts appear is in arithmetic -expressions. Here shifting is not always the preferred resolution; the -Bison declarations for operator precedence allow you to specify when to -shift and when to reduce. - -@menu -* Why Precedence:: An example showing why precedence is needed. -* Using Precedence:: How to specify precedence in Bison grammars. -* Precedence Examples:: How these features are used in the previous example. -* How Precedence:: How they work. -@end menu - -@node Why Precedence -@subsection When Precedence is Needed - -Consider the following ambiguous grammar fragment (ambiguous because the -input @w{@samp{1 - 2 * 3}} can be parsed in two different ways): - -@example -@group -expr: - expr '-' expr -| expr '*' expr -| expr '<' expr -| '(' expr ')' -@dots{} -; -@end group -@end example - -@noindent -Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2}; -should it reduce them via the rule for the subtraction operator? It -depends on the next token. Of course, if the next token is @samp{)}, we -must reduce; shifting is invalid because no single rule can reduce the -token sequence @w{@samp{- 2 )}} or anything starting with that. But if -the next token is @samp{*} or @samp{<}, we have a choice: either -shifting or reduction would allow the parse to complete, but with -different results. - -To decide which one Bison should do, we must consider the results. If -the next operator token @var{op} is shifted, then it must be reduced -first in order to permit another opportunity to reduce the difference. -The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other -hand, if the subtraction is reduced before shifting @var{op}, the result -is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or -reduce should depend on the relative precedence of the operators -@samp{-} and @var{op}: @samp{*} should be shifted first, but not -@samp{<}. - -@cindex associativity -What about input such as @w{@samp{1 - 2 - 5}}; should this be -@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most -operators we prefer the former, which is called @dfn{left association}. -The latter alternative, @dfn{right association}, is desirable for -assignment operators. The choice of left or right association is a -matter of whether the parser chooses to shift or reduce when the stack -contains @w{@samp{1 - 2}} and the lookahead token is @samp{-}: shifting -makes right-associativity. - -@node Using Precedence -@subsection Specifying Operator Precedence -@findex %left -@findex %right -@findex %nonassoc - -Bison allows you to specify these choices with the operator precedence -declarations @code{%left} and @code{%right}. Each such declaration -contains a list of tokens, which are operators whose precedence and -associativity is being declared. The @code{%left} declaration makes all -those operators left-associative and the @code{%right} declaration makes -them right-associative. A third alternative is @code{%nonassoc}, which -declares that it is a syntax error to find the same operator twice ``in a -row''. - -The relative precedence of different operators is controlled by the -order in which they are declared. The first @code{%left} or -@code{%right} declaration in the file declares the operators whose -precedence is lowest, the next such declaration declares the operators -whose precedence is a little higher, and so on. - -@node Precedence Examples -@subsection Precedence Examples - -In our example, we would want the following declarations: - -@example -%left '<' -%left '-' -%left '*' -@end example - -In a more complete example, which supports other operators as well, we -would declare them in groups of equal precedence. For example, @code{'+'} is -declared with @code{'-'}: - -@example -%left '<' '>' '=' NE LE GE -%left '+' '-' -%left '*' '/' -@end example - -@noindent -(Here @code{NE} and so on stand for the operators for ``not equal'' -and so on. We assume that these tokens are more than one character long -and therefore are represented by names, not character literals.) - -@node How Precedence -@subsection How Precedence Works - -The first effect of the precedence declarations is to assign precedence -levels to the terminal symbols declared. The second effect is to assign -precedence levels to certain rules: each rule gets its precedence from -the last terminal symbol mentioned in the components. (You can also -specify explicitly the precedence of a rule. @xref{Contextual -Precedence, ,Context-Dependent Precedence}.) - -Finally, the resolution of conflicts works by comparing the precedence -of the rule being considered with that of the lookahead token. If the -token's precedence is higher, the choice is to shift. If the rule's -precedence is higher, the choice is to reduce. If they have equal -precedence, the choice is made based on the associativity of that -precedence level. The verbose output file made by @samp{-v} -(@pxref{Invocation, ,Invoking Bison}) says how each conflict was -resolved. - -Not all rules and not all tokens have precedence. If either the rule or -the lookahead token has no precedence, then the default is to shift. - -@node Contextual Precedence -@section Context-Dependent Precedence -@cindex context-dependent precedence -@cindex unary operator precedence -@cindex precedence, context-dependent -@cindex precedence, unary operator -@findex %prec - -Often the precedence of an operator depends on the context. This sounds -outlandish at first, but it is really very common. For example, a minus -sign typically has a very high precedence as a unary operator, and a -somewhat lower precedence (lower than multiplication) as a binary operator. - -The Bison precedence declarations, @code{%left}, @code{%right} and -@code{%nonassoc}, can only be used once for a given token; so a token has -only one precedence declared in this way. For context-dependent -precedence, you need to use an additional mechanism: the @code{%prec} -modifier for rules. - -The @code{%prec} modifier declares the precedence of a particular rule by -specifying a terminal symbol whose precedence should be used for that rule. -It's not necessary for that symbol to appear otherwise in the rule. The -modifier's syntax is: - -@example -%prec @var{terminal-symbol} -@end example - -@noindent -and it is written after the components of the rule. Its effect is to -assign the rule the precedence of @var{terminal-symbol}, overriding -the precedence that would be deduced for it in the ordinary way. The -altered rule precedence then affects how conflicts involving that rule -are resolved (@pxref{Precedence, ,Operator Precedence}). - -Here is how @code{%prec} solves the problem of unary minus. First, declare -a precedence for a fictitious terminal symbol named @code{UMINUS}. There -are no tokens of this type, but the symbol serves to stand for its -precedence: - -@example -@dots{} -%left '+' '-' -%left '*' -%left UMINUS -@end example - -Now the precedence of @code{UMINUS} can be used in specific rules: - -@example -@group -exp: - @dots{} -| exp '-' exp - @dots{} -| '-' exp %prec UMINUS -@end group -@end example - -@ifset defaultprec -If you forget to append @code{%prec UMINUS} to the rule for unary -minus, Bison silently assumes that minus has its usual precedence. -This kind of problem can be tricky to debug, since one typically -discovers the mistake only by testing the code. - -The @code{%no-default-prec;} declaration makes it easier to discover -this kind of problem systematically. It causes rules that lack a -@code{%prec} modifier to have no precedence, even if the last terminal -symbol mentioned in their components has a declared precedence. - -If @code{%no-default-prec;} is in effect, you must specify @code{%prec} -for all rules that participate in precedence conflict resolution. -Then you will see any shift/reduce conflict until you tell Bison how -to resolve it, either by changing your grammar or by adding an -explicit precedence. This will probably add declarations to the -grammar, but it helps to protect against incorrect rule precedences. - -The effect of @code{%no-default-prec;} can be reversed by giving -@code{%default-prec;}, which is the default. -@end ifset - -@node Parser States -@section Parser States -@cindex finite-state machine -@cindex parser state -@cindex state (of parser) - -The function @code{yyparse} is implemented using a finite-state machine. -The values pushed on the parser stack are not simply token type codes; they -represent the entire sequence of terminal and nonterminal symbols at or -near the top of the stack. The current state collects all the information -about previous input which is relevant to deciding what to do next. - -Each time a lookahead token is read, the current parser state together -with the type of lookahead token are looked up in a table. This table -entry can say, ``Shift the lookahead token.'' In this case, it also -specifies the new parser state, which is pushed onto the top of the -parser stack. Or it can say, ``Reduce using rule number @var{n}.'' -This means that a certain number of tokens or groupings are taken off -the top of the stack, and replaced by one grouping. In other words, -that number of states are popped from the stack, and one new state is -pushed. - -There is one other alternative: the table can say that the lookahead token -is erroneous in the current state. This causes error processing to begin -(@pxref{Error Recovery}). - -@node Reduce/Reduce -@section Reduce/Reduce Conflicts -@cindex reduce/reduce conflict -@cindex conflicts, reduce/reduce - -A reduce/reduce conflict occurs if there are two or more rules that apply -to the same sequence of input. This usually indicates a serious error -in the grammar. - -For example, here is an erroneous attempt to define a sequence -of zero or more @code{word} groupings. - -@example -@group -sequence: - /* empty */ @{ printf ("empty sequence\n"); @} -| maybeword -| sequence word @{ printf ("added word %s\n", $2); @} -; -@end group - -@group -maybeword: - /* empty */ @{ printf ("empty maybeword\n"); @} -| word @{ printf ("single word %s\n", $1); @} -; -@end group -@end example - -@noindent -The error is an ambiguity: there is more than one way to parse a single -@code{word} into a @code{sequence}. It could be reduced to a -@code{maybeword} and then into a @code{sequence} via the second rule. -Alternatively, nothing-at-all could be reduced into a @code{sequence} -via the first rule, and this could be combined with the @code{word} -using the third rule for @code{sequence}. - -There is also more than one way to reduce nothing-at-all into a -@code{sequence}. This can be done directly via the first rule, -or indirectly via @code{maybeword} and then the second rule. - -You might think that this is a distinction without a difference, because it -does not change whether any particular input is valid or not. But it does -affect which actions are run. One parsing order runs the second rule's -action; the other runs the first rule's action and the third rule's action. -In this example, the output of the program changes. - -Bison resolves a reduce/reduce conflict by choosing to use the rule that -appears first in the grammar, but it is very risky to rely on this. Every -reduce/reduce conflict must be studied and usually eliminated. Here is the -proper way to define @code{sequence}: - -@example -sequence: - /* empty */ @{ printf ("empty sequence\n"); @} -| sequence word @{ printf ("added word %s\n", $2); @} -; -@end example - -Here is another common error that yields a reduce/reduce conflict: - -@example -sequence: - /* empty */ -| sequence words -| sequence redirects -; - -words: - /* empty */ -| words word -; - -redirects: - /* empty */ -| redirects redirect -; -@end example - -@noindent -The intention here is to define a sequence which can contain either -@code{word} or @code{redirect} groupings. The individual definitions of -@code{sequence}, @code{words} and @code{redirects} are error-free, but the -three together make a subtle ambiguity: even an empty input can be parsed -in infinitely many ways! - -Consider: nothing-at-all could be a @code{words}. Or it could be two -@code{words} in a row, or three, or any number. It could equally well be a -@code{redirects}, or two, or any number. Or it could be a @code{words} -followed by three @code{redirects} and another @code{words}. And so on. - -Here are two ways to correct these rules. First, to make it a single level -of sequence: - -@example -sequence: - /* empty */ -| sequence word -| sequence redirect -; -@end example - -Second, to prevent either a @code{words} or a @code{redirects} -from being empty: - -@example -@group -sequence: - /* empty */ -| sequence words -| sequence redirects -; -@end group - -@group -words: - word -| words word -; -@end group - -@group -redirects: - redirect -| redirects redirect -; -@end group -@end example - -@node Mysterious Conflicts -@section Mysterious Conflicts -@cindex Mysterious Conflicts - -Sometimes reduce/reduce conflicts can occur that don't look warranted. -Here is an example: - -@example -@group -%token ID - -%% -def: param_spec return_spec ','; -param_spec: - type -| name_list ':' type -; -@end group -@group -return_spec: - type -| name ':' type -; -@end group -@group -type: ID; -@end group -@group -name: ID; -name_list: - name -| name ',' name_list -; -@end group -@end example - -It would seem that this grammar can be parsed with only a single token -of lookahead: when a @code{param_spec} is being read, an @code{ID} is -a @code{name} if a comma or colon follows, or a @code{type} if another -@code{ID} follows. In other words, this grammar is LR(1). - -@cindex LR -@cindex LALR -However, for historical reasons, Bison cannot by default handle all -LR(1) grammars. -In this grammar, two contexts, that after an @code{ID} at the beginning -of a @code{param_spec} and likewise at the beginning of a -@code{return_spec}, are similar enough that Bison assumes they are the -same. -They appear similar because the same set of rules would be -active---the rule for reducing to a @code{name} and that for reducing to -a @code{type}. Bison is unable to determine at that stage of processing -that the rules would require different lookahead tokens in the two -contexts, so it makes a single parser state for them both. Combining -the two contexts causes a conflict later. In parser terminology, this -occurrence means that the grammar is not LALR(1). - -@cindex IELR -@cindex canonical LR -For many practical grammars (specifically those that fall into the non-LR(1) -class), the limitations of LALR(1) result in difficulties beyond just -mysterious reduce/reduce conflicts. The best way to fix all these problems -is to select a different parser table construction algorithm. Either -IELR(1) or canonical LR(1) would suffice, but the former is more efficient -and easier to debug during development. @xref{LR Table Construction}, for -details. (Bison's IELR(1) and canonical LR(1) implementations are -experimental. More user feedback will help to stabilize them.) - -If you instead wish to work around LALR(1)'s limitations, you -can often fix a mysterious conflict by identifying the two parser states -that are being confused, and adding something to make them look -distinct. In the above example, adding one rule to -@code{return_spec} as follows makes the problem go away: - -@example -@group -%token BOGUS -@dots{} -%% -@dots{} -return_spec: - type -| name ':' type -| ID BOGUS /* This rule is never used. */ -; -@end group -@end example - -This corrects the problem because it introduces the possibility of an -additional active rule in the context after the @code{ID} at the beginning of -@code{return_spec}. This rule is not active in the corresponding context -in a @code{param_spec}, so the two contexts receive distinct parser states. -As long as the token @code{BOGUS} is never generated by @code{yylex}, -the added rule cannot alter the way actual input is parsed. - -In this particular example, there is another way to solve the problem: -rewrite the rule for @code{return_spec} to use @code{ID} directly -instead of via @code{name}. This also causes the two confusing -contexts to have different sets of active rules, because the one for -@code{return_spec} activates the altered rule for @code{return_spec} -rather than the one for @code{name}. - -@example -param_spec: - type -| name_list ':' type -; -return_spec: - type -| ID ':' type -; -@end example - -For a more detailed exposition of LALR(1) parsers and parser -generators, @pxref{Bibliography,,DeRemer 1982}. - -@node Tuning LR -@section Tuning LR - -The default behavior of Bison's LR-based parsers is chosen mostly for -historical reasons, but that behavior is often not robust. For example, in -the previous section, we discussed the mysterious conflicts that can be -produced by LALR(1), Bison's default parser table construction algorithm. -Another example is Bison's @code{%error-verbose} directive, which instructs -the generated parser to produce verbose syntax error messages, which can -sometimes contain incorrect information. - -In this section, we explore several modern features of Bison that allow you -to tune fundamental aspects of the generated LR-based parsers. Some of -these features easily eliminate shortcomings like those mentioned above. -Others can be helpful purely for understanding your parser. - -Most of the features discussed in this section are still experimental. More -user feedback will help to stabilize them. - -@menu -* LR Table Construction:: Choose a different construction algorithm. -* Default Reductions:: Disable default reductions. -* LAC:: Correct lookahead sets in the parser states. -* Unreachable States:: Keep unreachable parser states for debugging. -@end menu - -@node LR Table Construction -@subsection LR Table Construction -@cindex Mysterious Conflict -@cindex LALR -@cindex IELR -@cindex canonical LR -@findex %define lr.type - -For historical reasons, Bison constructs LALR(1) parser tables by default. -However, LALR does not possess the full language-recognition power of LR. -As a result, the behavior of parsers employing LALR parser tables is often -mysterious. We presented a simple example of this effect in @ref{Mysterious -Conflicts}. - -As we also demonstrated in that example, the traditional approach to -eliminating such mysterious behavior is to restructure the grammar. -Unfortunately, doing so correctly is often difficult. Moreover, merely -discovering that LALR causes mysterious behavior in your parser can be -difficult as well. - -Fortunately, Bison provides an easy way to eliminate the possibility of such -mysterious behavior altogether. You simply need to activate a more powerful -parser table construction algorithm by using the @code{%define lr.type} -directive. - -@deffn {Directive} {%define lr.type @var{TYPE}} -Specify the type of parser tables within the LR(1) family. The accepted -values for @var{TYPE} are: - -@itemize -@item @code{lalr} (default) -@item @code{ielr} -@item @code{canonical-lr} -@end itemize - -(This feature is experimental. More user feedback will help to stabilize -it.) -@end deffn - -For example, to activate IELR, you might add the following directive to you -grammar file: - -@example -%define lr.type ielr -@end example - -@noindent For the example in @ref{Mysterious Conflicts}, the mysterious -conflict is then eliminated, so there is no need to invest time in -comprehending the conflict or restructuring the grammar to fix it. If, -during future development, the grammar evolves such that all mysterious -behavior would have disappeared using just LALR, you need not fear that -continuing to use IELR will result in unnecessarily large parser tables. -That is, IELR generates LALR tables when LALR (using a deterministic parsing -algorithm) is sufficient to support the full language-recognition power of -LR. Thus, by enabling IELR at the start of grammar development, you can -safely and completely eliminate the need to consider LALR's shortcomings. - -While IELR is almost always preferable, there are circumstances where LALR -or the canonical LR parser tables described by Knuth -(@pxref{Bibliography,,Knuth 1965}) can be useful. Here we summarize the -relative advantages of each parser table construction algorithm within -Bison: - -@itemize -@item LALR - -There are at least two scenarios where LALR can be worthwhile: - -@itemize -@item GLR without static conflict resolution. - -@cindex GLR with LALR -When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any -conflicts statically (for example, with @code{%left} or @code{%prec}), then -the parser explores all potential parses of any given input. In this case, -the choice of parser table construction algorithm is guaranteed not to alter -the language accepted by the parser. LALR parser tables are the smallest -parser tables Bison can currently construct, so they may then be preferable. -Nevertheless, once you begin to resolve conflicts statically, GLR behaves -more like a deterministic parser in the syntactic contexts where those -conflicts appear, and so either IELR or canonical LR can then be helpful to -avoid LALR's mysterious behavior. - -@item Malformed grammars. - -Occasionally during development, an especially malformed grammar with a -major recurring flaw may severely impede the IELR or canonical LR parser -table construction algorithm. LALR can be a quick way to construct parser -tables in order to investigate such problems while ignoring the more subtle -differences from IELR and canonical LR. -@end itemize - -@item IELR - -IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given -any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables -always accept exactly the same set of sentences. However, like LALR, IELR -merges parser states during parser table construction so that the number of -parser states is often an order of magnitude less than for canonical LR. -More importantly, because canonical LR's extra parser states may contain -duplicate conflicts in the case of non-LR grammars, the number of conflicts -for IELR is often an order of magnitude less as well. This effect can -significantly reduce the complexity of developing a grammar. - -@item Canonical LR - -@cindex delayed syntax error detection -@cindex LAC -@findex %nonassoc -While inefficient, canonical LR parser tables can be an interesting means to -explore a grammar because they possess a property that IELR and LALR tables -do not. That is, if @code{%nonassoc} is not used and default reductions are -left disabled (@pxref{Default Reductions}), then, for every left context of -every canonical LR state, the set of tokens accepted by that state is -guaranteed to be the exact set of tokens that is syntactically acceptable in -that left context. It might then seem that an advantage of canonical LR -parsers in production is that, under the above constraints, they are -guaranteed to detect a syntax error as soon as possible without performing -any unnecessary reductions. However, IELR parsers that use LAC are also -able to achieve this behavior without sacrificing @code{%nonassoc} or -default reductions. For details and a few caveats of LAC, @pxref{LAC}. -@end itemize - -For a more detailed exposition of the mysterious behavior in LALR parsers -and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and -@ref{Bibliography,,Denny 2010 November}. - -@node Default Reductions -@subsection Default Reductions -@cindex default reductions -@findex %define lr.default-reductions -@findex %nonassoc - -After parser table construction, Bison identifies the reduction with the -largest lookahead set in each parser state. To reduce the size of the -parser state, traditional Bison behavior is to remove that lookahead set and -to assign that reduction to be the default parser action. Such a reduction -is known as a @dfn{default reduction}. - -Default reductions affect more than the size of the parser tables. They -also affect the behavior of the parser: - -@itemize -@item Delayed @code{yylex} invocations. - -@cindex delayed yylex invocations -@cindex consistent states -@cindex defaulted states -A @dfn{consistent state} is a state that has only one possible parser -action. If that action is a reduction and is encoded as a default -reduction, then that consistent state is called a @dfn{defaulted state}. -Upon reaching a defaulted state, a Bison-generated parser does not bother to -invoke @code{yylex} to fetch the next token before performing the reduction. -In other words, whether default reductions are enabled in consistent states -determines how soon a Bison-generated parser invokes @code{yylex} for a -token: immediately when it @emph{reaches} that token in the input or when it -eventually @emph{needs} that token as a lookahead to determine the next -parser action. Traditionally, default reductions are enabled, and so the -parser exhibits the latter behavior. - -The presence of defaulted states is an important consideration when -designing @code{yylex} and the grammar file. That is, if the behavior of -@code{yylex} can influence or be influenced by the semantic actions -associated with the reductions in defaulted states, then the delay of the -next @code{yylex} invocation until after those reductions is significant. -For example, the semantic actions might pop a scope stack that @code{yylex} -uses to determine what token to return. Thus, the delay might be necessary -to ensure that @code{yylex} does not look up the next token in a scope that -should already be considered closed. - -@item Delayed syntax error detection. - -@cindex delayed syntax error detection -When the parser fetches a new token by invoking @code{yylex}, it checks -whether there is an action for that token in the current parser state. The -parser detects a syntax error if and only if either (1) there is no action -for that token or (2) the action for that token is the error action (due to -the use of @code{%nonassoc}). However, if there is a default reduction in -that state (which might or might not be a defaulted state), then it is -impossible for condition 1 to exist. That is, all tokens have an action. -Thus, the parser sometimes fails to detect the syntax error until it reaches -a later state. - -@cindex LAC -@c If there's an infinite loop, default reductions can prevent an incorrect -@c sentence from being rejected. -While default reductions never cause the parser to accept syntactically -incorrect sentences, the delay of syntax error detection can have unexpected -effects on the behavior of the parser. However, the delay can be caused -anyway by parser state merging and the use of @code{%nonassoc}, and it can -be fixed by another Bison feature, LAC. We discuss the effects of delayed -syntax error detection and LAC more in the next section (@pxref{LAC}). -@end itemize - -For canonical LR, the only default reduction that Bison enables by default -is the accept action, which appears only in the accepting state, which has -no other action and is thus a defaulted state. However, the default accept -action does not delay any @code{yylex} invocation or syntax error detection -because the accept action ends the parse. - -For LALR and IELR, Bison enables default reductions in nearly all states by -default. There are only two exceptions. First, states that have a shift -action on the @code{error} token do not have default reductions because -delayed syntax error detection could then prevent the @code{error} token -from ever being shifted in that state. However, parser state merging can -cause the same effect anyway, and LAC fixes it in both cases, so future -versions of Bison might drop this exception when LAC is activated. Second, -GLR parsers do not record the default reduction as the action on a lookahead -token for which there is a conflict. The correct action in this case is to -split the parse instead. - -To adjust which states have default reductions enabled, use the -@code{%define lr.default-reductions} directive. - -@deffn {Directive} {%define lr.default-reductions @var{WHERE}} -Specify the kind of states that are permitted to contain default reductions. -The accepted values of @var{WHERE} are: -@itemize -@item @code{most} (default for LALR and IELR) -@item @code{consistent} -@item @code{accepting} (default for canonical LR) -@end itemize - -(The ability to specify where default reductions are permitted is -experimental. More user feedback will help to stabilize it.) -@end deffn - -@node LAC -@subsection LAC -@findex %define parse.lac -@cindex LAC -@cindex lookahead correction - -Canonical LR, IELR, and LALR can suffer from a couple of problems upon -encountering a syntax error. First, the parser might perform additional -parser stack reductions before discovering the syntax error. Such -reductions can perform user semantic actions that are unexpected because -they are based on an invalid token, and they cause error recovery to begin -in a different syntactic context than the one in which the invalid token was -encountered. Second, when verbose error messages are enabled (@pxref{Error -Reporting}), the expected token list in the syntax error message can both -contain invalid tokens and omit valid tokens. - -The culprits for the above problems are @code{%nonassoc}, default reductions -in inconsistent states (@pxref{Default Reductions}), and parser state -merging. Because IELR and LALR merge parser states, they suffer the most. -Canonical LR can suffer only if @code{%nonassoc} is used or if default -reductions are enabled for inconsistent states. - -LAC (Lookahead Correction) is a new mechanism within the parsing algorithm -that solves these problems for canonical LR, IELR, and LALR without -sacrificing @code{%nonassoc}, default reductions, or state merging. You can -enable LAC with the @code{%define parse.lac} directive. - -@deffn {Directive} {%define parse.lac @var{VALUE}} -Enable LAC to improve syntax error handling. -@itemize -@item @code{none} (default) -@item @code{full} -@end itemize -(This feature is experimental. More user feedback will help to stabilize -it. Moreover, it is currently only available for deterministic parsers in -C.) -@end deffn - -Conceptually, the LAC mechanism is straight-forward. Whenever the parser -fetches a new token from the scanner so that it can determine the next -parser action, it immediately suspends normal parsing and performs an -exploratory parse using a temporary copy of the normal parser state stack. -During this exploratory parse, the parser does not perform user semantic -actions. If the exploratory parse reaches a shift action, normal parsing -then resumes on the normal parser stacks. If the exploratory parse reaches -an error instead, the parser reports a syntax error. If verbose syntax -error messages are enabled, the parser must then discover the list of -expected tokens, so it performs a separate exploratory parse for each token -in the grammar. - -There is one subtlety about the use of LAC. That is, when in a consistent -parser state with a default reduction, the parser will not attempt to fetch -a token from the scanner because no lookahead is needed to determine the -next parser action. Thus, whether default reductions are enabled in -consistent states (@pxref{Default Reductions}) affects how soon the parser -detects a syntax error: immediately when it @emph{reaches} an erroneous -token or when it eventually @emph{needs} that token as a lookahead to -determine the next parser action. The latter behavior is probably more -intuitive, so Bison currently provides no way to achieve the former behavior -while default reductions are enabled in consistent states. - -Thus, when LAC is in use, for some fixed decision of whether to enable -default reductions in consistent states, canonical LR and IELR behave almost -exactly the same for both syntactically acceptable and syntactically -unacceptable input. While LALR still does not support the full -language-recognition power of canonical LR and IELR, LAC at least enables -LALR's syntax error handling to correctly reflect LALR's -language-recognition power. - -There are a few caveats to consider when using LAC: - -@itemize -@item Infinite parsing loops. - -IELR plus LAC does have one shortcoming relative to canonical LR. Some -parsers generated by Bison can loop infinitely. LAC does not fix infinite -parsing loops that occur between encountering a syntax error and detecting -it, but enabling canonical LR or disabling default reductions sometimes -does. - -@item Verbose error message limitations. - -Because of internationalization considerations, Bison-generated parsers -limit the size of the expected token list they are willing to report in a -verbose syntax error message. If the number of expected tokens exceeds that -limit, the list is simply dropped from the message. Enabling LAC can -increase the size of the list and thus cause the parser to drop it. Of -course, dropping the list is better than reporting an incorrect list. - -@item Performance. - -Because LAC requires many parse actions to be performed twice, it can have a -performance penalty. However, not all parse actions must be performed -twice. Specifically, during a series of default reductions in consistent -states and shift actions, the parser never has to initiate an exploratory -parse. Moreover, the most time-consuming tasks in a parse are often the -file I/O, the lexical analysis performed by the scanner, and the user's -semantic actions, but none of these are performed during the exploratory -parse. Finally, the base of the temporary stack used during an exploratory -parse is a pointer into the normal parser state stack so that the stack is -never physically copied. In our experience, the performance penalty of LAC -has proved insignificant for practical grammars. -@end itemize - -While the LAC algorithm shares techniques that have been recognized in the -parser community for years, for the publication that introduces LAC, -@pxref{Bibliography,,Denny 2010 May}. - -@node Unreachable States -@subsection Unreachable States -@findex %define lr.keep-unreachable-states -@cindex unreachable states - -If there exists no sequence of transitions from the parser's start state to -some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable -state}. A state can become unreachable during conflict resolution if Bison -disables a shift action leading to it from a predecessor state. - -By default, Bison removes unreachable states from the parser after conflict -resolution because they are useless in the generated parser. However, -keeping unreachable states is sometimes useful when trying to understand the -relationship between the parser and the grammar. - -@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}} -Request that Bison allow unreachable states to remain in the parser tables. -@var{VALUE} must be a Boolean. The default is @code{false}. -@end deffn - -There are a few caveats to consider: - -@itemize @bullet -@item Missing or extraneous warnings. - -Unreachable states may contain conflicts and may use rules not used in any -other state. Thus, keeping unreachable states may induce warnings that are -irrelevant to your parser's behavior, and it may eliminate warnings that are -relevant. Of course, the change in warnings may actually be relevant to a -parser table analysis that wants to keep unreachable states, so this -behavior will likely remain in future Bison releases. - -@item Other useless states. - -While Bison is able to remove unreachable states, it is not guaranteed to -remove other kinds of useless states. Specifically, when Bison disables -reduce actions during conflict resolution, some goto actions may become -useless, and thus some additional states may become useless. If Bison were -to compute which goto actions were useless and then disable those actions, -it could identify such states as unreachable and then remove those states. -However, Bison does not compute which goto actions are useless. -@end itemize - -@node Generalized LR Parsing -@section Generalized LR (GLR) Parsing -@cindex GLR parsing -@cindex generalized LR (GLR) parsing -@cindex ambiguous grammars -@cindex nondeterministic parsing - -Bison produces @emph{deterministic} parsers that choose uniquely -when to reduce and which reduction to apply -based on a summary of the preceding input and on one extra token of lookahead. -As a result, normal Bison handles a proper subset of the family of -context-free languages. -Ambiguous grammars, since they have strings with more than one possible -sequence of reductions cannot have deterministic parsers in this sense. -The same is true of languages that require more than one symbol of -lookahead, since the parser lacks the information necessary to make a -decision at the point it must be made in a shift-reduce parser. -Finally, as previously mentioned (@pxref{Mysterious Conflicts}), -there are languages where Bison's default choice of how to -summarize the input seen so far loses necessary information. - -When you use the @samp{%glr-parser} declaration in your grammar file, -Bison generates a parser that uses a different algorithm, called -Generalized LR (or GLR). A Bison GLR -parser uses the same basic -algorithm for parsing as an ordinary Bison parser, but behaves -differently in cases where there is a shift-reduce conflict that has not -been resolved by precedence rules (@pxref{Precedence}) or a -reduce-reduce conflict. When a GLR parser encounters such a -situation, it -effectively @emph{splits} into a several parsers, one for each possible -shift or reduction. These parsers then proceed as usual, consuming -tokens in lock-step. Some of the stacks may encounter other conflicts -and split further, with the result that instead of a sequence of states, -a Bison GLR parsing stack is what is in effect a tree of states. - -In effect, each stack represents a guess as to what the proper parse -is. Additional input may indicate that a guess was wrong, in which case -the appropriate stack silently disappears. Otherwise, the semantics -actions generated in each stack are saved, rather than being executed -immediately. When a stack disappears, its saved semantic actions never -get executed. When a reduction causes two stacks to become equivalent, -their sets of semantic actions are both saved with the state that -results from the reduction. We say that two stacks are equivalent -when they both represent the same sequence of states, -and each pair of corresponding states represents a -grammar symbol that produces the same segment of the input token -stream. - -Whenever the parser makes a transition from having multiple -states to having one, it reverts to the normal deterministic parsing -algorithm, after resolving and executing the saved-up actions. -At this transition, some of the states on the stack will have semantic -values that are sets (actually multisets) of possible actions. The -parser tries to pick one of the actions by first finding one whose rule -has the highest dynamic precedence, as set by the @samp{%dprec} -declaration. Otherwise, if the alternative actions are not ordered by -precedence, but there the same merging function is declared for both -rules by the @samp{%merge} declaration, -Bison resolves and evaluates both and then calls the merge function on -the result. Otherwise, it reports an ambiguity. - -It is possible to use a data structure for the GLR parsing tree that -permits the processing of any LR(1) grammar in linear time (in the -size of the input), any unambiguous (not necessarily -LR(1)) grammar in -quadratic worst-case time, and any general (possibly ambiguous) -context-free grammar in cubic worst-case time. However, Bison currently -uses a simpler data structure that requires time proportional to the -length of the input times the maximum number of stacks required for any -prefix of the input. Thus, really ambiguous or nondeterministic -grammars can require exponential time and space to process. Such badly -behaving examples, however, are not generally of practical interest. -Usually, nondeterminism in a grammar is local---the parser is ``in -doubt'' only for a few tokens at a time. Therefore, the current data -structure should generally be adequate. On LR(1) portions of a -grammar, in particular, it is only slightly slower than with the -deterministic LR(1) Bison parser. - -For a more detailed exposition of GLR parsers, @pxref{Bibliography,,Scott -2000}. - -@node Memory Management -@section Memory Management, and How to Avoid Memory Exhaustion -@cindex memory exhaustion -@cindex memory management -@cindex stack overflow -@cindex parser stack overflow -@cindex overflow of parser stack - -The Bison parser stack can run out of memory if too many tokens are shifted and -not reduced. When this happens, the parser function @code{yyparse} -calls @code{yyerror} and then returns 2. - -Because Bison parsers have growing stacks, hitting the upper limit -usually results from using a right recursion instead of a left -recursion, see @ref{Recursion, ,Recursive Rules}. - -@vindex YYMAXDEPTH -By defining the macro @code{YYMAXDEPTH}, you can control how deep the -parser stack can become before memory is exhausted. Define the -macro with a value that is an integer. This value is the maximum number -of tokens that can be shifted (and not reduced) before overflow. - -The stack space allowed is not necessarily allocated. If you specify a -large value for @code{YYMAXDEPTH}, the parser normally allocates a small -stack at first, and then makes it bigger by stages as needed. This -increasing allocation happens automatically and silently. Therefore, -you do not need to make @code{YYMAXDEPTH} painfully small merely to save -space for ordinary inputs that do not need much stack. - -However, do not allow @code{YYMAXDEPTH} to be a value so large that -arithmetic overflow could occur when calculating the size of the stack -space. Also, do not allow @code{YYMAXDEPTH} to be less than -@code{YYINITDEPTH}. - -@cindex default stack limit -The default value of @code{YYMAXDEPTH}, if you do not define it, is -10000. - -@vindex YYINITDEPTH -You can control how much stack is allocated initially by defining the -macro @code{YYINITDEPTH} to a positive integer. For the deterministic -parser in C, this value must be a compile-time constant -unless you are assuming C99 or some other target language or compiler -that allows variable-length arrays. The default is 200. - -Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}. - -@c FIXME: C++ output. -Because of semantic differences between C and C++, the deterministic -parsers in C produced by Bison cannot grow when compiled -by C++ compilers. In this precise case (compiling a C parser as C++) you are -suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix -this deficiency in a future release. - -@node Error Recovery -@chapter Error Recovery -@cindex error recovery -@cindex recovery from errors - -It is not usually acceptable to have a program terminate on a syntax -error. For example, a compiler should recover sufficiently to parse the -rest of the input file and check it for errors; a calculator should accept -another expression. - -In a simple interactive command parser where each input is one line, it may -be sufficient to allow @code{yyparse} to return 1 on error and have the -caller ignore the rest of the input line when that happens (and then call -@code{yyparse} again). But this is inadequate for a compiler, because it -forgets all the syntactic context leading up to the error. A syntax error -deep within a function in the compiler input should not cause the compiler -to treat the following line like the beginning of a source file. - -@findex error -You can define how to recover from a syntax error by writing rules to -recognize the special token @code{error}. This is a terminal symbol that -is always defined (you need not declare it) and reserved for error -handling. The Bison parser generates an @code{error} token whenever a -syntax error happens; if you have provided a rule to recognize this token -in the current context, the parse can continue. - -For example: - -@example -stmts: - /* empty string */ -| stmts '\n' -| stmts exp '\n' -| stmts error '\n' -@end example - -The fourth rule in this example says that an error followed by a newline -makes a valid addition to any @code{stmts}. - -What happens if a syntax error occurs in the middle of an @code{exp}? The -error recovery rule, interpreted strictly, applies to the precise sequence -of a @code{stmts}, an @code{error} and a newline. If an error occurs in -the middle of an @code{exp}, there will probably be some additional tokens -and subexpressions on the stack after the last @code{stmts}, and there -will be tokens to read before the next newline. So the rule is not -applicable in the ordinary way. - -But Bison can force the situation to fit the rule, by discarding part of -the semantic context and part of the input. First it discards states -and objects from the stack until it gets back to a state in which the -@code{error} token is acceptable. (This means that the subexpressions -already parsed are discarded, back to the last complete @code{stmts}.) -At this point the @code{error} token can be shifted. Then, if the old -lookahead token is not acceptable to be shifted next, the parser reads -tokens and discards them until it finds a token which is acceptable. In -this example, Bison reads and discards input until the next newline so -that the fourth rule can apply. Note that discarded symbols are -possible sources of memory leaks, see @ref{Destructor Decl, , Freeing -Discarded Symbols}, for a means to reclaim this memory. - -The choice of error rules in the grammar is a choice of strategies for -error recovery. A simple and useful strategy is simply to skip the rest of -the current input line or current statement if an error is detected: - -@example -stmt: error ';' /* On error, skip until ';' is read. */ -@end example - -It is also useful to recover to the matching close-delimiter of an -opening-delimiter that has already been parsed. Otherwise the -close-delimiter will probably appear to be unmatched, and generate another, -spurious error message: - -@example -primary: - '(' expr ')' -| '(' error ')' -@dots{} -; -@end example - -Error recovery strategies are necessarily guesses. When they guess wrong, -one syntax error often leads to another. In the above example, the error -recovery rule guesses that an error is due to bad input within one -@code{stmt}. Suppose that instead a spurious semicolon is inserted in the -middle of a valid @code{stmt}. After the error recovery rule recovers -from the first error, another syntax error will be found straightaway, -since the text following the spurious semicolon is also an invalid -@code{stmt}. - -To prevent an outpouring of error messages, the parser will output no error -message for another syntax error that happens shortly after the first; only -after three consecutive input tokens have been successfully shifted will -error messages resume. - -Note that rules which accept the @code{error} token may have actions, just -as any other rules can. - -@findex yyerrok -You can make error messages resume immediately by using the macro -@code{yyerrok} in an action. If you do this in the error rule's action, no -error messages will be suppressed. This macro requires no arguments; -@samp{yyerrok;} is a valid C statement. - -@findex yyclearin -The previous lookahead token is reanalyzed immediately after an error. If -this is unacceptable, then the macro @code{yyclearin} may be used to clear -this token. Write the statement @samp{yyclearin;} in the error rule's -action. -@xref{Action Features, ,Special Features for Use in Actions}. - -For example, suppose that on a syntax error, an error handling routine is -called that advances the input stream to some point where parsing should -once again commence. The next symbol returned by the lexical scanner is -probably correct. The previous lookahead token ought to be discarded -with @samp{yyclearin;}. - -@vindex YYRECOVERING -The expression @code{YYRECOVERING ()} yields 1 when the parser -is recovering from a syntax error, and 0 otherwise. -Syntax error diagnostics are suppressed while recovering from a syntax -error. - -@node Context Dependency -@chapter Handling Context Dependencies - -The Bison paradigm is to parse tokens first, then group them into larger -syntactic units. In many languages, the meaning of a token is affected by -its context. Although this violates the Bison paradigm, certain techniques -(known as @dfn{kludges}) may enable you to write Bison parsers for such -languages. - -@menu -* Semantic Tokens:: Token parsing can depend on the semantic context. -* Lexical Tie-ins:: Token parsing can depend on the syntactic context. -* Tie-in Recovery:: Lexical tie-ins have implications for how - error recovery rules must be written. -@end menu - -(Actually, ``kludge'' means any technique that gets its job done but is -neither clean nor robust.) - -@node Semantic Tokens -@section Semantic Info in Token Types - -The C language has a context dependency: the way an identifier is used -depends on what its current meaning is. For example, consider this: - -@example -foo (x); -@end example - -This looks like a function call statement, but if @code{foo} is a typedef -name, then this is actually a declaration of @code{x}. How can a Bison -parser for C decide how to parse this input? - -The method used in GNU C is to have two different token types, -@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an -identifier, it looks up the current declaration of the identifier in order -to decide which token type to return: @code{TYPENAME} if the identifier is -declared as a typedef, @code{IDENTIFIER} otherwise. - -The grammar rules can then express the context dependency by the choice of -token type to recognize. @code{IDENTIFIER} is accepted as an expression, -but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but -@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier -is @emph{not} significant, such as in declarations that can shadow a -typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is -accepted---there is one rule for each of the two token types. - -This technique is simple to use if the decision of which kinds of -identifiers to allow is made at a place close to where the identifier is -parsed. But in C this is not always so: C allows a declaration to -redeclare a typedef name provided an explicit type has been specified -earlier: - -@example -typedef int foo, bar; -int baz (void) -@group -@{ - static bar (bar); /* @r{redeclare @code{bar} as static variable} */ - extern foo foo (foo); /* @r{redeclare @code{foo} as function} */ - return foo (bar); -@} -@end group -@end example - -Unfortunately, the name being declared is separated from the declaration -construct itself by a complicated syntactic structure---the ``declarator''. - -As a result, part of the Bison parser for C needs to be duplicated, with -all the nonterminal names changed: once for parsing a declaration in -which a typedef name can be redefined, and once for parsing a -declaration in which that can't be done. Here is a part of the -duplication, with actions omitted for brevity: - -@example -@group -initdcl: - declarator maybeasm '=' init -| declarator maybeasm -; -@end group - -@group -notype_initdcl: - notype_declarator maybeasm '=' init -| notype_declarator maybeasm -; -@end group -@end example - -@noindent -Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl} -cannot. The distinction between @code{declarator} and -@code{notype_declarator} is the same sort of thing. - -There is some similarity between this technique and a lexical tie-in -(described next), in that information which alters the lexical analysis is -changed during parsing by other parts of the program. The difference is -here the information is global, and is used for other purposes in the -program. A true lexical tie-in has a special-purpose flag controlled by -the syntactic context. - -@node Lexical Tie-ins -@section Lexical Tie-ins -@cindex lexical tie-in - -One way to handle context-dependency is the @dfn{lexical tie-in}: a flag -which is set by Bison actions, whose purpose is to alter the way tokens are -parsed. - -For example, suppose we have a language vaguely like C, but with a special -construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes -an expression in parentheses in which all integers are hexadecimal. In -particular, the token @samp{a1b} must be treated as an integer rather than -as an identifier if it appears in that context. Here is how you can do it: - -@example -@group -%@{ - int hexflag; - int yylex (void); - void yyerror (char const *); -%@} -%% -@dots{} -@end group -@group -expr: - IDENTIFIER -| constant -| HEX '(' @{ hexflag = 1; @} - expr ')' @{ hexflag = 0; $$ = $4; @} -| expr '+' expr @{ $$ = make_sum ($1, $3); @} -@dots{} -; -@end group - -@group -constant: - INTEGER -| STRING -; -@end group -@end example - -@noindent -Here we assume that @code{yylex} looks at the value of @code{hexflag}; when -it is nonzero, all integers are parsed in hexadecimal, and tokens starting -with letters are parsed as integers if possible. - -The declaration of @code{hexflag} shown in the prologue of the grammar -file is needed to make it accessible to the actions (@pxref{Prologue, -,The Prologue}). You must also write the code in @code{yylex} to obey -the flag. - -@node Tie-in Recovery -@section Lexical Tie-ins and Error Recovery - -Lexical tie-ins make strict demands on any error recovery rules you have. -@xref{Error Recovery}. - -The reason for this is that the purpose of an error recovery rule is to -abort the parsing of one construct and resume in some larger construct. -For example, in C-like languages, a typical error recovery rule is to skip -tokens until the next semicolon, and then start a new statement, like this: - -@example -stmt: - expr ';' -| IF '(' expr ')' stmt @{ @dots{} @} -@dots{} -| error ';' @{ hexflag = 0; @} -; -@end example - -If there is a syntax error in the middle of a @samp{hex (@var{expr})} -construct, this error rule will apply, and then the action for the -completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would -remain set for the entire rest of the input, or until the next @code{hex} -keyword, causing identifiers to be misinterpreted as integers. - -To avoid this problem the error recovery rule itself clears @code{hexflag}. - -There may also be an error recovery rule that works within expressions. -For example, there could be a rule which applies within parentheses -and skips to the close-parenthesis: - -@example -@group -expr: - @dots{} -| '(' expr ')' @{ $$ = $2; @} -| '(' error ')' -@dots{} -@end group -@end example - -If this rule acts within the @code{hex} construct, it is not going to abort -that construct (since it applies to an inner level of parentheses within -the construct). Therefore, it should not clear the flag: the rest of -the @code{hex} construct should be parsed with the flag still in effect. - -What if there is an error recovery rule which might abort out of the -@code{hex} construct or might not, depending on circumstances? There is no -way you can write the action to determine whether a @code{hex} construct is -being aborted or not. So if you are using a lexical tie-in, you had better -make sure your error recovery rules are not of this kind. Each rule must -be such that you can be sure that it always will, or always won't, have to -clear the flag. - -@c ================================================== Debugging Your Parser - -@node Debugging -@chapter Debugging Your Parser - -Developing a parser can be a challenge, especially if you don't understand -the algorithm (@pxref{Algorithm, ,The Bison Parser Algorithm}). This -chapter explains how to generate and read the detailed description of the -automaton, and how to enable and understand the parser run-time traces. - -@menu -* Understanding:: Understanding the structure of your parser. -* Tracing:: Tracing the execution of your parser. -@end menu - -@node Understanding -@section Understanding Your Parser - -As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm}) -Bison parsers are @dfn{shift/reduce automata}. In some cases (much more -frequent than one would hope), looking at this automaton is required to -tune or simply fix a parser. Bison provides two different -representation of it, either textually or graphically (as a DOT file). - -The textual file is generated when the options @option{--report} or -@option{--verbose} are specified, see @ref{Invocation, , Invoking -Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from -the parser implementation file name, and adding @samp{.output} -instead. Therefore, if the grammar file is @file{foo.y}, then the -parser implementation file is called @file{foo.tab.c} by default. As -a consequence, the verbose output file is called @file{foo.output}. - -The following grammar file, @file{calc.y}, will be used in the sequel: - -@example -%token NUM STR -%left '+' '-' -%left '*' -%% -exp: - exp '+' exp -| exp '-' exp -| exp '*' exp -| exp '/' exp -| NUM -; -useless: STR; -%% -@end example - -@command{bison} reports: - -@example -calc.y: warning: 1 nonterminal useless in grammar -calc.y: warning: 1 rule useless in grammar -calc.y:11.1-7: warning: nonterminal useless in grammar: useless -calc.y:11.10-12: warning: rule useless in grammar: useless: STR -calc.y: conflicts: 7 shift/reduce -@end example - -When given @option{--report=state}, in addition to @file{calc.tab.c}, it -creates a file @file{calc.output} with contents detailed below. The -order of the output and the exact presentation might vary, but the -interpretation is the same. - -@noindent -@cindex token, useless -@cindex useless token -@cindex nonterminal, useless -@cindex useless nonterminal -@cindex rule, useless -@cindex useless rule -The first section reports useless tokens, nonterminals and rules. Useless -nonterminals and rules are removed in order to produce a smaller parser, but -useless tokens are preserved, since they might be used by the scanner (note -the difference between ``useless'' and ``unused'' below): - -@example -Nonterminals useless in grammar - useless - -Terminals unused in grammar - STR - -Rules useless in grammar - 6 useless: STR -@end example - -@noindent -The next section lists states that still have conflicts. - -@example -State 8 conflicts: 1 shift/reduce -State 9 conflicts: 1 shift/reduce -State 10 conflicts: 1 shift/reduce -State 11 conflicts: 4 shift/reduce -@end example - -@noindent -Then Bison reproduces the exact grammar it used: - -@example -Grammar - - 0 $accept: exp $end - - 1 exp: exp '+' exp - 2 | exp '-' exp - 3 | exp '*' exp - 4 | exp '/' exp - 5 | NUM -@end example - -@noindent -and reports the uses of the symbols: - -@example -@group -Terminals, with rules where they appear - -$end (0) 0 -'*' (42) 3 -'+' (43) 1 -'-' (45) 2 -'/' (47) 4 -error (256) -NUM (258) 5 -STR (259) -@end group - -@group -Nonterminals, with rules where they appear - -$accept (9) - on left: 0 -exp (10) - on left: 1 2 3 4 5, on right: 0 1 2 3 4 -@end group -@end example - -@noindent -@cindex item -@cindex pointed rule -@cindex rule, pointed -Bison then proceeds onto the automaton itself, describing each state -with its set of @dfn{items}, also known as @dfn{pointed rules}. Each -item is a production rule together with a point (@samp{.}) marking -the location of the input cursor. - -@example -state 0 - - 0 $accept: . exp $end - - NUM shift, and go to state 1 - - exp go to state 2 -@end example - -This reads as follows: ``state 0 corresponds to being at the very -beginning of the parsing, in the initial rule, right before the start -symbol (here, @code{exp}). When the parser returns to this state right -after having reduced a rule that produced an @code{exp}, the control -flow jumps to state 2. If there is no such transition on a nonterminal -symbol, and the lookahead is a @code{NUM}, then this token is shifted onto -the parse stack, and the control flow jumps to state 1. Any other -lookahead triggers a syntax error.'' - -@cindex core, item set -@cindex item set core -@cindex kernel, item set -@cindex item set core -Even though the only active rule in state 0 seems to be rule 0, the -report lists @code{NUM} as a lookahead token because @code{NUM} can be -at the beginning of any rule deriving an @code{exp}. By default Bison -reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if -you want to see more detail you can invoke @command{bison} with -@option{--report=itemset} to list the derived items as well: - -@example -state 0 - - 0 $accept: . exp $end - 1 exp: . exp '+' exp - 2 | . exp '-' exp - 3 | . exp '*' exp - 4 | . exp '/' exp - 5 | . NUM - - NUM shift, and go to state 1 - - exp go to state 2 -@end example - -@noindent -In the state 1@dots{} - -@example -state 1 - - 5 exp: NUM . - - $default reduce using rule 5 (exp) -@end example - -@noindent -the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token -(@samp{$default}), the parser will reduce it. If it was coming from -state 0, then, after this reduction it will return to state 0, and will -jump to state 2 (@samp{exp: go to state 2}). - -@example -state 2 - - 0 $accept: exp . $end - 1 exp: exp . '+' exp - 2 | exp . '-' exp - 3 | exp . '*' exp - 4 | exp . '/' exp - - $end shift, and go to state 3 - '+' shift, and go to state 4 - '-' shift, and go to state 5 - '*' shift, and go to state 6 - '/' shift, and go to state 7 -@end example - -@noindent -In state 2, the automaton can only shift a symbol. For instance, -because of the item @samp{exp: exp . '+' exp}, if the lookahead is -@samp{+} it is shifted onto the parse stack, and the automaton -jumps to state 4, corresponding to the item @samp{exp: exp '+' . exp}. -Since there is no default action, any lookahead not listed triggers a syntax -error. - -@cindex accepting state -The state 3 is named the @dfn{final state}, or the @dfn{accepting -state}: - -@example -state 3 - - 0 $accept: exp $end . - - $default accept -@end example - -@noindent -the initial rule is completed (the start symbol and the end-of-input were -read), the parsing exits successfully. - -The interpretation of states 4 to 7 is straightforward, and is left to -the reader. - -@example -state 4 - - 1 exp: exp '+' . exp - - NUM shift, and go to state 1 - - exp go to state 8 - - -state 5 - - 2 exp: exp '-' . exp - - NUM shift, and go to state 1 - - exp go to state 9 - - -state 6 - - 3 exp: exp '*' . exp - - NUM shift, and go to state 1 - - exp go to state 10 - - -state 7 - - 4 exp: exp '/' . exp - - NUM shift, and go to state 1 - - exp go to state 11 -@end example - -As was announced in beginning of the report, @samp{State 8 conflicts: -1 shift/reduce}: - -@example -state 8 - - 1 exp: exp . '+' exp - 1 | exp '+' exp . - 2 | exp . '-' exp - 3 | exp . '*' exp - 4 | exp . '/' exp - - '*' shift, and go to state 6 - '/' shift, and go to state 7 - - '/' [reduce using rule 1 (exp)] - $default reduce using rule 1 (exp) -@end example - -Indeed, there are two actions associated to the lookahead @samp{/}: -either shifting (and going to state 7), or reducing rule 1. The -conflict means that either the grammar is ambiguous, or the parser lacks -information to make the right decision. Indeed the grammar is -ambiguous, as, since we did not specify the precedence of @samp{/}, the -sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM / -NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) / -NUM}, which corresponds to reducing rule 1. - -Because in deterministic parsing a single decision can be made, Bison -arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, , -Shift/Reduce Conflicts}. Discarded actions are reported between -square brackets. - -Note that all the previous states had a single possible action: either -shifting the next token and going to the corresponding state, or -reducing a single rule. In the other cases, i.e., when shifting -@emph{and} reducing is possible or when @emph{several} reductions are -possible, the lookahead is required to select the action. State 8 is -one such state: if the lookahead is @samp{*} or @samp{/} then the action -is shifting, otherwise the action is reducing rule 1. In other words, -the first two items, corresponding to rule 1, are not eligible when the -lookahead token is @samp{*}, since we specified that @samp{*} has higher -precedence than @samp{+}. More generally, some items are eligible only -with some set of possible lookahead tokens. When run with -@option{--report=lookahead}, Bison specifies these lookahead tokens: - -@example -state 8 - - 1 exp: exp . '+' exp - 1 | exp '+' exp . [$end, '+', '-', '/'] - 2 | exp . '-' exp - 3 | exp . '*' exp - 4 | exp . '/' exp - - '*' shift, and go to state 6 - '/' shift, and go to state 7 - - '/' [reduce using rule 1 (exp)] - $default reduce using rule 1 (exp) -@end example - -Note however that while @samp{NUM + NUM / NUM} is ambiguous (which results in -the conflicts on @samp{/}), @samp{NUM + NUM * NUM} is not: the conflict was -solved thanks to associativity and precedence directives. If invoked with -@option{--report=solved}, Bison includes information about the solved -conflicts in the report: - -@example -Conflict between rule 1 and token '+' resolved as reduce (%left '+'). -Conflict between rule 1 and token '-' resolved as reduce (%left '-'). -Conflict between rule 1 and token '*' resolved as shift ('+' < '*'). -@end example - - -The remaining states are similar: - -@example -@group -state 9 - - 1 exp: exp . '+' exp - 2 | exp . '-' exp - 2 | exp '-' exp . - 3 | exp . '*' exp - 4 | exp . '/' exp - - '*' shift, and go to state 6 - '/' shift, and go to state 7 - - '/' [reduce using rule 2 (exp)] - $default reduce using rule 2 (exp) -@end group - -@group -state 10 - - 1 exp: exp . '+' exp - 2 | exp . '-' exp - 3 | exp . '*' exp - 3 | exp '*' exp . - 4 | exp . '/' exp - - '/' shift, and go to state 7 - - '/' [reduce using rule 3 (exp)] - $default reduce using rule 3 (exp) -@end group - -@group -state 11 - - 1 exp: exp . '+' exp - 2 | exp . '-' exp - 3 | exp . '*' exp - 4 | exp . '/' exp - 4 | exp '/' exp . - - '+' shift, and go to state 4 - '-' shift, and go to state 5 - '*' shift, and go to state 6 - '/' shift, and go to state 7 - - '+' [reduce using rule 4 (exp)] - '-' [reduce using rule 4 (exp)] - '*' [reduce using rule 4 (exp)] - '/' [reduce using rule 4 (exp)] - $default reduce using rule 4 (exp) -@end group -@end example - -@noindent -Observe that state 11 contains conflicts not only due to the lack of -precedence of @samp{/} with respect to @samp{+}, @samp{-}, and -@samp{*}, but also because the -associativity of @samp{/} is not specified. - - -@node Tracing -@section Tracing Your Parser -@findex yydebug -@cindex debugging -@cindex tracing the parser - -When a Bison grammar compiles properly but parses ``incorrectly'', the -@code{yydebug} parser-trace feature helps figuring out why. - -@menu -* Enabling Traces:: Activating run-time trace support -* Mfcalc Traces:: Extending @code{mfcalc} to support traces -* The YYPRINT Macro:: Obsolete interface for semantic value reports -@end menu - -@node Enabling Traces -@subsection Enabling Traces -There are several means to enable compilation of trace facilities: - -@table @asis -@item the macro @code{YYDEBUG} -@findex YYDEBUG -Define the macro @code{YYDEBUG} to a nonzero value when you compile the -parser. This is compliant with POSIX Yacc. You could use -@samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define -YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue, , The -Prologue}). - -@item the option @option{-t}, @option{--debug} -Use the @samp{-t} option when you run Bison (@pxref{Invocation, -,Invoking Bison}). This is POSIX compliant too. - -@item the directive @samp{%debug} -@findex %debug -Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison -Declaration Summary}). This is a Bison extension, which will prove -useful when Bison will output parsers for languages that don't use a -preprocessor. Unless POSIX and Yacc portability matter to -you, this is -the preferred solution. -@end table - -We suggest that you always enable the debug option so that debugging is -always possible. - -@findex YYFPRINTF -The trace facility outputs messages with macro calls of the form -@code{YYFPRINTF (stderr, @var{format}, @var{args})} where -@var{format} and @var{args} are the usual @code{printf} format and variadic -arguments. If you define @code{YYDEBUG} to a nonzero value but do not -define @code{YYFPRINTF}, @code{} is automatically included -and @code{YYFPRINTF} is defined to @code{fprintf}. - -Once you have compiled the program with trace facilities, the way to -request a trace is to store a nonzero value in the variable @code{yydebug}. -You can do this by making the C code do it (in @code{main}, perhaps), or -you can alter the value with a C debugger. - -Each step taken by the parser when @code{yydebug} is nonzero produces a -line or two of trace information, written on @code{stderr}. The trace -messages tell you these things: - -@itemize @bullet -@item -Each time the parser calls @code{yylex}, what kind of token was read. - -@item -Each time a token is shifted, the depth and complete contents of the -state stack (@pxref{Parser States}). - -@item -Each time a rule is reduced, which rule it is, and the complete contents -of the state stack afterward. -@end itemize - -To make sense of this information, it helps to refer to the automaton -description file (@pxref{Understanding, ,Understanding Your Parser}). -This file shows the meaning of each state in terms of -positions in various rules, and also what each state will do with each -possible input token. As you read the successive trace messages, you -can see that the parser is functioning according to its specification in -the listing file. Eventually you will arrive at the place where -something undesirable happens, and you will see which parts of the -grammar are to blame. - -The parser implementation file is a C/C++/Java program and you can use -debuggers on it, but it's not easy to interpret what it is doing. The -parser function is a finite-state machine interpreter, and aside from -the actions it executes the same code over and over. Only the values -of variables show where in the grammar it is working. - -@node Mfcalc Traces -@subsection Enabling Debug Traces for @code{mfcalc} - -The debugging information normally gives the token type of each token read, -but not its semantic value. The @code{%printer} directive allows specify -how semantic values are reported, see @ref{Printer Decl, , Printing -Semantic Values}. For backward compatibility, Yacc like C parsers may also -use the @code{YYPRINT} (@pxref{The YYPRINT Macro, , The @code{YYPRINT} -Macro}), but its use is discouraged. - -As a demonstration of @code{%printer}, consider the multi-function -calculator, @code{mfcalc} (@pxref{Multi-function Calc}). To enable run-time -traces, and semantic value reports, insert the following directives in its -prologue: - -@comment file: mfcalc.y: 2 -@example -/* Generate the parser description file. */ -%verbose -/* Enable run-time traces (yydebug). */ -%define parse.trace - -/* Formatting semantic values. */ -%printer @{ fprintf (yyoutput, "%s", $$->name); @} VAR; -%printer @{ fprintf (yyoutput, "%s()", $$->name); @} FNCT; -%printer @{ fprintf (yyoutput, "%g", $$); @} ; -@end example - -The @code{%define} directive instructs Bison to generate run-time trace -support. Then, activation of these traces is controlled at run-time by the -@code{yydebug} variable, which is disabled by default. Because these traces -will refer to the ``states'' of the parser, it is helpful to ask for the -creation of a description of that parser; this is the purpose of (admittedly -ill-named) @code{%verbose} directive. - -The set of @code{%printer} directives demonstrates how to format the -semantic value in the traces. Note that the specification can be done -either on the symbol type (e.g., @code{VAR} or @code{FNCT}), or on the type -tag: since @code{} is the type for both @code{NUM} and @code{exp}, this -printer will be used for them. - -Here is a sample of the information provided by run-time traces. The traces -are sent onto standard error. - -@example -$ @kbd{echo 'sin(1-1)' | ./mfcalc -p} -Starting parse -Entering state 0 -Reducing stack by rule 1 (line 34): --> $$ = nterm input () -Stack now 0 -Entering state 1 -@end example - -@noindent -This first batch shows a specific feature of this grammar: the first rule -(which is in line 34 of @file{mfcalc.y} can be reduced without even having -to look for the first token. The resulting left-hand symbol (@code{$$}) is -a valueless (@samp{()}) @code{input} non terminal (@code{nterm}). - -Then the parser calls the scanner. -@example -Reading a token: Next token is token FNCT (sin()) -Shifting token FNCT (sin()) -Entering state 6 -@end example - -@noindent -That token (@code{token}) is a function (@code{FNCT}) whose value is -@samp{sin} as formatted per our @code{%printer} specification: @samp{sin()}. -The parser stores (@code{Shifting}) that token, and others, until it can do -something about it. - -@example -Reading a token: Next token is token '(' () -Shifting token '(' () -Entering state 14 -Reading a token: Next token is token NUM (1.000000) -Shifting token NUM (1.000000) -Entering state 4 -Reducing stack by rule 6 (line 44): - $1 = token NUM (1.000000) --> $$ = nterm exp (1.000000) -Stack now 0 1 6 14 -Entering state 24 -@end example - -@noindent -The previous reduction demonstrates the @code{%printer} directive for -@code{}: both the token @code{NUM} and the resulting non-terminal -@code{exp} have @samp{1} as value. - -@example -Reading a token: Next token is token '-' () -Shifting token '-' () -Entering state 17 -Reading a token: Next token is token NUM (1.000000) -Shifting token NUM (1.000000) -Entering state 4 -Reducing stack by rule 6 (line 44): - $1 = token NUM (1.000000) --> $$ = nterm exp (1.000000) -Stack now 0 1 6 14 24 17 -Entering state 26 -Reading a token: Next token is token ')' () -Reducing stack by rule 11 (line 49): - $1 = nterm exp (1.000000) - $2 = token '-' () - $3 = nterm exp (1.000000) --> $$ = nterm exp (0.000000) -Stack now 0 1 6 14 -Entering state 24 -@end example - -@noindent -The rule for the subtraction was just reduced. The parser is about to -discover the end of the call to @code{sin}. - -@example -Next token is token ')' () -Shifting token ')' () -Entering state 31 -Reducing stack by rule 9 (line 47): - $1 = token FNCT (sin()) - $2 = token '(' () - $3 = nterm exp (0.000000) - $4 = token ')' () --> $$ = nterm exp (0.000000) -Stack now 0 1 -Entering state 11 -@end example - -@noindent -Finally, the end-of-line allow the parser to complete the computation, and -display its result. - -@example -Reading a token: Next token is token '\n' () -Shifting token '\n' () -Entering state 22 -Reducing stack by rule 4 (line 40): - $1 = nterm exp (0.000000) - $2 = token '\n' () -@result{} 0 --> $$ = nterm line () -Stack now 0 1 -Entering state 10 -Reducing stack by rule 2 (line 35): - $1 = nterm input () - $2 = nterm line () --> $$ = nterm input () -Stack now 0 -Entering state 1 -@end example - -The parser has returned into state 1, in which it is waiting for the next -expression to evaluate, or for the end-of-file token, which causes the -completion of the parsing. - -@example -Reading a token: Now at end of input. -Shifting token $end () -Entering state 2 -Stack now 0 1 2 -Cleanup: popping token $end () -Cleanup: popping nterm input () -@end example - - -@node The YYPRINT Macro -@subsection The @code{YYPRINT} Macro - -@findex YYPRINT -Before @code{%printer} support, semantic values could be displayed using the -@code{YYPRINT} macro, which works only for terminal symbols and only with -the @file{yacc.c} skeleton. - -@deffn {Macro} YYPRINT (@var{stream}, @var{token}, @var{value}); -@findex YYPRINT -If you define @code{YYPRINT}, it should take three arguments. The parser -will pass a standard I/O stream, the numeric code for the token type, and -the token value (from @code{yylval}). - -For @file{yacc.c} only. Obsoleted by @code{%printer}. -@end deffn - -Here is an example of @code{YYPRINT} suitable for the multi-function -calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}): - -@example -%@{ - static void print_token_value (FILE *, int, YYSTYPE); - #define YYPRINT(File, Type, Value) \ - print_token_value (File, Type, Value) -%@} - -@dots{} %% @dots{} %% @dots{} - -static void -print_token_value (FILE *file, int type, YYSTYPE value) -@{ - if (type == VAR) - fprintf (file, "%s", value.tptr->name); - else if (type == NUM) - fprintf (file, "%d", value.val); -@} -@end example - -@c ================================================= Invoking Bison - -@node Invocation -@chapter Invoking Bison -@cindex invoking Bison -@cindex Bison invocation -@cindex options for invoking Bison - -The usual way to invoke Bison is as follows: - -@example -bison @var{infile} -@end example - -Here @var{infile} is the grammar file name, which usually ends in -@samp{.y}. The parser implementation file's name is made by replacing -the @samp{.y} with @samp{.tab.c} and removing any leading directory. -Thus, the @samp{bison foo.y} file name yields @file{foo.tab.c}, and -the @samp{bison hack/foo.y} file name yields @file{foo.tab.c}. It's -also possible, in case you are writing C++ code instead of C in your -grammar file, to name it @file{foo.ypp} or @file{foo.y++}. Then, the -output files will take an extension like the given one as input -(respectively @file{foo.tab.cpp} and @file{foo.tab.c++}). This -feature takes effect with all options that manipulate file names like -@samp{-o} or @samp{-d}. - -For example : - -@example -bison -d @var{infile.yxx} -@end example -@noindent -will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}, and - -@example -bison -d -o @var{output.c++} @var{infile.y} -@end example -@noindent -will produce @file{output.c++} and @file{outfile.h++}. - -For compatibility with POSIX, the standard Bison -distribution also contains a shell script called @command{yacc} that -invokes Bison with the @option{-y} option. - -@menu -* Bison Options:: All the options described in detail, - in alphabetical order by short options. -* Option Cross Key:: Alphabetical list of long options. -* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}. -@end menu - -@node Bison Options -@section Bison Options - -Bison supports both traditional single-letter options and mnemonic long -option names. Long option names are indicated with @samp{--} instead of -@samp{-}. Abbreviations for option names are allowed as long as they -are unique. When a long option takes an argument, like -@samp{--file-prefix}, connect the option name and the argument with -@samp{=}. - -Here is a list of options that can be used with Bison, alphabetized by -short option. It is followed by a cross key alphabetized by long -option. - -@c Please, keep this ordered as in `bison --help'. -@noindent -Operations modes: -@table @option -@item -h -@itemx --help -Print a summary of the command-line options to Bison and exit. - -@item -V -@itemx --version -Print the version number of Bison and exit. - -@item --print-localedir -Print the name of the directory containing locale-dependent data. - -@item --print-datadir -Print the name of the directory containing skeletons and XSLT. - -@item -y -@itemx --yacc -Act more like the traditional Yacc command. This can cause different -diagnostics to be generated, and may change behavior in other minor -ways. Most importantly, imitate Yacc's output file name conventions, -so that the parser implementation file is called @file{y.tab.c}, and -the other outputs are called @file{y.output} and @file{y.tab.h}. -Also, if generating a deterministic parser in C, generate -@code{#define} statements in addition to an @code{enum} to associate -token numbers with token names. Thus, the following shell script can -substitute for Yacc, and the Bison distribution contains such a script -for compatibility with POSIX: - -@example -#! /bin/sh -bison -y "$@@" -@end example - -The @option{-y}/@option{--yacc} option is intended for use with -traditional Yacc grammars. If your grammar uses a Bison extension -like @samp{%glr-parser}, Bison might not be Yacc-compatible even if -this option is specified. - -@item -W [@var{category}] -@itemx --warnings[=@var{category}] -Output warnings falling in @var{category}. @var{category} can be one -of: -@table @code -@item midrule-values -Warn about mid-rule values that are set but not used within any of the actions -of the parent rule. -For example, warn about unused @code{$2} in: - -@example -exp: '1' @{ $$ = 1; @} '+' exp @{ $$ = $1 + $4; @}; -@end example - -Also warn about mid-rule values that are used but not set. -For example, warn about unset @code{$$} in the mid-rule action in: - -@example -exp: '1' @{ $1 = 1; @} '+' exp @{ $$ = $2 + $4; @}; -@end example - -These warnings are not enabled by default since they sometimes prove to -be false alarms in existing grammars employing the Yacc constructs -@code{$0} or @code{$-@var{n}} (where @var{n} is some positive integer). - -@item yacc -Incompatibilities with POSIX Yacc. - -@item conflicts-sr -@itemx conflicts-rr -S/R and R/R conflicts. These warnings are enabled by default. However, if -the @code{%expect} or @code{%expect-rr} directive is specified, an -unexpected number of conflicts is an error, and an expected number of -conflicts is not reported, so @option{-W} and @option{--warning} then have -no effect on the conflict report. - -@item other -All warnings not categorized above. These warnings are enabled by default. - -This category is provided merely for the sake of completeness. Future -releases of Bison may move warnings from this category to new, more specific -categories. - -@item all -All the warnings. -@item none -Turn off all the warnings. -@item error -Treat warnings as errors. -@end table - -A category can be turned off by prefixing its name with @samp{no-}. For -instance, @option{-Wno-yacc} will hide the warnings about -POSIX Yacc incompatibilities. -@end table - -@noindent -Tuning the parser: - -@table @option -@item -t -@itemx --debug -In the parser implementation file, define the macro @code{YYDEBUG} to -1 if it is not already defined, so that the debugging facilities are -compiled. @xref{Tracing, ,Tracing Your Parser}. - -@item -D @var{name}[=@var{value}] -@itemx --define=@var{name}[=@var{value}] -@itemx -F @var{name}[=@var{value}] -@itemx --force-define=@var{name}[=@var{value}] -Each of these is equivalent to @samp{%define @var{name} "@var{value}"} -(@pxref{%define Summary}) except that Bison processes multiple -definitions for the same @var{name} as follows: - -@itemize -@item -Bison quietly ignores all command-line definitions for @var{name} except -the last. -@item -If that command-line definition is specified by a @code{-D} or -@code{--define}, Bison reports an error for any @code{%define} -definition for @var{name}. -@item -If that command-line definition is specified by a @code{-F} or -@code{--force-define} instead, Bison quietly ignores all @code{%define} -definitions for @var{name}. -@item -Otherwise, Bison reports an error if there are multiple @code{%define} -definitions for @var{name}. -@end itemize - -You should avoid using @code{-F} and @code{--force-define} in your -make files unless you are confident that it is safe to quietly ignore -any conflicting @code{%define} that may be added to the grammar file. - -@item -L @var{language} -@itemx --language=@var{language} -Specify the programming language for the generated parser, as if -@code{%language} was specified (@pxref{Decl Summary, , Bison Declaration -Summary}). Currently supported languages include C, C++, and Java. -@var{language} is case-insensitive. - -This option is experimental and its effect may be modified in future -releases. - -@item --locations -Pretend that @code{%locations} was specified. @xref{Decl Summary}. - -@item -p @var{prefix} -@itemx --name-prefix=@var{prefix} -Pretend that @code{%name-prefix "@var{prefix}"} was specified. -@xref{Decl Summary}. - -@item -l -@itemx --no-lines -Don't put any @code{#line} preprocessor commands in the parser -implementation file. Ordinarily Bison puts them in the parser -implementation file so that the C compiler and debuggers will -associate errors with your source file, the grammar file. This option -causes them to associate errors with the parser implementation file, -treating it as an independent source file in its own right. - -@item -S @var{file} -@itemx --skeleton=@var{file} -Specify the skeleton to use, similar to @code{%skeleton} -(@pxref{Decl Summary, , Bison Declaration Summary}). - -@c You probably don't need this option unless you are developing Bison. -@c You should use @option{--language} if you want to specify the skeleton for a -@c different language, because it is clearer and because it will always -@c choose the correct skeleton for non-deterministic or push parsers. - -If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton -file in the Bison installation directory. -If it does, @var{file} is an absolute file name or a file name relative to the -current working directory. -This is similar to how most shells resolve commands. - -@item -k -@itemx --token-table -Pretend that @code{%token-table} was specified. @xref{Decl Summary}. -@end table - -@noindent -Adjust the output: - -@table @option -@item --defines[=@var{file}] -Pretend that @code{%defines} was specified, i.e., write an extra output -file containing macro definitions for the token type names defined in -the grammar, as well as a few other declarations. @xref{Decl Summary}. - -@item -d -This is the same as @code{--defines} except @code{-d} does not accept a -@var{file} argument since POSIX Yacc requires that @code{-d} can be bundled -with other short options. - -@item -b @var{file-prefix} -@itemx --file-prefix=@var{prefix} -Pretend that @code{%file-prefix} was specified, i.e., specify prefix to use -for all Bison output file names. @xref{Decl Summary}. - -@item -r @var{things} -@itemx --report=@var{things} -Write an extra output file containing verbose description of the comma -separated list of @var{things} among: - -@table @code -@item state -Description of the grammar, conflicts (resolved and unresolved), and -parser's automaton. - -@item lookahead -Implies @code{state} and augments the description of the automaton with -each rule's lookahead set. - -@item itemset -Implies @code{state} and augments the description of the automaton with -the full set of items for each state, instead of its core only. -@end table - -@item --report-file=@var{file} -Specify the @var{file} for the verbose description. - -@item -v -@itemx --verbose -Pretend that @code{%verbose} was specified, i.e., write an extra output -file containing verbose descriptions of the grammar and -parser. @xref{Decl Summary}. - -@item -o @var{file} -@itemx --output=@var{file} -Specify the @var{file} for the parser implementation file. - -The other output files' names are constructed from @var{file} as -described under the @samp{-v} and @samp{-d} options. - -@item -g [@var{file}] -@itemx --graph[=@var{file}] -Output a graphical representation of the parser's -automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz} -@uref{http://www.graphviz.org/doc/info/lang.html, DOT} format. -@code{@var{file}} is optional. -If omitted and the grammar file is @file{foo.y}, the output file will be -@file{foo.dot}. - -@item -x [@var{file}] -@itemx --xml[=@var{file}] -Output an XML report of the parser's automaton computed by Bison. -@code{@var{file}} is optional. -If omitted and the grammar file is @file{foo.y}, the output file will be -@file{foo.xml}. -(The current XML schema is experimental and may evolve. -More user feedback will help to stabilize it.) -@end table - -@node Option Cross Key -@section Option Cross Key - -Here is a list of options, alphabetized by long option, to help you find -the corresponding short option and directive. - -@multitable {@option{--force-define=@var{name}[=@var{value}]}} {@option{-F @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}} -@headitem Long Option @tab Short Option @tab Bison Directive -@include cross-options.texi -@end multitable - -@node Yacc Library -@section Yacc Library - -The Yacc library contains default implementations of the -@code{yyerror} and @code{main} functions. These default -implementations are normally not useful, but POSIX requires -them. To use the Yacc library, link your program with the -@option{-ly} option. Note that Bison's implementation of the Yacc -library is distributed under the terms of the GNU General -Public License (@pxref{Copying}). - -If you use the Yacc library's @code{yyerror} function, you should -declare @code{yyerror} as follows: - -@example -int yyerror (char const *); -@end example - -Bison ignores the @code{int} value returned by this @code{yyerror}. -If you use the Yacc library's @code{main} function, your -@code{yyparse} function should have the following type signature: - -@example -int yyparse (void); -@end example - -@c ================================================= C++ Bison - -@node Other Languages -@chapter Parsers Written In Other Languages - -@menu -* C++ Parsers:: The interface to generate C++ parser classes -* Java Parsers:: The interface to generate Java parser classes -@end menu - -@node C++ Parsers -@section C++ Parsers - -@menu -* C++ Bison Interface:: Asking for C++ parser generation -* C++ Semantic Values:: %union vs. C++ -* C++ Location Values:: The position and location classes -* C++ Parser Interface:: Instantiating and running the parser -* C++ Scanner Interface:: Exchanges between yylex and parse -* A Complete C++ Example:: Demonstrating their use -@end menu - -@node C++ Bison Interface -@subsection C++ Bison Interface -@c - %skeleton "lalr1.cc" -@c - Always pure -@c - initial action - -The C++ deterministic parser is selected using the skeleton directive, -@samp{%skeleton "lalr1.cc"}, or the synonymous command-line option -@option{--skeleton=lalr1.cc}. -@xref{Decl Summary}. - -When run, @command{bison} will create several entities in the @samp{yy} -namespace. -@findex %define namespace -Use the @samp{%define namespace} directive to change the namespace -name, see @ref{%define Summary,,namespace}. The various classes are -generated in the following files: - -@table @file -@item position.hh -@itemx location.hh -The definition of the classes @code{position} and @code{location}, -used for location tracking. @xref{C++ Location Values}. - -@item stack.hh -An auxiliary class @code{stack} used by the parser. - -@item @var{file}.hh -@itemx @var{file}.cc -(Assuming the extension of the grammar file was @samp{.yy}.) The -declaration and implementation of the C++ parser class. The basename -and extension of these two files follow the same rules as with regular C -parsers (@pxref{Invocation}). - -The header is @emph{mandatory}; you must either pass -@option{-d}/@option{--defines} to @command{bison}, or use the -@samp{%defines} directive. -@end table - -All these files are documented using Doxygen; run @command{doxygen} -for a complete and accurate documentation. - -@node C++ Semantic Values -@subsection C++ Semantic Values -@c - No objects in unions -@c - YYSTYPE -@c - Printer and destructor - -The @code{%union} directive works as for C, see @ref{Union Decl, ,The -Collection of Value Types}. In particular it produces a genuine -@code{union}@footnote{In the future techniques to allow complex types -within pseudo-unions (similar to Boost variants) might be implemented to -alleviate these issues.}, which have a few specific features in C++. -@itemize @minus -@item -The type @code{YYSTYPE} is defined but its use is discouraged: rather -you should refer to the parser's encapsulated type -@code{yy::parser::semantic_type}. -@item -Non POD (Plain Old Data) types cannot be used. C++ forbids any -instance of classes with constructors in unions: only @emph{pointers} -to such objects are allowed. -@end itemize - -Because objects have to be stored via pointers, memory is not -reclaimed automatically: using the @code{%destructor} directive is the -only means to avoid leaks. @xref{Destructor Decl, , Freeing Discarded -Symbols}. - - -@node C++ Location Values -@subsection C++ Location Values -@c - %locations -@c - class Position -@c - class Location -@c - %define filename_type "const symbol::Symbol" - -When the directive @code{%locations} is used, the C++ parser supports -location tracking, see @ref{Tracking Locations}. Two auxiliary classes -define a @code{position}, a single point in a file, and a @code{location}, a -range composed of a pair of @code{position}s (possibly spanning several -files). - -@tindex uint -In this section @code{uint} is an abbreviation for @code{unsigned int}: in -genuine code only the latter is used. - -@menu -* C++ position:: One point in the source file -* C++ location:: Two points in the source file -@end menu - -@node C++ position -@subsubsection C++ @code{position} - -@deftypeop {Constructor} {position} {} position (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) -Create a @code{position} denoting a given point. Note that @code{file} is -not reclaimed when the @code{position} is destroyed: memory managed must be -handled elsewhere. -@end deftypeop - -@deftypemethod {position} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) -Reset the position to the given values. -@end deftypemethod - -@deftypeivar {position} {std::string*} file -The name of the file. It will always be handled as a pointer, the -parser will never duplicate nor deallocate it. As an experimental -feature you may change it to @samp{@var{type}*} using @samp{%define -filename_type "@var{type}"}. -@end deftypeivar - -@deftypeivar {position} {uint} line -The line, starting at 1. -@end deftypeivar - -@deftypemethod {position} {uint} lines (int @var{height} = 1) -Advance by @var{height} lines, resetting the column number. -@end deftypemethod - -@deftypeivar {position} {uint} column -The column, starting at 1. -@end deftypeivar - -@deftypemethod {position} {uint} columns (int @var{width} = 1) -Advance by @var{width} columns, without changing the line number. -@end deftypemethod - -@deftypemethod {position} {position&} operator+= (int @var{width}) -@deftypemethodx {position} {position} operator+ (int @var{width}) -@deftypemethodx {position} {position&} operator-= (int @var{width}) -@deftypemethodx {position} {position} operator- (int @var{width}) -Various forms of syntactic sugar for @code{columns}. -@end deftypemethod - -@deftypemethod {position} {bool} operator== (const position& @var{that}) -@deftypemethodx {position} {bool} operator!= (const position& @var{that}) -Whether @code{*this} and @code{that} denote equal/different positions. -@end deftypemethod - -@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const position& @var{p}) -Report @var{p} on @var{o} like this: -@samp{@var{file}:@var{line}.@var{column}}, or -@samp{@var{line}.@var{column}} if @var{file} is null. -@end deftypefun - -@node C++ location -@subsubsection C++ @code{location} - -@deftypeop {Constructor} {location} {} location (const position& @var{begin}, const position& @var{end}) -Create a @code{Location} from the endpoints of the range. -@end deftypeop - -@deftypeop {Constructor} {location} {} location (const position& @var{pos} = position()) -@deftypeopx {Constructor} {location} {} location (std::string* @var{file}, uint @var{line}, uint @var{col}) -Create a @code{Location} denoting an empty range located at a given point. -@end deftypeop - -@deftypemethod {location} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1) -Reset the location to an empty range at the given values. -@end deftypemethod - -@deftypeivar {location} {position} begin -@deftypeivarx {location} {position} end -The first, inclusive, position of the range, and the first beyond. -@end deftypeivar - -@deftypemethod {location} {uint} columns (int @var{width} = 1) -@deftypemethodx {location} {uint} lines (int @var{height} = 1) -Advance the @code{end} position. -@end deftypemethod - -@deftypemethod {location} {location} operator+ (const location& @var{end}) -@deftypemethodx {location} {location} operator+ (int @var{width}) -@deftypemethodx {location} {location} operator+= (int @var{width}) -Various forms of syntactic sugar. -@end deftypemethod - -@deftypemethod {location} {void} step () -Move @code{begin} onto @code{end}. -@end deftypemethod - -@deftypemethod {location} {bool} operator== (const location& @var{that}) -@deftypemethodx {location} {bool} operator!= (const location& @var{that}) -Whether @code{*this} and @code{that} denote equal/different ranges of -positions. -@end deftypemethod - -@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const location& @var{p}) -Report @var{p} on @var{o}, taking care of special cases such as: no -@code{filename} defined, or equal filename/line or column. -@end deftypefun - -@node C++ Parser Interface -@subsection C++ Parser Interface -@c - define parser_class_name -@c - Ctor -@c - parse, error, set_debug_level, debug_level, set_debug_stream, -@c debug_stream. -@c - Reporting errors - -The output files @file{@var{output}.hh} and @file{@var{output}.cc} -declare and define the parser class in the namespace @code{yy}. The -class name defaults to @code{parser}, but may be changed using -@samp{%define parser_class_name "@var{name}"}. The interface of -this class is detailed below. It can be extended using the -@code{%parse-param} feature: its semantics is slightly changed since -it describes an additional member of the parser class, and an -additional argument for its constructor. - -@defcv {Type} {parser} {semantic_type} -@defcvx {Type} {parser} {location_type} -The types for semantics value and locations. -@end defcv - -@defcv {Type} {parser} {token} -A structure that contains (only) the @code{yytokentype} enumeration, which -defines the tokens. To refer to the token @code{FOO}, -use @code{yy::parser::token::FOO}. The scanner can use -@samp{typedef yy::parser::token token;} to ``import'' the token enumeration -(@pxref{Calc++ Scanner}). -@end defcv - -@deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...) -Build a new parser object. There are no arguments by default, unless -@samp{%parse-param @{@var{type1} @var{arg1}@}} was used. -@end deftypemethod - -@deftypemethod {parser} {int} parse () -Run the syntactic analysis, and return 0 on success, 1 otherwise. -@end deftypemethod - -@deftypemethod {parser} {std::ostream&} debug_stream () -@deftypemethodx {parser} {void} set_debug_stream (std::ostream& @var{o}) -Get or set the stream used for tracing the parsing. It defaults to -@code{std::cerr}. -@end deftypemethod - -@deftypemethod {parser} {debug_level_type} debug_level () -@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l}) -Get or set the tracing level. Currently its value is either 0, no trace, -or nonzero, full tracing. -@end deftypemethod - -@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m}) -The definition for this member function must be supplied by the user: -the parser uses it to report a parser error occurring at @var{l}, -described by @var{m}. -@end deftypemethod - - -@node C++ Scanner Interface -@subsection C++ Scanner Interface -@c - prefix for yylex. -@c - Pure interface to yylex -@c - %lex-param - -The parser invokes the scanner by calling @code{yylex}. Contrary to C -parsers, C++ parsers are always pure: there is no point in using the -@code{%define api.pure} directive. Therefore the interface is as follows. - -@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...) -Return the next token. Its type is the return value, its semantic -value and location being @var{yylval} and @var{yylloc}. Invocations of -@samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments. -@end deftypemethod - - -@node A Complete C++ Example -@subsection A Complete C++ Example - -This section demonstrates the use of a C++ parser with a simple but -complete example. This example should be available on your system, -ready to compile, in the directory @dfn{../bison/examples/calc++}. It -focuses on the use of Bison, therefore the design of the various C++ -classes is very naive: no accessors, no encapsulation of members etc. -We will use a Lex scanner, and more precisely, a Flex scanner, to -demonstrate the various interaction. A hand written scanner is -actually easier to interface with. - -@menu -* Calc++ --- C++ Calculator:: The specifications -* Calc++ Parsing Driver:: An active parsing context -* Calc++ Parser:: A parser class -* Calc++ Scanner:: A pure C++ Flex scanner -* Calc++ Top Level:: Conducting the band -@end menu - -@node Calc++ --- C++ Calculator -@subsubsection Calc++ --- C++ Calculator - -Of course the grammar is dedicated to arithmetics, a single -expression, possibly preceded by variable assignments. An -environment containing possibly predefined variables such as -@code{one} and @code{two}, is exchanged with the parser. An example -of valid input follows. - -@example -three := 3 -seven := one + two * three -seven * seven -@end example - -@node Calc++ Parsing Driver -@subsubsection Calc++ Parsing Driver -@c - An env -@c - A place to store error messages -@c - A place for the result - -To support a pure interface with the parser (and the scanner) the -technique of the ``parsing context'' is convenient: a structure -containing all the data to exchange. Since, in addition to simply -launch the parsing, there are several auxiliary tasks to execute (open -the file for parsing, instantiate the parser etc.), we recommend -transforming the simple parsing context structure into a fully blown -@dfn{parsing driver} class. - -The declaration of this driver class, @file{calc++-driver.hh}, is as -follows. The first part includes the CPP guard and imports the -required standard library components, and the declaration of the parser -class. - -@comment file: calc++-driver.hh -@example -#ifndef CALCXX_DRIVER_HH -# define CALCXX_DRIVER_HH -# include -# include -# include "calc++-parser.hh" -@end example - - -@noindent -Then comes the declaration of the scanning function. Flex expects -the signature of @code{yylex} to be defined in the macro -@code{YY_DECL}, and the C++ parser expects it to be declared. We can -factor both as follows. - -@comment file: calc++-driver.hh -@example -// Tell Flex the lexer's prototype ... -# define YY_DECL \ - yy::calcxx_parser::token_type \ - yylex (yy::calcxx_parser::semantic_type* yylval, \ - yy::calcxx_parser::location_type* yylloc, \ - calcxx_driver& driver) -// ... and declare it for the parser's sake. -YY_DECL; -@end example - -@noindent -The @code{calcxx_driver} class is then declared with its most obvious -members. - -@comment file: calc++-driver.hh -@example -// Conducting the whole scanning and parsing of Calc++. -class calcxx_driver -@{ -public: - calcxx_driver (); - virtual ~calcxx_driver (); - - std::map variables; - - int result; -@end example - -@noindent -To encapsulate the coordination with the Flex scanner, it is useful to -have two members function to open and close the scanning phase. - -@comment file: calc++-driver.hh -@example - // Handling the scanner. - void scan_begin (); - void scan_end (); - bool trace_scanning; -@end example - -@noindent -Similarly for the parser itself. - -@comment file: calc++-driver.hh -@example - // Run the parser. Return 0 on success. - int parse (const std::string& f); - std::string file; - bool trace_parsing; -@end example - -@noindent -To demonstrate pure handling of parse errors, instead of simply -dumping them on the standard error output, we will pass them to the -compiler driver using the following two member functions. Finally, we -close the class declaration and CPP guard. - -@comment file: calc++-driver.hh -@example - // Error handling. - void error (const yy::location& l, const std::string& m); - void error (const std::string& m); -@}; -#endif // ! CALCXX_DRIVER_HH -@end example - -The implementation of the driver is straightforward. The @code{parse} -member function deserves some attention. The @code{error} functions -are simple stubs, they should actually register the located error -messages and set error state. - -@comment file: calc++-driver.cc -@example -#include "calc++-driver.hh" -#include "calc++-parser.hh" - -calcxx_driver::calcxx_driver () - : trace_scanning (false), trace_parsing (false) -@{ - variables["one"] = 1; - variables["two"] = 2; -@} - -calcxx_driver::~calcxx_driver () -@{ -@} - -int -calcxx_driver::parse (const std::string &f) -@{ - file = f; - scan_begin (); - yy::calcxx_parser parser (*this); - parser.set_debug_level (trace_parsing); - int res = parser.parse (); - scan_end (); - return res; -@} - -void -calcxx_driver::error (const yy::location& l, const std::string& m) -@{ - std::cerr << l << ": " << m << std::endl; -@} - -void -calcxx_driver::error (const std::string& m) -@{ - std::cerr << m << std::endl; -@} -@end example - -@node Calc++ Parser -@subsubsection Calc++ Parser - -The grammar file @file{calc++-parser.yy} starts by asking for the C++ -deterministic parser skeleton, the creation of the parser header file, -and specifies the name of the parser class. Because the C++ skeleton -changed several times, it is safer to require the version you designed -the grammar for. - -@comment file: calc++-parser.yy -@example -%skeleton "lalr1.cc" /* -*- C++ -*- */ -%require "@value{VERSION}" -%defines -%define parser_class_name "calcxx_parser" -@end example - -@noindent -@findex %code requires -Then come the declarations/inclusions needed to define the -@code{%union}. Because the parser uses the parsing driver and -reciprocally, both cannot include the header of the other. Because the -driver's header needs detailed knowledge about the parser class (in -particular its inner types), it is the parser's header which will simply -use a forward declaration of the driver. -@xref{%code Summary}. - -@comment file: calc++-parser.yy -@example -%code requires @{ -# include -class calcxx_driver; -@} -@end example - -@noindent -The driver is passed by reference to the parser and to the scanner. -This provides a simple but effective pure interface, not relying on -global variables. - -@comment file: calc++-parser.yy -@example -// The parsing context. -%parse-param @{ calcxx_driver& driver @} -%lex-param @{ calcxx_driver& driver @} -@end example - -@noindent -Then we request the location tracking feature, and initialize the -first location's file name. Afterward new locations are computed -relatively to the previous locations: the file name will be -automatically propagated. - -@comment file: calc++-parser.yy -@example -%locations -%initial-action -@{ - // Initialize the initial location. - @@$.begin.filename = @@$.end.filename = &driver.file; -@}; -@end example - -@noindent -Use the two following directives to enable parser tracing and verbose error -messages. However, verbose error messages can contain incorrect information -(@pxref{LAC}). - -@comment file: calc++-parser.yy -@example -%debug -%error-verbose -@end example - -@noindent -Semantic values cannot use ``real'' objects, but only pointers to -them. - -@comment file: calc++-parser.yy -@example -// Symbols. -%union -@{ - int ival; - std::string *sval; -@}; -@end example - -@noindent -@findex %code -The code between @samp{%code @{} and @samp{@}} is output in the -@file{*.cc} file; it needs detailed knowledge about the driver. - -@comment file: calc++-parser.yy -@example -%code @{ -# include "calc++-driver.hh" -@} -@end example - - -@noindent -The token numbered as 0 corresponds to end of file; the following line -allows for nicer error messages referring to ``end of file'' instead -of ``$end''. Similarly user friendly named are provided for each -symbol. Note that the tokens names are prefixed by @code{TOKEN_} to -avoid name clashes. - -@comment file: calc++-parser.yy -@example -%token END 0 "end of file" -%token ASSIGN ":=" -%token IDENTIFIER "identifier" -%token NUMBER "number" -%type exp -@end example - -@noindent -To enable memory deallocation during error recovery, use -@code{%destructor}. - -@c FIXME: Document %printer, and mention that it takes a braced-code operand. -@comment file: calc++-parser.yy -@example -%printer @{ yyoutput << *$$; @} "identifier" -%destructor @{ delete $$; @} "identifier" - -%printer @{ yyoutput << $$; @} -@end example - -@noindent -The grammar itself is straightforward. - -@comment file: calc++-parser.yy -@example -%% -%start unit; -unit: assignments exp @{ driver.result = $2; @}; - -assignments: - /* Nothing. */ @{@} -| assignments assignment @{@}; - -assignment: - "identifier" ":=" exp - @{ driver.variables[*$1] = $3; delete $1; @}; - -%left '+' '-'; -%left '*' '/'; -exp: exp '+' exp @{ $$ = $1 + $3; @} - | exp '-' exp @{ $$ = $1 - $3; @} - | exp '*' exp @{ $$ = $1 * $3; @} - | exp '/' exp @{ $$ = $1 / $3; @} - | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} - | "number" @{ $$ = $1; @}; -%% -@end example - -@noindent -Finally the @code{error} member function registers the errors to the -driver. - -@comment file: calc++-parser.yy -@example -void -yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l, - const std::string& m) -@{ - driver.error (l, m); -@} -@end example - -@node Calc++ Scanner -@subsubsection Calc++ Scanner - -The Flex scanner first includes the driver declaration, then the -parser's to get the set of defined tokens. - -@comment file: calc++-scanner.ll -@example -%@{ /* -*- C++ -*- */ -# include -# include -# include -# include -# include "calc++-driver.hh" -# include "calc++-parser.hh" - -/* Work around an incompatibility in flex (at least versions - 2.5.31 through 2.5.33): it generates code that does - not conform to C89. See Debian bug 333231 - . */ -# undef yywrap -# define yywrap() 1 - -/* By default yylex returns int, we use token_type. - Unfortunately yyterminate by default returns 0, which is - not of token_type. */ -#define yyterminate() return token::END -%@} -@end example - -@noindent -Because there is no @code{#include}-like feature we don't need -@code{yywrap}, we don't need @code{unput} either, and we parse an -actual file, this is not an interactive session with the user. -Finally we enable the scanner tracing features. - -@comment file: calc++-scanner.ll -@example -%option noyywrap nounput batch debug -@end example - -@noindent -Abbreviations allow for more readable rules. - -@comment file: calc++-scanner.ll -@example -id [a-zA-Z][a-zA-Z_0-9]* -int [0-9]+ -blank [ \t] -@end example - -@noindent -The following paragraph suffices to track locations accurately. Each -time @code{yylex} is invoked, the begin position is moved onto the end -position. Then when a pattern is matched, the end position is -advanced of its width. In case it matched ends of lines, the end -cursor is adjusted, and each time blanks are matched, the begin cursor -is moved onto the end cursor to effectively ignore the blanks -preceding tokens. Comments would be treated equally. - -@comment file: calc++-scanner.ll -@example -@group -%@{ -# define YY_USER_ACTION yylloc->columns (yyleng); -%@} -@end group -%% -%@{ - yylloc->step (); -%@} -@{blank@}+ yylloc->step (); -[\n]+ yylloc->lines (yyleng); yylloc->step (); -@end example - -@noindent -The rules are simple, just note the use of the driver to report errors. -It is convenient to use a typedef to shorten -@code{yy::calcxx_parser::token::identifier} into -@code{token::identifier} for instance. - -@comment file: calc++-scanner.ll -@example -%@{ - typedef yy::calcxx_parser::token token; -%@} - /* Convert ints to the actual type of tokens. */ -[-+*/] return yy::calcxx_parser::token_type (yytext[0]); -":=" return token::ASSIGN; -@{int@} @{ - errno = 0; - long n = strtol (yytext, NULL, 10); - if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) - driver.error (*yylloc, "integer is out of range"); - yylval->ival = n; - return token::NUMBER; -@} -@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; -. driver.error (*yylloc, "invalid character"); -%% -@end example - -@noindent -Finally, because the scanner related driver's member function depend -on the scanner's data, it is simpler to implement them in this file. - -@comment file: calc++-scanner.ll -@example -@group -void -calcxx_driver::scan_begin () -@{ - yy_flex_debug = trace_scanning; - if (file.empty () || file == "-") - yyin = stdin; - else if (!(yyin = fopen (file.c_str (), "r"))) - @{ - error ("cannot open " + file + ": " + strerror(errno)); - exit (EXIT_FAILURE); - @} -@} -@end group - -@group -void -calcxx_driver::scan_end () -@{ - fclose (yyin); -@} -@end group -@end example - -@node Calc++ Top Level -@subsubsection Calc++ Top Level - -The top level file, @file{calc++.cc}, poses no problem. - -@comment file: calc++.cc -@example -#include -#include "calc++-driver.hh" - -@group -int -main (int argc, char *argv[]) -@{ - calcxx_driver driver; - for (int i = 1; i < argc; ++i) - if (argv[i] == std::string ("-p")) - driver.trace_parsing = true; - else if (argv[i] == std::string ("-s")) - driver.trace_scanning = true; - else if (!driver.parse (argv[i])) - std::cout << driver.result << std::endl; -@} -@end group -@end example - -@node Java Parsers -@section Java Parsers - -@menu -* Java Bison Interface:: Asking for Java parser generation -* Java Semantic Values:: %type and %token vs. Java -* Java Location Values:: The position and location classes -* Java Parser Interface:: Instantiating and running the parser -* Java Scanner Interface:: Specifying the scanner for the parser -* Java Action Features:: Special features for use in actions -* Java Differences:: Differences between C/C++ and Java Grammars -* Java Declarations Summary:: List of Bison declarations used with Java -@end menu - -@node Java Bison Interface -@subsection Java Bison Interface -@c - %language "Java" - -(The current Java interface is experimental and may evolve. -More user feedback will help to stabilize it.) - -The Java parser skeletons are selected using the @code{%language "Java"} -directive or the @option{-L java}/@option{--language=java} option. - -@c FIXME: Documented bug. -When generating a Java parser, @code{bison @var{basename}.y} will -create a single Java source file named @file{@var{basename}.java} -containing the parser implementation. Using a grammar file without a -@file{.y} suffix is currently broken. The basename of the parser -implementation file can be changed by the @code{%file-prefix} -directive or the @option{-p}/@option{--name-prefix} option. The -entire parser implementation file name can be changed by the -@code{%output} directive or the @option{-o}/@option{--output} option. -The parser implementation file contains a single class for the parser. - -You can create documentation for generated parsers using Javadoc. - -Contrary to C parsers, Java parsers do not use global variables; the -state of the parser is always local to an instance of the parser class. -Therefore, all Java parsers are ``pure'', and the @code{%pure-parser} -and @code{%define api.pure} directives does not do anything when used in -Java. - -Push parsers are currently unsupported in Java and @code{%define -api.push-pull} have no effect. - -GLR parsers are currently unsupported in Java. Do not use the -@code{glr-parser} directive. - -No header file can be generated for Java parsers. Do not use the -@code{%defines} directive or the @option{-d}/@option{--defines} options. - -@c FIXME: Possible code change. -Currently, support for debugging and verbose errors are always compiled -in. Thus the @code{%debug} and @code{%token-table} directives and the -@option{-t}/@option{--debug} and @option{-k}/@option{--token-table} -options have no effect. This may change in the future to eliminate -unused code in the generated parser, so use @code{%debug} and -@code{%verbose-error} explicitly if needed. Also, in the future the -@code{%token-table} directive might enable a public interface to -access the token names and codes. - -@node Java Semantic Values -@subsection Java Semantic Values -@c - No %union, specify type in %type/%token. -@c - YYSTYPE -@c - Printer and destructor - -There is no @code{%union} directive in Java parsers. Instead, the -semantic values' types (class names) should be specified in the -@code{%type} or @code{%token} directive: - -@example -%type expr assignment_expr term factor -%type number -@end example - -By default, the semantic stack is declared to have @code{Object} members, -which means that the class types you specify can be of any class. -To improve the type safety of the parser, you can declare the common -superclass of all the semantic values using the @code{%define stype} -directive. For example, after the following declaration: - -@example -%define stype "ASTNode" -@end example - -@noindent -any @code{%type} or @code{%token} specifying a semantic type which -is not a subclass of ASTNode, will cause a compile-time error. - -@c FIXME: Documented bug. -Types used in the directives may be qualified with a package name. -Primitive data types are accepted for Java version 1.5 or later. Note -that in this case the autoboxing feature of Java 1.5 will be used. -Generic types may not be used; this is due to a limitation in the -implementation of Bison, and may change in future releases. - -Java parsers do not support @code{%destructor}, since the language -adopts garbage collection. The parser will try to hold references -to semantic values for as little time as needed. - -Java parsers do not support @code{%printer}, as @code{toString()} -can be used to print the semantic values. This however may change -(in a backwards-compatible way) in future versions of Bison. - - -@node Java Location Values -@subsection Java Location Values -@c - %locations -@c - class Position -@c - class Location - -When the directive @code{%locations} is used, the Java parser supports -location tracking, see @ref{Tracking Locations}. An auxiliary user-defined -class defines a @dfn{position}, a single point in a file; Bison itself -defines a class representing a @dfn{location}, a range composed of a pair of -positions (possibly spanning several files). The location class is an inner -class of the parser; the name is @code{Location} by default, and may also be -renamed using @code{%define location_type "@var{class-name}"}. - -The location class treats the position as a completely opaque value. -By default, the class name is @code{Position}, but this can be changed -with @code{%define position_type "@var{class-name}"}. This class must -be supplied by the user. - - -@deftypeivar {Location} {Position} begin -@deftypeivarx {Location} {Position} end -The first, inclusive, position of the range, and the first beyond. -@end deftypeivar - -@deftypeop {Constructor} {Location} {} Location (Position @var{loc}) -Create a @code{Location} denoting an empty range located at a given point. -@end deftypeop - -@deftypeop {Constructor} {Location} {} Location (Position @var{begin}, Position @var{end}) -Create a @code{Location} from the endpoints of the range. -@end deftypeop - -@deftypemethod {Location} {String} toString () -Prints the range represented by the location. For this to work -properly, the position class should override the @code{equals} and -@code{toString} methods appropriately. -@end deftypemethod - - -@node Java Parser Interface -@subsection Java Parser Interface -@c - define parser_class_name -@c - Ctor -@c - parse, error, set_debug_level, debug_level, set_debug_stream, -@c debug_stream. -@c - Reporting errors - -The name of the generated parser class defaults to @code{YYParser}. The -@code{YY} prefix may be changed using the @code{%name-prefix} directive -or the @option{-p}/@option{--name-prefix} option. Alternatively, use -@code{%define parser_class_name "@var{name}"} to give a custom name to -the class. The interface of this class is detailed below. - -By default, the parser class has package visibility. A declaration -@code{%define public} will change to public visibility. Remember that, -according to the Java language specification, the name of the @file{.java} -file should match the name of the class in this case. Similarly, you can -use @code{abstract}, @code{final} and @code{strictfp} with the -@code{%define} declaration to add other modifiers to the parser class. - -The Java package name of the parser class can be specified using the -@code{%define package} directive. The superclass and the implemented -interfaces of the parser class can be specified with the @code{%define -extends} and @code{%define implements} directives. - -The parser class defines an inner class, @code{Location}, that is used -for location tracking (see @ref{Java Location Values}), and a inner -interface, @code{Lexer} (see @ref{Java Scanner Interface}). Other than -these inner class/interface, and the members described in the interface -below, all the other members and fields are preceded with a @code{yy} or -@code{YY} prefix to avoid clashes with user code. - -@c FIXME: The following constants and variables are still undocumented: -@c @code{bisonVersion}, @code{bisonSkeleton} and @code{errorVerbose}. - -The parser class can be extended using the @code{%parse-param} -directive. Each occurrence of the directive will add a @code{protected -final} field to the parser class, and an argument to its constructor, -which initialize them automatically. - -Token names defined by @code{%token} and the predefined @code{EOF} token -name are added as constant fields to the parser class. - -@deftypeop {Constructor} {YYParser} {} YYParser (@var{lex_param}, @dots{}, @var{parse_param}, @dots{}) -Build a new parser object with embedded @code{%code lexer}. There are -no parameters, unless @code{%parse-param}s and/or @code{%lex-param}s are -used. -@end deftypeop - -@deftypeop {Constructor} {YYParser} {} YYParser (Lexer @var{lexer}, @var{parse_param}, @dots{}) -Build a new parser object using the specified scanner. There are no -additional parameters unless @code{%parse-param}s are used. - -If the scanner is defined by @code{%code lexer}, this constructor is -declared @code{protected} and is called automatically with a scanner -created with the correct @code{%lex-param}s. -@end deftypeop - -@deftypemethod {YYParser} {boolean} parse () -Run the syntactic analysis, and return @code{true} on success, -@code{false} otherwise. -@end deftypemethod - -@deftypemethod {YYParser} {boolean} recovering () -During the syntactic analysis, return @code{true} if recovering -from a syntax error. -@xref{Error Recovery}. -@end deftypemethod - -@deftypemethod {YYParser} {java.io.PrintStream} getDebugStream () -@deftypemethodx {YYParser} {void} setDebugStream (java.io.printStream @var{o}) -Get or set the stream used for tracing the parsing. It defaults to -@code{System.err}. -@end deftypemethod - -@deftypemethod {YYParser} {int} getDebugLevel () -@deftypemethodx {YYParser} {void} setDebugLevel (int @var{l}) -Get or set the tracing level. Currently its value is either 0, no trace, -or nonzero, full tracing. -@end deftypemethod - - -@node Java Scanner Interface -@subsection Java Scanner Interface -@c - %code lexer -@c - %lex-param -@c - Lexer interface - -There are two possible ways to interface a Bison-generated Java parser -with a scanner: the scanner may be defined by @code{%code lexer}, or -defined elsewhere. In either case, the scanner has to implement the -@code{Lexer} inner interface of the parser class. - -In the first case, the body of the scanner class is placed in -@code{%code lexer} blocks. If you want to pass parameters from the -parser constructor to the scanner constructor, specify them with -@code{%lex-param}; they are passed before @code{%parse-param}s to the -constructor. - -In the second case, the scanner has to implement the @code{Lexer} interface, -which is defined within the parser class (e.g., @code{YYParser.Lexer}). -The constructor of the parser object will then accept an object -implementing the interface; @code{%lex-param} is not used in this -case. - -In both cases, the scanner has to implement the following methods. - -@deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg}) -This method is defined by the user to emit an error message. The first -parameter is omitted if location tracking is not active. Its type can be -changed using @code{%define location_type "@var{class-name}".} -@end deftypemethod - -@deftypemethod {Lexer} {int} yylex () -Return the next token. Its type is the return value, its semantic -value and location are saved and returned by the their methods in the -interface. - -Use @code{%define lex_throws} to specify any uncaught exceptions. -Default is @code{java.io.IOException}. -@end deftypemethod - -@deftypemethod {Lexer} {Position} getStartPos () -@deftypemethodx {Lexer} {Position} getEndPos () -Return respectively the first position of the last token that -@code{yylex} returned, and the first position beyond it. These -methods are not needed unless location tracking is active. - -The return type can be changed using @code{%define position_type -"@var{class-name}".} -@end deftypemethod - -@deftypemethod {Lexer} {Object} getLVal () -Return the semantic value of the last token that yylex returned. - -The return type can be changed using @code{%define stype -"@var{class-name}".} -@end deftypemethod - - -@node Java Action Features -@subsection Special Features for Use in Java Actions - -The following special constructs can be uses in Java actions. -Other analogous C action features are currently unavailable for Java. - -Use @code{%define throws} to specify any uncaught exceptions from parser -actions, and initial actions specified by @code{%initial-action}. - -@defvar $@var{n} -The semantic value for the @var{n}th component of the current rule. -This may not be assigned to. -@xref{Java Semantic Values}. -@end defvar - -@defvar $<@var{typealt}>@var{n} -Like @code{$@var{n}} but specifies a alternative type @var{typealt}. -@xref{Java Semantic Values}. -@end defvar - -@defvar $$ -The semantic value for the grouping made by the current rule. As a -value, this is in the base type (@code{Object} or as specified by -@code{%define stype}) as in not cast to the declared subtype because -casts are not allowed on the left-hand side of Java assignments. -Use an explicit Java cast if the correct subtype is needed. -@xref{Java Semantic Values}. -@end defvar - -@defvar $<@var{typealt}>$ -Same as @code{$$} since Java always allow assigning to the base type. -Perhaps we should use this and @code{$<>$} for the value and @code{$$} -for setting the value but there is currently no easy way to distinguish -these constructs. -@xref{Java Semantic Values}. -@end defvar - -@defvar @@@var{n} -The location information of the @var{n}th component of the current rule. -This may not be assigned to. -@xref{Java Location Values}. -@end defvar - -@defvar @@$ -The location information of the grouping made by the current rule. -@xref{Java Location Values}. -@end defvar - -@deftypefn {Statement} return YYABORT @code{;} -Return immediately from the parser, indicating failure. -@xref{Java Parser Interface}. -@end deftypefn - -@deftypefn {Statement} return YYACCEPT @code{;} -Return immediately from the parser, indicating success. -@xref{Java Parser Interface}. -@end deftypefn - -@deftypefn {Statement} {return} YYERROR @code{;} -Start error recovery (without printing an error message). -@xref{Error Recovery}. -@end deftypefn - -@deftypefn {Function} {boolean} recovering () -Return whether error recovery is being done. In this state, the parser -reads token until it reaches a known state, and then restarts normal -operation. -@xref{Error Recovery}. -@end deftypefn - -@deftypefn {Function} {protected void} yyerror (String msg) -@deftypefnx {Function} {protected void} yyerror (Position pos, String msg) -@deftypefnx {Function} {protected void} yyerror (Location loc, String msg) -Print an error message using the @code{yyerror} method of the scanner -instance in use. -@end deftypefn - - -@node Java Differences -@subsection Differences between C/C++ and Java Grammars - -The different structure of the Java language forces several differences -between C/C++ grammars, and grammars designed for Java parsers. This -section summarizes these differences. - -@itemize -@item -Java lacks a preprocessor, so the @code{YYERROR}, @code{YYACCEPT}, -@code{YYABORT} symbols (@pxref{Table of Symbols}) cannot obviously be -macros. Instead, they should be preceded by @code{return} when they -appear in an action. The actual definition of these symbols is -opaque to the Bison grammar, and it might change in the future. The -only meaningful operation that you can do, is to return them. -@xref{Java Action Features}. - -Note that of these three symbols, only @code{YYACCEPT} and -@code{YYABORT} will cause a return from the @code{yyparse} -method@footnote{Java parsers include the actions in a separate -method than @code{yyparse} in order to have an intuitive syntax that -corresponds to these C macros.}. - -@item -Java lacks unions, so @code{%union} has no effect. Instead, semantic -values have a common base type: @code{Object} or as specified by -@samp{%define stype}. Angle brackets on @code{%token}, @code{type}, -@code{$@var{n}} and @code{$$} specify subtypes rather than fields of -an union. The type of @code{$$}, even with angle brackets, is the base -type since Java casts are not allow on the left-hand side of assignments. -Also, @code{$@var{n}} and @code{@@@var{n}} are not allowed on the -left-hand side of assignments. @xref{Java Semantic Values}, and -@ref{Java Action Features}. - -@item -The prologue declarations have a different meaning than in C/C++ code. -@table @asis -@item @code{%code imports} -blocks are placed at the beginning of the Java source code. They may -include copyright notices. For a @code{package} declarations, it is -suggested to use @code{%define package} instead. - -@item unqualified @code{%code} -blocks are placed inside the parser class. - -@item @code{%code lexer} -blocks, if specified, should include the implementation of the -scanner. If there is no such block, the scanner can be any class -that implements the appropriate interface (@pxref{Java Scanner -Interface}). -@end table - -Other @code{%code} blocks are not supported in Java parsers. -In particular, @code{%@{ @dots{} %@}} blocks should not be used -and may give an error in future versions of Bison. - -The epilogue has the same meaning as in C/C++ code and it can -be used to define other classes used by the parser @emph{outside} -the parser class. -@end itemize - - -@node Java Declarations Summary -@subsection Java Declarations Summary - -This summary only include declarations specific to Java or have special -meaning when used in a Java parser. - -@deffn {Directive} {%language "Java"} -Generate a Java class for the parser. -@end deffn - -@deffn {Directive} %lex-param @{@var{type} @var{name}@} -A parameter for the lexer class defined by @code{%code lexer} -@emph{only}, added as parameters to the lexer constructor and the parser -constructor that @emph{creates} a lexer. Default is none. -@xref{Java Scanner Interface}. -@end deffn - -@deffn {Directive} %name-prefix "@var{prefix}" -The prefix of the parser class name @code{@var{prefix}Parser} if -@code{%define parser_class_name} is not used. Default is @code{YY}. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} %parse-param @{@var{type} @var{name}@} -A parameter for the parser class added as parameters to constructor(s) -and as fields initialized by the constructor(s). Default is none. -@xref{Java Parser Interface}. -@end deffn - -@deffn {Directive} %token <@var{type}> @var{token} @dots{} -Declare tokens. Note that the angle brackets enclose a Java @emph{type}. -@xref{Java Semantic Values}. -@end deffn - -@deffn {Directive} %type <@var{type}> @var{nonterminal} @dots{} -Declare the type of nonterminals. Note that the angle brackets enclose -a Java @emph{type}. -@xref{Java Semantic Values}. -@end deffn - -@deffn {Directive} %code @{ @var{code} @dots{} @} -Code appended to the inside of the parser class. -@xref{Java Differences}. -@end deffn - -@deffn {Directive} {%code imports} @{ @var{code} @dots{} @} -Code inserted just after the @code{package} declaration. -@xref{Java Differences}. -@end deffn - -@deffn {Directive} {%code lexer} @{ @var{code} @dots{} @} -Code added to the body of a inner lexer class within the parser class. -@xref{Java Scanner Interface}. -@end deffn - -@deffn {Directive} %% @var{code} @dots{} -Code (after the second @code{%%}) appended to the end of the file, -@emph{outside} the parser class. -@xref{Java Differences}. -@end deffn - -@deffn {Directive} %@{ @var{code} @dots{} %@} -Not supported. Use @code{%code import} instead. -@xref{Java Differences}. -@end deffn - -@deffn {Directive} {%define abstract} -Whether the parser class is declared @code{abstract}. Default is false. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define extends} "@var{superclass}" -The superclass of the parser class. Default is none. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define final} -Whether the parser class is declared @code{final}. Default is false. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define implements} "@var{interfaces}" -The implemented interfaces of the parser class, a comma-separated list. -Default is none. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define lex_throws} "@var{exceptions}" -The exceptions thrown by the @code{yylex} method of the lexer, a -comma-separated list. Default is @code{java.io.IOException}. -@xref{Java Scanner Interface}. -@end deffn - -@deffn {Directive} {%define location_type} "@var{class}" -The name of the class used for locations (a range between two -positions). This class is generated as an inner class of the parser -class by @command{bison}. Default is @code{Location}. -@xref{Java Location Values}. -@end deffn - -@deffn {Directive} {%define package} "@var{package}" -The package to put the parser class in. Default is none. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define parser_class_name} "@var{name}" -The name of the parser class. Default is @code{YYParser} or -@code{@var{name-prefix}Parser}. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define position_type} "@var{class}" -The name of the class used for positions. This class must be supplied by -the user. Default is @code{Position}. -@xref{Java Location Values}. -@end deffn - -@deffn {Directive} {%define public} -Whether the parser class is declared @code{public}. Default is false. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define stype} "@var{class}" -The base type of semantic values. Default is @code{Object}. -@xref{Java Semantic Values}. -@end deffn - -@deffn {Directive} {%define strictfp} -Whether the parser class is declared @code{strictfp}. Default is false. -@xref{Java Bison Interface}. -@end deffn - -@deffn {Directive} {%define throws} "@var{exceptions}" -The exceptions thrown by user-supplied parser actions and -@code{%initial-action}, a comma-separated list. Default is none. -@xref{Java Parser Interface}. -@end deffn - - -@c ================================================= FAQ - -@node FAQ -@chapter Frequently Asked Questions -@cindex frequently asked questions -@cindex questions - -Several questions about Bison come up occasionally. Here some of them -are addressed. - -@menu -* Memory Exhausted:: Breaking the Stack Limits -* How Can I Reset the Parser:: @code{yyparse} Keeps some State -* Strings are Destroyed:: @code{yylval} Loses Track of Strings -* Implementing Gotos/Loops:: Control Flow in the Calculator -* Multiple start-symbols:: Factoring closely related grammars -* Secure? Conform?:: Is Bison POSIX safe? -* I can't build Bison:: Troubleshooting -* Where can I find help?:: Troubleshouting -* Bug Reports:: Troublereporting -* More Languages:: Parsers in C++, Java, and so on -* Beta Testing:: Experimenting development versions -* Mailing Lists:: Meeting other Bison users -@end menu - -@node Memory Exhausted -@section Memory Exhausted - -@quotation -My parser returns with error with a @samp{memory exhausted} -message. What can I do? -@end quotation - -This question is already addressed elsewhere, see @ref{Recursion, ,Recursive -Rules}. - -@node How Can I Reset the Parser -@section How Can I Reset the Parser - -The following phenomenon has several symptoms, resulting in the -following typical questions: - -@quotation -I invoke @code{yyparse} several times, and on correct input it works -properly; but when a parse error is found, all the other calls fail -too. How can I reset the error flag of @code{yyparse}? -@end quotation - -@noindent -or - -@quotation -My parser includes support for an @samp{#include}-like feature, in -which case I run @code{yyparse} from @code{yyparse}. This fails -although I did specify @samp{%define api.pure}. -@end quotation - -These problems typically come not from Bison itself, but from -Lex-generated scanners. Because these scanners use large buffers for -speed, they might not notice a change of input file. As a -demonstration, consider the following source file, -@file{first-line.l}: - -@example -@group -%@{ -#include -#include -%@} -@end group -%% -.*\n ECHO; return 1; -%% -@group -int -yyparse (char const *file) -@{ - yyin = fopen (file, "r"); - if (!yyin) - @{ - perror ("fopen"); - exit (EXIT_FAILURE); - @} -@end group -@group - /* One token only. */ - yylex (); - if (fclose (yyin) != 0) - @{ - perror ("fclose"); - exit (EXIT_FAILURE); - @} - return 0; -@} -@end group - -@group -int -main (void) -@{ - yyparse ("input"); - yyparse ("input"); - return 0; -@} -@end group -@end example - -@noindent -If the file @file{input} contains - -@example -input:1: Hello, -input:2: World! -@end example - -@noindent -then instead of getting the first line twice, you get: - -@example -$ @kbd{flex -ofirst-line.c first-line.l} -$ @kbd{gcc -ofirst-line first-line.c -ll} -$ @kbd{./first-line} -input:1: Hello, -input:2: World! -@end example - -Therefore, whenever you change @code{yyin}, you must tell the -Lex-generated scanner to discard its current buffer and switch to the -new one. This depends upon your implementation of Lex; see its -documentation for more. For Flex, it suffices to call -@samp{YY_FLUSH_BUFFER} after each change to @code{yyin}. If your -Flex-generated scanner needs to read from several input streams to -handle features like include files, you might consider using Flex -functions like @samp{yy_switch_to_buffer} that manipulate multiple -input buffers. - -If your Flex-generated scanner uses start conditions (@pxref{Start -conditions, , Start conditions, flex, The Flex Manual}), you might -also want to reset the scanner's state, i.e., go back to the initial -start condition, through a call to @samp{BEGIN (0)}. - -@node Strings are Destroyed -@section Strings are Destroyed - -@quotation -My parser seems to destroy old strings, or maybe it loses track of -them. Instead of reporting @samp{"foo", "bar"}, it reports -@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}. -@end quotation - -This error is probably the single most frequent ``bug report'' sent to -Bison lists, but is only concerned with a misunderstanding of the role -of the scanner. Consider the following Lex code: - -@example -@group -%@{ -#include -char *yylval = NULL; -%@} -@end group -@group -%% -.* yylval = yytext; return 1; -\n /* IGNORE */ -%% -@end group -@group -int -main () -@{ - /* Similar to using $1, $2 in a Bison action. */ - char *fst = (yylex (), yylval); - char *snd = (yylex (), yylval); - printf ("\"%s\", \"%s\"\n", fst, snd); - return 0; -@} -@end group -@end example - -If you compile and run this code, you get: - -@example -$ @kbd{flex -osplit-lines.c split-lines.l} -$ @kbd{gcc -osplit-lines split-lines.c -ll} -$ @kbd{printf 'one\ntwo\n' | ./split-lines} -"one -two", "two" -@end example - -@noindent -this is because @code{yytext} is a buffer provided for @emph{reading} -in the action, but if you want to keep it, you have to duplicate it -(e.g., using @code{strdup}). Note that the output may depend on how -your implementation of Lex handles @code{yytext}. For instance, when -given the Lex compatibility option @option{-l} (which triggers the -option @samp{%array}) Flex generates a different behavior: - -@example -$ @kbd{flex -l -osplit-lines.c split-lines.l} -$ @kbd{gcc -osplit-lines split-lines.c -ll} -$ @kbd{printf 'one\ntwo\n' | ./split-lines} -"two", "two" -@end example - - -@node Implementing Gotos/Loops -@section Implementing Gotos/Loops - -@quotation -My simple calculator supports variables, assignments, and functions, -but how can I implement gotos, or loops? -@end quotation - -Although very pedagogical, the examples included in the document blur -the distinction to make between the parser---whose job is to recover -the structure of a text and to transmit it to subsequent modules of -the program---and the processing (such as the execution) of this -structure. This works well with so called straight line programs, -i.e., precisely those that have a straightforward execution model: -execute simple instructions one after the others. - -@cindex abstract syntax tree -@cindex AST -If you want a richer model, you will probably need to use the parser -to construct a tree that does represent the structure it has -recovered; this tree is usually called the @dfn{abstract syntax tree}, -or @dfn{AST} for short. Then, walking through this tree, -traversing it in various ways, will enable treatments such as its -execution or its translation, which will result in an interpreter or a -compiler. - -This topic is way beyond the scope of this manual, and the reader is -invited to consult the dedicated literature. - - -@node Multiple start-symbols -@section Multiple start-symbols - -@quotation -I have several closely related grammars, and I would like to share their -implementations. In fact, I could use a single grammar but with -multiple entry points. -@end quotation - -Bison does not support multiple start-symbols, but there is a very -simple means to simulate them. If @code{foo} and @code{bar} are the two -pseudo start-symbols, then introduce two new tokens, say -@code{START_FOO} and @code{START_BAR}, and use them as switches from the -real start-symbol: - -@example -%token START_FOO START_BAR; -%start start; -start: - START_FOO foo -| START_BAR bar; -@end example - -These tokens prevents the introduction of new conflicts. As far as the -parser goes, that is all that is needed. - -Now the difficult part is ensuring that the scanner will send these -tokens first. If your scanner is hand-written, that should be -straightforward. If your scanner is generated by Lex, them there is -simple means to do it: recall that anything between @samp{%@{ ... %@}} -after the first @code{%%} is copied verbatim in the top of the generated -@code{yylex} function. Make sure a variable @code{start_token} is -available in the scanner (e.g., a global variable or using -@code{%lex-param} etc.), and use the following: - -@example - /* @r{Prologue.} */ -%% -%@{ - if (start_token) - @{ - int t = start_token; - start_token = 0; - return t; - @} -%@} - /* @r{The rules.} */ -@end example - - -@node Secure? Conform? -@section Secure? Conform? - -@quotation -Is Bison secure? Does it conform to POSIX? -@end quotation - -If you're looking for a guarantee or certification, we don't provide it. -However, Bison is intended to be a reliable program that conforms to the -POSIX specification for Yacc. If you run into problems, -please send us a bug report. - -@node I can't build Bison -@section I can't build Bison - -@quotation -I can't build Bison because @command{make} complains that -@code{msgfmt} is not found. -What should I do? -@end quotation - -Like most GNU packages with internationalization support, that feature -is turned on by default. If you have problems building in the @file{po} -subdirectory, it indicates that your system's internationalization -support is lacking. You can re-configure Bison with -@option{--disable-nls} to turn off this support, or you can install GNU -gettext from @url{ftp://ftp.gnu.org/gnu/gettext/} and re-configure -Bison. See the file @file{ABOUT-NLS} for more information. - - -@node Where can I find help? -@section Where can I find help? - -@quotation -I'm having trouble using Bison. Where can I find help? -@end quotation - -First, read this fine manual. Beyond that, you can send mail to -@email{help-bison@@gnu.org}. This mailing list is intended to be -populated with people who are willing to answer questions about using -and installing Bison. Please keep in mind that (most of) the people on -the list have aspects of their lives which are not related to Bison (!), -so you may not receive an answer to your question right away. This can -be frustrating, but please try not to honk them off; remember that any -help they provide is purely voluntary and out of the kindness of their -hearts. - -@node Bug Reports -@section Bug Reports - -@quotation -I found a bug. What should I include in the bug report? -@end quotation - -Before you send a bug report, make sure you are using the latest -version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its -mirrors. Be sure to include the version number in your bug report. If -the bug is present in the latest version but not in a previous version, -try to determine the most recent version which did not contain the bug. - -If the bug is parser-related, you should include the smallest grammar -you can which demonstrates the bug. The grammar file should also be -complete (i.e., I should be able to run it through Bison without having -to edit or add anything). The smaller and simpler the grammar, the -easier it will be to fix the bug. - -Include information about your compilation environment, including your -operating system's name and version and your compiler's name and -version. If you have trouble compiling, you should also include a -transcript of the build session, starting with the invocation of -`configure'. Depending on the nature of the bug, you may be asked to -send additional files as well (such as `config.h' or `config.cache'). - -Patches are most welcome, but not required. That is, do not hesitate to -send a bug report just because you cannot provide a fix. - -Send bug reports to @email{bug-bison@@gnu.org}. - -@node More Languages -@section More Languages - -@quotation -Will Bison ever have C++ and Java support? How about @var{insert your -favorite language here}? -@end quotation - -C++ and Java support is there now, and is documented. We'd love to add other -languages; contributions are welcome. - -@node Beta Testing -@section Beta Testing - -@quotation -What is involved in being a beta tester? -@end quotation - -It's not terribly involved. Basically, you would download a test -release, compile it, and use it to build and run a parser or two. After -that, you would submit either a bug report or a message saying that -everything is okay. It is important to report successes as well as -failures because test releases eventually become mainstream releases, -but only if they are adequately tested. If no one tests, development is -essentially halted. - -Beta testers are particularly needed for operating systems to which the -developers do not have easy access. They currently have easy access to -recent GNU/Linux and Solaris versions. Reports about other operating -systems are especially welcome. - -@node Mailing Lists -@section Mailing Lists - -@quotation -How do I join the help-bison and bug-bison mailing lists? -@end quotation - -See @url{http://lists.gnu.org/}. - -@c ================================================= Table of Symbols - -@node Table of Symbols -@appendix Bison Symbols -@cindex Bison symbols, table of -@cindex symbols in Bison, table of - -@deffn {Variable} @@$ -In an action, the location of the left-hand side of the rule. -@xref{Tracking Locations}. -@end deffn - -@deffn {Variable} @@@var{n} -In an action, the location of the @var{n}-th symbol of the right-hand side -of the rule. @xref{Tracking Locations}. -@end deffn - -@deffn {Variable} @@@var{name} -In an action, the location of a symbol addressed by name. @xref{Tracking -Locations}. -@end deffn - -@deffn {Variable} @@[@var{name}] -In an action, the location of a symbol addressed by name. @xref{Tracking -Locations}. -@end deffn - -@deffn {Variable} $$ -In an action, the semantic value of the left-hand side of the rule. -@xref{Actions}. -@end deffn - -@deffn {Variable} $@var{n} -In an action, the semantic value of the @var{n}-th symbol of the -right-hand side of the rule. @xref{Actions}. -@end deffn - -@deffn {Variable} $@var{name} -In an action, the semantic value of a symbol addressed by name. -@xref{Actions}. -@end deffn - -@deffn {Variable} $[@var{name}] -In an action, the semantic value of a symbol addressed by name. -@xref{Actions}. -@end deffn - -@deffn {Delimiter} %% -Delimiter used to separate the grammar rule section from the -Bison declarations section or the epilogue. -@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}. -@end deffn - -@c Don't insert spaces, or check the DVI output. -@deffn {Delimiter} %@{@var{code}%@} -All code listed between @samp{%@{} and @samp{%@}} is copied verbatim -to the parser implementation file. Such code forms the prologue of -the grammar file. @xref{Grammar Outline, ,Outline of a Bison -Grammar}. -@end deffn - -@deffn {Construct} /*@dots{}*/ -Comment delimiters, as in C. -@end deffn - -@deffn {Delimiter} : -Separates a rule's result from its components. @xref{Rules, ,Syntax of -Grammar Rules}. -@end deffn - -@deffn {Delimiter} ; -Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}. -@end deffn - -@deffn {Delimiter} | -Separates alternate rules for the same result nonterminal. -@xref{Rules, ,Syntax of Grammar Rules}. -@end deffn - -@deffn {Directive} <*> -Used to define a default tagged @code{%destructor} or default tagged -@code{%printer}. - -This feature is experimental. -More user feedback will help to determine whether it should become a permanent -feature. - -@xref{Destructor Decl, , Freeing Discarded Symbols}. -@end deffn - -@deffn {Directive} <> -Used to define a default tagless @code{%destructor} or default tagless -@code{%printer}. - -This feature is experimental. -More user feedback will help to determine whether it should become a permanent -feature. - -@xref{Destructor Decl, , Freeing Discarded Symbols}. -@end deffn - -@deffn {Symbol} $accept -The predefined nonterminal whose only rule is @samp{$accept: @var{start} -$end}, where @var{start} is the start symbol. @xref{Start Decl, , The -Start-Symbol}. It cannot be used in the grammar. -@end deffn - -@deffn {Directive} %code @{@var{code}@} -@deffnx {Directive} %code @var{qualifier} @{@var{code}@} -Insert @var{code} verbatim into the output parser source at the -default location or at the location specified by @var{qualifier}. -@xref{%code Summary}. -@end deffn - -@deffn {Directive} %debug -Equip the parser for debugging. @xref{Decl Summary}. -@end deffn - -@ifset defaultprec -@deffn {Directive} %default-prec -Assign a precedence to rules that lack an explicit @samp{%prec} -modifier. @xref{Contextual Precedence, ,Context-Dependent -Precedence}. -@end deffn -@end ifset - -@deffn {Directive} %define @var{variable} -@deffnx {Directive} %define @var{variable} @var{value} -@deffnx {Directive} %define @var{variable} "@var{value}" -Define a variable to adjust Bison's behavior. @xref{%define Summary}. -@end deffn - -@deffn {Directive} %defines -Bison declaration to create a parser header file, which is usually -meant for the scanner. @xref{Decl Summary}. -@end deffn - -@deffn {Directive} %defines @var{defines-file} -Same as above, but save in the file @var{defines-file}. -@xref{Decl Summary}. -@end deffn - -@deffn {Directive} %destructor -Specify how the parser should reclaim the memory associated to -discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}. -@end deffn - -@deffn {Directive} %dprec -Bison declaration to assign a precedence to a rule that is used at parse -time to resolve reduce/reduce conflicts. @xref{GLR Parsers, ,Writing -GLR Parsers}. -@end deffn - -@deffn {Symbol} $end -The predefined token marking the end of the token stream. It cannot be -used in the grammar. -@end deffn - -@deffn {Symbol} error -A token name reserved for error recovery. This token may be used in -grammar rules so as to allow the Bison parser to recognize an error in -the grammar without halting the process. In effect, a sentence -containing an error may be recognized as valid. On a syntax error, the -token @code{error} becomes the current lookahead token. Actions -corresponding to @code{error} are then executed, and the lookahead -token is reset to the token that originally caused the violation. -@xref{Error Recovery}. -@end deffn - -@deffn {Directive} %error-verbose -Bison declaration to request verbose, specific error message strings -when @code{yyerror} is called. @xref{Error Reporting}. -@end deffn - -@deffn {Directive} %file-prefix "@var{prefix}" -Bison declaration to set the prefix of the output files. @xref{Decl -Summary}. -@end deffn - -@deffn {Directive} %glr-parser -Bison declaration to produce a GLR parser. @xref{GLR -Parsers, ,Writing GLR Parsers}. -@end deffn - -@deffn {Directive} %initial-action -Run user code before parsing. @xref{Initial Action Decl, , Performing Actions before Parsing}. -@end deffn - -@deffn {Directive} %language -Specify the programming language for the generated parser. -@xref{Decl Summary}. -@end deffn - -@deffn {Directive} %left -Bison declaration to assign left associativity to token(s). -@xref{Precedence Decl, ,Operator Precedence}. -@end deffn - -@deffn {Directive} %lex-param @{@var{argument-declaration}@} -Bison declaration to specifying an additional parameter that -@code{yylex} should accept. @xref{Pure Calling,, Calling Conventions -for Pure Parsers}. -@end deffn - -@deffn {Directive} %merge -Bison declaration to assign a merging function to a rule. If there is a -reduce/reduce conflict with a rule having the same merging function, the -function is applied to the two semantic values to get a single result. -@xref{GLR Parsers, ,Writing GLR Parsers}. -@end deffn - -@deffn {Directive} %name-prefix "@var{prefix}" -Bison declaration to rename the external symbols. @xref{Decl Summary}. -@end deffn - -@ifset defaultprec -@deffn {Directive} %no-default-prec -Do not assign a precedence to rules that lack an explicit @samp{%prec} -modifier. @xref{Contextual Precedence, ,Context-Dependent -Precedence}. -@end deffn -@end ifset - -@deffn {Directive} %no-lines -Bison declaration to avoid generating @code{#line} directives in the -parser implementation file. @xref{Decl Summary}. -@end deffn - -@deffn {Directive} %nonassoc -Bison declaration to assign nonassociativity to token(s). -@xref{Precedence Decl, ,Operator Precedence}. -@end deffn - -@deffn {Directive} %output "@var{file}" -Bison declaration to set the name of the parser implementation file. -@xref{Decl Summary}. -@end deffn - -@deffn {Directive} %parse-param @{@var{argument-declaration}@} -Bison declaration to specifying an additional parameter that -@code{yyparse} should accept. @xref{Parser Function,, The Parser -Function @code{yyparse}}. -@end deffn - -@deffn {Directive} %prec -Bison declaration to assign a precedence to a specific rule. -@xref{Contextual Precedence, ,Context-Dependent Precedence}. -@end deffn - -@deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{%define -Summary,,api.pure}), for which Bison is more careful to warn about -unreasonable usage. -@end deffn - -@deffn {Directive} %require "@var{version}" -Require version @var{version} or higher of Bison. @xref{Require Decl, , -Require a Version of Bison}. -@end deffn - -@deffn {Directive} %right -Bison declaration to assign right associativity to token(s). -@xref{Precedence Decl, ,Operator Precedence}. -@end deffn - -@deffn {Directive} %skeleton -Specify the skeleton to use; usually for development. -@xref{Decl Summary}. -@end deffn - -@deffn {Directive} %start -Bison declaration to specify the start symbol. @xref{Start Decl, ,The -Start-Symbol}. -@end deffn - -@deffn {Directive} %token -Bison declaration to declare token(s) without specifying precedence. -@xref{Token Decl, ,Token Type Names}. -@end deffn - -@deffn {Directive} %token-table -Bison declaration to include a token name table in the parser -implementation file. @xref{Decl Summary}. -@end deffn - -@deffn {Directive} %type -Bison declaration to declare nonterminals. @xref{Type Decl, -,Nonterminal Symbols}. -@end deffn - -@deffn {Symbol} $undefined -The predefined token onto which all undefined values returned by -@code{yylex} are mapped. It cannot be used in the grammar, rather, use -@code{error}. -@end deffn - -@deffn {Directive} %union -Bison declaration to specify several possible data types for semantic -values. @xref{Union Decl, ,The Collection of Value Types}. -@end deffn - -@deffn {Macro} YYABORT -Macro to pretend that an unrecoverable syntax error has occurred, by -making @code{yyparse} return 1 immediately. The error reporting -function @code{yyerror} is not called. @xref{Parser Function, ,The -Parser Function @code{yyparse}}. - -For Java parsers, this functionality is invoked using @code{return YYABORT;} -instead. -@end deffn - -@deffn {Macro} YYACCEPT -Macro to pretend that a complete utterance of the language has been -read, by making @code{yyparse} return 0 immediately. -@xref{Parser Function, ,The Parser Function @code{yyparse}}. - -For Java parsers, this functionality is invoked using @code{return YYACCEPT;} -instead. -@end deffn - -@deffn {Macro} YYBACKUP -Macro to discard a value from the parser stack and fake a lookahead -token. @xref{Action Features, ,Special Features for Use in Actions}. -@end deffn - -@deffn {Variable} yychar -External integer variable that contains the integer value of the -lookahead token. (In a pure parser, it is a local variable within -@code{yyparse}.) Error-recovery rule actions may examine this variable. -@xref{Action Features, ,Special Features for Use in Actions}. -@end deffn - -@deffn {Variable} yyclearin -Macro used in error-recovery rule actions. It clears the previous -lookahead token. @xref{Error Recovery}. -@end deffn - -@deffn {Macro} YYDEBUG -Macro to define to equip the parser with tracing code. @xref{Tracing, -,Tracing Your Parser}. -@end deffn - -@deffn {Variable} yydebug -External integer variable set to zero by default. If @code{yydebug} -is given a nonzero value, the parser will output information on input -symbols and parser action. @xref{Tracing, ,Tracing Your Parser}. -@end deffn - -@deffn {Macro} yyerrok -Macro to cause parser to recover immediately to its normal mode -after a syntax error. @xref{Error Recovery}. -@end deffn - -@deffn {Macro} YYERROR -Cause an immediate syntax error. This statement initiates error -recovery just as if the parser itself had detected an error; however, it -does not call @code{yyerror}, and does not print any message. If you -want to print an error message, call @code{yyerror} explicitly before -the @samp{YYERROR;} statement. @xref{Error Recovery}. - -For Java parsers, this functionality is invoked using @code{return YYERROR;} -instead. -@end deffn - -@deffn {Function} yyerror -User-supplied function to be called by @code{yyparse} on error. -@xref{Error Reporting, ,The Error -Reporting Function @code{yyerror}}. -@end deffn - -@deffn {Macro} YYERROR_VERBOSE -An obsolete macro that you define with @code{#define} in the prologue -to request verbose, specific error message strings -when @code{yyerror} is called. It doesn't matter what definition you -use for @code{YYERROR_VERBOSE}, just whether you define it. Using -@code{%error-verbose} is preferred. @xref{Error Reporting}. -@end deffn - -@deffn {Macro} YYFPRINTF -Macro used to output run-time traces. -@xref{Enabling Traces}. -@end deffn - -@deffn {Macro} YYINITDEPTH -Macro for specifying the initial size of the parser stack. -@xref{Memory Management}. -@end deffn - -@deffn {Function} yylex -User-supplied lexical analyzer function, called with no arguments to get -the next token. @xref{Lexical, ,The Lexical Analyzer Function -@code{yylex}}. -@end deffn - -@deffn {Macro} YYLEX_PARAM -An obsolete macro for specifying an extra argument (or list of extra -arguments) for @code{yyparse} to pass to @code{yylex}. The use of this -macro is deprecated, and is supported only for Yacc like parsers. -@xref{Pure Calling,, Calling Conventions for Pure Parsers}. -@end deffn - -@deffn {Variable} yylloc -External variable in which @code{yylex} should place the line and column -numbers associated with a token. (In a pure parser, it is a local -variable within @code{yyparse}, and its address is passed to -@code{yylex}.) -You can ignore this variable if you don't use the @samp{@@} feature in the -grammar actions. -@xref{Token Locations, ,Textual Locations of Tokens}. -In semantic actions, it stores the location of the lookahead token. -@xref{Actions and Locations, ,Actions and Locations}. -@end deffn - -@deffn {Type} YYLTYPE -Data type of @code{yylloc}; by default, a structure with four -members. @xref{Location Type, , Data Types of Locations}. -@end deffn - -@deffn {Variable} yylval -External variable in which @code{yylex} should place the semantic -value associated with a token. (In a pure parser, it is a local -variable within @code{yyparse}, and its address is passed to -@code{yylex}.) -@xref{Token Values, ,Semantic Values of Tokens}. -In semantic actions, it stores the semantic value of the lookahead token. -@xref{Actions, ,Actions}. -@end deffn - -@deffn {Macro} YYMAXDEPTH -Macro for specifying the maximum size of the parser stack. @xref{Memory -Management}. -@end deffn - -@deffn {Variable} yynerrs -Global variable which Bison increments each time it reports a syntax error. -(In a pure parser, it is a local variable within @code{yyparse}. In a -pure push parser, it is a member of yypstate.) -@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. -@end deffn - -@deffn {Function} yyparse -The parser function produced by Bison; call this function to start -parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}. -@end deffn - -@deffn {Macro} YYPRINT -Macro used to output token semantic values. For @file{yacc.c} only. -Obsoleted by @code{%printer}. -@xref{The YYPRINT Macro, , The @code{YYPRINT} Macro}. -@end deffn - -@deffn {Function} yypstate_delete -The function to delete a parser instance, produced by Bison in push mode; -call this function to delete the memory associated with a parser. -@xref{Parser Delete Function, ,The Parser Delete Function -@code{yypstate_delete}}. -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) -@end deffn - -@deffn {Function} yypstate_new -The function to create a parser instance, produced by Bison in push mode; -call this function to create a new parser. -@xref{Parser Create Function, ,The Parser Create Function -@code{yypstate_new}}. -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) -@end deffn - -@deffn {Function} yypull_parse -The parser function produced by Bison in push mode; call this function to -parse the rest of the input stream. -@xref{Pull Parser Function, ,The Pull Parser Function -@code{yypull_parse}}. -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) -@end deffn - -@deffn {Function} yypush_parse -The parser function produced by Bison in push mode; call this function to -parse a single token. @xref{Push Parser Function, ,The Push Parser Function -@code{yypush_parse}}. -(The current push parsing interface is experimental and may evolve. -More user feedback will help to stabilize it.) -@end deffn - -@deffn {Macro} YYPARSE_PARAM -An obsolete macro for specifying the name of a parameter that -@code{yyparse} should accept. The use of this macro is deprecated, and -is supported only for Yacc like parsers. @xref{Pure Calling,, Calling -Conventions for Pure Parsers}. -@end deffn - -@deffn {Macro} YYRECOVERING -The expression @code{YYRECOVERING ()} yields 1 when the parser -is recovering from a syntax error, and 0 otherwise. -@xref{Action Features, ,Special Features for Use in Actions}. -@end deffn - -@deffn {Macro} YYSTACK_USE_ALLOCA -Macro used to control the use of @code{alloca} when the -deterministic parser in C needs to extend its stacks. If defined to 0, -the parser will use @code{malloc} to extend its stacks. If defined to -1, the parser will use @code{alloca}. Values other than 0 and 1 are -reserved for future Bison extensions. If not defined, -@code{YYSTACK_USE_ALLOCA} defaults to 0. - -In the all-too-common case where your code may run on a host with a -limited stack and with unreliable stack-overflow checking, you should -set @code{YYMAXDEPTH} to a value that cannot possibly result in -unchecked stack overflow on any of your target hosts when -@code{alloca} is called. You can inspect the code that Bison -generates in order to determine the proper numeric values. This will -require some expertise in low-level implementation details. -@end deffn - -@deffn {Type} YYSTYPE -Data type of semantic values; @code{int} by default. -@xref{Value Type, ,Data Types of Semantic Values}. -@end deffn - -@node Glossary -@appendix Glossary -@cindex glossary - -@table @asis -@item Accepting state -A state whose only action is the accept action. -The accepting state is thus a consistent state. -@xref{Understanding,,}. - -@item Backus-Naur Form (BNF; also called ``Backus Normal Form'') -Formal method of specifying context-free grammars originally proposed -by John Backus, and slightly improved by Peter Naur in his 1960-01-02 -committee document contributing to what became the Algol 60 report. -@xref{Language and Grammar, ,Languages and Context-Free Grammars}. - -@item Consistent state -A state containing only one possible action. @xref{Default Reductions}. - -@item Context-free grammars -Grammars specified as rules that can be applied regardless of context. -Thus, if there is a rule which says that an integer can be used as an -expression, integers are allowed @emph{anywhere} an expression is -permitted. @xref{Language and Grammar, ,Languages and Context-Free -Grammars}. - -@item Default reduction -The reduction that a parser should perform if the current parser state -contains no other action for the lookahead token. In permitted parser -states, Bison declares the reduction with the largest lookahead set to be -the default reduction and removes that lookahead set. @xref{Default -Reductions}. - -@item Defaulted state -A consistent state with a default reduction. @xref{Default Reductions}. - -@item Dynamic allocation -Allocation of memory that occurs during execution, rather than at -compile time or on entry to a function. - -@item Empty string -Analogous to the empty set in set theory, the empty string is a -character string of length zero. - -@item Finite-state stack machine -A ``machine'' that has discrete states in which it is said to exist at -each instant in time. As input to the machine is processed, the -machine moves from state to state as specified by the logic of the -machine. In the case of the parser, the input is the language being -parsed, and the states correspond to various stages in the grammar -rules. @xref{Algorithm, ,The Bison Parser Algorithm}. - -@item Generalized LR (GLR) -A parsing algorithm that can handle all context-free grammars, including those -that are not LR(1). It resolves situations that Bison's -deterministic parsing -algorithm cannot by effectively splitting off multiple parsers, trying all -possible parsers, and discarding those that fail in the light of additional -right context. @xref{Generalized LR Parsing, ,Generalized -LR Parsing}. - -@item Grouping -A language construct that is (in general) grammatically divisible; -for example, `expression' or `declaration' in C@. -@xref{Language and Grammar, ,Languages and Context-Free Grammars}. - -@item IELR(1) (Inadequacy Elimination LR(1)) -A minimal LR(1) parser table construction algorithm. That is, given any -context-free grammar, IELR(1) generates parser tables with the full -language-recognition power of canonical LR(1) but with nearly the same -number of parser states as LALR(1). This reduction in parser states is -often an order of magnitude. More importantly, because canonical LR(1)'s -extra parser states may contain duplicate conflicts in the case of non-LR(1) -grammars, the number of conflicts for IELR(1) is often an order of magnitude -less as well. This can significantly reduce the complexity of developing a -grammar. @xref{LR Table Construction}. - -@item Infix operator -An arithmetic operator that is placed between the operands on which it -performs some operation. - -@item Input stream -A continuous flow of data between devices or programs. - -@item LAC (Lookahead Correction) -A parsing mechanism that fixes the problem of delayed syntax error -detection, which is caused by LR state merging, default reductions, and the -use of @code{%nonassoc}. Delayed syntax error detection results in -unexpected semantic actions, initiation of error recovery in the wrong -syntactic context, and an incorrect list of expected tokens in a verbose -syntax error message. @xref{LAC}. - -@item Language construct -One of the typical usage schemas of the language. For example, one of -the constructs of the C language is the @code{if} statement. -@xref{Language and Grammar, ,Languages and Context-Free Grammars}. - -@item Left associativity -Operators having left associativity are analyzed from left to right: -@samp{a+b+c} first computes @samp{a+b} and then combines with -@samp{c}. @xref{Precedence, ,Operator Precedence}. - -@item Left recursion -A rule whose result symbol is also its first component symbol; for -example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive -Rules}. - -@item Left-to-right parsing -Parsing a sentence of a language by analyzing it token by token from -left to right. @xref{Algorithm, ,The Bison Parser Algorithm}. - -@item Lexical analyzer (scanner) -A function that reads an input stream and returns tokens one by one. -@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}. - -@item Lexical tie-in -A flag, set by actions in the grammar rules, which alters the way -tokens are parsed. @xref{Lexical Tie-ins}. - -@item Literal string token -A token which consists of two or more fixed characters. @xref{Symbols}. - -@item Lookahead token -A token already read but not yet shifted. @xref{Lookahead, ,Lookahead -Tokens}. - -@item LALR(1) -The class of context-free grammars that Bison (like most other parser -generators) can handle by default; a subset of LR(1). -@xref{Mysterious Conflicts}. - -@item LR(1) -The class of context-free grammars in which at most one token of -lookahead is needed to disambiguate the parsing of any piece of input. - -@item Nonterminal symbol -A grammar symbol standing for a grammatical construct that can -be expressed through rules in terms of smaller constructs; in other -words, a construct that is not a token. @xref{Symbols}. - -@item Parser -A function that recognizes valid sentences of a language by analyzing -the syntax structure of a set of tokens passed to it from a lexical -analyzer. - -@item Postfix operator -An arithmetic operator that is placed after the operands upon which it -performs some operation. - -@item Reduction -Replacing a string of nonterminals and/or terminals with a single -nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison -Parser Algorithm}. - -@item Reentrant -A reentrant subprogram is a subprogram which can be in invoked any -number of times in parallel, without interference between the various -invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. - -@item Reverse polish notation -A language in which all operators are postfix operators. - -@item Right recursion -A rule whose result symbol is also its last component symbol; for -example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive -Rules}. - -@item Semantics -In computer languages, the semantics are specified by the actions -taken for each instance of the language, i.e., the meaning of -each statement. @xref{Semantics, ,Defining Language Semantics}. - -@item Shift -A parser is said to shift when it makes the choice of analyzing -further input from the stream rather than reducing immediately some -already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm}. - -@item Single-character literal -A single character that is recognized and interpreted as is. -@xref{Grammar in Bison, ,From Formal Rules to Bison Input}. - -@item Start symbol -The nonterminal symbol that stands for a complete valid utterance in -the language being parsed. The start symbol is usually listed as the -first nonterminal symbol in a language specification. -@xref{Start Decl, ,The Start-Symbol}. - -@item Symbol table -A data structure where symbol names and associated data are stored -during parsing to allow for recognition and use of existing -information in repeated uses of a symbol. @xref{Multi-function Calc}. - -@item Syntax error -An error encountered during parsing of an input stream due to invalid -syntax. @xref{Error Recovery}. - -@item Token -A basic, grammatically indivisible unit of a language. The symbol -that describes a token in the grammar is a terminal symbol. -The input of the Bison parser is a stream of tokens which comes from -the lexical analyzer. @xref{Symbols}. - -@item Terminal symbol -A grammar symbol that has no rules in the grammar and therefore is -grammatically indivisible. The piece of text it represents is a token. -@xref{Language and Grammar, ,Languages and Context-Free Grammars}. - -@item Unreachable state -A parser state to which there does not exist a sequence of transitions from -the parser's start state. A state can become unreachable during conflict -resolution. @xref{Unreachable States}. -@end table - -@node Copying This Manual -@appendix Copying This Manual -@include fdl.texi - -@node Bibliography -@unnumbered Bibliography - -@table @asis -@item [Denny 2008] -Joel E. Denny and Brian A. Malloy, IELR(1): Practical LR(1) Parser Tables -for Non-LR(1) Grammars with Conflict Resolution, in @cite{Proceedings of the -2008 ACM Symposium on Applied Computing} (SAC'08), ACM, New York, NY, USA, -pp.@: 240--245. @uref{http://dx.doi.org/10.1145/1363686.1363747} - -@item [Denny 2010 May] -Joel E. Denny, PSLR(1): Pseudo-Scannerless Minimal LR(1) for the -Deterministic Parsing of Composite Languages, Ph.D. Dissertation, Clemson -University, Clemson, SC, USA (May 2010). -@uref{http://proquest.umi.com/pqdlink?did=2041473591&Fmt=7&clientId=79356&RQT=309&VName=PQD} - -@item [Denny 2010 November] -Joel E. Denny and Brian A. Malloy, The IELR(1) Algorithm for Generating -Minimal LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution, -in @cite{Science of Computer Programming}, Vol.@: 75, Issue 11 (November -2010), pp.@: 943--979. @uref{http://dx.doi.org/10.1016/j.scico.2009.08.001} - -@item [DeRemer 1982] -Frank DeRemer and Thomas Pennello, Efficient Computation of LALR(1) -Look-Ahead Sets, in @cite{ACM Transactions on Programming Languages and -Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@: -615--649. @uref{http://dx.doi.org/10.1145/69622.357187} - -@item [Knuth 1965] -Donald E. Knuth, On the Translation of Languages from Left to Right, in -@cite{Information and Control}, Vol.@: 8, Issue 6 (December 1965), pp.@: -607--639. @uref{http://dx.doi.org/10.1016/S0019-9958(65)90426-2} - -@item [Scott 2000] -Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf Hussain, -@cite{Tomita-Style Generalised LR Parsers}, Royal Holloway, University of -London, Department of Computer Science, TR-00-12 (December 2000). -@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps} -@end table - -@node Index -@unnumbered Index - -@printindex cp - -@bye - -@c LocalWords: texinfo setfilename settitle setchapternewpage finalout texi FSF -@c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex FSF's -@c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry Naur -@c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa Multi -@c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc multi -@c LocalWords: rpcalc Lexer Expr ltcalc mfcalc yylex defaultprec Donnelly Gotos -@c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref yypush -@c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex lr -@c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge POSIX -@c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG yypull -@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit nonfree -@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok rr -@c LocalWords: longjmp fprintf stderr yylloc YYLTYPE cos ln Stallman Destructor -@c LocalWords: symrec val tptr FNCT fnctptr func struct sym enum IEC syntaxes -@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof Lex -@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum DOTDOT -@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype Unary -@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless yynerrs nonterminal -@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES reentrant -@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param yypstate -@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP subrange -@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword loc -@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH inline -@c LocalWords: YYINITDEPTH stmts ref initdcl maybeasm notype Lookahead yyoutput -@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args Autoconf -@c LocalWords: infile ypp yxx outfile itemx tex leaderfill Troubleshouting sqrt -@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead -@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th -@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps -@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC -@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs sr -@c LocalWords: yytokentype destructor multicharacter nonnull EBCDIC nterm LR's -@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK -@c LocalWords: destructors Reentrancy nonreentrant subgrammar nonassociative Ph -@c LocalWords: deffnx namespace xml goto lalr ielr runtime lex yacc yyps env -@c LocalWords: yystate variadic Unshift NLS gettext po UTF Automake LOCALEDIR -@c LocalWords: YYENABLE bindtextdomain Makefile DEFS CPPFLAGS DBISON DeRemer -@c LocalWords: autoreconf Pennello multisets nondeterminism Generalised baz ACM -@c LocalWords: redeclare automata Dparse localedir datadir XSLT midrule Wno -@c LocalWords: Graphviz multitable headitem hh basename Doxygen fno filename -@c LocalWords: doxygen ival sval deftypemethod deallocate pos deftypemethodx -@c LocalWords: Ctor defcv defcvx arg accessors arithmetics CPP ifndef CALCXX -@c LocalWords: lexer's calcxx bool LPAREN RPAREN deallocation cerrno climits -@c LocalWords: cstdlib Debian undef yywrap unput noyywrap nounput zA yyleng -@c LocalWords: errno strtol ERANGE str strerror iostream argc argv Javadoc PSLR -@c LocalWords: bytecode initializers superclass stype ASTNode autoboxing nls -@c LocalWords: toString deftypeivar deftypeivarx deftypeop YYParser strictfp -@c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv -@c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url -@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos -@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's -@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy -@c LocalWords: Scannerless ispell american - -@c Local Variables: -@c ispell-dictionary: "american" -@c fill-column: 76 -@c End: diff --git a/examples/calc++/Makefile.am b/examples/calc++/Makefile.am index d62c57ec..e8bbbaca 100644 --- a/examples/calc++/Makefile.am +++ b/examples/calc++/Makefile.am @@ -28,7 +28,7 @@ $(BISON): $(BISON_IN) ## Extracting. ## ## ------------ ## -doc = $(top_srcdir)/doc/bison.texinfo +doc = $(top_srcdir)/doc/bison.texi extexi = $(top_srcdir)/examples/extexi # Extract in src. $(srcdir)/calc.stamp: $(doc) $(extexi) @@ -36,7 +36,7 @@ $(srcdir)/calc.stamp: $(doc) $(extexi) $(AM_V_at)touch $@.tmp $(AM_V_at)cd $(srcdir) && \ $(AWK) -f ../extexi -v VERSION="$(VERSION)" \ - ../../doc/bison.texinfo -- calc++-parser.yy \ + ../../doc/bison.texi -- calc++-parser.yy \ calc++-scanner.ll calc++.cc calc++-driver.hh calc++-driver.cc $(AM_V_at)mv $@.tmp $@