X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/1a7a65f9d5aec5eda92d92b2148fee2e6f0c4199..c6abeab182fed54a2068fd75978a97f9c09d9da7:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 70f62672..12c5ed3b 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -34,8 +34,8 @@ This manual (@value{UPDATED}) is for @acronym{GNU} Bison (version @value{VERSION}), the @acronym{GNU} parser generator. Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, -1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software -Foundation, Inc. +1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free +Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -89,76 +89,76 @@ Cover art by Etienne Suvasa. @menu * Introduction:: * Conditions:: -* Copying:: The @acronym{GNU} General Public License says - how you can copy and share Bison +* Copying:: The @acronym{GNU} General Public License says + how you can copy and share Bison. Tutorial sections: -* Concepts:: Basic concepts for understanding Bison. -* Examples:: Three simple explained examples of using Bison. +* Concepts:: Basic concepts for understanding Bison. +* Examples:: Three simple explained examples of using Bison. Reference sections: -* Grammar File:: Writing Bison declarations and rules. -* Interface:: C-language interface to the parser function @code{yyparse}. -* Algorithm:: How the Bison parser works at run-time. -* Error Recovery:: Writing rules for error recovery. +* Grammar File:: Writing Bison declarations and rules. +* Interface:: C-language interface to the parser function @code{yyparse}. +* Algorithm:: How the Bison parser works at run-time. +* Error Recovery:: Writing rules for error recovery. * Context Dependency:: What to do if your language syntax is too - messy for Bison to handle straightforwardly. -* Debugging:: Understanding or debugging Bison parsers. -* Invocation:: How to run Bison (to produce the parser source file). -* Other Languages:: Creating C++ and Java parsers. -* FAQ:: Frequently Asked Questions -* Table of Symbols:: All the keywords of the Bison language are explained. -* Glossary:: Basic concepts are explained. -* Copying This Manual:: License for copying this manual. -* Index:: Cross-references to the text. + messy for Bison to handle straightforwardly. +* Debugging:: Understanding or debugging Bison parsers. +* Invocation:: How to run Bison (to produce the parser source file). +* Other Languages:: Creating C++ and Java parsers. +* FAQ:: Frequently Asked Questions +* Table of Symbols:: All the keywords of the Bison language are explained. +* Glossary:: Basic concepts are explained. +* Copying This Manual:: License for copying this manual. +* Index:: Cross-references to the text. @detailmenu --- The Detailed Node Listing --- The Concepts of Bison -* Language and Grammar:: Languages and context-free grammars, - as mathematical ideas. -* Grammar in Bison:: How we represent grammars for Bison's sake. -* Semantic Values:: Each token or syntactic grouping can have - a semantic value (the value of an integer, - the name of an identifier, etc.). -* Semantic Actions:: Each rule can have an action containing C code. -* GLR Parsers:: Writing parsers for general context-free languages. -* Locations Overview:: Tracking Locations. -* Bison Parser:: What are Bison's input and output, - how is the output used? -* Stages:: Stages in writing and running Bison grammars. -* Grammar Layout:: Overall structure of a Bison grammar file. +* Language and Grammar:: Languages and context-free grammars, + as mathematical ideas. +* Grammar in Bison:: How we represent grammars for Bison's sake. +* Semantic Values:: Each token or syntactic grouping can have + a semantic value (the value of an integer, + the name of an identifier, etc.). +* Semantic Actions:: Each rule can have an action containing C code. +* GLR Parsers:: Writing parsers for general context-free languages. +* Locations Overview:: Tracking Locations. +* Bison Parser:: What are Bison's input and output, + how is the output used? +* Stages:: Stages in writing and running Bison grammars. +* Grammar Layout:: Overall structure of a Bison grammar file. Writing @acronym{GLR} Parsers -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. Examples -* RPN Calc:: Reverse polish notation calculator; - a first example with no operator precedence. -* Infix Calc:: Infix (algebraic) notation calculator. - Operator precedence is introduced. +* RPN Calc:: Reverse polish notation calculator; + a first example with no operator precedence. +* Infix Calc:: Infix (algebraic) notation calculator. + Operator precedence is introduced. * Simple Error Recovery:: Continuing after syntax errors. * Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. -* Multi-function Calc:: Calculator with memory and trig functions. - It uses multiple data-types for semantic values. -* Exercises:: Ideas for improving the multi-function calculator. +* Multi-function Calc:: Calculator with memory and trig functions. + It uses multiple data-types for semantic values. +* Exercises:: Ideas for improving the multi-function calculator. Reverse Polish Notation Calculator -* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc. -* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. -* Lexer: Rpcalc Lexer. The lexical analyzer. -* Main: Rpcalc Main. The controlling function. -* Error: Rpcalc Error. The error reporting function. -* Gen: Rpcalc Gen. Running Bison on the grammar file. -* Comp: Rpcalc Compile. Run the C compiler on the output code. +* Rpcalc Declarations:: Prologue (declarations) for rpcalc. +* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. +* Rpcalc Lexer:: The lexical analyzer. +* Rpcalc Main:: The controlling function. +* Rpcalc Error:: The error reporting function. +* Rpcalc Generate:: Running Bison on the grammar file. +* Rpcalc Compile:: Run the C compiler on the output code. Grammar Rules for @code{rpcalc} @@ -168,15 +168,15 @@ Grammar Rules for @code{rpcalc} Location Tracking Calculator: @code{ltcalc} -* Decls: Ltcalc Decls. Bison and C declarations for ltcalc. -* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations. -* Lexer: Ltcalc Lexer. The lexical analyzer. +* Ltcalc Declarations:: Bison and C declarations for ltcalc. +* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. +* Ltcalc Lexer:: The lexical analyzer. Multi-Function Calculator: @code{mfcalc} -* Decl: Mfcalc Decl. Bison declarations for multi-function calculator. -* Rules: Mfcalc Rules. Grammar rules for the calculator. -* Symtab: Mfcalc Symtab. Symbol table management subroutines. +* Mfcalc Declarations:: Bison declarations for multi-function calculator. +* Mfcalc Rules:: Grammar rules for the calculator. +* Mfcalc Symbol Table:: Symbol table management subroutines. Bison Grammar Files @@ -191,11 +191,11 @@ Bison Grammar Files Outline of a Bison Grammar -* Prologue:: Syntax and usage of the prologue. +* Prologue:: Syntax and usage of the prologue. * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. -* Bison Declarations:: Syntax and usage of the Bison declarations section. -* Grammar Rules:: Syntax and usage of the grammar rules section. -* Epilogue:: Syntax and usage of the epilogue. +* Bison Declarations:: Syntax and usage of the Bison declarations section. +* Grammar Rules:: Syntax and usage of the grammar rules section. +* Epilogue:: Syntax and usage of the epilogue. Defining Language Semantics @@ -230,24 +230,28 @@ Bison Declarations Parser C-Language Interface -* Parser Function:: How to call @code{yyparse} and what it returns. -* Lexical:: You must supply a function @code{yylex} - which reads tokens. -* Error Reporting:: You must supply a function @code{yyerror}. -* Action Features:: Special features for use in actions. -* Internationalization:: How to let the parser speak in the user's - native language. +* Parser Function:: How to call @code{yyparse} and what it returns. +* Push Parser Function:: How to call @code{yypush_parse} and what it returns. +* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. +* Parser Create Function:: How to call @code{yypstate_new} and what it returns. +* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. +* Lexical:: You must supply a function @code{yylex} + which reads tokens. +* Error Reporting:: You must supply a function @code{yyerror}. +* Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. The Lexical Analyzer Function @code{yylex} * Calling Convention:: How @code{yyparse} calls @code{yylex}. -* Token Values:: How @code{yylex} must return the semantic value - of the token it has read. -* Token Locations:: How @code{yylex} must return the text location - (line number, etc.) of the token, if the - actions want that. -* Pure Calling:: How the calling convention differs - in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +* Token Values:: How @code{yylex} must return the semantic value + of the token it has read. +* Token Locations:: How @code{yylex} must return the text location + (line number, etc.) of the token, if the + actions want that. +* Pure Calling:: How the calling convention differs in a pure parser + (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). The Bison Parser Algorithm @@ -257,7 +261,7 @@ The Bison Parser Algorithm * Contextual Precedence:: When an operator's precedence depends on context. * Parser States:: The parser is a finite-state-machine with stack. * Reduce/Reduce:: When two rules are applicable in the same situation. -* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. +* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. * Generalized LR Parsing:: Parsing arbitrary context-free grammars. * Memory Management:: What happens when memory is exhausted. How to avoid it. @@ -312,33 +316,33 @@ A Complete C++ Example Java Parsers -* Java Bison Interface:: Asking for Java parser generation -* Java Semantic Values:: %type and %token vs. Java -* Java Location Values:: The position and location classes -* Java Parser Interface:: Instantiating and running the parser -* Java Scanner Interface:: Specifying the scanner for the parser -* Java Action Features:: Special features for use in actions. -* Java Differences:: Differences between C/C++ and Java Grammars -* Java Declarations Summary:: List of Bison declarations used with Java +* Java Bison Interface:: Asking for Java parser generation +* Java Semantic Values:: %type and %token vs. Java +* Java Location Values:: The position and location classes +* Java Parser Interface:: Instantiating and running the parser +* Java Scanner Interface:: Specifying the scanner for the parser +* Java Action Features:: Special features for use in actions +* Java Differences:: Differences between C/C++ and Java Grammars +* Java Declarations Summary:: List of Bison declarations used with Java Frequently Asked Questions -* Memory Exhausted:: Breaking the Stack Limits -* How Can I Reset the Parser:: @code{yyparse} Keeps some State -* Strings are Destroyed:: @code{yylval} Loses Track of Strings -* Implementing Gotos/Loops:: Control Flow in the Calculator -* Multiple start-symbols:: Factoring closely related grammars -* Secure? Conform?:: Is Bison @acronym{POSIX} safe? -* I can't build Bison:: Troubleshooting -* Where can I find help?:: Troubleshouting -* Bug Reports:: Troublereporting -* Other Languages:: Parsers in Java and others -* Beta Testing:: Experimenting development versions -* Mailing Lists:: Meeting other Bison users +* Memory Exhausted:: Breaking the Stack Limits +* How Can I Reset the Parser:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings +* Implementing Gotos/Loops:: Control Flow in the Calculator +* Multiple start-symbols:: Factoring closely related grammars +* Secure? Conform?:: Is Bison @acronym{POSIX} safe? +* I can't build Bison:: Troubleshooting +* Where can I find help?:: Troubleshouting +* Bug Reports:: Troublereporting +* More Languages:: Parsers in C++, Java, and so on +* Beta Testing:: Experimenting development versions +* Mailing Lists:: Meeting other Bison users Copying This Manual -* Copying This Manual:: License for copying this manual. +* Copying This Manual:: License for copying this manual. @end detailmenu @end menu @@ -348,10 +352,12 @@ Copying This Manual @cindex introduction @dfn{Bison} is a general-purpose parser generator that converts an -annotated context-free grammar into an @acronym{LALR}(1) or -@acronym{GLR} parser for that grammar. Once you are proficient with -Bison, you can use it to develop a wide range of language parsers, from those -used in simple desk calculators to complex programming languages. +annotated context-free grammar into a deterministic or @acronym{GLR} +parser employing @acronym{LALR}(1), @acronym{IELR}(1), or canonical +@acronym{LR}(1) parser tables. +Once you are proficient with Bison, you can use it to develop a wide +range of language parsers, from those used in simple desk calculators to +complex programming languages. Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc @@ -418,19 +424,19 @@ details of Bison will not make sense. If you do not already know how to use Bison or Yacc, we suggest you start by reading this chapter carefully. @menu -* Language and Grammar:: Languages and context-free grammars, - as mathematical ideas. -* Grammar in Bison:: How we represent grammars for Bison's sake. -* Semantic Values:: Each token or syntactic grouping can have - a semantic value (the value of an integer, - the name of an identifier, etc.). -* Semantic Actions:: Each rule can have an action containing C code. -* GLR Parsers:: Writing parsers for general context-free languages. -* Locations Overview:: Tracking Locations. -* Bison Parser:: What are Bison's input and output, - how is the output used? -* Stages:: Stages in writing and running Bison grammars. -* Grammar Layout:: Overall structure of a Bison grammar file. +* Language and Grammar:: Languages and context-free grammars, + as mathematical ideas. +* Grammar in Bison:: How we represent grammars for Bison's sake. +* Semantic Values:: Each token or syntactic grouping can have + a semantic value (the value of an integer, + the name of an identifier, etc.). +* Semantic Actions:: Each rule can have an action containing C code. +* GLR Parsers:: Writing parsers for general context-free languages. +* Locations Overview:: Tracking Locations. +* Bison Parser:: What are Bison's input and output, + how is the output used? +* Stages:: Stages in writing and running Bison grammars. +* Grammar Layout:: Overall structure of a Bison grammar file. @end menu @node Language and Grammar @@ -457,26 +463,27 @@ order to specify the language Algol 60. Any grammar expressed in essentially machine-readable @acronym{BNF}. @cindex @acronym{LALR}(1) grammars +@cindex @acronym{IELR}(1) grammars @cindex @acronym{LR}(1) grammars -There are various important subclasses of context-free grammar. Although it -can handle almost all context-free grammars, Bison is optimized for what -are called @acronym{LALR}(1) grammars. -In brief, in these grammars, it must be possible to -tell how to parse any portion of an input string with just a single -token of lookahead. Strictly speaking, that is a description of an -@acronym{LR}(1) grammar, and @acronym{LALR}(1) involves additional -restrictions that are -hard to explain simply; but it is rare in actual practice to find an -@acronym{LR}(1) grammar that fails to be @acronym{LALR}(1). +There are various important subclasses of context-free grammars. +Although it can handle almost all context-free grammars, Bison is +optimized for what are called @acronym{LR}(1) grammars. +In brief, in these grammars, it must be possible to tell how to parse +any portion of an input string with just a single token of lookahead. +For historical reasons, Bison by default is limited by the additional +restrictions of @acronym{LALR}(1), which is hard to explain simply. @xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for more information on this. +To escape these additional restrictions, you can request +@acronym{IELR}(1) or canonical @acronym{LR}(1) parser tables. +@xref{Decl Summary,,lr.type}, to learn how. @cindex @acronym{GLR} parsing @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars @cindex nondeterministic parsing -Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning +Parsers for @acronym{LR}(1) grammars are @dfn{deterministic}, meaning roughly that the next grammar rule to apply at any point in the input is uniquely determined by the preceding input and a fixed, finite portion (called a @dfn{lookahead}) of the remaining input. A context-free @@ -705,8 +712,8 @@ from the values of the two subexpressions. @cindex shift/reduce conflicts @cindex reduce/reduce conflicts -In some grammars, Bison's standard -@acronym{LALR}(1) parsing algorithm cannot decide whether to apply a +In some grammars, Bison's deterministic +@acronym{LR}(1) parsing algorithm cannot decide whether to apply a certain grammar rule at a given point. That is, it may not be able to decide (on the basis of the input read so far) which of two possible reductions (applications of a grammar rule) applies, or whether to apply @@ -715,13 +722,13 @@ input. These are known respectively as @dfn{reduce/reduce} conflicts (@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts (@pxref{Shift/Reduce}). -To use a grammar that is not easily modified to be @acronym{LALR}(1), a +To use a grammar that is not easily modified to be @acronym{LR}(1), a more general parsing algorithm is sometimes necessary. If you include @code{%glr-parser} among the Bison declarations in your file (@pxref{Grammar Outline}), the result is a Generalized @acronym{LR} (@acronym{GLR}) parser. These parsers handle Bison grammars that contain no unresolved conflicts (i.e., after applying precedence -declarations) identically to @acronym{LALR}(1) parsers. However, when +declarations) identically to deterministic parsers. However, when faced with unresolved shift/reduce and reduce/reduce conflicts, @acronym{GLR} parsers use the simple expedient of doing both, effectively cloning the parser to follow both possibilities. Each of @@ -746,10 +753,10 @@ user-defined function on the resulting values to produce an arbitrary merged result. @menu -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. @end menu @node Simple GLR Parsers @@ -763,11 +770,8 @@ merged result. @cindex shift/reduce conflicts In the simplest cases, you can use the @acronym{GLR} algorithm -to parse grammars that are unambiguous, but fail to be @acronym{LALR}(1). -Such grammars typically require more than one symbol of lookahead, -or (in rare cases) fall into the category of grammars in which the -@acronym{LALR}(1) algorithm throws away too much information (they are in -@acronym{LR}(1), but not @acronym{LALR}(1), @ref{Mystery Conflicts}). +to parse grammars that are unambiguous but fail to be @acronym{LR}(1). +Such grammars typically require more than one symbol of lookahead. Consider a problem that arises in the declaration of enumerated and subrange types in the @@ -804,7 +808,7 @@ type enum = (a); valid, and more-complicated cases can come up in practical programs.) These two declarations look identical until the @samp{..} token. -With normal @acronym{LALR}(1) one-token lookahead it is not +With normal @acronym{LR}(1) one-token lookahead it is not possible to decide between the two forms when the identifier @samp{a} is parsed. It is, however, desirable for a parser to decide this, since in the latter case @@ -843,9 +847,9 @@ reports a syntax error as usual. The effect of all this is that the parser seems to ``guess'' the correct branch to take, or in other words, it seems to use more -lookahead than the underlying @acronym{LALR}(1) algorithm actually allows -for. In this example, @acronym{LALR}(2) would suffice, but also some cases -that are not @acronym{LALR}(@math{k}) for any @math{k} can be handled this way. +lookahead than the underlying @acronym{LR}(1) algorithm actually allows +for. In this example, @acronym{LR}(2) would suffice, but also some cases +that are not @acronym{LR}(@math{k}) for any @math{k} can be handled this way. In general, a @acronym{GLR} parser can take quadratic or cubic worst-case time, and the current Bison parser even takes exponential time and space @@ -898,7 +902,7 @@ expr : '(' expr ')' @end group @end example -When used as a normal @acronym{LALR}(1) grammar, Bison correctly complains +When used as a normal @acronym{LR}(1) grammar, Bison correctly complains about one reduce/reduce conflict. In the conflicting situation the parser chooses one of the alternatives, arbitrarily the one declared first. Therefore the following correct input is not @@ -930,7 +934,7 @@ there are at least two potential problems to beware. First, always analyze the conflicts reported by Bison to make sure that @acronym{GLR} splitting is only done where it is intended. A @acronym{GLR} parser splitting inadvertently may cause problems less obvious than an -@acronym{LALR} parser statically choosing the wrong alternative in a +@acronym{LR} parser statically choosing the wrong alternative in a conflict. Second, consider interactions with the lexer (@pxref{Semantic Tokens}) with great care. Since a split parser consumes tokens without performing any actions during the split, the lexer cannot obtain @@ -1150,7 +1154,7 @@ Another Bison feature requiring special consideration is @code{YYERROR} (@pxref{Action Features}), which you can invoke in a semantic action to initiate error recovery. During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is -the same as its effect in an @acronym{LALR}(1) parser. +the same as its effect in a deterministic parser. In a deferred semantic action, its effect is undefined. @c The effect is probably a syntax error at the split point. @@ -1377,15 +1381,15 @@ languages are written the same way. You can copy these examples into a source file to try them. @menu -* RPN Calc:: Reverse polish notation calculator; - a first example with no operator precedence. -* Infix Calc:: Infix (algebraic) notation calculator. - Operator precedence is introduced. +* RPN Calc:: Reverse polish notation calculator; + a first example with no operator precedence. +* Infix Calc:: Infix (algebraic) notation calculator. + Operator precedence is introduced. * Simple Error Recovery:: Continuing after syntax errors. * Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. -* Multi-function Calc:: Calculator with memory and trig functions. - It uses multiple data-types for semantic values. -* Exercises:: Ideas for improving the multi-function calculator. +* Multi-function Calc:: Calculator with memory and trig functions. + It uses multiple data-types for semantic values. +* Exercises:: Ideas for improving the multi-function calculator. @end menu @node RPN Calc @@ -1404,16 +1408,16 @@ The source code for this calculator is named @file{rpcalc.y}. The @samp{.y} extension is a convention used for Bison input files. @menu -* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc. -* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. -* Lexer: Rpcalc Lexer. The lexical analyzer. -* Main: Rpcalc Main. The controlling function. -* Error: Rpcalc Error. The error reporting function. -* Gen: Rpcalc Gen. Running Bison on the grammar file. -* Comp: Rpcalc Compile. Run the C compiler on the output code. +* Rpcalc Declarations:: Prologue (declarations) for rpcalc. +* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. +* Rpcalc Lexer:: The lexical analyzer. +* Rpcalc Main:: The controlling function. +* Rpcalc Error:: The error reporting function. +* Rpcalc Generate:: Running Bison on the grammar file. +* Rpcalc Compile:: Run the C compiler on the output code. @end menu -@node Rpcalc Decls +@node Rpcalc Declarations @subsection Declarations for @code{rpcalc} Here are the C and Bison declarations for the reverse polish notation @@ -1663,7 +1667,7 @@ therefore, @code{NUM} becomes a macro for @code{yylex} to use. The semantic value of the token (if it has one) is stored into the global variable @code{yylval}, which is where the Bison parser will look for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was -defined at the beginning of the grammar; @pxref{Rpcalc Decls, +defined at the beginning of the grammar; @pxref{Rpcalc Declarations, ,Declarations for @code{rpcalc}}.) A token type code of zero is returned if the end-of-input is encountered. @@ -1759,7 +1763,7 @@ have not written any error rules in this example, so any invalid input will cause the calculator program to exit. This is not clean behavior for a real calculator, but it is adequate for the first example. -@node Rpcalc Gen +@node Rpcalc Generate @subsection Running Bison to Make the Parser @cindex running Bison (introduction) @@ -1979,12 +1983,12 @@ most of the work needed to use locations will be done in the lexical analyzer. @menu -* Decls: Ltcalc Decls. Bison and C declarations for ltcalc. -* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations. -* Lexer: Ltcalc Lexer. The lexical analyzer. +* Ltcalc Declarations:: Bison and C declarations for ltcalc. +* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. +* Ltcalc Lexer:: The lexical analyzer. @end menu -@node Ltcalc Decls +@node Ltcalc Declarations @subsection Declarations for @code{ltcalc} The C and Bison declarations for the location tracking calculator are @@ -2220,12 +2224,12 @@ $ Note that multiple assignment and nested function calls are permitted. @menu -* Decl: Mfcalc Decl. Bison declarations for multi-function calculator. -* Rules: Mfcalc Rules. Grammar rules for the calculator. -* Symtab: Mfcalc Symtab. Symbol table management subroutines. +* Mfcalc Declarations:: Bison declarations for multi-function calculator. +* Mfcalc Rules:: Grammar rules for the calculator. +* Mfcalc Symbol Table:: Symbol table management subroutines. @end menu -@node Mfcalc Decl +@node Mfcalc Declarations @subsection Declarations for @code{mfcalc} Here are the C and Bison declarations for the multi-function calculator. @@ -2321,7 +2325,7 @@ exp: NUM @{ $$ = $1; @} %% @end smallexample -@node Mfcalc Symtab +@node Mfcalc Symbol Table @subsection The @code{mfcalc} Symbol Table @cindex symbol table example @@ -2634,11 +2638,11 @@ As a @acronym{GNU} extension, @samp{//} introduces a comment that continues until end of line. @menu -* Prologue:: Syntax and usage of the prologue. +* Prologue:: Syntax and usage of the prologue. * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. -* Bison Declarations:: Syntax and usage of the Bison declarations section. -* Grammar Rules:: Syntax and usage of the grammar rules section. -* Epilogue:: Syntax and usage of the epilogue. +* Bison Declarations:: Syntax and usage of the Bison declarations section. +* Grammar Rules:: Syntax and usage of the grammar rules section. +* Epilogue:: Syntax and usage of the epilogue. @end menu @node Prologue @@ -2702,9 +2706,6 @@ feature test macros can affect the behavior of Bison-generated @findex %code requires @findex %code provides @findex %code top -(The prologue alternatives described here are experimental. -More user feedback will help to determine whether they should become permanent -features.) The functionality of @var{Prologue} sections can often be subtle and inflexible. @@ -3049,8 +3050,12 @@ A @dfn{nonterminal symbol} stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case. -Symbol names can contain letters, digits (not at the beginning), -underscores and periods. Periods make sense only in nonterminals. +Symbol names can contain letters, underscores, periods, dashes, and (not +at the beginning) digits. Dashes in symbol names are a GNU +extension, incompatible with @acronym{POSIX} Yacc. Terminal symbols +that contain periods or dashes make little sense: since they are not +valid symbols (in most programming languages) they are not exported as +token names. There are three ways of writing terminal symbols in the grammar: @@ -3801,8 +3806,11 @@ typedef struct YYLTYPE @} YYLTYPE; @end example -At the beginning of the parsing, Bison initializes all these fields to 1 -for @code{yylloc}. +When @code{YYLTYPE} is not defined, at the beginning of the parsing, Bison +initializes all these fields to 1 for @code{yylloc}. To initialize +@code{yylloc} with a custom location type (or to chose a different +initialization), use the @code{%initial-action} directive. @xref{Initial +Action Decl, , Performing Actions before Parsing}. @node Actions and Locations @subsection Actions and Locations @@ -4467,7 +4475,7 @@ be @var{n} shift/reduce conflicts and no reduce/reduce conflicts. Bison reports an error if the number of shift/reduce conflicts differs from @var{n}, or if there are any reduce/reduce conflicts. -For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more +For deterministic parsers, reduce/reduce conflicts are more serious, and should be eliminated entirely. Bison will always report reduce/reduce conflicts for these parsers. With @acronym{GLR} parsers, however, both kinds of conflicts are routine; otherwise, @@ -4536,7 +4544,7 @@ statically allocated variables for communication with @code{yylex}, including @code{yylval} and @code{yylloc}.) Alternatively, you can generate a pure, reentrant parser. The Bison -declaration @code{%define api.pure} says that you want the parser to be +declaration @samp{%define api.pure} says that you want the parser to be reentrant. It looks like this: @example @@ -4561,7 +4569,7 @@ valid grammar. @subsection A Push Parser @cindex push parser @cindex push parser -@findex %define api.push_pull +@findex %define api.push-pull (The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.) @@ -4577,10 +4585,10 @@ within a certain time period. Normally, Bison generates a pull parser. The following Bison declaration says that you want the parser to be a push -parser (@pxref{Decl Summary,,%define api.push_pull}): +parser (@pxref{Decl Summary,,%define api.push-pull}): @example -%define api.push_pull "push" +%define api.push-pull push @end example In almost all cases, you want to ensure that your push parser is also @@ -4591,7 +4599,7 @@ what you are doing, your declarations should look like this: @example %define api.pure -%define api.push_pull "push" +%define api.push-pull push @end example There is a major notable functional difference between the pure push parser @@ -4640,14 +4648,14 @@ for use by the next invocation of the @code{yypush_parse} function. Bison also supports both the push parser interface along with the pull parser interface in the same generated parser. In order to get this functionality, -you should replace the @code{%define api.push_pull "push"} declaration with the -@code{%define api.push_pull "both"} declaration. Doing this will create all of +you should replace the @samp{%define api.push-pull push} declaration with the +@samp{%define api.push-pull both} declaration. Doing this will create all of the symbols mentioned earlier along with the two extra symbols, @code{yyparse} and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally would be used. However, the user should note that it is implemented in the generated parser by calling @code{yypull_parse}. This makes the @code{yyparse} function that is generated with the -@code{%define api.push_pull "both"} declaration slower than the normal +@samp{%define api.push-pull both} declaration slower than the normal @code{yyparse} function. If the user calls the @code{yypull_parse} function it will parse the rest of the input stream. It is possible to @code{yypush_parse} tokens to select a subgrammar @@ -4663,9 +4671,9 @@ yypull_parse (ps); /* Will call the lexer */ yypstate_delete (ps); @end example -Adding the @code{%define api.pure} declaration does exactly the same thing to -the generated parser with @code{%define api.push_pull "both"} as it did for -@code{%define api.push_pull "push"}. +Adding the @samp{%define api.pure} declaration does exactly the same thing to +the generated parser with @samp{%define api.push-pull both} as it did for +@samp{%define api.push-pull push}. @node Decl Summary @subsection Bison Declaration Summary @@ -4745,10 +4753,6 @@ Thus, @code{%code} replaces the traditional Yacc prologue, For a detailed discussion, see @ref{Prologue Alternatives}. For Java, the default location is inside the parser class. - -(Like all the Yacc prologue alternatives, this directive is experimental. -More user feedback will help to determine whether it should become a permanent -feature.) @end deffn @deffn {Directive} %code @var{qualifier} @{@var{code}@} @@ -4759,7 +4763,9 @@ use this form instead. @var{qualifier} identifies the purpose of @var{code} and thus the location(s) where Bison should generate it. -Not all values of @var{qualifier} are available for all target languages: +Not all @var{qualifier}s are accepted for all target languages. +Unaccepted @var{qualifier}s produce an error. +Some of the accepted @var{qualifier}s are: @itemize @bullet @item requires @@ -4826,53 +4832,111 @@ before any class definitions. @end itemize @end itemize -(Like all the Yacc prologue alternatives, this directive is experimental. -More user feedback will help to determine whether it should become a permanent -feature.) - @cindex Prologue For a detailed discussion of how to use @code{%code} in place of the traditional Yacc prologue for C/C++, see @ref{Prologue Alternatives}. @end deffn @deffn {Directive} %debug -In the parser file, define the macro @code{YYDEBUG} to 1 if it is not -already defined, so that the debugging facilities are compiled. -@end deffn +Instrument the output parser for traces. Obsoleted by @samp{%define +parse.trace}. @xref{Tracing, ,Tracing Your Parser}. +@end deffn @deffn {Directive} %define @var{variable} +@deffnx {Directive} %define @var{variable} @var{value} @deffnx {Directive} %define @var{variable} "@var{value}" Define a variable to adjust Bison's behavior. -The possible choices for @var{variable}, as well as their meanings, depend on -the selected target language and/or the parser skeleton (@pxref{Decl -Summary,,%language}, @pxref{Decl Summary,,%skeleton}). -Bison will warn if a @var{variable} is defined multiple times. +It is an error if a @var{variable} is defined by @code{%define} multiple +times, but see @ref{Bison Options,,-D @var{name}[=@var{value}]}. + +@var{value} must be placed in quotation marks if it contains any +character other than a letter, underscore, period, dash, or non-initial +digit. -Omitting @code{"@var{value}"} is always equivalent to specifying it as +Omitting @code{"@var{value}"} entirely is always equivalent to specifying @code{""}. -Some @var{variable}s may be used as Booleans. +Some @var{variable}s take Boolean values. In this case, Bison will complain if the variable definition does not meet one of the following four conditions: @enumerate -@item @code{"@var{value}"} is @code{"true"} +@item @code{@var{value}} is @code{true} -@item @code{"@var{value}"} is omitted (or is @code{""}). -This is equivalent to @code{"true"}. +@item @code{@var{value}} is omitted (or @code{""} is specified). +This is equivalent to @code{true}. -@item @code{"@var{value}"} is @code{"false"}. +@item @code{@var{value}} is @code{false}. @item @var{variable} is never defined. -In this case, Bison selects a default value, which may depend on the selected -target language and/or parser skeleton. +In this case, Bison selects a default value. @end enumerate +What @var{variable}s are accepted, as well as their meanings and default +values, depend on the selected target language and/or the parser +skeleton (@pxref{Decl Summary,,%language}, @pxref{Decl +Summary,,%skeleton}). +Unaccepted @var{variable}s produce an error. Some of the accepted @var{variable}s are: -@itemize @bullet +@table @code +@c ================================================== namespace +@item api.namespace +@findex %define api.namespace +@itemize +@item Languages(s): C++ + +@item Purpose: Specifies the namespace for the parser class. +For example, if you specify: + +@smallexample +%define api.namespace "foo::bar" +@end smallexample + +Bison uses @code{foo::bar} verbatim in references such as: + +@smallexample +foo::bar::parser::semantic_type +@end smallexample + +However, to open a namespace, Bison removes any leading @code{::} and then +splits on any remaining occurrences: + +@smallexample +namespace foo @{ namespace bar @{ + class position; + class location; +@} @} +@end smallexample + +@item Accepted Values: +Any absolute or relative C++ namespace reference without a trailing +@code{"::"}. For example, @code{"foo"} or @code{"::foo::bar"}. + +@item Default Value: +The value specified by @code{%name-prefix}, which defaults to @code{yy}. +This usage of @code{%name-prefix} is for backward compatibility and can +be confusing since @code{%name-prefix} also specifies the textual prefix +for the lexical analyzer function. Thus, if you specify +@code{%name-prefix}, it is best to also specify @samp{%define +api.namespace} so that @code{%name-prefix} @emph{only} affects the +lexical analyzer function. For example, if you specify: + +@smallexample +%define api.namespace "foo" +%name-prefix "bar::" +@end smallexample + +The parser namespace is @code{foo} and @code{yylex} is referenced as +@code{bar::lex}. +@end itemize +@c namespace + + + +@c ================================================== api.pure @item api.pure @findex %define api.pure @@ -4884,27 +4948,134 @@ Some of the accepted @var{variable}s are: @item Accepted Values: Boolean -@item Default Value: @code{"false"} +@item Default Value: @code{false} @end itemize +@c api.pure -@item api.push_pull -@findex %define api.push_pull + + +@c ================================================== api.push-pull +@item api.push-pull +@findex %define api.push-pull @itemize @bullet -@item Language(s): C (LALR(1) only) +@item Language(s): C (deterministic parsers only) @item Purpose: Requests a pull parser, a push parser, or both. @xref{Push Decl, ,A Push Parser}. (The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.) -@item Accepted Values: @code{"pull"}, @code{"push"}, @code{"both"} +@item Accepted Values: @code{pull}, @code{push}, @code{both} + +@item Default Value: @code{pull} +@end itemize +@c api.push-pull + +@item api.tokens.prefix +@findex %define api.tokens.prefix + +@itemize +@item Languages(s): all + +@item Purpose: +Add a prefix to the token names when generating their definition in the +target language. For instance + +@example +%token FILE for ERROR +%define api.tokens.prefix "TOK_" +%% +start: FILE for ERROR; +@end example + +@noindent +generates the definition of the symbols @code{TOK_FILE}, @code{TOK_for}, +and @code{TOK_ERROR} in the generated source files. In particular, the +scanner must use these prefixed token names, while the grammar itself +may still use the short names (as in the sample rule given above). The +generated informational files (@file{*.output}, @file{*.xml}, +@file{*.dot}) are not modified by this prefix. See @ref{Calc++ Parser} +and @ref{Calc++ Scanner}, for a complete example. + +@item Accepted Values: +Any string. Should be a valid identifier prefix in the target language, +in other words, it should typically be an identifier itself (sequence of +letters, underscores, and ---not at the beginning--- digits). + +@item Default Value: +empty +@end itemize +@c api.tokens.prefix + + +@item lr.default-reductions +@cindex default reductions +@findex %define lr.default-reductions +@cindex delayed syntax errors +@cindex syntax errors delayed + +@itemize @bullet +@item Language(s): all + +@item Purpose: Specifies the kind of states that are permitted to +contain default reductions. +That is, in such a state, Bison declares the reduction with the largest +lookahead set to be the default reduction and then removes that +lookahead set. +The advantages of default reductions are discussed below. +The disadvantage is that, when the generated parser encounters a +syntactically unacceptable token, the parser might then perform +unnecessary default reductions before it can detect the syntax error. + +(This feature is experimental. +More user feedback will help to stabilize it.) + +@item Accepted Values: +@itemize +@item @code{all}. +For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl +Summary,,lr.type}) by default, all states are permitted to contain +default reductions. +The advantage is that parser table sizes can be significantly reduced. +The reason Bison does not by default attempt to address the disadvantage +of delayed syntax error detection is that this disadvantage is already +inherent in @acronym{LALR} and @acronym{IELR} parser tables. +That is, unlike in a canonical @acronym{LR} state, the lookahead sets of +reductions in an @acronym{LALR} or @acronym{IELR} state can contain +tokens that are syntactically incorrect for some left contexts. + +@item @code{consistent}. +@cindex consistent states +A consistent state is a state that has only one possible action. +If that action is a reduction, then the parser does not need to request +a lookahead token from the scanner before performing that action. +However, the parser only recognizes the ability to ignore the lookahead +token when such a reduction is encoded as a default reduction. +Thus, if default reductions are permitted in and only in consistent +states, then a canonical @acronym{LR} parser reports a syntax error as +soon as it @emph{needs} the syntactically unacceptable token from the +scanner. + +@item @code{accepting}. +@cindex accepting state +By default, the only default reduction permitted in a canonical +@acronym{LR} parser is the accept action in the accepting state, which +the parser reaches only after reading all tokens from the input. +Thus, the default canonical @acronym{LR} parser reports a syntax error +as soon as it @emph{reaches} the syntactically unacceptable token +without performing any extra reductions. +@end itemize -@item Default Value: @code{"pull"} +@item Default Value: +@itemize +@item @code{accepting} if @code{lr.type} is @code{canonical-lr}. +@item @code{all} otherwise. +@end itemize @end itemize -@item lr.keep_unreachable_states -@findex %define lr.keep_unreachable_states +@item lr.keep-unreachable-states +@findex %define lr.keep-unreachable-states @itemize @bullet @item Language(s): all @@ -4920,7 +5091,7 @@ are useless in the generated parser. @item Accepted Values: Boolean -@item Default Value: @code{"false"} +@item Default Value: @code{false} @item Caveats: @@ -4945,61 +5116,159 @@ states. However, Bison does not compute which goto actions are useless. @end itemize @end itemize +@c lr.keep-unreachable-states + +@item lr.type +@findex %define lr.type +@cindex @acronym{LALR} +@cindex @acronym{IELR} +@cindex @acronym{LR} + +@itemize @bullet +@item Language(s): all + +@item Purpose: Specifies the type of parser tables within the +@acronym{LR}(1) family. +(This feature is experimental. +More user feedback will help to stabilize it.) + +@item Accepted Values: +@itemize +@item @code{lalr}. +While Bison generates @acronym{LALR} parser tables by default for +historical reasons, @acronym{IELR} or canonical @acronym{LR} is almost +always preferable for deterministic parsers. +The trouble is that @acronym{LALR} parser tables can suffer from +mysterious conflicts and thus may not accept the full set of sentences +that @acronym{IELR} and canonical @acronym{LR} accept. +@xref{Mystery Conflicts}, for details. +However, there are at least two scenarios where @acronym{LALR} may be +worthwhile: +@itemize +@cindex @acronym{GLR} with @acronym{LALR} +@item When employing @acronym{GLR} parsers (@pxref{GLR Parsers}), if you +do not resolve any conflicts statically (for example, with @code{%left} +or @code{%prec}), then the parser explores all potential parses of any +given input. +In this case, the use of @acronym{LALR} parser tables is guaranteed not +to alter the language accepted by the parser. +@acronym{LALR} parser tables are the smallest parser tables Bison can +currently generate, so they may be preferable. + +@item Occasionally during development, an especially malformed grammar +with a major recurring flaw may severely impede the @acronym{IELR} or +canonical @acronym{LR} parser table generation algorithm. +@acronym{LALR} can be a quick way to generate parser tables in order to +investigate such problems while ignoring the more subtle differences +from @acronym{IELR} and canonical @acronym{LR}. +@end itemize + +@item @code{ielr}. +@acronym{IELR} is a minimal @acronym{LR} algorithm. +That is, given any grammar (@acronym{LR} or non-@acronym{LR}), +@acronym{IELR} and canonical @acronym{LR} always accept exactly the same +set of sentences. +However, as for @acronym{LALR}, the number of parser states is often an +order of magnitude less for @acronym{IELR} than for canonical +@acronym{LR}. +More importantly, because canonical @acronym{LR}'s extra parser states +may contain duplicate conflicts in the case of non-@acronym{LR} +grammars, the number of conflicts for @acronym{IELR} is often an order +of magnitude less as well. +This can significantly reduce the complexity of developing of a grammar. + +@item @code{canonical-lr}. +@cindex delayed syntax errors +@cindex syntax errors delayed +The only advantage of canonical @acronym{LR} over @acronym{IELR} is +that, for every left context of every canonical @acronym{LR} state, the +set of tokens accepted by that state is the exact set of tokens that is +syntactically acceptable in that left context. +Thus, the only difference in parsing behavior is that the canonical +@acronym{LR} parser can report a syntax error as soon as possible +without performing any unnecessary reductions. +@xref{Decl Summary,,lr.default-reductions}, for further details. +Even when canonical @acronym{LR} behavior is ultimately desired, +@acronym{IELR}'s elimination of duplicate conflicts should still +facilitate the development of a grammar. +@end itemize + +@item Default Value: @code{lalr} +@end itemize + +@c ================================================== namespace @item namespace @findex %define namespace +Obsoleted by @code{api.namespace} +@c namespace + + +@c ================================================== parse.assert +@item parse.assert +@findex %define parse.assert @itemize @item Languages(s): C++ -@item Purpose: Specifies the namespace for the parser class. -For example, if you specify: +@item Purpose: Issue runtime assertions to catch invalid uses. +In C++, when variants are used, symbols must be constructed and +destroyed properly. This option checks these constraints. -@smallexample -%define namespace "foo::bar" -@end smallexample +@item Accepted Values: Boolean -Bison uses @code{foo::bar} verbatim in references such as: +@item Default Value: @code{false} +@end itemize +@c parse.assert -@smallexample -foo::bar::parser::semantic_type -@end smallexample -However, to open a namespace, Bison removes any leading @code{::} and then -splits on any remaining occurrences: +@c ================================================== parse.error +@item parse.error +@findex %define parse.error +@itemize +@item Languages(s): +all. +@item Purpose: +Control the kind of error messages passed to the error reporting +function. @xref{Error Reporting, ,The Error Reporting Function +@code{yyerror}}. +@item Accepted Values: +@itemize +@item @code{simple} +Error messages passed to @code{yyerror} are simply @w{@code{"syntax +error"}}. +@item @code{verbose} +Error messages report the unexpected token, and possibly the expected +ones. +@end itemize -@smallexample -namespace foo @{ namespace bar @{ - class position; - class location; -@} @} -@end smallexample +@item Default Value: +@code{simple} +@end itemize +@c parse.error -@item Accepted Values: Any absolute or relative C++ namespace reference without -a trailing @code{"::"}. -For example, @code{"foo"} or @code{"::foo::bar"}. - -@item Default Value: The value specified by @code{%name-prefix}, which defaults -to @code{yy}. -This usage of @code{%name-prefix} is for backward compatibility and can be -confusing since @code{%name-prefix} also specifies the textual prefix for the -lexical analyzer function. -Thus, if you specify @code{%name-prefix}, it is best to also specify -@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the -lexical analyzer function. -For example, if you specify: -@smallexample -%define namespace "foo" -%name-prefix "bar::" -@end smallexample +@c ================================================== parse.trace +@item parse.trace +@findex %define parse.trace -The parser namespace is @code{foo} and @code{yylex} is referenced as -@code{bar::lex}. -@end itemize +@itemize +@item Languages(s): C, C++ + +@item Purpose: Require parser instrumentation for tracing. +In C/C++, define the macro @code{YYDEBUG} to 1 in the parser file if it +is not already defined, so that the debugging facilities are compiled. +@xref{Tracing, ,Tracing Your Parser}. + +@item Accepted Values: Boolean + +@item Default Value: @code{false} @end itemize +@c parse.trace +@end table @end deffn +@c ---------------------------------------------------------- %define @deffn {Directive} %defines Write a header file containing macro definitions for the token type @@ -5083,7 +5352,7 @@ is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs}, @code{yypstate_new} and @code{yypstate_delete} will also be renamed. For example, if you use @samp{%name-prefix "c_"}, the names become @code{c_parse}, @code{c_lex}, and so on. -For C++ parsers, see the @code{%define namespace} documentation in this +For C++ parsers, see the @samp{%define api.namespace} documentation in this section. @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. @end deffn @@ -5110,7 +5379,7 @@ Specify @var{file} for the parser file. @end deffn @deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{Decl Summary, ,%define}), +Deprecated version of @samp{%define api.pure} (@pxref{Decl Summary, ,%define}), for which Bison is more careful to warn about unreasonable usage. @end deffn @@ -5229,19 +5498,17 @@ identifier (aside from those in this manual) in an action or in epilogue in the grammar file, you are likely to run into trouble. @menu -* Parser Function:: How to call @code{yyparse} and what it returns. -* Push Parser Function:: How to call @code{yypush_parse} and what it returns. -* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. -* Parser Create Function:: How to call @code{yypstate_new} and what it - returns. -* Parser Delete Function:: How to call @code{yypstate_delete} and what it - returns. -* Lexical:: You must supply a function @code{yylex} - which reads tokens. -* Error Reporting:: You must supply a function @code{yyerror}. -* Action Features:: Special features for use in actions. -* Internationalization:: How to let the parser speak in the user's - native language. +* Parser Function:: How to call @code{yyparse} and what it returns. +* Push Parser Function:: How to call @code{yypush_parse} and what it returns. +* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. +* Parser Create Function:: How to call @code{yypstate_new} and what it returns. +* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. +* Lexical:: You must supply a function @code{yylex} + which reads tokens. +* Error Reporting:: You must supply a function @code{yyerror}. +* Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. @end menu @node Parser Function @@ -5326,8 +5593,8 @@ exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} More user feedback will help to stabilize it.) You call the function @code{yypush_parse} to parse a single token. This -function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +function is available if either the @samp{%define api.push-pull push} or +@samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun int yypush_parse (yypstate *yyps) @@ -5344,7 +5611,7 @@ is required to finish parsing the grammar. More user feedback will help to stabilize it.) You call the function @code{yypull_parse} to parse the rest of the input -stream. This function is available if the @code{%define api.push_pull "both"} +stream. This function is available if the @samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. @@ -5360,8 +5627,8 @@ The value returned by @code{yypull_parse} is the same as for @code{yyparse}. More user feedback will help to stabilize it.) You call the function @code{yypstate_new} to create a new parser instance. -This function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +This function is available if either the @samp{%define api.push-pull push} or +@samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun yypstate *yypstate_new (void) @@ -5379,8 +5646,8 @@ allocated. More user feedback will help to stabilize it.) You call the function @code{yypstate_delete} to delete a parser instance. -function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +function is available if either the @samp{%define api.push-pull push} or +@samp{%define api.push-pull both} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun void yypstate_delete (yypstate *yyps) @@ -5408,13 +5675,13 @@ that need it. @xref{Invocation, ,Invoking Bison}. @menu * Calling Convention:: How @code{yyparse} calls @code{yylex}. -* Token Values:: How @code{yylex} must return the semantic value - of the token it has read. -* Token Locations:: How @code{yylex} must return the text location - (line number, etc.) of the token, if the - actions want that. -* Pure Calling:: How the calling convention differs - in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +* Token Values:: How @code{yylex} must return the semantic value + of the token it has read. +* Token Locations:: How @code{yylex} must return the text location + (line number, etc.) of the token, if the + actions want that. +* Pure Calling:: How the calling convention differs in a pure parser + (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). @end menu @node Calling Convention @@ -5568,7 +5835,7 @@ The data type of @code{yylloc} has the name @code{YYLTYPE}. @node Pure Calling @subsection Calling Conventions for Pure Parsers -When you use the Bison declaration @code{%define api.pure} to request a +When you use the Bison declaration @samp{%define api.pure} to request a pure, reentrant parser, the global communication variables @code{yylval} and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.) In such parsers the two global variables are replaced by @@ -5619,7 +5886,7 @@ int yylex (int *nastiness); int yyparse (int *nastiness, int *randomness); @end example -If @code{%define api.pure} is added: +If @samp{%define api.pure} is added: @example int yylex (YYSTYPE *lvalp, int *nastiness); @@ -5627,7 +5894,7 @@ int yyparse (int *nastiness, int *randomness); @end example @noindent -and finally, if both @code{%define api.pure} and @code{%locations} are used: +and finally, if both @samp{%define api.pure} and @code{%locations} are used: @example int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); @@ -5641,7 +5908,7 @@ int yyparse (int *nastiness, int *randomness); @cindex parse error @cindex syntax error -The Bison parser detects a @dfn{syntax error} or @dfn{parse error} +The Bison parser detects a @dfn{syntax error} (or @dfn{parse error}) whenever it reads a token which cannot satisfy any syntax rule. An action in the grammar can also explicitly proclaim an error, using the macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use @@ -5653,8 +5920,8 @@ called by @code{yyparse} whenever a syntax error is found, and it receives one argument. For a syntax error, the string is normally @w{@code{"syntax error"}}. -@findex %error-verbose -If you invoke the directive @code{%error-verbose} in the Bison +@findex %define parse.error +If you invoke @samp{%define parse.error verbose} in the Bison declarations section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose and specific error message string instead of just plain @w{@code{"syntax error"}}. @@ -5711,7 +5978,7 @@ void yyerror (int *nastiness, char const *msg); /* GLR parsers. */ Finally, @acronym{GLR} and Yacc parsers share the same @code{yyerror} calling convention for absolutely pure parsers, i.e., when the calling convention of @code{yylex} @emph{and} the calling convention of -@code{%define api.pure} are pure. +@samp{%define api.pure} are pure. I.e.: @example @@ -6057,7 +6324,7 @@ This kind of parser is known in the literature as a bottom-up parser. * Contextual Precedence:: When an operator's precedence depends on context. * Parser States:: The parser is a finite-state-machine with stack. * Reduce/Reduce:: When two rules are applicable in the same situation. -* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. +* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. * Generalized LR Parsing:: Parsing arbitrary context-free grammars. * Memory Management:: What happens when memory is exhausted. How to avoid it. @end menu @@ -6668,12 +6935,13 @@ a @code{name} if a comma or colon follows, or a @code{type} if another @cindex @acronym{LR}(1) @cindex @acronym{LALR}(1) -However, Bison, like most parser generators, cannot actually handle all -@acronym{LR}(1) grammars. In this grammar, two contexts, that after -an @code{ID} -at the beginning of a @code{param_spec} and likewise at the beginning of -a @code{return_spec}, are similar enough that Bison assumes they are the -same. They appear similar because the same set of rules would be +However, for historical reasons, Bison cannot by default handle all +@acronym{LR}(1) grammars. +In this grammar, two contexts, that after an @code{ID} at the beginning +of a @code{param_spec} and likewise at the beginning of a +@code{return_spec}, are similar enough that Bison assumes they are the +same. +They appear similar because the same set of rules would be active---the rule for reducing to a @code{name} and that for reducing to a @code{type}. Bison is unable to determine at that stage of processing that the rules would require different lookahead tokens in the two @@ -6681,16 +6949,22 @@ contexts, so it makes a single parser state for them both. Combining the two contexts causes a conflict later. In parser terminology, this occurrence means that the grammar is not @acronym{LALR}(1). -In general, it is better to fix deficiencies than to document them. But -this particular deficiency is intrinsically hard to fix; parser -generators that can handle @acronym{LR}(1) grammars are hard to write -and tend to -produce parsers that are very large. In practice, Bison is more useful -as it is now. - -When the problem arises, you can often fix it by identifying the two -parser states that are being confused, and adding something to make them -look distinct. In the above example, adding one rule to +For many practical grammars (specifically those that fall into the +non-@acronym{LR}(1) class), the limitations of @acronym{LALR}(1) result in +difficulties beyond just mysterious reduce/reduce conflicts. +The best way to fix all these problems is to select a different parser +table generation algorithm. +Either @acronym{IELR}(1) or canonical @acronym{LR}(1) would suffice, but +the former is more efficient and easier to debug during development. +@xref{Decl Summary,,lr.type}, for details. +(Bison's @acronym{IELR}(1) and canonical @acronym{LR}(1) implementations +are experimental. +More user feedback will help to stabilize them.) + +If you instead wish to work around @acronym{LALR}(1)'s limitations, you +can often fix a mysterious conflict by identifying the two parser states +that are being confused, and adding something to make them look +distinct. In the above example, adding one rule to @code{return_spec} as follows makes the problem go away: @example @@ -6758,7 +7032,7 @@ The same is true of languages that require more than one symbol of lookahead, since the parser lacks the information necessary to make a decision at the point it must be made in a shift-reduce parser. Finally, as previously mentioned (@pxref{Mystery Conflicts}), -there are languages where Bison's particular choice of how to +there are languages where Bison's default choice of how to summarize the input seen so far loses necessary information. When you use the @samp{%glr-parser} declaration in your grammar file, @@ -6790,7 +7064,7 @@ grammar symbol that produces the same segment of the input token stream. Whenever the parser makes a transition from having multiple -states to having one, it reverts to the normal @acronym{LALR}(1) parsing +states to having one, it reverts to the normal deterministic parsing algorithm, after resolving and executing the saved-up actions. At this transition, some of the states on the stack will have semantic values that are sets (actually multisets) of possible actions. The @@ -6803,9 +7077,9 @@ Bison resolves and evaluates both and then calls the merge function on the result. Otherwise, it reports an ambiguity. It is possible to use a data structure for the @acronym{GLR} parsing tree that -permits the processing of any @acronym{LALR}(1) grammar in linear time (in the +permits the processing of any @acronym{LR}(1) grammar in linear time (in the size of the input), any unambiguous (not necessarily -@acronym{LALR}(1)) grammar in +@acronym{LR}(1)) grammar in quadratic worst-case time, and any general (possibly ambiguous) context-free grammar in cubic worst-case time. However, Bison currently uses a simpler data structure that requires time proportional to the @@ -6815,9 +7089,9 @@ grammars can require exponential time and space to process. Such badly behaving examples, however, are not generally of practical interest. Usually, nondeterminism in a grammar is local---the parser is ``in doubt'' only for a few tokens at a time. Therefore, the current data -structure should generally be adequate. On @acronym{LALR}(1) portions of a -grammar, in particular, it is only slightly slower than with the default -Bison parser. +structure should generally be adequate. On @acronym{LR}(1) portions of a +grammar, in particular, it is only slightly slower than with the +deterministic @acronym{LR}(1) Bison parser. For a more detailed exposition of @acronym{GLR} parsers, please see: Elizabeth Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style @@ -6866,16 +7140,16 @@ The default value of @code{YYMAXDEPTH}, if you do not define it, is @vindex YYINITDEPTH You can control how much stack is allocated initially by defining the -macro @code{YYINITDEPTH} to a positive integer. For the C -@acronym{LALR}(1) parser, this value must be a compile-time constant +macro @code{YYINITDEPTH} to a positive integer. For the deterministic +parser in C, this value must be a compile-time constant unless you are assuming C99 or some other target language or compiler that allows variable-length arrays. The default is 200. Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}. @c FIXME: C++ output. -Because of semantical differences between C and C++, the -@acronym{LALR}(1) parsers in C produced by Bison cannot grow when compiled +Because of semantical differences between C and C++, the deterministic +parsers in C produced by Bison cannot grow when compiled by C++ compilers. In this precise case (compiling a C parser as C++) you are suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix this deficiency in a future release. @@ -7263,7 +7537,8 @@ useless: STR; @command{bison} reports: @example -calc.y: warning: 1 nonterminal and 1 rule useless in grammar +calc.y: warning: 1 nonterminal useless in grammar +calc.y: warning: 1 rule useless in grammar calc.y:11.1-7: warning: nonterminal useless in grammar: useless calc.y:11.10-12: warning: rule useless in grammar: useless: STR calc.y: conflicts: 7 shift/reduce @@ -7451,6 +7726,7 @@ control will jump to state 4, corresponding to the item @samp{exp -> exp '+' . exp}. Since there is no default action, any other token than those listed above will trigger a syntax error. +@cindex accepting state The state 3 is named the @dfn{final state}, or the @dfn{accepting state}: @@ -7531,7 +7807,7 @@ sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM / NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) / NUM}, which corresponds to reducing rule 1. -Because in @acronym{LALR}(1) parsing a single decision can be made, Bison +Because in deterministic parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, , Shift/Reduce Conflicts}. Discarded actions are reported in between square brackets. @@ -7648,15 +7924,21 @@ Use the @samp{-t} option when you run Bison (@pxref{Invocation, @item the directive @samp{%debug} @findex %debug -Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison -Declaration Summary}). This is a Bison extension, which will prove -useful when Bison will output parsers for languages that don't use a -preprocessor. Unless @acronym{POSIX} and Yacc portability matter to -you, this is -the preferred solution. +Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison Declaration +Summary}). This Bison extension is maintained for backward +compatibility with previous versions of Bison. + +@item the variable @samp{parse.trace} +@findex %define parse.trace +Add the @samp{%define parse.trace} directive (@pxref{Decl Summary, +,Bison Declaration Summary}), or pass the @option{-Dparse.trace} option +(@pxref{Bison Options}). This is a Bison extension, which is especially +useful for languages that don't use a preprocessor. Unless +@acronym{POSIX} and Yacc portability matter to you, this is the +preferred solution. @end table -We suggest that you always enable the debug option so that debugging is +We suggest that you always enable the trace option so that debugging is always possible. The trace facility outputs messages with macro calls of the form @@ -7713,7 +7995,7 @@ standard I/O stream, the numeric code for the token type, and the token value (from @code{yylval}). Here is an example of @code{YYPRINT} suitable for the multi-function -calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}): +calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}): @smallexample %@{ @@ -7825,7 +8107,7 @@ other minor ways. Most importantly, imitate Yacc's output file name conventions, so that the parser output file is called @file{y.tab.c}, and the other outputs are called @file{y.output} and @file{y.tab.h}. -Also, if generating an @acronym{LALR}(1) parser in C, generate @code{#define} +Also, if generating a deterministic parser in C, generate @code{#define} statements in addition to an @code{enum} to associate token numbers with token names. Thus, the following shell script can substitute for Yacc, and the Bison @@ -7841,8 +8123,8 @@ traditional Yacc grammars. If your grammar uses a Bison extension like @samp{%glr-parser}, Bison might not be Yacc-compatible even if this option is specified. -@item -W -@itemx --warnings +@item -W [@var{category}] +@itemx --warnings[=@var{category}] Output warnings falling in @var{category}. @var{category} can be one of: @table @code @@ -7895,8 +8177,32 @@ already defined, so that the debugging facilities are compiled. @item -D @var{name}[=@var{value}] @itemx --define=@var{name}[=@var{value}] -Same as running @samp{%define @var{name} "@var{value}"} (@pxref{Decl -Summary, ,%define}). +@itemx -F @var{name}[=@var{value}] +@itemx --force-define=@var{name}[=@var{value}] +Each of these is equivalent to @samp{%define @var{name} "@var{value}"} +(@pxref{Decl Summary, ,%define}) except that Bison processes multiple +definitions for the same @var{name} as follows: + +@itemize +@item +Bison quietly ignores all command-line definitions for @var{name} except +the last. +@item +If that command-line definition is specified by a @code{-D} or +@code{--define}, Bison reports an error for any @code{%define} +definition for @var{name}. +@item +If that command-line definition is specified by a @code{-F} or +@code{--force-define} instead, Bison quietly ignores all @code{%define} +definitions for @var{name}. +@item +Otherwise, Bison reports an error if there are multiple @code{%define} +definitions for @var{name}. +@end itemize + +You should avoid using @code{-F} and @code{--force-define} in your +makefiles unless you are confident that it is safe to quietly ignore any +conflicting @code{%define} that may be added to the grammar file. @item -L @var{language} @itemx --language=@var{language} @@ -7972,7 +8278,7 @@ separated list of @var{things} among: @table @code @item state Description of the grammar, conflicts (resolved and unresolved), and -@acronym{LALR} automaton. +parser's automaton. @item lookahead Implies @code{state} and augments the description of the automaton with @@ -7999,18 +8305,18 @@ Specify the @var{file} for the parser file. The other output files' names are constructed from @var{file} as described under the @samp{-v} and @samp{-d} options. -@item -g[@var{file}] +@item -g [@var{file}] @itemx --graph[=@var{file}] -Output a graphical representation of the @acronym{LALR}(1) grammar +Output a graphical representation of the parser's automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz} @uref{http://www.graphviz.org/doc/info/lang.html, @acronym{DOT}} format. @code{@var{file}} is optional. If omitted and the grammar file is @file{foo.y}, the output file will be @file{foo.dot}. -@item -x[@var{file}] +@item -x [@var{file}] @itemx --xml[=@var{file}] -Output an XML report of the @acronym{LALR}(1) automaton computed by Bison. +Output an XML report of the parser's automaton computed by Bison. @code{@var{file}} is optional. If omitted and the grammar file is @file{foo.y}, the output file will be @file{foo.xml}. @@ -8021,12 +8327,11 @@ More user feedback will help to stabilize it.) @node Option Cross Key @section Option Cross Key -@c FIXME: How about putting the directives too? Here is a list of options, alphabetized by long option, to help you find -the corresponding short option. +the corresponding short option and directive. -@multitable {@option{--defines=@var{defines-file}}} {@option{-b @var{file-prefix}XXX}} -@headitem Long Option @tab Short Option +@multitable {@option{--force-define=@var{name}[=@var{value}]}} {@option{-F @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}} +@headitem Long Option @tab Short Option @tab Bison Directive @include cross-options.texi @end multitable @@ -8084,15 +8389,16 @@ int yyparse (void); @c - Always pure @c - initial action -The C++ @acronym{LALR}(1) parser is selected using the skeleton directive, +The C++ deterministic parser is selected using the skeleton directive, @samp{%skeleton "lalr1.c"}, or the synonymous command-line option @option{--skeleton=lalr1.c}. @xref{Decl Summary}. When run, @command{bison} will create several entities in the @samp{yy} namespace. -@findex %define namespace -Use the @samp{%define namespace} directive to change the namespace name, see +@findex %define api.namespace +Use the @samp{%define api.namespace} directive to change the namespace +name, see @ref{Decl Summary}. The various classes are generated in the following files: @@ -8276,7 +8582,7 @@ described by @var{m}. The parser invokes the scanner by calling @code{yylex}. Contrary to C parsers, C++ parsers are always pure: there is no point in using the -@code{%define api.pure} directive. Therefore the interface is as follows. +@samp{%define api.pure} directive. Therefore the interface is as follows. @deftypemethod {parser} {int} yylex (semantic_value_type& @var{yylval}, location_type& @var{yylloc}, @var{type1} @var{arg1}, ...) Return the next token. Its type is the return value, its semantic @@ -8473,10 +8779,10 @@ calcxx_driver::error (const std::string& m) @subsubsection Calc++ Parser The parser definition file @file{calc++-parser.yy} starts by asking for -the C++ LALR(1) skeleton, the creation of the parser header file, and -specifies the name of the parser class. Because the C++ skeleton -changed several times, it is safer to require the version you designed -the grammar for. +the C++ deterministic parser skeleton, the creation of the parser header +file, and specifies the name of the parser class. +Because the C++ skeleton changed several times, it is safer to require +the version you designed the grammar for. @comment file: calc++-parser.yy @example @@ -8538,8 +8844,8 @@ error messages. @comment file: calc++-parser.yy @example -%debug -%error-verbose +%define parse.trace +%define parse.error verbose @end example @noindent @@ -8571,13 +8877,14 @@ The code between @samp{%code @{} and @samp{@}} is output in the @noindent The token numbered as 0 corresponds to end of file; the following line -allows for nicer error messages referring to ``end of file'' instead -of ``$end''. Similarly user friendly named are provided for each -symbol. Note that the tokens names are prefixed by @code{TOKEN_} to -avoid name clashes. +allows for nicer error messages referring to ``end of file'' instead of +``$end''. Similarly user friendly names are provided for each symbol. +To avoid name clashes in the generated files (@pxref{Calc++ Scanner}), +prefix tokens with @code{TOK_} (@pxref{Decl Summary,, api.tokens.prefix}). @comment file: calc++-parser.yy @example +%define api.tokens.prefix "TOK_" %token END 0 "end of file" %token ASSIGN ":=" %token IDENTIFIER "identifier" @@ -8607,22 +8914,24 @@ The grammar itself is straightforward. %start unit; unit: assignments exp @{ driver.result = $2; @}; -assignments: assignments assignment @{@} - | /* Nothing. */ @{@}; +assignments: + assignments assignment @{@} +| /* Nothing. */ @{@}; assignment: - "identifier" ":=" exp + "identifier" ":=" exp @{ driver.variables[*$1] = $3; delete $1; @}; %left '+' '-'; %left '*' '/'; -exp: exp '+' exp @{ $$ = $1 + $3; @} - | exp '-' exp @{ $$ = $1 - $3; @} - | exp '*' exp @{ $$ = $1 * $3; @} - | exp '/' exp @{ $$ = $1 / $3; @} - | '(' exp ')' @{ $$ = $2; @} - | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} - | "number" @{ $$ = $1; @}; +exp: + exp '+' exp @{ $$ = $1 + $3; @} +| exp '-' exp @{ $$ = $1 - $3; @} +| exp '*' exp @{ $$ = $1 * $3; @} +| exp '/' exp @{ $$ = $1 / $3; @} +| '(' exp ')' @{ $$ = $2; @} +| "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} +| "number" @{ $$ = $1; @}; %% @end example @@ -8650,8 +8959,8 @@ parser's to get the set of defined tokens. @example %@{ /* -*- C++ -*- */ # include -# include -# include +# include +# include # include # include "calc++-driver.hh" # include "calc++-parser.hh" @@ -8663,10 +8972,10 @@ parser's to get the set of defined tokens. # undef yywrap # define yywrap() 1 -/* By default yylex returns int, we use token_type. - Unfortunately yyterminate by default returns 0, which is +/* By default yylex returns an int; we use token_type. + The default yyterminate implementation returns 0, which is not of token_type. */ -#define yyterminate() return token::END +#define yyterminate() return TOKEN(END) %@} @end example @@ -8714,28 +9023,32 @@ preceding tokens. Comments would be treated equally. @end example @noindent -The rules are simple, just note the use of the driver to report errors. -It is convenient to use a typedef to shorten -@code{yy::calcxx_parser::token::identifier} into -@code{token::identifier} for instance. +The rules are simple. The driver is used to report errors. It is +convenient to use a macro to shorten +@code{yy::calcxx_parser::token::TOK_@var{Name}} into +@code{TOKEN(@var{Name})}; note the token prefix, @code{TOK_}. @comment file: calc++-scanner.ll @example %@{ - typedef yy::calcxx_parser::token token; +# define TOKEN(Name) \ + yy::calcxx_parser::token::TOK_ ## Name %@} /* Convert ints to the actual type of tokens. */ [-+*/()] return yy::calcxx_parser::token_type (yytext[0]); -":=" return token::ASSIGN; +":=" return TOKEN(ASSIGN); @{int@} @{ errno = 0; long n = strtol (yytext, NULL, 10); if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) driver.error (*yylloc, "integer is out of range"); yylval->ival = n; - return token::NUMBER; + return TOKEN(NUMBER); +@} +@{id@} @{ + yylval->sval = new std::string (yytext); + return TOKEN(IDENTIFIER); @} -@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; . driver.error (*yylloc, "invalid character"); %% @end example @@ -8798,14 +9111,14 @@ main (int argc, char *argv[]) @section Java Parsers @menu -* Java Bison Interface:: Asking for Java parser generation -* Java Semantic Values:: %type and %token vs. Java -* Java Location Values:: The position and location classes -* Java Parser Interface:: Instantiating and running the parser -* Java Scanner Interface:: Specifying the scanner for the parser -* Java Action Features:: Special features for use in actions. -* Java Differences:: Differences between C/C++ and Java Grammars -* Java Declarations Summary:: List of Bison declarations used with Java +* Java Bison Interface:: Asking for Java parser generation +* Java Semantic Values:: %type and %token vs. Java +* Java Location Values:: The position and location classes +* Java Parser Interface:: Instantiating and running the parser +* Java Scanner Interface:: Specifying the scanner for the parser +* Java Action Features:: Special features for use in actions +* Java Differences:: Differences between C/C++ and Java Grammars +* Java Declarations Summary:: List of Bison declarations used with Java @end menu @node Java Bison Interface @@ -8833,11 +9146,11 @@ You can create documentation for generated parsers using Javadoc. Contrary to C parsers, Java parsers do not use global variables; the state of the parser is always local to an instance of the parser class. Therefore, all Java parsers are ``pure'', and the @code{%pure-parser} -and @code{%define api.pure} directives does not do anything when used in +and @samp{%define api.pure} directives does not do anything when used in Java. Push parsers are currently unsupported in Java and @code{%define -api.push_pull} have no effect. +api.push-pull} have no effect. @acronym{GLR} parsers are currently unsupported in Java. Do not use the @code{glr-parser} directive. @@ -8846,11 +9159,13 @@ No header file can be generated for Java parsers. Do not use the @code{%defines} directive or the @option{-d}/@option{--defines} options. @c FIXME: Possible code change. -Currently, support for debugging is always compiled -in. Thus the @code{%debug} and @code{%token-table} directives and the +Currently, support for tracing is always compiled +in. Thus the @samp{%define parse.trace} and @samp{%token-table} +directives and the @option{-t}/@option{--debug} and @option{-k}/@option{--token-table} options have no effect. This may change in the future to eliminate -unused code in the generated parser, so use @code{%debug} explicitly +unused code in the generated parser, so use @samp{%define parse.trace} +explicitly if needed. Also, in the future the @code{%token-table} directive might enable a public interface to access the token names and codes. @@ -8879,7 +9194,7 @@ semantic values' types (class names) should be specified in the By default, the semantic stack is declared to have @code{Object} members, which means that the class types you specify can be of any class. To improve the type safety of the parser, you can declare the common -superclass of all the semantic values using the @code{%define stype} +superclass of all the semantic values using the @samp{%define stype} directive. For example, after the following declaration: @example @@ -8919,11 +9234,11 @@ in a file; Bison itself defines a class representing a @dfn{location}, a range composed of a pair of positions (possibly spanning several files). The location class is an inner class of the parser; the name is @code{Location} by default, and may also be renamed using -@code{%define location_type "@var{class-name}}. +@samp{%define location_type "@var{class-name}"}. The location class treats the position as a completely opaque value. By default, the class name is @code{Position}, but this can be changed -with @code{%define position_type "@var{class-name}"}. This class must +with @samp{%define position_type "@var{class-name}"}. This class must be supplied by the user. @@ -8933,7 +9248,7 @@ The first, inclusive, position of the range, and the first beyond. @end deftypeivar @deftypeop {Constructor} {Location} {} Location (Position @var{loc}) -Create a @code{Location} denoting an empty range located at a given point. +Create a @code{Location} denoting an empty range located at a given point. @end deftypeop @deftypeop {Constructor} {Location} {} Location (Position @var{begin}, Position @var{end}) @@ -8958,22 +9273,22 @@ properly, the position class should override the @code{equals} and The name of the generated parser class defaults to @code{YYParser}. The @code{YY} prefix may be changed using the @code{%name-prefix} directive or the @option{-p}/@option{--name-prefix} option. Alternatively, use -@code{%define parser_class_name "@var{name}"} to give a custom name to +@samp{%define parser_class_name "@var{name}"} to give a custom name to the class. The interface of this class is detailed below. By default, the parser class has package visibility. A declaration -@code{%define public} will change to public visibility. Remember that, +@samp{%define public} will change to public visibility. Remember that, according to the Java language specification, the name of the @file{.java} file should match the name of the class in this case. Similarly, you can use @code{abstract}, @code{final} and @code{strictfp} with the @code{%define} declaration to add other modifiers to the parser class. -A single @code{%define annotations "@var{annotations}"} directive can +A single @samp{%define annotations "@var{annotations}"} directive can be used to add any number of annotations to the parser class. The Java package name of the parser class can be specified using the -@code{%define package} directive. The superclass and the implemented +@samp{%define package} directive. The superclass and the implemented interfaces of the parser class can be specified with the @code{%define -extends} and @code{%define implements} directives. +extends} and @samp{%define implements} directives. The parser class defines an inner class, @code{Location}, that is used for location tracking (see @ref{Java Location Values}), and a inner @@ -8994,7 +9309,7 @@ used. Use @code{%code init} for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use -@code{%define init_throws} to specify any uncatch exceptions. +@samp{%define init_throws} to specify any uncatch exceptions. @end deftypeop @deftypeop {Constructor} {YYParser} {} YYParser (Lexer @var{lexer}, @var{parse_param}, @dots{}) @@ -9007,7 +9322,7 @@ created with the correct @code{%lex-param}s. Use @code{%code init} for code added to the start of the constructor body. This is especially useful to initialize superclasses. Use -@code{%define init_throws} to specify any uncatch exceptions. +@samp{%define init_throws} to specify any uncatch exceptions. @end deftypeop @deftypemethod {YYParser} {boolean} parse () @@ -9018,7 +9333,7 @@ Run the syntactic analysis, and return @code{true} on success, @deftypemethod {YYParser} {boolean} getErrorVerbose () @deftypemethodx {YYParser} {void} setErrorVerbose (boolean @var{verbose}) Get or set the option to produce verbose error messages. These are only -available with the @code{%error-verbose} directive, which also turn on +available with @samp{%define parse.error verbose}, which also turns on verbose error messages. @end deftypemethod @@ -9084,7 +9399,7 @@ In both cases, the scanner has to implement the following methods. @deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg}) This method is defined by the user to emit an error message. The first parameter is omitted if location tracking is not active. Its type can be -changed using @code{%define location_type "@var{class-name}".} +changed using @samp{%define location_type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {int} yylex () @@ -9092,7 +9407,7 @@ Return the next token. Its type is the return value, its semantic value and location are saved and returned by the ther methods in the interface. -Use @code{%define lex_throws} to specify any uncaught exceptions. +Use @samp{%define lex_throws} to specify any uncaught exceptions. Default is @code{java.io.IOException}. @end deftypemethod @@ -9102,14 +9417,14 @@ Return respectively the first position of the last token that @code{yylex} returned, and the first position beyond it. These methods are not needed unless location tracking is active. -The return type can be changed using @code{%define position_type +The return type can be changed using @samp{%define position_type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {Object} getLVal () Return the semantical value of the last token that yylex returned. -The return type can be changed using @code{%define stype +The return type can be changed using @samp{%define stype "@var{class-name}".} @end deftypemethod @@ -9120,7 +9435,7 @@ The return type can be changed using @code{%define stype The following special constructs can be uses in Java actions. Other analogous C action features are currently unavailable for Java. -Use @code{%define throws} to specify any uncaught exceptions from parser +Use @samp{%define throws} to specify any uncaught exceptions from parser actions, and initial actions specified by @code{%initial-action}. @defvar $@var{n} @@ -9137,7 +9452,7 @@ Like @code{$@var{n}} but specifies a alternative type @var{typealt}. @defvar $$ The semantic value for the grouping made by the current rule. As a value, this is in the base type (@code{Object} or as specified by -@code{%define stype}) as in not cast to the declared subtype because +@samp{%define stype}) as in not cast to the declared subtype because casts are not allowed on the left-hand side of Java assignments. Use an explicit Java cast if the correct subtype is needed. @xref{Java Semantic Values}. @@ -9173,12 +9488,12 @@ Return immediately from the parser, indicating success. @end deffn @deffn {Statement} {return YYERROR;} -Start error recovery without printing an error message. +Start error recovery without printing an error message. @xref{Error Recovery}. @end deffn @deffn {Statement} {return YYFAIL;} -Print an error message and start error recovery. +Print an error message and start error recovery. @xref{Error Recovery}. @end deffn @@ -9224,7 +9539,7 @@ corresponds to these C macros.}. @item Java lacks unions, so @code{%union} has no effect. Instead, semantic values have a common base type: @code{Object} or as specified by -@code{%define stype}. Angle backets on @code{%token}, @code{type}, +@samp{%define stype}. Angle backets on @code{%token}, @code{type}, @code{$@var{n}} and @code{$$} specify subtypes rather than fields of an union. The type of @code{$$}, even with angle brackets, is the base type since Java casts are not allow on the left-hand side of assignments. @@ -9238,7 +9553,7 @@ The prolog declarations have a different meaning than in C/C++ code. @item @code{%code imports} blocks are placed at the beginning of the Java source code. They may include copyright notices. For a @code{package} declarations, it is -suggested to use @code{%define package} instead. +suggested to use @samp{%define package} instead. @item unqualified @code{%code} blocks are placed inside the parser class. @@ -9279,7 +9594,7 @@ constructor that @emph{creates} a lexer. Default is none. @deffn {Directive} %name-prefix "@var{prefix}" The prefix of the parser class name @code{@var{prefix}Parser} if -@code{%define parser_class_name} is not used. Default is @code{YY}. +@samp{%define parser_class_name} is not used. Default is @code{YY}. @xref{Java Bison Interface}. @end deffn @@ -9469,7 +9784,7 @@ or @display My parser includes support for an @samp{#include}-like feature, in which case I run @code{yyparse} from @code{yyparse}. This fails -although I did specify @code{%define api.pure}. +although I did specify @samp{%define api.pure}. @end display These problems typically come not from Bison itself, but from @@ -9895,10 +10210,6 @@ Insert @var{code} verbatim into output parser source. Equip the parser for debugging. @xref{Decl Summary}. @end deffn -@deffn {Directive} %debug -Equip the parser for debugging. @xref{Decl Summary}. -@end deffn - @ifset defaultprec @deffn {Directive} %default-prec Assign a precedence to rules that lack an explicit @samp{%prec} @@ -9909,6 +10220,7 @@ Precedence}. @deffn {Directive} %define @var{define-variable} @deffnx {Directive} %define @var{define-variable} @var{value} +@deffnx {Directive} %define @var{define-variable} "@var{value}" Define a variable to adjust Bison's behavior. @xref{Decl Summary,,%define}. @end deffn @@ -9951,8 +10263,7 @@ token is reset to the token that originally caused the violation. @end deffn @deffn {Directive} %error-verbose -Bison declaration to request verbose, specific error message strings -when @code{yyerror} is called. +An obsolete directive standing for @samp{%define parse.error verbose}. @end deffn @deffn {Directive} %file-prefix "@var{prefix}" @@ -10036,7 +10347,7 @@ Bison declaration to assign precedence to token(s), but no associativity @end deffn @deffn {Directive} %pure-parser -Deprecated version of @code{%define api.pure} (@pxref{Decl Summary, ,%define}), +Deprecated version of @samp{%define api.pure} (@pxref{Decl Summary, ,%define}), for which Bison is more careful to warn about unreasonable usage. @end deffn @@ -10150,16 +10461,16 @@ instead. @deffn {Function} yyerror User-supplied function to be called by @code{yyparse} on error. -@xref{Error Reporting, ,The Error -Reporting Function @code{yyerror}}. +@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @deffn {Macro} YYERROR_VERBOSE -An obsolete macro that you define with @code{#define} in the prologue -to request verbose, specific error message strings -when @code{yyerror} is called. It doesn't matter what definition you -use for @code{YYERROR_VERBOSE}, just whether you define it. Using -@code{%error-verbose} is preferred. +An obsolete macro used in the @file{yacc.c} skeleton, that you define +with @code{#define} in the prologue to request verbose, specific error +message strings when @code{yyerror} is called. It doesn't matter what +definition you use for @code{YYERROR_VERBOSE}, just whether you define +it. Using @samp{%define parse.error verbose} is preferred +(@pxref{Error Reporting, ,The Error Reporting Function @code{yyerror}}). @end deffn @deffn {Macro} YYINITDEPTH @@ -10273,8 +10584,8 @@ is recovering from a syntax error, and 0 otherwise. @end deffn @deffn {Macro} YYSTACK_USE_ALLOCA -Macro used to control the use of @code{alloca} when the C -@acronym{LALR}(1) parser needs to extend its stacks. If defined to 0, +Macro used to control the use of @code{alloca} when the +deterministic parser in C needs to extend its stacks. If defined to 0, the parser will use @code{malloc} to extend its stacks. If defined to 1, the parser will use @code{alloca}. Values other than 0 and 1 are reserved for future Bison extensions. If not defined, @@ -10299,12 +10610,21 @@ Data type of semantic values; @code{int} by default. @cindex glossary @table @asis +@item Accepting State +A state whose only action is the accept action. +The accepting state is thus a consistent state. +@xref{Understanding,,}. + @item Backus-Naur Form (@acronym{BNF}; also called ``Backus Normal Form'') Formal method of specifying context-free grammars originally proposed by John Backus, and slightly improved by Peter Naur in his 1960-01-02 committee document contributing to what became the Algol 60 report. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item Consistent State +A state containing only one possible action. +@xref{Decl Summary,,lr.default-reductions}. + @item Context-free grammars Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an @@ -10312,6 +10632,14 @@ expression, integers are allowed @emph{anywhere} an expression is permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item Default Reduction +The reduction that a parser should perform if the current parser state +contains no other action for the lookahead token. +In permitted parser states, Bison declares the reduction with the +largest lookahead set to be the default reduction and removes that +lookahead set. +@xref{Decl Summary,,lr.default-reductions}. + @item Dynamic allocation Allocation of memory that occurs during execution, rather than at compile time or on entry to a function. @@ -10330,8 +10658,8 @@ rules. @xref{Algorithm, ,The Bison Parser Algorithm}. @item Generalized @acronym{LR} (@acronym{GLR}) A parsing algorithm that can handle all context-free grammars, including those -that are not @acronym{LALR}(1). It resolves situations that Bison's -usual @acronym{LALR}(1) +that are not @acronym{LR}(1). It resolves situations that Bison's +deterministic parsing algorithm cannot by effectively splitting off multiple parsers, trying all possible parsers, and discarding those that fail in the light of additional right context. @xref{Generalized LR Parsing, ,Generalized @@ -10342,6 +10670,20 @@ A language construct that is (in general) grammatically divisible; for example, `expression' or `declaration' in C@. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item @acronym{IELR}(1) +A minimal @acronym{LR}(1) parser table generation algorithm. +That is, given any context-free grammar, @acronym{IELR}(1) generates +parser tables with the full language recognition power of canonical +@acronym{LR}(1) but with nearly the same number of parser states as +@acronym{LALR}(1). +This reduction in parser states is often an order of magnitude. +More importantly, because canonical @acronym{LR}(1)'s extra parser +states may contain duplicate conflicts in the case of +non-@acronym{LR}(1) grammars, the number of conflicts for +@acronym{IELR}(1) is often an order of magnitude less as well. +This can significantly reduce the complexity of developing of a grammar. +@xref{Decl Summary,,lr.type}. + @item Infix operator An arithmetic operator that is placed between the operands on which it performs some operation. @@ -10385,8 +10727,8 @@ Tokens}. @item @acronym{LALR}(1) The class of context-free grammars that Bison (like most other parser -generators) can handle; a subset of @acronym{LR}(1). @xref{Mystery -Conflicts, ,Mysterious Reduce/Reduce Conflicts}. +generators) can handle by default; a subset of @acronym{LR}(1). +@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}. @item @acronym{LR}(1) The class of context-free grammars in which at most one token of @@ -10476,12 +10818,16 @@ grammatically indivisible. The piece of text it represents is a token. @bye +@c Local Variables: +@c fill-column: 76 +@c End: + @c LocalWords: texinfo setfilename settitle setchapternewpage finalout @c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex @c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry @c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa @c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc -@c LocalWords: rpcalc Lexer Gen Comp Expr ltcalc mfcalc Decl Symtab yylex +@c LocalWords: rpcalc Lexer Expr ltcalc mfcalc yylex @c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref @c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex @c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge @@ -10504,4 +10850,4 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: infile ypp yxx outfile itemx tex leaderfill @c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll @c LocalWords: nbar yytext fst snd osplit ntwo strdup AST -@c LocalWords: YYSTACK DVI fdl printindex +@c LocalWords: YYSTACK DVI fdl printindex IELR