X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/09ccae9b18a7c09ebf7bb8df2a18c8c4a6def248..99c08fb6626f4aca4a7eb2e5d53dae43bc40771b:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index ae3dc4b0..176cc7ff 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -34,8 +34,8 @@ This manual (@value{UPDATED}) is for @acronym{GNU} Bison (version @value{VERSION}), the @acronym{GNU} parser generator. Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, -1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008 Free Software -Foundation, Inc. +1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free +Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -89,76 +89,76 @@ Cover art by Etienne Suvasa. @menu * Introduction:: * Conditions:: -* Copying:: The @acronym{GNU} General Public License says - how you can copy and share Bison +* Copying:: The @acronym{GNU} General Public License says + how you can copy and share Bison. Tutorial sections: -* Concepts:: Basic concepts for understanding Bison. -* Examples:: Three simple explained examples of using Bison. +* Concepts:: Basic concepts for understanding Bison. +* Examples:: Three simple explained examples of using Bison. Reference sections: -* Grammar File:: Writing Bison declarations and rules. -* Interface:: C-language interface to the parser function @code{yyparse}. -* Algorithm:: How the Bison parser works at run-time. -* Error Recovery:: Writing rules for error recovery. +* Grammar File:: Writing Bison declarations and rules. +* Interface:: C-language interface to the parser function @code{yyparse}. +* Algorithm:: How the Bison parser works at run-time. +* Error Recovery:: Writing rules for error recovery. * Context Dependency:: What to do if your language syntax is too - messy for Bison to handle straightforwardly. -* Debugging:: Understanding or debugging Bison parsers. -* Invocation:: How to run Bison (to produce the parser source file). -* Other Languages:: Creating C++ and Java parsers. -* FAQ:: Frequently Asked Questions -* Table of Symbols:: All the keywords of the Bison language are explained. -* Glossary:: Basic concepts are explained. -* Copying This Manual:: License for copying this manual. -* Index:: Cross-references to the text. + messy for Bison to handle straightforwardly. +* Debugging:: Understanding or debugging Bison parsers. +* Invocation:: How to run Bison (to produce the parser source file). +* Other Languages:: Creating C++ and Java parsers. +* FAQ:: Frequently Asked Questions +* Table of Symbols:: All the keywords of the Bison language are explained. +* Glossary:: Basic concepts are explained. +* Copying This Manual:: License for copying this manual. +* Index:: Cross-references to the text. @detailmenu --- The Detailed Node Listing --- The Concepts of Bison -* Language and Grammar:: Languages and context-free grammars, - as mathematical ideas. -* Grammar in Bison:: How we represent grammars for Bison's sake. -* Semantic Values:: Each token or syntactic grouping can have - a semantic value (the value of an integer, - the name of an identifier, etc.). -* Semantic Actions:: Each rule can have an action containing C code. -* GLR Parsers:: Writing parsers for general context-free languages. -* Locations Overview:: Tracking Locations. -* Bison Parser:: What are Bison's input and output, - how is the output used? -* Stages:: Stages in writing and running Bison grammars. -* Grammar Layout:: Overall structure of a Bison grammar file. +* Language and Grammar:: Languages and context-free grammars, + as mathematical ideas. +* Grammar in Bison:: How we represent grammars for Bison's sake. +* Semantic Values:: Each token or syntactic grouping can have + a semantic value (the value of an integer, + the name of an identifier, etc.). +* Semantic Actions:: Each rule can have an action containing C code. +* GLR Parsers:: Writing parsers for general context-free languages. +* Locations Overview:: Tracking Locations. +* Bison Parser:: What are Bison's input and output, + how is the output used? +* Stages:: Stages in writing and running Bison grammars. +* Grammar Layout:: Overall structure of a Bison grammar file. Writing @acronym{GLR} Parsers -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. Examples -* RPN Calc:: Reverse polish notation calculator; - a first example with no operator precedence. -* Infix Calc:: Infix (algebraic) notation calculator. - Operator precedence is introduced. +* RPN Calc:: Reverse polish notation calculator; + a first example with no operator precedence. +* Infix Calc:: Infix (algebraic) notation calculator. + Operator precedence is introduced. * Simple Error Recovery:: Continuing after syntax errors. * Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. -* Multi-function Calc:: Calculator with memory and trig functions. - It uses multiple data-types for semantic values. -* Exercises:: Ideas for improving the multi-function calculator. +* Multi-function Calc:: Calculator with memory and trig functions. + It uses multiple data-types for semantic values. +* Exercises:: Ideas for improving the multi-function calculator. Reverse Polish Notation Calculator -* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc. -* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. -* Lexer: Rpcalc Lexer. The lexical analyzer. -* Main: Rpcalc Main. The controlling function. -* Error: Rpcalc Error. The error reporting function. -* Gen: Rpcalc Gen. Running Bison on the grammar file. -* Comp: Rpcalc Compile. Run the C compiler on the output code. +* Rpcalc Declarations:: Prologue (declarations) for rpcalc. +* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. +* Rpcalc Lexer:: The lexical analyzer. +* Rpcalc Main:: The controlling function. +* Rpcalc Error:: The error reporting function. +* Rpcalc Generate:: Running Bison on the grammar file. +* Rpcalc Compile:: Run the C compiler on the output code. Grammar Rules for @code{rpcalc} @@ -168,15 +168,15 @@ Grammar Rules for @code{rpcalc} Location Tracking Calculator: @code{ltcalc} -* Decls: Ltcalc Decls. Bison and C declarations for ltcalc. -* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations. -* Lexer: Ltcalc Lexer. The lexical analyzer. +* Ltcalc Declarations:: Bison and C declarations for ltcalc. +* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. +* Ltcalc Lexer:: The lexical analyzer. Multi-Function Calculator: @code{mfcalc} -* Decl: Mfcalc Decl. Bison declarations for multi-function calculator. -* Rules: Mfcalc Rules. Grammar rules for the calculator. -* Symtab: Mfcalc Symtab. Symbol table management subroutines. +* Mfcalc Declarations:: Bison declarations for multi-function calculator. +* Mfcalc Rules:: Grammar rules for the calculator. +* Mfcalc Symbol Table:: Symbol table management subroutines. Bison Grammar Files @@ -191,11 +191,11 @@ Bison Grammar Files Outline of a Bison Grammar -* Prologue:: Syntax and usage of the prologue. +* Prologue:: Syntax and usage of the prologue. * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. -* Bison Declarations:: Syntax and usage of the Bison declarations section. -* Grammar Rules:: Syntax and usage of the grammar rules section. -* Epilogue:: Syntax and usage of the epilogue. +* Bison Declarations:: Syntax and usage of the Bison declarations section. +* Grammar Rules:: Syntax and usage of the grammar rules section. +* Epilogue:: Syntax and usage of the epilogue. Defining Language Semantics @@ -230,24 +230,28 @@ Bison Declarations Parser C-Language Interface -* Parser Function:: How to call @code{yyparse} and what it returns. -* Lexical:: You must supply a function @code{yylex} - which reads tokens. -* Error Reporting:: You must supply a function @code{yyerror}. -* Action Features:: Special features for use in actions. -* Internationalization:: How to let the parser speak in the user's - native language. +* Parser Function:: How to call @code{yyparse} and what it returns. +* Push Parser Function:: How to call @code{yypush_parse} and what it returns. +* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. +* Parser Create Function:: How to call @code{yypstate_new} and what it returns. +* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. +* Lexical:: You must supply a function @code{yylex} + which reads tokens. +* Error Reporting:: You must supply a function @code{yyerror}. +* Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. The Lexical Analyzer Function @code{yylex} * Calling Convention:: How @code{yyparse} calls @code{yylex}. -* Token Values:: How @code{yylex} must return the semantic value - of the token it has read. -* Token Locations:: How @code{yylex} must return the text location - (line number, etc.) of the token, if the - actions want that. -* Pure Calling:: How the calling convention differs - in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +* Token Values:: How @code{yylex} must return the semantic value + of the token it has read. +* Token Locations:: How @code{yylex} must return the text location + (line number, etc.) of the token, if the + actions want that. +* Pure Calling:: How the calling convention differs in a pure parser + (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). The Bison Parser Algorithm @@ -257,7 +261,7 @@ The Bison Parser Algorithm * Contextual Precedence:: When an operator's precedence depends on context. * Parser States:: The parser is a finite-state-machine with stack. * Reduce/Reduce:: When two rules are applicable in the same situation. -* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. +* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. * Generalized LR Parsing:: Parsing arbitrary context-free grammars. * Memory Management:: What happens when memory is exhausted. How to avoid it. @@ -312,33 +316,33 @@ A Complete C++ Example Java Parsers -* Java Bison Interface:: Asking for Java parser generation -* Java Semantic Values:: %type and %token vs. Java -* Java Location Values:: The position and location classes -* Java Parser Interface:: Instantiating and running the parser -* Java Scanner Interface:: Specifying the scanner for the parser -* Java Action Features:: Special features for use in actions. -* Java Differences:: Differences between C/C++ and Java Grammars -* Java Declarations Summary:: List of Bison declarations used with Java +* Java Bison Interface:: Asking for Java parser generation +* Java Semantic Values:: %type and %token vs. Java +* Java Location Values:: The position and location classes +* Java Parser Interface:: Instantiating and running the parser +* Java Scanner Interface:: Specifying the scanner for the parser +* Java Action Features:: Special features for use in actions +* Java Differences:: Differences between C/C++ and Java Grammars +* Java Declarations Summary:: List of Bison declarations used with Java Frequently Asked Questions -* Memory Exhausted:: Breaking the Stack Limits -* How Can I Reset the Parser:: @code{yyparse} Keeps some State -* Strings are Destroyed:: @code{yylval} Loses Track of Strings -* Implementing Gotos/Loops:: Control Flow in the Calculator -* Multiple start-symbols:: Factoring closely related grammars -* Secure? Conform?:: Is Bison @acronym{POSIX} safe? -* I can't build Bison:: Troubleshooting -* Where can I find help?:: Troubleshouting -* Bug Reports:: Troublereporting -* Other Languages:: Parsers in Java and others -* Beta Testing:: Experimenting development versions -* Mailing Lists:: Meeting other Bison users +* Memory Exhausted:: Breaking the Stack Limits +* How Can I Reset the Parser:: @code{yyparse} Keeps some State +* Strings are Destroyed:: @code{yylval} Loses Track of Strings +* Implementing Gotos/Loops:: Control Flow in the Calculator +* Multiple start-symbols:: Factoring closely related grammars +* Secure? Conform?:: Is Bison @acronym{POSIX} safe? +* I can't build Bison:: Troubleshooting +* Where can I find help?:: Troubleshouting +* Bug Reports:: Troublereporting +* More Languages:: Parsers in C++, Java, and so on +* Beta Testing:: Experimenting development versions +* Mailing Lists:: Meeting other Bison users Copying This Manual -* Copying This Manual:: License for copying this manual. +* Copying This Manual:: License for copying this manual. @end detailmenu @end menu @@ -348,10 +352,12 @@ Copying This Manual @cindex introduction @dfn{Bison} is a general-purpose parser generator that converts an -annotated context-free grammar into an @acronym{LALR}(1) or -@acronym{GLR} parser for that grammar. Once you are proficient with -Bison, you can use it to develop a wide range of language parsers, from those -used in simple desk calculators to complex programming languages. +annotated context-free grammar into a deterministic or @acronym{GLR} +parser employing @acronym{LALR}(1), @acronym{IELR}(1), or canonical +@acronym{LR}(1) parser tables. +Once you are proficient with Bison, you can use it to develop a wide +range of language parsers, from those used in simple desk calculators to +complex programming languages. Bison is upward compatible with Yacc: all properly-written Yacc grammars ought to work with Bison with no change. Anyone familiar with Yacc @@ -418,19 +424,19 @@ details of Bison will not make sense. If you do not already know how to use Bison or Yacc, we suggest you start by reading this chapter carefully. @menu -* Language and Grammar:: Languages and context-free grammars, - as mathematical ideas. -* Grammar in Bison:: How we represent grammars for Bison's sake. -* Semantic Values:: Each token or syntactic grouping can have - a semantic value (the value of an integer, - the name of an identifier, etc.). -* Semantic Actions:: Each rule can have an action containing C code. -* GLR Parsers:: Writing parsers for general context-free languages. -* Locations Overview:: Tracking Locations. -* Bison Parser:: What are Bison's input and output, - how is the output used? -* Stages:: Stages in writing and running Bison grammars. -* Grammar Layout:: Overall structure of a Bison grammar file. +* Language and Grammar:: Languages and context-free grammars, + as mathematical ideas. +* Grammar in Bison:: How we represent grammars for Bison's sake. +* Semantic Values:: Each token or syntactic grouping can have + a semantic value (the value of an integer, + the name of an identifier, etc.). +* Semantic Actions:: Each rule can have an action containing C code. +* GLR Parsers:: Writing parsers for general context-free languages. +* Locations Overview:: Tracking Locations. +* Bison Parser:: What are Bison's input and output, + how is the output used? +* Stages:: Stages in writing and running Bison grammars. +* Grammar Layout:: Overall structure of a Bison grammar file. @end menu @node Language and Grammar @@ -457,26 +463,27 @@ order to specify the language Algol 60. Any grammar expressed in essentially machine-readable @acronym{BNF}. @cindex @acronym{LALR}(1) grammars +@cindex @acronym{IELR}(1) grammars @cindex @acronym{LR}(1) grammars -There are various important subclasses of context-free grammar. Although it -can handle almost all context-free grammars, Bison is optimized for what -are called @acronym{LALR}(1) grammars. -In brief, in these grammars, it must be possible to -tell how to parse any portion of an input string with just a single -token of lookahead. Strictly speaking, that is a description of an -@acronym{LR}(1) grammar, and @acronym{LALR}(1) involves additional -restrictions that are -hard to explain simply; but it is rare in actual practice to find an -@acronym{LR}(1) grammar that fails to be @acronym{LALR}(1). +There are various important subclasses of context-free grammars. +Although it can handle almost all context-free grammars, Bison is +optimized for what are called @acronym{LR}(1) grammars. +In brief, in these grammars, it must be possible to tell how to parse +any portion of an input string with just a single token of lookahead. +For historical reasons, Bison by default is limited by the additional +restrictions of @acronym{LALR}(1), which is hard to explain simply. @xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, for more information on this. +To escape these additional restrictions, you can request +@acronym{IELR}(1) or canonical @acronym{LR}(1) parser tables. +@xref{Decl Summary,,lr.type}, to learn how. @cindex @acronym{GLR} parsing @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars @cindex nondeterministic parsing -Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning +Parsers for @acronym{LR}(1) grammars are @dfn{deterministic}, meaning roughly that the next grammar rule to apply at any point in the input is uniquely determined by the preceding input and a fixed, finite portion (called a @dfn{lookahead}) of the remaining input. A context-free @@ -705,8 +712,8 @@ from the values of the two subexpressions. @cindex shift/reduce conflicts @cindex reduce/reduce conflicts -In some grammars, Bison's standard -@acronym{LALR}(1) parsing algorithm cannot decide whether to apply a +In some grammars, Bison's deterministic +@acronym{LR}(1) parsing algorithm cannot decide whether to apply a certain grammar rule at a given point. That is, it may not be able to decide (on the basis of the input read so far) which of two possible reductions (applications of a grammar rule) applies, or whether to apply @@ -715,13 +722,13 @@ input. These are known respectively as @dfn{reduce/reduce} conflicts (@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts (@pxref{Shift/Reduce}). -To use a grammar that is not easily modified to be @acronym{LALR}(1), a +To use a grammar that is not easily modified to be @acronym{LR}(1), a more general parsing algorithm is sometimes necessary. If you include @code{%glr-parser} among the Bison declarations in your file (@pxref{Grammar Outline}), the result is a Generalized @acronym{LR} (@acronym{GLR}) parser. These parsers handle Bison grammars that contain no unresolved conflicts (i.e., after applying precedence -declarations) identically to @acronym{LALR}(1) parsers. However, when +declarations) identically to deterministic parsers. However, when faced with unresolved shift/reduce and reduce/reduce conflicts, @acronym{GLR} parsers use the simple expedient of doing both, effectively cloning the parser to follow both possibilities. Each of @@ -746,10 +753,10 @@ user-defined function on the resulting values to produce an arbitrary merged result. @menu -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. -* GLR Semantic Actions:: Deferred semantic actions have special concerns. -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. @end menu @node Simple GLR Parsers @@ -763,11 +770,8 @@ merged result. @cindex shift/reduce conflicts In the simplest cases, you can use the @acronym{GLR} algorithm -to parse grammars that are unambiguous, but fail to be @acronym{LALR}(1). -Such grammars typically require more than one symbol of lookahead, -or (in rare cases) fall into the category of grammars in which the -@acronym{LALR}(1) algorithm throws away too much information (they are in -@acronym{LR}(1), but not @acronym{LALR}(1), @ref{Mystery Conflicts}). +to parse grammars that are unambiguous but fail to be @acronym{LR}(1). +Such grammars typically require more than one symbol of lookahead. Consider a problem that arises in the declaration of enumerated and subrange types in the @@ -804,7 +808,7 @@ type enum = (a); valid, and more-complicated cases can come up in practical programs.) These two declarations look identical until the @samp{..} token. -With normal @acronym{LALR}(1) one-token lookahead it is not +With normal @acronym{LR}(1) one-token lookahead it is not possible to decide between the two forms when the identifier @samp{a} is parsed. It is, however, desirable for a parser to decide this, since in the latter case @@ -843,9 +847,9 @@ reports a syntax error as usual. The effect of all this is that the parser seems to ``guess'' the correct branch to take, or in other words, it seems to use more -lookahead than the underlying @acronym{LALR}(1) algorithm actually allows -for. In this example, @acronym{LALR}(2) would suffice, but also some cases -that are not @acronym{LALR}(@math{k}) for any @math{k} can be handled this way. +lookahead than the underlying @acronym{LR}(1) algorithm actually allows +for. In this example, @acronym{LR}(2) would suffice, but also some cases +that are not @acronym{LR}(@math{k}) for any @math{k} can be handled this way. In general, a @acronym{GLR} parser can take quadratic or cubic worst-case time, and the current Bison parser even takes exponential time and space @@ -898,7 +902,7 @@ expr : '(' expr ')' @end group @end example -When used as a normal @acronym{LALR}(1) grammar, Bison correctly complains +When used as a normal @acronym{LR}(1) grammar, Bison correctly complains about one reduce/reduce conflict. In the conflicting situation the parser chooses one of the alternatives, arbitrarily the one declared first. Therefore the following correct input is not @@ -930,7 +934,7 @@ there are at least two potential problems to beware. First, always analyze the conflicts reported by Bison to make sure that @acronym{GLR} splitting is only done where it is intended. A @acronym{GLR} parser splitting inadvertently may cause problems less obvious than an -@acronym{LALR} parser statically choosing the wrong alternative in a +@acronym{LR} parser statically choosing the wrong alternative in a conflict. Second, consider interactions with the lexer (@pxref{Semantic Tokens}) with great care. Since a split parser consumes tokens without performing any actions during the split, the lexer cannot obtain @@ -1150,7 +1154,7 @@ Another Bison feature requiring special consideration is @code{YYERROR} (@pxref{Action Features}), which you can invoke in a semantic action to initiate error recovery. During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is -the same as its effect in an @acronym{LALR}(1) parser. +the same as its effect in a deterministic parser. In a deferred semantic action, its effect is undefined. @c The effect is probably a syntax error at the split point. @@ -1377,15 +1381,15 @@ languages are written the same way. You can copy these examples into a source file to try them. @menu -* RPN Calc:: Reverse polish notation calculator; - a first example with no operator precedence. -* Infix Calc:: Infix (algebraic) notation calculator. - Operator precedence is introduced. +* RPN Calc:: Reverse polish notation calculator; + a first example with no operator precedence. +* Infix Calc:: Infix (algebraic) notation calculator. + Operator precedence is introduced. * Simple Error Recovery:: Continuing after syntax errors. * Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$. -* Multi-function Calc:: Calculator with memory and trig functions. - It uses multiple data-types for semantic values. -* Exercises:: Ideas for improving the multi-function calculator. +* Multi-function Calc:: Calculator with memory and trig functions. + It uses multiple data-types for semantic values. +* Exercises:: Ideas for improving the multi-function calculator. @end menu @node RPN Calc @@ -1404,16 +1408,16 @@ The source code for this calculator is named @file{rpcalc.y}. The @samp{.y} extension is a convention used for Bison input files. @menu -* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc. -* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation. -* Lexer: Rpcalc Lexer. The lexical analyzer. -* Main: Rpcalc Main. The controlling function. -* Error: Rpcalc Error. The error reporting function. -* Gen: Rpcalc Gen. Running Bison on the grammar file. -* Comp: Rpcalc Compile. Run the C compiler on the output code. +* Rpcalc Declarations:: Prologue (declarations) for rpcalc. +* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation. +* Rpcalc Lexer:: The lexical analyzer. +* Rpcalc Main:: The controlling function. +* Rpcalc Error:: The error reporting function. +* Rpcalc Generate:: Running Bison on the grammar file. +* Rpcalc Compile:: Run the C compiler on the output code. @end menu -@node Rpcalc Decls +@node Rpcalc Declarations @subsection Declarations for @code{rpcalc} Here are the C and Bison declarations for the reverse polish notation @@ -1663,7 +1667,7 @@ therefore, @code{NUM} becomes a macro for @code{yylex} to use. The semantic value of the token (if it has one) is stored into the global variable @code{yylval}, which is where the Bison parser will look for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was -defined at the beginning of the grammar; @pxref{Rpcalc Decls, +defined at the beginning of the grammar; @pxref{Rpcalc Declarations, ,Declarations for @code{rpcalc}}.) A token type code of zero is returned if the end-of-input is encountered. @@ -1759,7 +1763,7 @@ have not written any error rules in this example, so any invalid input will cause the calculator program to exit. This is not clean behavior for a real calculator, but it is adequate for the first example. -@node Rpcalc Gen +@node Rpcalc Generate @subsection Running Bison to Make the Parser @cindex running Bison (introduction) @@ -1979,12 +1983,12 @@ most of the work needed to use locations will be done in the lexical analyzer. @menu -* Decls: Ltcalc Decls. Bison and C declarations for ltcalc. -* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations. -* Lexer: Ltcalc Lexer. The lexical analyzer. +* Ltcalc Declarations:: Bison and C declarations for ltcalc. +* Ltcalc Rules:: Grammar rules for ltcalc, with explanations. +* Ltcalc Lexer:: The lexical analyzer. @end menu -@node Ltcalc Decls +@node Ltcalc Declarations @subsection Declarations for @code{ltcalc} The C and Bison declarations for the location tracking calculator are @@ -2220,12 +2224,12 @@ $ Note that multiple assignment and nested function calls are permitted. @menu -* Decl: Mfcalc Decl. Bison declarations for multi-function calculator. -* Rules: Mfcalc Rules. Grammar rules for the calculator. -* Symtab: Mfcalc Symtab. Symbol table management subroutines. +* Mfcalc Declarations:: Bison declarations for multi-function calculator. +* Mfcalc Rules:: Grammar rules for the calculator. +* Mfcalc Symbol Table:: Symbol table management subroutines. @end menu -@node Mfcalc Decl +@node Mfcalc Declarations @subsection Declarations for @code{mfcalc} Here are the C and Bison declarations for the multi-function calculator. @@ -2321,7 +2325,7 @@ exp: NUM @{ $$ = $1; @} %% @end smallexample -@node Mfcalc Symtab +@node Mfcalc Symbol Table @subsection The @code{mfcalc} Symbol Table @cindex symbol table example @@ -2634,11 +2638,11 @@ As a @acronym{GNU} extension, @samp{//} introduces a comment that continues until end of line. @menu -* Prologue:: Syntax and usage of the prologue. +* Prologue:: Syntax and usage of the prologue. * Prologue Alternatives:: Syntax and usage of alternatives to the prologue. -* Bison Declarations:: Syntax and usage of the Bison declarations section. -* Grammar Rules:: Syntax and usage of the grammar rules section. -* Epilogue:: Syntax and usage of the epilogue. +* Bison Declarations:: Syntax and usage of the Bison declarations section. +* Grammar Rules:: Syntax and usage of the grammar rules section. +* Epilogue:: Syntax and usage of the epilogue. @end menu @node Prologue @@ -2702,9 +2706,6 @@ feature test macros can affect the behavior of Bison-generated @findex %code requires @findex %code provides @findex %code top -(The prologue alternatives described here are experimental. -More user feedback will help to determine whether they should become permanent -features.) The functionality of @var{Prologue} sections can often be subtle and inflexible. @@ -3049,8 +3050,12 @@ A @dfn{nonterminal symbol} stands for a class of syntactically equivalent groupings. The symbol name is used in writing grammar rules. By convention, it should be all lower case. -Symbol names can contain letters, digits (not at the beginning), -underscores and periods. Periods make sense only in nonterminals. +Symbol names can contain letters, underscores, periods, dashes, and (not +at the beginning) digits. Dashes in symbol names are a GNU +extension, incompatible with @acronym{POSIX} Yacc. Terminal symbols +that contain periods or dashes make little sense: since they are not +valid symbols (in most programming languages) they are not exported as +token names. There are three ways of writing terminal symbols in the grammar: @@ -4467,7 +4472,7 @@ be @var{n} shift/reduce conflicts and no reduce/reduce conflicts. Bison reports an error if the number of shift/reduce conflicts differs from @var{n}, or if there are any reduce/reduce conflicts. -For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more +For deterministic parsers, reduce/reduce conflicts are more serious, and should be eliminated entirely. Bison will always report reduce/reduce conflicts for these parsers. With @acronym{GLR} parsers, however, both kinds of conflicts are routine; otherwise, @@ -4561,7 +4566,7 @@ valid grammar. @subsection A Push Parser @cindex push parser @cindex push parser -@findex %define api.push_pull +@findex %define api.push-pull (The current push parsing interface is experimental and may evolve. More user feedback will help to stabilize it.) @@ -4577,10 +4582,10 @@ within a certain time period. Normally, Bison generates a pull parser. The following Bison declaration says that you want the parser to be a push -parser (@pxref{Decl Summary,,%define api.push_pull}): +parser (@pxref{Decl Summary,,%define api.push-pull}): @example -%define api.push_pull "push" +%define api.push-pull "push" @end example In almost all cases, you want to ensure that your push parser is also @@ -4591,7 +4596,7 @@ what you are doing, your declarations should look like this: @example %define api.pure -%define api.push_pull "push" +%define api.push-pull "push" @end example There is a major notable functional difference between the pure push parser @@ -4640,14 +4645,14 @@ for use by the next invocation of the @code{yypush_parse} function. Bison also supports both the push parser interface along with the pull parser interface in the same generated parser. In order to get this functionality, -you should replace the @code{%define api.push_pull "push"} declaration with the -@code{%define api.push_pull "both"} declaration. Doing this will create all of +you should replace the @code{%define api.push-pull "push"} declaration with the +@code{%define api.push-pull "both"} declaration. Doing this will create all of the symbols mentioned earlier along with the two extra symbols, @code{yyparse} and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally would be used. However, the user should note that it is implemented in the generated parser by calling @code{yypull_parse}. This makes the @code{yyparse} function that is generated with the -@code{%define api.push_pull "both"} declaration slower than the normal +@code{%define api.push-pull "both"} declaration slower than the normal @code{yyparse} function. If the user calls the @code{yypull_parse} function it will parse the rest of the input stream. It is possible to @code{yypush_parse} tokens to select a subgrammar @@ -4664,8 +4669,8 @@ yypstate_delete (ps); @end example Adding the @code{%define api.pure} declaration does exactly the same thing to -the generated parser with @code{%define api.push_pull "both"} as it did for -@code{%define api.push_pull "push"}. +the generated parser with @code{%define api.push-pull "both"} as it did for +@code{%define api.push-pull "push"}. @node Decl Summary @subsection Bison Declaration Summary @@ -4745,10 +4750,6 @@ Thus, @code{%code} replaces the traditional Yacc prologue, For a detailed discussion, see @ref{Prologue Alternatives}. For Java, the default location is inside the parser class. - -(Like all the Yacc prologue alternatives, this directive is experimental. -More user feedback will help to determine whether it should become a permanent -feature.) @end deffn @deffn {Directive} %code @var{qualifier} @{@var{code}@} @@ -4826,20 +4827,16 @@ before any class definitions. @end itemize @end itemize -(Like all the Yacc prologue alternatives, this directive is experimental. -More user feedback will help to determine whether it should become a permanent -feature.) - @cindex Prologue For a detailed discussion of how to use @code{%code} in place of the traditional Yacc prologue for C/C++, see @ref{Prologue Alternatives}. @end deffn @deffn {Directive} %debug -In the parser file, define the macro @code{YYDEBUG} to 1 if it is not -already defined, so that the debugging facilities are compiled. -@end deffn +Instrument the output parser for traces. Obsoleted by @samp{%define +parse.trace}. @xref{Tracing, ,Tracing Your Parser}. +@end deffn @deffn {Directive} %define @var{variable} @deffnx {Directive} %define @var{variable} "@var{value}" @@ -4872,7 +4869,7 @@ target language and/or parser skeleton. Some of the accepted @var{variable}s are: -@itemize @bullet +@table @code @item api.pure @findex %define api.pure @@ -4886,12 +4883,13 @@ Some of the accepted @var{variable}s are: @item Default Value: @code{"false"} @end itemize +@c api.pure -@item api.push_pull -@findex %define api.push_pull +@item api.push-pull +@findex %define api.push-pull @itemize @bullet -@item Language(s): C (LALR(1) only) +@item Language(s): C (deterministic parsers only) @item Purpose: Requests a pull parser, a push parser, or both. @xref{Push Decl, ,A Push Parser}. @@ -4902,9 +4900,92 @@ More user feedback will help to stabilize it.) @item Default Value: @code{"pull"} @end itemize +@c api.push-pull -@item lr.keep_unreachable_states -@findex %define lr.keep_unreachable_states +@item error-verbose +@findex %define error-verbose +@itemize +@item Languages(s): +all. +@item Purpose: +Enable the generation of more verbose error messages than a instead of +just plain @w{@code{"syntax error"}}. @xref{Error Reporting, ,The Error +Reporting Function @code{yyerror}}. +@item Accepted Values: +Boolean +@item Default Value: +@code{false} +@end itemize +@c error-verbose + + +@item lr.default-reductions +@cindex default reductions +@findex %define lr.default-reductions +@cindex delayed syntax errors +@cindex syntax errors delayed + +@itemize @bullet +@item Language(s): all + +@item Purpose: Specifies the kind of states that are permitted to +contain default reductions. +That is, in such a state, Bison declares the reduction with the largest +lookahead set to be the default reduction and then removes that +lookahead set. +The advantages of default reductions are discussed below. +The disadvantage is that, when the generated parser encounters a +syntactically unacceptable token, the parser might then perform +unnecessary default reductions before it can detect the syntax error. + +(This feature is experimental. +More user feedback will help to stabilize it.) + +@item Accepted Values: +@itemize +@item @code{"all"}. +For @acronym{LALR} and @acronym{IELR} parsers (@pxref{Decl +Summary,,lr.type}) by default, all states are permitted to contain +default reductions. +The advantage is that parser table sizes can be significantly reduced. +The reason Bison does not by default attempt to address the disadvantage +of delayed syntax error detection is that this disadvantage is already +inherent in @acronym{LALR} and @acronym{IELR} parser tables. +That is, unlike in a canonical @acronym{LR} state, the lookahead sets of +reductions in an @acronym{LALR} or @acronym{IELR} state can contain +tokens that are syntactically incorrect for some left contexts. + +@item @code{"consistent"}. +@cindex consistent states +A consistent state is a state that has only one possible action. +If that action is a reduction, then the parser does not need to request +a lookahead token from the scanner before performing that action. +However, the parser only recognizes the ability to ignore the lookahead +token when such a reduction is encoded as a default reduction. +Thus, if default reductions are permitted in and only in consistent +states, then a canonical @acronym{LR} parser reports a syntax error as +soon as it @emph{needs} the syntactically unacceptable token from the +scanner. + +@item @code{"accepting"}. +@cindex accepting state +By default, the only default reduction permitted in a canonical +@acronym{LR} parser is the accept action in the accepting state, which +the parser reaches only after reading all tokens from the input. +Thus, the default canonical @acronym{LR} parser reports a syntax error +as soon as it @emph{reaches} the syntactically unacceptable token +without performing any extra reductions. +@end itemize + +@item Default Value: +@itemize +@item @code{"accepting"} if @code{lr.type} is @code{"canonical LR"}. +@item @code{"all"} otherwise. +@end itemize +@end itemize + +@item lr.keep-unreachable-states +@findex %define lr.keep-unreachable-states @itemize @bullet @item Language(s): all @@ -4945,6 +5026,85 @@ states. However, Bison does not compute which goto actions are useless. @end itemize @end itemize +@c lr.keep-unreachable-states + +@item lr.type +@findex %define lr.type +@cindex @acronym{LALR} +@cindex @acronym{IELR} +@cindex @acronym{LR} + +@itemize @bullet +@item Language(s): all + +@item Purpose: Specifies the type of parser tables within the +@acronym{LR}(1) family. +(This feature is experimental. +More user feedback will help to stabilize it.) + +@item Accepted Values: +@itemize +@item @code{"LALR"}. +While Bison generates @acronym{LALR} parser tables by default for +historical reasons, @acronym{IELR} or canonical @acronym{LR} is almost +always preferable for deterministic parsers. +The trouble is that @acronym{LALR} parser tables can suffer from +mysterious conflicts and thus may not accept the full set of sentences +that @acronym{IELR} and canonical @acronym{LR} accept. +@xref{Mystery Conflicts}, for details. +However, there are at least two scenarios where @acronym{LALR} may be +worthwhile: +@itemize +@cindex @acronym{GLR} with @acronym{LALR} +@item When employing @acronym{GLR} parsers (@pxref{GLR Parsers}), if you +do not resolve any conflicts statically (for example, with @code{%left} +or @code{%prec}), then the parser explores all potential parses of any +given input. +In this case, the use of @acronym{LALR} parser tables is guaranteed not +to alter the language accepted by the parser. +@acronym{LALR} parser tables are the smallest parser tables Bison can +currently generate, so they may be preferable. + +@item Occasionally during development, an especially malformed grammar +with a major recurring flaw may severely impede the @acronym{IELR} or +canonical @acronym{LR} parser table generation algorithm. +@acronym{LALR} can be a quick way to generate parser tables in order to +investigate such problems while ignoring the more subtle differences +from @acronym{IELR} and canonical @acronym{LR}. +@end itemize + +@item @code{"IELR"}. +@acronym{IELR} is a minimal @acronym{LR} algorithm. +That is, given any grammar (@acronym{LR} or non-@acronym{LR}), +@acronym{IELR} and canonical @acronym{LR} always accept exactly the same +set of sentences. +However, as for @acronym{LALR}, the number of parser states is often an +order of magnitude less for @acronym{IELR} than for canonical +@acronym{LR}. +More importantly, because canonical @acronym{LR}'s extra parser states +may contain duplicate conflicts in the case of non-@acronym{LR} +grammars, the number of conflicts for @acronym{IELR} is often an order +of magnitude less as well. +This can significantly reduce the complexity of developing of a grammar. + +@item @code{"canonical LR"}. +@cindex delayed syntax errors +@cindex syntax errors delayed +The only advantage of canonical @acronym{LR} over @acronym{IELR} is +that, for every left context of every canonical @acronym{LR} state, the +set of tokens accepted by that state is the exact set of tokens that is +syntactically acceptable in that left context. +Thus, the only difference in parsing behavior is that the canonical +@acronym{LR} parser can report a syntax error as soon as possible +without performing any unnecessary reductions. +@xref{Decl Summary,,lr.default-reductions}, for further details. +Even when canonical @acronym{LR} behavior is ultimately desired, +@acronym{IELR}'s elimination of duplicate conflicts should still +facilitate the development of a grammar. +@end itemize + +@item Default Value: @code{"LALR"} +@end itemize @item namespace @findex %define namespace @@ -4997,9 +5157,80 @@ For example, if you specify: The parser namespace is @code{foo} and @code{yylex} is referenced as @code{bar::lex}. @end itemize +@c namespace + +@item parse.assert +@findex %define parse.assert + +@itemize +@item Languages(s): C++ + +@item Purpose: Issue runtime assertions to catch invalid uses. +In C++, when variants are used, symbols must be constructed and +destroyed properly. This option checks these constraints. + +@item Accepted Values: Boolean + +@item Default Value: @code{false} +@end itemize +@c parse.assert + +@item parse.trace +@findex %define parse.trace + +@itemize +@item Languages(s): C, C++ + +@item Purpose: Require parser instrumentation for tracing. +In C/C++, define the macro @code{YYDEBUG} to 1 in the parser file if it +is not already defined, so that the debugging facilities are compiled. +@xref{Tracing, ,Tracing Your Parser}. + +@item Accepted Values: Boolean + +@item Default Value: @code{false} @end itemize +@c parse.trace + +@item token.prefix +@findex %define token.prefix +@itemize +@item Languages(s): all + +@item Purpose: +Add a prefix to the token names when generating their definition in the +target language. For instance + +@example +%token FILE for ERROR +%define token.prefix "TOK_" +%% +start: FILE for ERROR; +@end example + +@noindent +generates the definition of the symbols @code{TOK_FILE}, @code{TOK_for}, +and @code{TOK_ERROR} in the generated source files. In particular, the +scanner must use these prefixed token names, while the grammar itself +may still use the short names (as in the sample rule given above). The +generated informational files (@file{*.output}, @file{*.xml}, +@file{*.dot}) are not modified by this prefix. See @ref{Calc++ Parser} +and @ref{Calc++ Scanner}, for a complete example. + +@item Accepted Values: +Any string. Should be a valid identifier prefix in the target language, +in other words, it should typically be an identifier itself (sequence of +letters, underscores, and ---not at the beginning--- digits). + +@item Default Value: +empty +@end itemize +@c token.prefix + +@end table @end deffn +@c ---------------------------------------------------------- %define @deffn {Directive} %defines Write a header file containing macro definitions for the token type @@ -5229,19 +5460,17 @@ identifier (aside from those in this manual) in an action or in epilogue in the grammar file, you are likely to run into trouble. @menu -* Parser Function:: How to call @code{yyparse} and what it returns. -* Push Parser Function:: How to call @code{yypush_parse} and what it returns. -* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. -* Parser Create Function:: How to call @code{yypstate_new} and what it - returns. -* Parser Delete Function:: How to call @code{yypstate_delete} and what it - returns. -* Lexical:: You must supply a function @code{yylex} - which reads tokens. -* Error Reporting:: You must supply a function @code{yyerror}. -* Action Features:: Special features for use in actions. -* Internationalization:: How to let the parser speak in the user's - native language. +* Parser Function:: How to call @code{yyparse} and what it returns. +* Push Parser Function:: How to call @code{yypush_parse} and what it returns. +* Pull Parser Function:: How to call @code{yypull_parse} and what it returns. +* Parser Create Function:: How to call @code{yypstate_new} and what it returns. +* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns. +* Lexical:: You must supply a function @code{yylex} + which reads tokens. +* Error Reporting:: You must supply a function @code{yyerror}. +* Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. @end menu @node Parser Function @@ -5326,8 +5555,8 @@ exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} More user feedback will help to stabilize it.) You call the function @code{yypush_parse} to parse a single token. This -function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +function is available if either the @code{%define api.push-pull "push"} or +@code{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun int yypush_parse (yypstate *yyps) @@ -5344,7 +5573,7 @@ is required to finish parsing the grammar. More user feedback will help to stabilize it.) You call the function @code{yypull_parse} to parse the rest of the input -stream. This function is available if the @code{%define api.push_pull "both"} +stream. This function is available if the @code{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @@ -5360,8 +5589,8 @@ The value returned by @code{yypull_parse} is the same as for @code{yyparse}. More user feedback will help to stabilize it.) You call the function @code{yypstate_new} to create a new parser instance. -This function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +This function is available if either the @code{%define api.push-pull "push"} or +@code{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun yypstate *yypstate_new (void) @@ -5379,8 +5608,8 @@ allocated. More user feedback will help to stabilize it.) You call the function @code{yypstate_delete} to delete a parser instance. -function is available if either the @code{%define api.push_pull "push"} or -@code{%define api.push_pull "both"} declaration is used. +function is available if either the @code{%define api.push-pull "push"} or +@code{%define api.push-pull "both"} declaration is used. @xref{Push Decl, ,A Push Parser}. @deftypefun void yypstate_delete (yypstate *yyps) @@ -5408,13 +5637,13 @@ that need it. @xref{Invocation, ,Invoking Bison}. @menu * Calling Convention:: How @code{yyparse} calls @code{yylex}. -* Token Values:: How @code{yylex} must return the semantic value - of the token it has read. -* Token Locations:: How @code{yylex} must return the text location - (line number, etc.) of the token, if the - actions want that. -* Pure Calling:: How the calling convention differs - in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). +* Token Values:: How @code{yylex} must return the semantic value + of the token it has read. +* Token Locations:: How @code{yylex} must return the text location + (line number, etc.) of the token, if the + actions want that. +* Pure Calling:: How the calling convention differs in a pure parser + (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). @end menu @node Calling Convention @@ -5653,8 +5882,8 @@ called by @code{yyparse} whenever a syntax error is found, and it receives one argument. For a syntax error, the string is normally @w{@code{"syntax error"}}. -@findex %error-verbose -If you invoke the directive @code{%error-verbose} in the Bison +@findex %define error-verbose +If you invoke the directive @code{%define error-verbose} in the Bison declarations section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose and specific error message string instead of just plain @w{@code{"syntax error"}}. @@ -6057,7 +6286,7 @@ This kind of parser is known in the literature as a bottom-up parser. * Contextual Precedence:: When an operator's precedence depends on context. * Parser States:: The parser is a finite-state-machine with stack. * Reduce/Reduce:: When two rules are applicable in the same situation. -* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. +* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. * Generalized LR Parsing:: Parsing arbitrary context-free grammars. * Memory Management:: What happens when memory is exhausted. How to avoid it. @end menu @@ -6668,12 +6897,13 @@ a @code{name} if a comma or colon follows, or a @code{type} if another @cindex @acronym{LR}(1) @cindex @acronym{LALR}(1) -However, Bison, like most parser generators, cannot actually handle all -@acronym{LR}(1) grammars. In this grammar, two contexts, that after -an @code{ID} -at the beginning of a @code{param_spec} and likewise at the beginning of -a @code{return_spec}, are similar enough that Bison assumes they are the -same. They appear similar because the same set of rules would be +However, for historical reasons, Bison cannot by default handle all +@acronym{LR}(1) grammars. +In this grammar, two contexts, that after an @code{ID} at the beginning +of a @code{param_spec} and likewise at the beginning of a +@code{return_spec}, are similar enough that Bison assumes they are the +same. +They appear similar because the same set of rules would be active---the rule for reducing to a @code{name} and that for reducing to a @code{type}. Bison is unable to determine at that stage of processing that the rules would require different lookahead tokens in the two @@ -6681,16 +6911,22 @@ contexts, so it makes a single parser state for them both. Combining the two contexts causes a conflict later. In parser terminology, this occurrence means that the grammar is not @acronym{LALR}(1). -In general, it is better to fix deficiencies than to document them. But -this particular deficiency is intrinsically hard to fix; parser -generators that can handle @acronym{LR}(1) grammars are hard to write -and tend to -produce parsers that are very large. In practice, Bison is more useful -as it is now. - -When the problem arises, you can often fix it by identifying the two -parser states that are being confused, and adding something to make them -look distinct. In the above example, adding one rule to +For many practical grammars (specifically those that fall into the +non-@acronym{LR}(1) class), the limitations of @acronym{LALR}(1) result in +difficulties beyond just mysterious reduce/reduce conflicts. +The best way to fix all these problems is to select a different parser +table generation algorithm. +Either @acronym{IELR}(1) or canonical @acronym{LR}(1) would suffice, but +the former is more efficient and easier to debug during development. +@xref{Decl Summary,,lr.type}, for details. +(Bison's @acronym{IELR}(1) and canonical @acronym{LR}(1) implementations +are experimental. +More user feedback will help to stabilize them.) + +If you instead wish to work around @acronym{LALR}(1)'s limitations, you +can often fix a mysterious conflict by identifying the two parser states +that are being confused, and adding something to make them look +distinct. In the above example, adding one rule to @code{return_spec} as follows makes the problem go away: @example @@ -6758,7 +6994,7 @@ The same is true of languages that require more than one symbol of lookahead, since the parser lacks the information necessary to make a decision at the point it must be made in a shift-reduce parser. Finally, as previously mentioned (@pxref{Mystery Conflicts}), -there are languages where Bison's particular choice of how to +there are languages where Bison's default choice of how to summarize the input seen so far loses necessary information. When you use the @samp{%glr-parser} declaration in your grammar file, @@ -6790,7 +7026,7 @@ grammar symbol that produces the same segment of the input token stream. Whenever the parser makes a transition from having multiple -states to having one, it reverts to the normal @acronym{LALR}(1) parsing +states to having one, it reverts to the normal deterministic parsing algorithm, after resolving and executing the saved-up actions. At this transition, some of the states on the stack will have semantic values that are sets (actually multisets) of possible actions. The @@ -6803,9 +7039,9 @@ Bison resolves and evaluates both and then calls the merge function on the result. Otherwise, it reports an ambiguity. It is possible to use a data structure for the @acronym{GLR} parsing tree that -permits the processing of any @acronym{LALR}(1) grammar in linear time (in the +permits the processing of any @acronym{LR}(1) grammar in linear time (in the size of the input), any unambiguous (not necessarily -@acronym{LALR}(1)) grammar in +@acronym{LR}(1)) grammar in quadratic worst-case time, and any general (possibly ambiguous) context-free grammar in cubic worst-case time. However, Bison currently uses a simpler data structure that requires time proportional to the @@ -6815,9 +7051,9 @@ grammars can require exponential time and space to process. Such badly behaving examples, however, are not generally of practical interest. Usually, nondeterminism in a grammar is local---the parser is ``in doubt'' only for a few tokens at a time. Therefore, the current data -structure should generally be adequate. On @acronym{LALR}(1) portions of a -grammar, in particular, it is only slightly slower than with the default -Bison parser. +structure should generally be adequate. On @acronym{LR}(1) portions of a +grammar, in particular, it is only slightly slower than with the +deterministic @acronym{LR}(1) Bison parser. For a more detailed exposition of @acronym{GLR} parsers, please see: Elizabeth Scott, Adrian Johnstone and Shamsa Sadaf Hussain, Tomita-Style @@ -6866,16 +7102,16 @@ The default value of @code{YYMAXDEPTH}, if you do not define it, is @vindex YYINITDEPTH You can control how much stack is allocated initially by defining the -macro @code{YYINITDEPTH} to a positive integer. For the C -@acronym{LALR}(1) parser, this value must be a compile-time constant +macro @code{YYINITDEPTH} to a positive integer. For the deterministic +parser in C, this value must be a compile-time constant unless you are assuming C99 or some other target language or compiler that allows variable-length arrays. The default is 200. Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}. @c FIXME: C++ output. -Because of semantical differences between C and C++, the -@acronym{LALR}(1) parsers in C produced by Bison cannot grow when compiled +Because of semantical differences between C and C++, the deterministic +parsers in C produced by Bison cannot grow when compiled by C++ compilers. In this precise case (compiling a C parser as C++) you are suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix this deficiency in a future release. @@ -7263,7 +7499,8 @@ useless: STR; @command{bison} reports: @example -calc.y: warning: 1 nonterminal and 1 rule useless in grammar +calc.y: warning: 1 nonterminal useless in grammar +calc.y: warning: 1 rule useless in grammar calc.y:11.1-7: warning: nonterminal useless in grammar: useless calc.y:11.10-12: warning: rule useless in grammar: useless: STR calc.y: conflicts: 7 shift/reduce @@ -7451,6 +7688,7 @@ control will jump to state 4, corresponding to the item @samp{exp -> exp '+' . exp}. Since there is no default action, any other token than those listed above will trigger a syntax error. +@cindex accepting state The state 3 is named the @dfn{final state}, or the @dfn{accepting state}: @@ -7531,7 +7769,7 @@ sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM / NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) / NUM}, which corresponds to reducing rule 1. -Because in @acronym{LALR}(1) parsing a single decision can be made, Bison +Because in deterministic parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, , Shift/Reduce Conflicts}. Discarded actions are reported in between square brackets. @@ -7648,15 +7886,21 @@ Use the @samp{-t} option when you run Bison (@pxref{Invocation, @item the directive @samp{%debug} @findex %debug -Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison -Declaration Summary}). This is a Bison extension, which will prove -useful when Bison will output parsers for languages that don't use a -preprocessor. Unless @acronym{POSIX} and Yacc portability matter to -you, this is -the preferred solution. +Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison Declaration +Summary}). This Bison extension is maintained for backward +compatibility with previous versions of Bison. + +@item the variable @samp{parse.trace} +@findex %define parse.trace +Add the @samp{%define parse.trace} directive (@pxref{Decl Summary, +,Bison Declaration Summary}), or pass the @option{-Dparse.trace} option +(@pxref{Bison Options}). This is a Bison extension, which is especially +useful for languages that don't use a preprocessor. Unless +@acronym{POSIX} and Yacc portability matter to you, this is the +preferred solution. @end table -We suggest that you always enable the debug option so that debugging is +We suggest that you always enable the trace option so that debugging is always possible. The trace facility outputs messages with macro calls of the form @@ -7713,7 +7957,7 @@ standard I/O stream, the numeric code for the token type, and the token value (from @code{yylval}). Here is an example of @code{YYPRINT} suitable for the multi-function -calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}): +calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}): @smallexample %@{ @@ -7825,7 +8069,7 @@ other minor ways. Most importantly, imitate Yacc's output file name conventions, so that the parser output file is called @file{y.tab.c}, and the other outputs are called @file{y.output} and @file{y.tab.h}. -Also, if generating an @acronym{LALR}(1) parser in C, generate @code{#define} +Also, if generating a deterministic parser in C, generate @code{#define} statements in addition to an @code{enum} to associate token numbers with token names. Thus, the following shell script can substitute for Yacc, and the Bison @@ -7841,8 +8085,8 @@ traditional Yacc grammars. If your grammar uses a Bison extension like @samp{%glr-parser}, Bison might not be Yacc-compatible even if this option is specified. -@item -W -@itemx --warnings +@item -W [@var{category}] +@itemx --warnings[=@var{category}] Output warnings falling in @var{category}. @var{category} can be one of: @table @code @@ -7972,7 +8216,7 @@ separated list of @var{things} among: @table @code @item state Description of the grammar, conflicts (resolved and unresolved), and -@acronym{LALR} automaton. +parser's automaton. @item lookahead Implies @code{state} and augments the description of the automaton with @@ -7999,18 +8243,18 @@ Specify the @var{file} for the parser file. The other output files' names are constructed from @var{file} as described under the @samp{-v} and @samp{-d} options. -@item -g[@var{file}] +@item -g [@var{file}] @itemx --graph[=@var{file}] -Output a graphical representation of the @acronym{LALR}(1) grammar +Output a graphical representation of the parser's automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz} @uref{http://www.graphviz.org/doc/info/lang.html, @acronym{DOT}} format. @code{@var{file}} is optional. If omitted and the grammar file is @file{foo.y}, the output file will be @file{foo.dot}. -@item -x[@var{file}] +@item -x [@var{file}] @itemx --xml[=@var{file}] -Output an XML report of the @acronym{LALR}(1) automaton computed by Bison. +Output an XML report of the parser's automaton computed by Bison. @code{@var{file}} is optional. If omitted and the grammar file is @file{foo.y}, the output file will be @file{foo.xml}. @@ -8021,12 +8265,11 @@ More user feedback will help to stabilize it.) @node Option Cross Key @section Option Cross Key -@c FIXME: How about putting the directives too? Here is a list of options, alphabetized by long option, to help you find the corresponding short option. -@multitable {@option{--defines=@var{defines-file}}} {@option{-b @var{file-prefix}XXX}} -@headitem Long Option @tab Short Option +@multitable {@option{--defines=@var{defines-file}}} {@option{-D @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}} +@headitem Long Option @tab Short Option @tab Bison Directive @include cross-options.texi @end multitable @@ -8084,7 +8327,7 @@ int yyparse (void); @c - Always pure @c - initial action -The C++ @acronym{LALR}(1) parser is selected using the skeleton directive, +The C++ deterministic parser is selected using the skeleton directive, @samp{%skeleton "lalr1.c"}, or the synonymous command-line option @option{--skeleton=lalr1.c}. @xref{Decl Summary}. @@ -8473,10 +8716,10 @@ calcxx_driver::error (const std::string& m) @subsubsection Calc++ Parser The parser definition file @file{calc++-parser.yy} starts by asking for -the C++ LALR(1) skeleton, the creation of the parser header file, and -specifies the name of the parser class. Because the C++ skeleton -changed several times, it is safer to require the version you designed -the grammar for. +the C++ deterministic parser skeleton, the creation of the parser header +file, and specifies the name of the parser class. +Because the C++ skeleton changed several times, it is safer to require +the version you designed the grammar for. @comment file: calc++-parser.yy @example @@ -8538,8 +8781,8 @@ error messages. @comment file: calc++-parser.yy @example -%debug -%error-verbose +%define parse.trace +%define error-verbose @end example @noindent @@ -8571,13 +8814,14 @@ The code between @samp{%code @{} and @samp{@}} is output in the @noindent The token numbered as 0 corresponds to end of file; the following line -allows for nicer error messages referring to ``end of file'' instead -of ``$end''. Similarly user friendly named are provided for each -symbol. Note that the tokens names are prefixed by @code{TOKEN_} to -avoid name clashes. +allows for nicer error messages referring to ``end of file'' instead of +``$end''. Similarly user friendly names are provided for each symbol. +To avoid name clashes in the generated files (@pxref{Calc++ Scanner}), +prefix tokens with @code{TOK_} (@pxref{Decl Summary,, token.prefix}). @comment file: calc++-parser.yy @example +%define token.prefix "TOK_" %token END 0 "end of file" %token ASSIGN ":=" %token IDENTIFIER "identifier" @@ -8607,21 +8851,24 @@ The grammar itself is straightforward. %start unit; unit: assignments exp @{ driver.result = $2; @}; -assignments: assignments assignment @{@} - | /* Nothing. */ @{@}; +assignments: + assignments assignment @{@} +| /* Nothing. */ @{@}; assignment: - "identifier" ":=" exp + "identifier" ":=" exp @{ driver.variables[*$1] = $3; delete $1; @}; %left '+' '-'; %left '*' '/'; -exp: exp '+' exp @{ $$ = $1 + $3; @} - | exp '-' exp @{ $$ = $1 - $3; @} - | exp '*' exp @{ $$ = $1 * $3; @} - | exp '/' exp @{ $$ = $1 / $3; @} - | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} - | "number" @{ $$ = $1; @}; +exp: + exp '+' exp @{ $$ = $1 + $3; @} +| exp '-' exp @{ $$ = $1 - $3; @} +| exp '*' exp @{ $$ = $1 * $3; @} +| exp '/' exp @{ $$ = $1 / $3; @} +| '(' exp ')' @{ $$ = $2; @} +| "identifier" @{ $$ = driver.variables[*$1]; delete $1; @} +| "number" @{ $$ = $1; @}; %% @end example @@ -8662,10 +8909,10 @@ parser's to get the set of defined tokens. # undef yywrap # define yywrap() 1 -/* By default yylex returns int, we use token_type. - Unfortunately yyterminate by default returns 0, which is +/* By default yylex returns an int; we use token_type. + The default yyterminate implementation returns 0, which is not of token_type. */ -#define yyterminate() return token::END +#define yyterminate() return TOKEN(END) %@} @end example @@ -8713,28 +8960,32 @@ preceding tokens. Comments would be treated equally. @end example @noindent -The rules are simple, just note the use of the driver to report errors. -It is convenient to use a typedef to shorten -@code{yy::calcxx_parser::token::identifier} into -@code{token::identifier} for instance. +The rules are simple. The driver is used to report errors. It is +convenient to use a macro to shorten +@code{yy::calcxx_parser::token::TOK_@var{Name}} into +@code{TOKEN(@var{Name})}; note the token prefix, @code{TOK_}. @comment file: calc++-scanner.ll @example %@{ - typedef yy::calcxx_parser::token token; +# define TOKEN(Name) \ + yy::calcxx_parser::token::TOK_ ## Name %@} /* Convert ints to the actual type of tokens. */ -[-+*/] return yy::calcxx_parser::token_type (yytext[0]); -":=" return token::ASSIGN; +[-+*/()] return yy::calcxx_parser::token_type (yytext[0]); +":=" return TOKEN(ASSIGN); @{int@} @{ errno = 0; long n = strtol (yytext, NULL, 10); if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) driver.error (*yylloc, "integer is out of range"); yylval->ival = n; - return token::NUMBER; + return TOKEN(NUMBER); +@} +@{id@} @{ + yylval->sval = new std::string (yytext); + return TOKEN(IDENTIFIER); @} -@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; . driver.error (*yylloc, "invalid character"); %% @end example @@ -8797,14 +9048,14 @@ main (int argc, char *argv[]) @section Java Parsers @menu -* Java Bison Interface:: Asking for Java parser generation -* Java Semantic Values:: %type and %token vs. Java -* Java Location Values:: The position and location classes -* Java Parser Interface:: Instantiating and running the parser -* Java Scanner Interface:: Specifying the scanner for the parser -* Java Action Features:: Special features for use in actions. -* Java Differences:: Differences between C/C++ and Java Grammars -* Java Declarations Summary:: List of Bison declarations used with Java +* Java Bison Interface:: Asking for Java parser generation +* Java Semantic Values:: %type and %token vs. Java +* Java Location Values:: The position and location classes +* Java Parser Interface:: Instantiating and running the parser +* Java Scanner Interface:: Specifying the scanner for the parser +* Java Action Features:: Special features for use in actions +* Java Differences:: Differences between C/C++ and Java Grammars +* Java Declarations Summary:: List of Bison declarations used with Java @end menu @node Java Bison Interface @@ -8836,7 +9087,7 @@ and @code{%define api.pure} directives does not do anything when used in Java. Push parsers are currently unsupported in Java and @code{%define -api.push_pull} have no effect. +api.push-pull} have no effect. @acronym{GLR} parsers are currently unsupported in Java. Do not use the @code{glr-parser} directive. @@ -8845,11 +9096,13 @@ No header file can be generated for Java parsers. Do not use the @code{%defines} directive or the @option{-d}/@option{--defines} options. @c FIXME: Possible code change. -Currently, support for debugging is always compiled -in. Thus the @code{%debug} and @code{%token-table} directives and the +Currently, support for tracing is always compiled +in. Thus the @samp{%define parse.trace} and @samp{%token-table} +directives and the @option{-t}/@option{--debug} and @option{-k}/@option{--token-table} options have no effect. This may change in the future to eliminate -unused code in the generated parser, so use @code{%debug} explicitly +unused code in the generated parser, so use @samp{%define parse.trace} +explicitly if needed. Also, in the future the @code{%token-table} directive might enable a public interface to access the token names and codes. @@ -8932,7 +9185,7 @@ The first, inclusive, position of the range, and the first beyond. @end deftypeivar @deftypeop {Constructor} {Location} {} Location (Position @var{loc}) -Create a @code{Location} denoting an empty range located at a given point. +Create a @code{Location} denoting an empty range located at a given point. @end deftypeop @deftypeop {Constructor} {Location} {} Location (Position @var{begin}, Position @var{end}) @@ -9017,7 +9270,7 @@ Run the syntactic analysis, and return @code{true} on success, @deftypemethod {YYParser} {boolean} getErrorVerbose () @deftypemethodx {YYParser} {void} setErrorVerbose (boolean @var{verbose}) Get or set the option to produce verbose error messages. These are only -available with the @code{%error-verbose} directive, which also turn on +available with the @code{%define error-verbose} directive, which also turn on verbose error messages. @end deftypemethod @@ -9172,12 +9425,12 @@ Return immediately from the parser, indicating success. @end deffn @deffn {Statement} {return YYERROR;} -Start error recovery without printing an error message. +Start error recovery without printing an error message. @xref{Error Recovery}. @end deffn @deffn {Statement} {return YYFAIL;} -Print an error message and start error recovery. +Print an error message and start error recovery. @xref{Error Recovery}. @end deffn @@ -9894,10 +10147,6 @@ Insert @var{code} verbatim into output parser source. Equip the parser for debugging. @xref{Decl Summary}. @end deffn -@deffn {Directive} %debug -Equip the parser for debugging. @xref{Decl Summary}. -@end deffn - @ifset defaultprec @deffn {Directive} %default-prec Assign a precedence to rules that lack an explicit @samp{%prec} @@ -9950,8 +10199,7 @@ token is reset to the token that originally caused the violation. @end deffn @deffn {Directive} %error-verbose -Bison declaration to request verbose, specific error message strings -when @code{yyerror} is called. +An obsolete directive standing for @samp{%define error-verbose}. @end deffn @deffn {Directive} %file-prefix "@var{prefix}" @@ -10149,16 +10397,16 @@ instead. @deffn {Function} yyerror User-supplied function to be called by @code{yyparse} on error. -@xref{Error Reporting, ,The Error -Reporting Function @code{yyerror}}. +@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @deffn {Macro} YYERROR_VERBOSE -An obsolete macro that you define with @code{#define} in the prologue -to request verbose, specific error message strings -when @code{yyerror} is called. It doesn't matter what definition you -use for @code{YYERROR_VERBOSE}, just whether you define it. Using -@code{%error-verbose} is preferred. +An obsolete macro used in the @file{yacc.c} skeleton, that you define +with @code{#define} in the prologue to request verbose, specific error +message strings when @code{yyerror} is called. It doesn't matter what +definition you use for @code{YYERROR_VERBOSE}, just whether you define +it. Using @code{%define error-verbose} is preferred (@pxref{Error +Reporting, ,The Error Reporting Function @code{yyerror}}). @end deffn @deffn {Macro} YYINITDEPTH @@ -10272,8 +10520,8 @@ is recovering from a syntax error, and 0 otherwise. @end deffn @deffn {Macro} YYSTACK_USE_ALLOCA -Macro used to control the use of @code{alloca} when the C -@acronym{LALR}(1) parser needs to extend its stacks. If defined to 0, +Macro used to control the use of @code{alloca} when the +deterministic parser in C needs to extend its stacks. If defined to 0, the parser will use @code{malloc} to extend its stacks. If defined to 1, the parser will use @code{alloca}. Values other than 0 and 1 are reserved for future Bison extensions. If not defined, @@ -10298,12 +10546,21 @@ Data type of semantic values; @code{int} by default. @cindex glossary @table @asis +@item Accepting State +A state whose only action is the accept action. +The accepting state is thus a consistent state. +@xref{Understanding,,}. + @item Backus-Naur Form (@acronym{BNF}; also called ``Backus Normal Form'') Formal method of specifying context-free grammars originally proposed by John Backus, and slightly improved by Peter Naur in his 1960-01-02 committee document contributing to what became the Algol 60 report. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item Consistent State +A state containing only one possible action. +@xref{Decl Summary,,lr.default-reductions}. + @item Context-free grammars Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an @@ -10311,6 +10568,14 @@ expression, integers are allowed @emph{anywhere} an expression is permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item Default Reduction +The reduction that a parser should perform if the current parser state +contains no other action for the lookahead token. +In permitted parser states, Bison declares the reduction with the +largest lookahead set to be the default reduction and removes that +lookahead set. +@xref{Decl Summary,,lr.default-reductions}. + @item Dynamic allocation Allocation of memory that occurs during execution, rather than at compile time or on entry to a function. @@ -10329,8 +10594,8 @@ rules. @xref{Algorithm, ,The Bison Parser Algorithm}. @item Generalized @acronym{LR} (@acronym{GLR}) A parsing algorithm that can handle all context-free grammars, including those -that are not @acronym{LALR}(1). It resolves situations that Bison's -usual @acronym{LALR}(1) +that are not @acronym{LR}(1). It resolves situations that Bison's +deterministic parsing algorithm cannot by effectively splitting off multiple parsers, trying all possible parsers, and discarding those that fail in the light of additional right context. @xref{Generalized LR Parsing, ,Generalized @@ -10341,6 +10606,20 @@ A language construct that is (in general) grammatically divisible; for example, `expression' or `declaration' in C@. @xref{Language and Grammar, ,Languages and Context-Free Grammars}. +@item @acronym{IELR}(1) +A minimal @acronym{LR}(1) parser table generation algorithm. +That is, given any context-free grammar, @acronym{IELR}(1) generates +parser tables with the full language recognition power of canonical +@acronym{LR}(1) but with nearly the same number of parser states as +@acronym{LALR}(1). +This reduction in parser states is often an order of magnitude. +More importantly, because canonical @acronym{LR}(1)'s extra parser +states may contain duplicate conflicts in the case of +non-@acronym{LR}(1) grammars, the number of conflicts for +@acronym{IELR}(1) is often an order of magnitude less as well. +This can significantly reduce the complexity of developing of a grammar. +@xref{Decl Summary,,lr.type}. + @item Infix operator An arithmetic operator that is placed between the operands on which it performs some operation. @@ -10384,8 +10663,8 @@ Tokens}. @item @acronym{LALR}(1) The class of context-free grammars that Bison (like most other parser -generators) can handle; a subset of @acronym{LR}(1). @xref{Mystery -Conflicts, ,Mysterious Reduce/Reduce Conflicts}. +generators) can handle by default; a subset of @acronym{LR}(1). +@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}. @item @acronym{LR}(1) The class of context-free grammars in which at most one token of @@ -10480,7 +10759,7 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry @c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa @c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc -@c LocalWords: rpcalc Lexer Gen Comp Expr ltcalc mfcalc Decl Symtab yylex +@c LocalWords: rpcalc Lexer Expr ltcalc mfcalc yylex @c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref @c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex @c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge @@ -10503,4 +10782,4 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: infile ypp yxx outfile itemx tex leaderfill @c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll @c LocalWords: nbar yytext fst snd osplit ntwo strdup AST -@c LocalWords: YYSTACK DVI fdl printindex +@c LocalWords: YYSTACK DVI fdl printindex IELR