X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/766de5eb7cda8eac2c23912f854bdc6c27f90e34..927b425baae16fc1d50e092b78b944e281b521f6:/doc/bison.texinfo?ds=sidebyside diff --git a/doc/bison.texinfo b/doc/bison.texinfo index 2cbfaad5..c004a842 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -44,12 +44,12 @@ This manual is for @acronym{GNU} Bison (version @value{VERSION}, @value{UPDATED}), the @acronym{GNU} parser generator. Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, -1999, 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc. +1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document under the terms of the @acronym{GNU} Free Documentation License, -Version 1.1 or any later version published by the Free Software +Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover texts being ``A @acronym{GNU} Manual,'' and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled @@ -62,7 +62,7 @@ Copies published by the Free Software Foundation raise funds for @end quotation @end copying -@dircategory GNU programming tools +@dircategory Software development @direntry * bison: (bison). @acronym{GNU} parser generator (Yacc replacement). @end direntry @@ -82,8 +82,8 @@ Copies published by the Free Software Foundation raise funds for @insertcopying @sp 2 Published by the Free Software Foundation @* -59 Temple Place, Suite 330 @* -Boston, MA 02111-1307 USA @* +51 Franklin Street, Fifth Floor @* +Boston, MA 02110-1301 USA @* Printed copies are available from the Free Software Foundation.@* @acronym{ISBN} 1-882114-44-2 @sp 2 @@ -117,9 +117,10 @@ Reference sections: messy for Bison to handle straightforwardly. * Debugging:: Understanding or debugging Bison parsers. * Invocation:: How to run Bison (to produce the parser source file). +* C++ Language Interface:: Creating C++ parser objects. +* FAQ:: Frequently Asked Questions * Table of Symbols:: All the keywords of the Bison language are explained. * Glossary:: Basic concepts are explained. -* FAQ:: Frequently Asked Questions * Copying This Manual:: License for copying this manual. * Index:: Cross-references to the text. @@ -144,9 +145,10 @@ The Concepts of Bison Writing @acronym{GLR} Parsers -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. Examples @@ -224,6 +226,7 @@ Tracking Locations Bison Declarations +* Require Decl:: Requiring a Bison version. * Token Decl:: Declaring terminal symbols. * Precedence Decl:: Declaring terminals with precedence and associativity. * Union Decl:: Declaring the set of all semantic value types. @@ -242,6 +245,8 @@ Parser C-Language Interface which reads tokens. * Error Reporting:: You must supply a function @code{yyerror}. * Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. The Lexical Analyzer Function @code{yylex} @@ -264,7 +269,7 @@ The Bison Parser Algorithm * Reduce/Reduce:: When two rules are applicable in the same situation. * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. * Generalized LR Parsing:: Parsing arbitrary context-free grammars. -* Stack Overflow:: What happens when stack gets full. How to avoid it. +* Memory Management:: What happens when memory is exhausted. How to avoid it. Operator Precedence @@ -292,12 +297,32 @@ Invoking Bison * Option Cross Key:: Alphabetical list of long options. * Yacc Library:: Yacc-compatible @code{yylex} and @code{main}. +C++ Language Interface + +* C++ Parsers:: The interface to generate C++ parser classes +* A Complete C++ Example:: Demonstrating their use + +C++ Parsers + +* C++ Bison Interface:: Asking for C++ parser generation +* C++ Semantic Values:: %union vs. C++ +* C++ Location Values:: The position and location classes +* C++ Parser Interface:: Instantiating and running the parser +* C++ Scanner Interface:: Exchanges between yylex and parse + +A Complete C++ Example + +* Calc++ --- C++ Calculator:: The specifications +* Calc++ Parsing Driver:: An active parsing context +* Calc++ Parser:: A parser class +* Calc++ Scanner:: A pure C++ Flex scanner +* Calc++ Top Level:: Conducting the band + Frequently Asked Questions -* Parser Stack Overflow:: Breaking the Stack Limits +* Memory Exhausted:: Breaking the Stack Limits * How Can I Reset the Parser:: @code{yyparse} Keeps some State * Strings are Destroyed:: @code{yylval} Loses Track of Strings -* C++ Parsers:: Compiling Parsers with C++ Compilers * Implementing Gotos/Loops:: Control Flow in the Calculator Copying This Manual @@ -437,15 +462,15 @@ more information on this. @cindex @acronym{GLR} parsing @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars -@cindex non-deterministic parsing +@cindex nondeterministic parsing Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning roughly that the next grammar rule to apply at any point in the input is uniquely determined by the preceding input and a fixed, finite portion (called a @dfn{look-ahead}) of the remaining input. A context-free grammar can be @dfn{ambiguous}, meaning that there are multiple ways to -apply the grammar rules to get the some inputs. Even unambiguous -grammars can be @dfn{non-deterministic}, meaning that no fixed +apply the grammar rules to get the same inputs. Even unambiguous +grammars can be @dfn{nondeterministic}, meaning that no fixed look-ahead always suffices to determine the next grammar rule to apply. With the proper declarations, Bison is also able to parse these more general context-free grammars, using a technique known as @acronym{GLR} @@ -709,9 +734,10 @@ user-defined function on the resulting values to produce an arbitrary merged result. @menu -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars. +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities. +* GLR Semantic Actions:: Deferred semantic actions have special concerns. +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler. @end menu @node Simple GLR Parsers @@ -886,29 +912,27 @@ parser recognizes all valid declarations, according to the limited syntax above, transparently. In fact, the user does not even notice when the parser splits. -So here we have a case where we can use the benefits of @acronym{GLR}, almost -without disadvantages. Even in simple cases like this, however, there -are at least two potential problems to beware. -First, always analyze the conflicts reported by -Bison to make sure that @acronym{GLR} splitting is only done where it is -intended. A @acronym{GLR} parser splitting inadvertently may cause -problems less obvious than an @acronym{LALR} parser statically choosing the -wrong alternative in a conflict. -Second, consider interactions with the lexer (@pxref{Semantic Tokens}) -with great care. Since a split parser consumes tokens -without performing any actions during the split, the lexer cannot -obtain information via parser actions. Some cases of -lexer interactions can be eliminated by using @acronym{GLR} to -shift the complications from the lexer to the parser. You must check -the remaining cases for correctness. - -In our example, it would be safe for the lexer to return tokens -based on their current meanings in some symbol table, because no new -symbols are defined in the middle of a type declaration. Though it -is possible for a parser to define the enumeration -constants as they are parsed, before the type declaration is -completed, it actually makes no difference since they cannot be used -within the same enumerated type declaration. +So here we have a case where we can use the benefits of @acronym{GLR}, +almost without disadvantages. Even in simple cases like this, however, +there are at least two potential problems to beware. First, always +analyze the conflicts reported by Bison to make sure that @acronym{GLR} +splitting is only done where it is intended. A @acronym{GLR} parser +splitting inadvertently may cause problems less obvious than an +@acronym{LALR} parser statically choosing the wrong alternative in a +conflict. Second, consider interactions with the lexer (@pxref{Semantic +Tokens}) with great care. Since a split parser consumes tokens without +performing any actions during the split, the lexer cannot obtain +information via parser actions. Some cases of lexer interactions can be +eliminated by using @acronym{GLR} to shift the complications from the +lexer to the parser. You must check the remaining cases for +correctness. + +In our example, it would be safe for the lexer to return tokens based on +their current meanings in some symbol table, because no new symbols are +defined in the middle of a type declaration. Though it is possible for +a parser to define the enumeration constants as they are parsed, before +the type declaration is completed, it actually makes no difference since +they cannot be used within the same enumerated type declaration. @node Merging GLR Parses @subsection Using @acronym{GLR} to Resolve Ambiguities @@ -1072,6 +1096,52 @@ productions that participate in any particular merge have identical and the parser will report an error during any parse that results in the offending merge. +@node GLR Semantic Actions +@subsection GLR Semantic Actions + +@cindex deferred semantic actions +By definition, a deferred semantic action is not performed at the same time as +the associated reduction. +This raises caveats for several Bison features you might use in a semantic +action in a @acronym{GLR} parser. + +@vindex yychar +@cindex @acronym{GLR} parsers and @code{yychar} +@vindex yylval +@cindex @acronym{GLR} parsers and @code{yylval} +@vindex yylloc +@cindex @acronym{GLR} parsers and @code{yylloc} +In any semantic action, you can examine @code{yychar} to determine the type of +the look-ahead token present at the time of the associated reduction. +After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF}, +you can then examine @code{yylval} and @code{yylloc} to determine the +look-ahead token's semantic value and location, if any. +In a nondeferred semantic action, you can also modify any of these variables to +influence syntax analysis. +@xref{Look-Ahead, ,Look-Ahead Tokens}. + +@findex yyclearin +@cindex @acronym{GLR} parsers and @code{yyclearin} +In a deferred semantic action, it's too late to influence syntax analysis. +In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to +shallow copies of the values they had at the time of the associated reduction. +For this reason alone, modifying them is dangerous. +Moreover, the result of modifying them is undefined and subject to change with +future versions of Bison. +For example, if a semantic action might be deferred, you should never write it +to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free +memory referenced by @code{yylval}. + +@findex YYERROR +@cindex @acronym{GLR} parsers and @code{YYERROR} +Another Bison feature requiring special consideration is @code{YYERROR} +(@pxref{Action Features}), which you can invoke in any semantic action to +initiate error recovery. +During deterministic @acronym{GLR} operation, the effect of @code{YYERROR} is +the same as its effect in an @acronym{LALR}(1) parser. +In a deferred semantic action, its effect is undefined. +@c The effect is probably a syntax error at the split point. + @node Compiler Requirements @subsection Considerations when Compiling @acronym{GLR} Parsers @cindex @code{inline} @@ -1174,13 +1244,17 @@ function @code{yyerror} and the parser function @code{yyparse} itself. This also includes numerous identifiers used for internal purposes. Therefore, you should avoid using C identifiers starting with @samp{yy} or @samp{YY} in the Bison grammar file except for the ones defined in -this manual. +this manual. Also, you should avoid using the C identifiers +@samp{malloc} and @samp{free} for anything other than their usual +meanings. In some cases the Bison parser file includes system headers, and in those cases your code should respect the identifiers reserved by those -headers. On some non-@acronym{GNU} hosts, @code{}, +headers. On some non-@acronym{GNU} hosts, @code{}, @code{}, @code{}, and @code{} are included as needed to -declare memory allocators and related types. Other system headers may +declare memory allocators and related types. @code{} is +included if message translation is in use +(@pxref{Internationalization}). Other system headers may be included if you define @code{YYDEBUG} to a nonzero value (@pxref{Tracing, ,Tracing Your Parser}). @@ -1691,12 +1765,12 @@ With all the source in a single file, you use the following command to convert it into a parser file: @example -bison @var{file_name}.y +bison @var{file}.y @end example @noindent In this example the file was called @file{rpcalc.y} (for ``Reverse Polish -@sc{calc}ulator''). Bison produces a file named @file{@var{file_name}.tab.c}, +@sc{calc}ulator''). Bison produces a file named @file{@var{file}.tab.c}, removing the @samp{.y} from the original file name. The file output by Bison contains the source code for @code{yyparse}. The additional functions in the input file (@code{yylex}, @code{yyerror} and @code{main}) @@ -2098,7 +2172,7 @@ as @code{sin}, @code{cos}, etc. It is easy to add new operators to the infix calculator as long as they are only single-character literals. The lexical analyzer @code{yylex} passes -back all nonnumber characters as tokens, so new grammar rules suffice for +back all nonnumeric characters as tokens, so new grammar rules suffice for adding a new operator. But we want something more flexible: built-in functions whose syntax has this form: @@ -2272,7 +2346,7 @@ typedef struct symrec symrec; /* The symbol table: a chain of `struct symrec'. */ extern symrec *sym_table; -symrec *putsym (char const *, func_t); +symrec *putsym (char const *, int); symrec *getsym (char const *); @end group @end smallexample @@ -2383,7 +2457,7 @@ getsym (char const *sym_name) The function @code{yylex} must now recognize variables, numeric values, and the single-character arithmetic operators. Strings of alphanumeric -characters with a leading non-digit are recognized as either variables or +characters with a leading letter are recognized as either variables or functions depending on what the symbol table says about them. The string is passed to @code{getsym} for look up in the symbol table. If @@ -2557,13 +2631,17 @@ continues until end of line. @cindex Prologue @cindex declarations -The @var{Prologue} section contains macro definitions and -declarations of functions and variables that are used in the actions in the -grammar rules. These are copied to the beginning of the parser file so -that they precede the definition of @code{yyparse}. You can use -@samp{#include} to get the declarations from a header file. If you don't -need any C declarations, you may omit the @samp{%@{} and @samp{%@}} -delimiters that bracket this section. +The @var{Prologue} section contains macro definitions and declarations +of functions and variables that are used in the actions in the grammar +rules. These are copied to the beginning of the parser file so that +they precede the definition of @code{yyparse}. You can use +@samp{#include} to get the declarations from a header file. If you +don't need any C declarations, you may omit the @samp{%@{} and +@samp{%@}} delimiters that bracket this section. + +The @var{Prologue} section is terminated by the the first occurrence +of @samp{%@}} that is outside a comment, a string literal, or a +character constant. You may have more than one @var{Prologue} section, intermixed with the @var{Bison declarations}. This allows you to have C and Bison @@ -2627,16 +2705,16 @@ not come before the definition of @code{yyparse}. For example, the definitions of @code{yylex} and @code{yyerror} often go here. Because C requires functions to be declared before being used, you often need to declare functions like @code{yylex} and @code{yyerror} in the Prologue, -even if you define them int he Epilogue. +even if you define them in the Epilogue. @xref{Interface, ,Parser C-Language Interface}. If the last section is empty, you may omit the @samp{%%} that separates it from the grammar rules. -The Bison parser itself contains many macros and identifiers whose -names start with @samp{yy} or @samp{YY}, so it is a -good idea to avoid using any such names (except those documented in this -manual) in the epilogue of the grammar file. +The Bison parser itself contains many macros and identifiers whose names +start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using +any such names (except those documented in this manual) in the epilogue +of the grammar file. @node Symbols @section Symbols, Terminal and Nonterminal @@ -2652,13 +2730,13 @@ A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a class of syntactically equivalent tokens. You use the symbol in grammar rules to mean that a token in that class is allowed. The symbol is represented in the Bison parser by a numeric code, and the @code{yylex} -function returns a token type code to indicate what kind of token has been -read. You don't need to know what the code value is; you can use the -symbol to stand for it. +function returns a token type code to indicate what kind of token has +been read. You don't need to know what the code value is; you can use +the symbol to stand for it. -A @dfn{nonterminal symbol} stands for a class of syntactically equivalent -groupings. The symbol name is used in writing grammar rules. By convention, -it should be all lower case. +A @dfn{nonterminal symbol} stands for a class of syntactically +equivalent groupings. The symbol name is used in writing grammar rules. +By convention, it should be all lower case. Symbol names can contain letters, digits (not at the beginning), underscores and periods. Periods make sense only in nonterminals. @@ -2754,7 +2832,7 @@ into a separate header file @file{@var{name}.tab.h} which you can include in the other source files that need it. @xref{Invocation, ,Invoking Bison}. If you want to write a grammar that is portable to any Standard C -host, you must use only non-null character tokens taken from the basic +host, you must use only nonnull character tokens taken from the basic execution character set of Standard C@. This set consists of the ten digits, the 52 lower- and upper-case English letters, and the characters in the following C-language string: @@ -2763,17 +2841,17 @@ characters in the following C-language string: "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~" @end example -The @code{yylex} function and Bison must use a consistent character -set and encoding for character tokens. For example, if you run Bison in an -@acronym{ASCII} environment, but then compile and run the resulting program -in an environment that uses an incompatible character set like -@acronym{EBCDIC}, the resulting program may not work because the -tables generated by Bison will assume @acronym{ASCII} numeric values for -character tokens. It is standard -practice for software distributions to contain C source files that -were generated by Bison in an @acronym{ASCII} environment, so installers on -platforms that are incompatible with @acronym{ASCII} must rebuild those -files before compiling them. +The @code{yylex} function and Bison must use a consistent character set +and encoding for character tokens. For example, if you run Bison in an +@acronym{ASCII} environment, but then compile and run the resulting +program in an environment that uses an incompatible character set like +@acronym{EBCDIC}, the resulting program may not work because the tables +generated by Bison will assume @acronym{ASCII} numeric values for +character tokens. It is standard practice for software distributions to +contain C source files that were generated by Bison in an +@acronym{ASCII} environment, so installers on platforms that are +incompatible with @acronym{ASCII} must rebuild those files before +compiling them. The symbol @code{error} is a terminal symbol reserved for error recovery (@pxref{Error Recovery}); you shouldn't use it for any other purpose. @@ -2825,6 +2903,22 @@ the semantics of the rule. An action looks like this: @end example @noindent +@cindex braced code +This is an example of @dfn{braced code}, that is, C code surrounded by +braces, much like a compound statement in C@. Braced code can contain +any sequence of C tokens, so long as its braces are balanced. Bison +does not check the braced code for correctness directly; it merely +copies the code to the output file, where the C compiler can check it. + +Within braced code, the balanced-brace count is not affected by braces +within comments, string literals, or character constants, but it is +affected by the C digraphs @samp{<%} and @samp{%>} that represent +braces. At the top level braced code must be terminated by @samp{@}} +and not by a digraph. Bison does not look for trigraphs, so if braced +code uses trigraphs you should ensure that they do not affect the +nesting of braces or the boundaries of comments, string literals, or +character constants. + Usually there is only one action and it follows the components. @xref{Actions}. @@ -2880,10 +2974,10 @@ with no components. @section Recursive Rules @cindex recursive rule -A rule is called @dfn{recursive} when its @var{result} nonterminal appears -also on its right hand side. Nearly all Bison grammars need to use -recursion, because that is the only way to define a sequence of any number -of a particular thing. Consider this recursive definition of a +A rule is called @dfn{recursive} when its @var{result} nonterminal +appears also on its right hand side. Nearly all Bison grammars need to +use recursion, because that is the only way to define a sequence of any +number of a particular thing. Consider this recursive definition of a comma-separated sequence of one or more expressions: @example @@ -2997,8 +3091,9 @@ This macro definition must go in the prologue of the grammar file In most programs, you will need different data types for different kinds of tokens and groupings. For example, a numeric constant may need type -@code{int} or @code{long int}, while a string constant needs type @code{char *}, -and an identifier might need a pointer to an entry in the symbol table. +@code{int} or @code{long int}, while a string constant needs type +@code{char *}, and an identifier might need a pointer to an entry in the +symbol table. To use more than one data type for semantic values in one parser, Bison requires you to do two things: @@ -3028,14 +3123,8 @@ each time an instance of that rule is recognized. The task of most actions is to compute a semantic value for the grouping built by the rule from the semantic values associated with tokens or smaller groupings. -An action consists of C statements surrounded by braces, much like a -compound statement in C@. An action can contain any sequence of C -statements. Bison does not look for trigraphs, though, so if your C -code uses trigraphs you should ensure that they do not affect the -nesting of braces or the boundaries of comments, strings, or character -literals. - -An action can be placed at any position in the rule; +An action consists of braced code containing C statements, and can be +placed at any position in the rule; it is executed at that position. Most rules have just one action at the end of the rule, following all the components. Actions in the middle of a rule are tricky and used only for special purposes (@pxref{Mid-Rule @@ -3113,6 +3202,12 @@ As long as @code{bar} is used only in the fashion shown here, @code{$0} always refers to the @code{expr} which precedes @code{bar} in the definition of @code{foo}. +@vindex yylval +It is also possible to access the semantic value of the look-ahead token, if +any, from a semantic action. +This semantic value is stored in @code{yylval}. +@xref{Action Features, ,Special Features for Use in Actions}. + @node Action Types @subsection Data Types of Values in Actions @cindex action data types @@ -3430,6 +3525,12 @@ exp: @dots{} @end group @end example +@vindex yylloc +It is also possible to access the location of the look-ahead token, if any, +from a semantic action. +This location is stored in @code{yylloc}. +@xref{Action Features, ,Special Features for Use in Actions}. + @node Location Default Action @subsection Default Action for Locations @vindex YYLLOC_DEFAULT @@ -3479,7 +3580,7 @@ By default, @code{YYLLOC_DEFAULT} is defined this way: where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol in @var{rhs} when @var{k} is positive, and the location of the symbol -just before the reduction when @var{k} is zero. +just before the reduction when @var{k} and @var{n} are both zero. When defining @code{YYLLOC_DEFAULT}, you should consider that: @@ -3521,6 +3622,7 @@ it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}). @menu +* Require Decl:: Requiring a Bison version. * Token Decl:: Declaring terminal symbols. * Precedence Decl:: Declaring terminals with precedence and associativity. * Union Decl:: Declaring the set of all semantic value types. @@ -3533,6 +3635,20 @@ Grammars}). * Decl Summary:: Table of all Bison declarations. @end menu +@node Require Decl +@subsection Require a Version of Bison +@cindex version requirement +@cindex requiring a version of Bison +@findex %require + +You may require the minimum version of Bison to process the grammar. If +the requirement is not met, @command{bison} exits with an error (exit +status 63). + +@example +%require "@var{version}" +@end example + @node Token Decl @subsection Token Type Names @cindex declaring token type names @@ -3666,10 +3782,10 @@ the one declared later has the higher precedence and is grouped first. @cindex value types, declaring @findex %union -The @code{%union} declaration specifies the entire collection of possible -data types for semantic values. The keyword @code{%union} is followed by a -pair of braces containing the same thing that goes inside a @code{union} in -C. +The @code{%union} declaration specifies the entire collection of +possible data types for semantic values. The keyword @code{%union} is +followed by braced code containing the same thing that goes inside a +@code{union} in C@. For example: @@ -3700,10 +3816,15 @@ As an extension to @acronym{POSIX}, a tag is allowed after the @end group @end example +@noindent specifies the union tag @code{value}, so the corresponding C type is @code{union value}. If you do not specify a tag, it defaults to @code{YYSTYPE}. +As another extension to @acronym{POSIX}, you may specify multiple +@code{%union} declarations; their contents are concatenated. However, +only the first @code{%union} declaration can specify a tag. + Note that, unlike making a @code{union} declaration in C, you need not write a semicolon after the closing brace. @@ -3745,7 +3866,7 @@ code. @deffn {Directive} %initial-action @{ @var{code} @} @findex %initial-action -Declare that the @var{code} must be invoked before parsing each time +Declare that the braced @var{code} must be invoked before parsing each time @code{yyparse} is called. The @var{code} may use @code{$$} and @code{@@$} --- initial value and location of the look-ahead --- and the @code{%parse-param}. @@ -3754,10 +3875,10 @@ Declare that the @var{code} must be invoked before parsing each time For instance, if your locations use a file name, you may use @example -%parse-param @{ const char *filename @}; +%parse-param @{ char const *file_name @}; %initial-action @{ - @@$.begin.filename = @@$.end.filename = filename; + @@$.initialize (file_name); @}; @end example @@ -3767,29 +3888,29 @@ For instance, if your locations use a file name, you may use @cindex freeing discarded symbols @findex %destructor -Some symbols can be discarded by the parser. For instance, during error -recovery (@pxref{Error Recovery}), embarrassing symbols already pushed -on the stack, and embarrassing tokens coming from the rest of the file -are thrown away until the parser falls on its feet. If these symbols -convey heap based information, this memory is lost. While this behavior -can be tolerable for batch parsers, such as in compilers, it is not for -possibly ``never ending'' parsers such as shells, or implementations of -communication protocols. +During error recovery (@pxref{Error Recovery}), symbols already pushed +on the stack and tokens coming from the rest of the file are discarded +until the parser falls on its feet. If the parser runs out of memory, +or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the +symbols on the stack must be discarded. Even if the parser succeeds, it +must discard the start symbol. -The @code{%destructor} directive allows for the definition of code that -is called when a symbol is thrown away. +When discarded symbols convey heap based information, this memory is +lost. While this behavior can be tolerable for batch parsers, such as +in traditional compilers, it is unacceptable for programs like shells or +protocol implementations that may parse and execute indefinitely. + +The @code{%destructor} directive defines code that is called when a +symbol is automatically discarded. @deffn {Directive} %destructor @{ @var{code} @} @var{symbols} @findex %destructor -Declare that the @var{code} must be invoked for each of the -@var{symbols} that will be discarded by the parser. The @var{code} -should use @code{$$} to designate the semantic value associated to the -@var{symbols}. The additional parser parameters are also available -(@pxref{Parser Function, , The Parser Function @code{yyparse}}). - -@strong{Warning:} as of Bison 1.875, this feature is still considered as -experimental, as there was not enough user feedback. In particular, -the syntax might still change. +Invoke the braced @var{code} whenever the parser discards one of the +@var{symbols}. +Within @var{code}, @code{$$} designates the semantic value associated +with the discarded symbol. The additional parser parameters are also +available (@pxref{Parser Function, , The Parser Function +@code{yyparse}}). @end deffn For instance: @@ -3805,27 +3926,9 @@ For instance: @end smallexample @noindent -guarantees that when a @code{STRING} or a @code{string} will be discarded, +guarantees that when a @code{STRING} or a @code{string} is discarded, its associated memory will be freed. -Note that in the future, Bison might also consider that right hand side -members that are not mentioned in the action can be destroyed. For -instance, in: - -@smallexample -comment: "/*" STRING "*/"; -@end smallexample - -@noindent -the parser is entitled to destroy the semantic value of the -@code{string}. Of course, this will not apply to the default action; -compare: - -@smallexample -typeless: string; // $$ = $1 does not apply; $1 is destroyed. -typefull: string; // $$ = $1 applies, $1 is not destroyed. -@end smallexample - @sp 1 @cindex discarded symbols @@ -3837,10 +3940,20 @@ stacked symbols popped during the first phase of error recovery, @item incoming terminals during the second phase of error recovery, @item -the current look-ahead when the parser aborts (either via an explicit -call to @code{YYABORT}, or as a consequence of a failed error recovery). +the current look-ahead and the entire stack (except the current +right-hand side symbols) when the parser returns immediately, and +@item +the start symbol, when the parser succeeds. @end itemize +The parser can @dfn{return immediately} because of an explicit call to +@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory +exhaustion. + +Right-hand size symbols of a rule that explicitly triggers a syntax +error via @code{YYERROR} are not discarded automatically. As a rule +of thumb, destructors are invoked only when user actions cannot manage +the memory. @node Expect Decl @subsection Suppressing Conflict Warnings @@ -3864,19 +3977,18 @@ The declaration looks like this: %expect @var{n} @end example -Here @var{n} is a decimal integer. The declaration says there should be -no warning if there are @var{n} shift/reduce conflicts and no -reduce/reduce conflicts. The usual warning is -given if there are either more or fewer conflicts, or if there are any -reduce/reduce conflicts. +Here @var{n} is a decimal integer. The declaration says there should +be @var{n} shift/reduce conflicts and no reduce/reduce conflicts. +Bison reports an error if the number of shift/reduce conflicts differs +from @var{n}, or if there are any reduce/reduce conflicts. -For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more serious, -and should be eliminated entirely. Bison will always report -reduce/reduce conflicts for these parsers. With @acronym{GLR} parsers, however, -both shift/reduce and reduce/reduce are routine (otherwise, there -would be no need to use @acronym{GLR} parsing). Therefore, it is also possible -to specify an expected number of reduce/reduce conflicts in @acronym{GLR} -parsers, using the declaration: +For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more +serious, and should be eliminated entirely. Bison will always report +reduce/reduce conflicts for these parsers. With @acronym{GLR} +parsers, however, both kinds of conflicts are routine; otherwise, +there would be no need to use @acronym{GLR} parsing. Therefore, it is +also possible to specify an expected number of reduce/reduce conflicts +in @acronym{GLR} parsers, using the declaration: @example %expect-rr @var{n} @@ -3897,12 +4009,12 @@ go back to the beginning. @item Add an @code{%expect} declaration, copying the number @var{n} from the -number which Bison printed. +number which Bison printed. With @acronym{GLR} parsers, add an +@code{%expect-rr} declaration as well. @end itemize -Now Bison will stop annoying you if you do not change the number of -conflicts, but it will warn you again if changes in the grammar result -in more or fewer conflicts. +Now Bison will warn you if you introduce an unexpected conflict, but +will keep silent otherwise. @node Start Decl @subsection The Start-Symbol @@ -3928,8 +4040,8 @@ may override this restriction with the @code{%start} declaration as follows: A @dfn{reentrant} program is one which does not alter in the course of execution; in other words, it consists entirely of @dfn{pure} (read-only) code. Reentrancy is important whenever asynchronous execution is possible; -for example, a non-reentrant program may not be safe to call from a signal -handler. In systems with multiple threads of control, a non-reentrant +for example, a nonreentrant program may not be safe to call from a signal +handler. In systems with multiple threads of control, a nonreentrant program must be called only within interlocks. Normally, Bison generates a parser which is not reentrant. This is @@ -4035,13 +4147,12 @@ is named @file{@var{name}.h}. Unless @code{YYSTYPE} is already defined as a macro, the output header declares @code{YYSTYPE}. Therefore, if you are using a @code{%union} -(@pxref{Multiple Types, ,More Than One Value Type}) with components -that require other definitions, or if you have defined a -@code{YYSTYPE} macro (@pxref{Value Type, ,Data Types of Semantic -Values}), you need to arrange for these definitions to be propagated to -all modules, e.g., by putting them in a -prerequisite header that is included both by your parser and by any -other module that needs @code{YYSTYPE}. +(@pxref{Multiple Types, ,More Than One Value Type}) with components that +require other definitions, or if you have defined a @code{YYSTYPE} macro +(@pxref{Value Type, ,Data Types of Semantic Values}), you need to +arrange for these definitions to be propagated to all modules, e.g., by +putting them in a prerequisite header that is included both by your +parser and by any other module that needs @code{YYSTYPE}. Unless your parser is pure, the output header declares @code{yylval} as an external variable. @xref{Pure Decl, ,A Pure (Reentrant) @@ -4052,15 +4163,15 @@ If you have also used locations, the output header declares @code{YYSTYPE} and @code{yylval}. @xref{Locations, ,Tracking Locations}. -This output file is normally essential if you wish to put the -definition of @code{yylex} in a separate source file, because -@code{yylex} typically needs to be able to refer to the -above-mentioned declarations and to the token type codes. -@xref{Token Values, ,Semantic Values of Tokens}. +This output file is normally essential if you wish to put the definition +of @code{yylex} in a separate source file, because @code{yylex} +typically needs to be able to refer to the above-mentioned declarations +and to the token type codes. @xref{Token Values, ,Semantic Values of +Tokens}. @end deffn @deffn {Directive} %destructor -Specifying how the parser should reclaim the memory associated to +Specify how the parser should reclaim the memory associated to discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}. @end deffn @@ -4102,7 +4213,7 @@ parser file contains just @code{#define} directives and static variable declarations. This option also tells Bison to write the C code for the grammar actions -into a file named @file{@var{filename}.act}, in the form of a +into a file named @file{@var{file}.act}, in the form of a brace-surrounded body fit for a @code{switch} statement. @end deffn @@ -4115,8 +4226,8 @@ associate errors with the parser file, treating it an independent source file in its own right. @end deffn -@deffn {Directive} %output="@var{filename}" -Specify the @var{filename} for the parser file. +@deffn {Directive} %output="@var{file}" +Specify @var{file} for the parser file. @end deffn @deffn {Directive} %pure-parser @@ -4124,6 +4235,11 @@ Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). @end deffn +@deffn {Directive} %require "@var{version}" +Require version @var{version} or higher of Bison. @xref{Require Decl, , +Require a Version of Bison}. +@end deffn + @deffn {Directive} %token-table Generate an array of token names in the parser file. The name of the array is @code{yytname}; @code{yytname[@var{i}]} is the name of the @@ -4133,15 +4249,14 @@ three elements of @code{yytname} correspond to the predefined tokens @code{"error"}, and @code{"$undefined"}; after these come the symbols defined in the grammar file. -For single-character literal tokens and literal string tokens, the name -in the table includes the single-quote or double-quote characters: for -example, @code{"'+'"} is a single-character literal and @code{"\"<=\""} -is a literal string token. All the characters of the literal string -token appear verbatim in the string found in the table; even -double-quote characters are not escaped. For example, if the token -consists of three characters @samp{*"*}, its string in @code{yytname} -contains @samp{"*"*"}. (In C, that would be written as -@code{"\"*\"*\""}). +The name in the table includes all the characters needed to represent +the token in Bison. For single-character literals and literal +strings, this includes the surrounding quoting characters and any +escape sequences. For example, the Bison single-character literal +@code{'+'} corresponds to a three-character name, represented in C as +@code{"'+'"}; and the Bison two-character literal string @code{"\\/"} +corresponds to a five-character name, represented in C as +@code{"\"\\\\/\""}. When you specify @code{%token-table}, Bison also generates macro definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and @@ -4222,6 +4337,8 @@ in the grammar file, you are likely to run into trouble. which reads tokens. * Error Reporting:: You must supply a function @code{yyerror}. * Action Features:: Special features for use in actions. +* Internationalization:: How to let the parser speak in the user's + native language. @end menu @node Parser Function @@ -4239,7 +4356,11 @@ without reading further. The value returned by @code{yyparse} is 0 if parsing was successful (return is due to end-of-input). -The value is 1 if parsing failed (return is due to a syntax error). +The value is 1 if parsing failed because of invalid input, i.e., input +that contains a syntax error or that causes @code{YYABORT} to be +invoked. + +The value is 2 if parsing failed due to memory exhaustion. @end deftypefun In an action, you can cause immediate return from @code{yyparse} by using @@ -4261,8 +4382,8 @@ declaration @code{%parse-param}: @deffn {Directive} %parse-param @{@var{argument-declaration}@} @findex %parse-param -Declare that an argument declared by @code{argument-declaration} is an -additional @code{yyparse} argument. +Declare that an argument declared by the braced-code +@var{argument-declaration} is an additional @code{yyparse} argument. The @var{argument-declaration} is used when declaring functions or prototypes. The last identifier in @var{argument-declaration} must be the argument name. @@ -4380,11 +4501,13 @@ the grammar file has no effect on @code{yylex}. table. The index of the token in the table is the token type's code. The name of a multicharacter token is recorded in @code{yytname} with a double-quote, the token's characters, and another double-quote. The -token's characters are not escaped in any way; they appear verbatim in -the contents of the string in the table. +token's characters are escaped as necessary to be suitable as input +to Bison. -Here's code for looking up a token in @code{yytname}, assuming that the -characters of the token are stored in @code{token_buffer}. +Here's code for looking up a multicharacter token in @code{yytname}, +assuming that the characters of the token are stored in +@code{token_buffer}, and assuming that the token does not contain any +characters like @samp{"} that require escaping. @smallexample for (i = 0; i < YYNTOKENS; i++) @@ -4407,7 +4530,7 @@ The @code{yytname} table is generated only if you use the @subsection Semantic Values of Tokens @vindex yylval -In an ordinary (non-reentrant) parser, the semantic value of the token must +In an ordinary (nonreentrant) parser, the semantic value of the token must be stored into the global variable @code{yylval}. When you are using just one data type for semantic values, @code{yylval} has that type. Thus, if the type is @code{int} (the default), you might write this in @@ -4455,12 +4578,11 @@ then the code in @code{yylex} might look like this: @vindex yylloc If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , -Tracking Locations}) in actions to keep track of the -textual locations of tokens and groupings, then you must provide this -information in @code{yylex}. The function @code{yyparse} expects to -find the textual location of a token just parsed in the global variable -@code{yylloc}. So @code{yylex} must store the proper data in that -variable. +Tracking Locations}) in actions to keep track of the textual locations +of tokens and groupings, then you must provide this information in +@code{yylex}. The function @code{yyparse} expects to find the textual +location of a token just parsed in the global variable @code{yylloc}. +So @code{yylex} must store the proper data in that variable. By default, the value of @code{yylloc} is a structure and you need only initialize the members that are going to be used by the actions. The @@ -4505,8 +4627,8 @@ Function}). @deffn {Directive} lex-param @{@var{argument-declaration}@} @findex %lex-param -Declare that @code{argument-declaration} is an additional @code{yylex} -argument declaration. +Declare that the braced-code @var{argument-declaration} is an +additional @code{yylex} argument declaration. @end deffn For instance: @@ -4565,13 +4687,16 @@ declarations section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then Bison provides a more verbose and specific error message string instead of just plain @w{@code{"syntax error"}}. -The parser can detect one other kind of error: stack overflow. This -happens when the input contains constructions that are very deeply +The parser can detect one other kind of error: memory exhaustion. This +can happen when the input contains constructions that are very deeply nested. It isn't likely you will encounter this, since the Bison -parser extends its stack automatically up to a very large limit. But -if overflow happens, @code{yyparse} calls @code{yyerror} in the usual -fashion, except that the argument string is @w{@code{"parser stack -overflow"}}. +parser normally extends its stack automatically up to a very large limit. But +if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual +fashion, except that the argument string is @w{@code{"memory exhausted"}}. + +In some cases diagnostics like @w{@code{"syntax error"}} are +translated automatically from English to some other language before +they are passed to @code{yyerror}. @xref{Internationalization}. The following definition suffices in simple programs: @@ -4652,7 +4777,7 @@ preferable since it more accurately describes the return type for @vindex yynerrs The variable @code{yynerrs} contains the number of syntax errors -encountered so far. Normally this variable is global; but if you +reported so far. Normally this variable is global; but if you request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) then it is a local variable which only the actions can access. @@ -4718,6 +4843,12 @@ In either case, the rest of the action is not executed. Value stored in @code{yychar} when there is no look-ahead token. @end deffn +@deffn {Macro} YYEOF +@vindex YYEOF +Value stored in @code{yychar} when the look-ahead is the end of the input +stream. +@end deffn + @deffn {Macro} YYERROR; @findex YYERROR Cause an immediate syntax error. This statement initiates error @@ -4734,15 +4865,20 @@ is recovering from a syntax error, and 0 the rest of the time. @end deffn @deffn {Variable} yychar -Variable containing the current look-ahead token. (In a pure parser, -this is actually a local variable within @code{yyparse}.) When there is -no look-ahead token, the value @code{YYEMPTY} is stored in the variable. +Variable containing either the look-ahead token, or @code{YYEOF} when the +look-ahead is the end of the input stream, or @code{YYEMPTY} when no look-ahead +has been performed so the next token is not yet known. +Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic +Actions}). @xref{Look-Ahead, ,Look-Ahead Tokens}. @end deffn @deffn {Macro} yyclearin; Discard the current look-ahead token. This is useful primarily in -error rules. @xref{Error Recovery}. +error rules. +Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR +Semantic Actions}). +@xref{Error Recovery}. @end deffn @deffn {Macro} yyerrok; @@ -4751,6 +4887,22 @@ errors. This is useful primarily in error rules. @xref{Error Recovery}. @end deffn +@deffn {Variable} yylloc +Variable containing the look-ahead token location when @code{yychar} is not set +to @code{YYEMPTY} or @code{YYEOF}. +Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic +Actions}). +@xref{Actions and Locations, ,Actions and Locations}. +@end deffn + +@deffn {Variable} yylval +Variable containing the look-ahead token semantic value when @code{yychar} is +not set to @code{YYEMPTY} or @code{YYEOF}. +Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic +Actions}). +@xref{Actions, ,Actions}. +@end deffn + @deffn {Value} @@$ @findex @@$ Acts like a structure variable containing information on the textual location @@ -4784,6 +4936,89 @@ of the @var{n}th component of the current rule. @xref{Locations, , Tracking Locations}. @end deffn +@node Internationalization +@section Parser Internationalization +@cindex internationalization +@cindex i18n +@cindex NLS +@cindex gettext +@cindex bison-po + +A Bison-generated parser can print diagnostics, including error and +tracing messages. By default, they appear in English. However, Bison +also supports outputting diagnostics in the user's native language. To +make this work, the user should set the usual environment variables. +@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}. +For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might +set the user's locale to French Canadian using the @acronym{UTF}-8 +encoding. The exact set of available locales depends on the user's +installation. + +The maintainer of a package that uses a Bison-generated parser enables +the internationalization of the parser's output through the following +steps. Here we assume a package that uses @acronym{GNU} Autoconf and +@acronym{GNU} Automake. + +@enumerate +@item +@cindex bison-i18n.m4 +Into the directory containing the @acronym{GNU} Autoconf macros used +by the package---often called @file{m4}---copy the +@file{bison-i18n.m4} file installed by Bison under +@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory. +For example: + +@example +cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4 +@end example + +@item +@findex BISON_I18N +@vindex BISON_LOCALEDIR +@vindex YYENABLE_NLS +In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT} +invocation, add an invocation of @code{BISON_I18N}. This macro is +defined in the file @file{bison-i18n.m4} that you copied earlier. It +causes @samp{configure} to find the value of the +@code{BISON_LOCALEDIR} variable, and it defines the source-language +symbol @code{YYENABLE_NLS} to enable translations in the +Bison-generated parser. + +@item +In the @code{main} function of your program, designate the directory +containing Bison's runtime message catalog, through a call to +@samp{bindtextdomain} with domain name @samp{bison-runtime}. +For example: + +@example +bindtextdomain ("bison-runtime", BISON_LOCALEDIR); +@end example + +Typically this appears after any other call @code{bindtextdomain +(PACKAGE, LOCALEDIR)} that your package already has. Here we rely on +@samp{BISON_LOCALEDIR} to be defined as a string through the +@file{Makefile}. + +@item +In the @file{Makefile.am} that controls the compilation of the @code{main} +function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro, +either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example: + +@example +DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' +@end example + +or: + +@example +AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"' +@end example + +@item +Finally, invoke the command @command{autoreconf} to generate the build +infrastructure. +@end enumerate + @node Algorithm @chapter The Bison Parser Algorithm @@ -4850,7 +5085,7 @@ This kind of parser is known in the literature as a bottom-up parser. * Reduce/Reduce:: When two rules are applicable in the same situation. * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. * Generalized LR Parsing:: Parsing arbitrary context-free grammars. -* Stack Overflow:: What happens when stack gets full. How to avoid it. +* Memory Management:: What happens when memory is exhausted. How to avoid it. @end menu @node Look-Ahead @@ -4905,7 +5140,11 @@ doing so would produce on the stack the sequence of symbols @code{expr '!'}. No rule allows that sequence. @vindex yychar -The current look-ahead token is stored in the variable @code{yychar}. +@vindex yylval +@vindex yylloc +The look-ahead token is stored in the variable @code{yychar}. +Its semantic value and location, if any, are stored in the variables +@code{yylval} and @code{yylloc}. @xref{Action Features, ,Special Features for Use in Actions}. @node Shift/Reduce @@ -5468,12 +5707,19 @@ return_spec: ; @end example +For a more detailed exposition of @acronym{LALR}(1) parsers and parser +generators, please see: +Frank DeRemer and Thomas Pennello, Efficient Computation of +@acronym{LALR}(1) Look-Ahead Sets, @cite{@acronym{ACM} Transactions on +Programming Languages and Systems}, Vol.@: 4, No.@: 4 (October 1982), +pp.@: 615--649 @uref{http://doi.acm.org/10.1145/69622.357187}. + @node Generalized LR Parsing @section Generalized @acronym{LR} (@acronym{GLR}) Parsing @cindex @acronym{GLR} parsing @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars -@cindex non-deterministic parsing +@cindex nondeterministic parsing Bison produces @emph{deterministic} parsers that choose uniquely when to reduce and which reduction to apply @@ -5538,10 +5784,10 @@ quadratic worst-case time, and any general (possibly ambiguous) context-free grammar in cubic worst-case time. However, Bison currently uses a simpler data structure that requires time proportional to the length of the input times the maximum number of stacks required for any -prefix of the input. Thus, really ambiguous or non-deterministic +prefix of the input. Thus, really ambiguous or nondeterministic grammars can require exponential time and space to process. Such badly behaving examples, however, are not generally of practical interest. -Usually, non-determinism in a grammar is local---the parser is ``in +Usually, nondeterminism in a grammar is local---the parser is ``in doubt'' only for a few tokens at a time. Therefore, the current data structure should generally be adequate. On @acronym{LALR}(1) portions of a grammar, in particular, it is only slightly slower than with the default @@ -5554,16 +5800,17 @@ London, Department of Computer Science, TR-00-12, @uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps}, (2000-12-24). -@node Stack Overflow -@section Stack Overflow, and How to Avoid It +@node Memory Management +@section Memory Management, and How to Avoid Memory Exhaustion +@cindex memory exhaustion +@cindex memory management @cindex stack overflow @cindex parser stack overflow @cindex overflow of parser stack -The Bison parser stack can overflow if too many tokens are shifted and +The Bison parser stack can run out of memory if too many tokens are shifted and not reduced. When this happens, the parser function @code{yyparse} -returns a nonzero value, pausing only to call @code{yyerror} to report -the overflow. +calls @code{yyerror} and then returns 2. Because Bison parsers have growing stacks, hitting the upper limit usually results from using a right recursion instead of a left @@ -5571,33 +5818,41 @@ recursion, @xref{Recursion, ,Recursive Rules}. @vindex YYMAXDEPTH By defining the macro @code{YYMAXDEPTH}, you can control how deep the -parser stack can become before a stack overflow occurs. Define the +parser stack can become before memory is exhausted. Define the macro with a value that is an integer. This value is the maximum number of tokens that can be shifted (and not reduced) before overflow. -It must be a constant expression whose value is known at compile time. The stack space allowed is not necessarily allocated. If you specify a -large value for @code{YYMAXDEPTH}, the parser actually allocates a small +large value for @code{YYMAXDEPTH}, the parser normally allocates a small stack at first, and then makes it bigger by stages as needed. This increasing allocation happens automatically and silently. Therefore, you do not need to make @code{YYMAXDEPTH} painfully small merely to save space for ordinary inputs that do not need much stack. +However, do not allow @code{YYMAXDEPTH} to be a value so large that +arithmetic overflow could occur when calculating the size of the stack +space. Also, do not allow @code{YYMAXDEPTH} to be less than +@code{YYINITDEPTH}. + @cindex default stack limit The default value of @code{YYMAXDEPTH}, if you do not define it, is 10000. @vindex YYINITDEPTH You can control how much stack is allocated initially by defining the -macro @code{YYINITDEPTH}. This value too must be a compile-time -constant integer. The default is 200. +macro @code{YYINITDEPTH} to a positive integer. For the C +@acronym{LALR}(1) parser, this value must be a compile-time constant +unless you are assuming C99 or some other target language or compiler +that allows variable-length arrays. The default is 200. + +Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}. @c FIXME: C++ output. Because of semantical differences between C and C++, the -@acronym{LALR}(1) parsers in C produced by Bison by compiled as C++ -cannot grow. In this precise case (compiling a C parser as C++) you are -suggested to grow @code{YYINITDEPTH}. In the near future, a C++ output -output will be provided which addresses this issue. +@acronym{LALR}(1) parsers in C produced by Bison cannot grow when compiled +by C++ compilers. In this precise case (compiling a C parser as C++) you are +suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix +this deficiency in a future release. @node Error Recovery @chapter Error Recovery @@ -5706,6 +5961,7 @@ The previous look-ahead token is reanalyzed immediately after an error. If this is unacceptable, then the macro @code{yyclearin} may be used to clear this token. Write the statement @samp{yyclearin;} in the error rule's action. +@xref{Action Features, ,Special Features for Use in Actions}. For example, suppose that on a syntax error, an error handling routine is called that advances the input stream to some point where parsing should @@ -5773,9 +6029,13 @@ redeclare a typedef name provided an explicit type has been specified earlier: @example -typedef int foo, bar, lose; -static foo (bar); /* @r{redeclare @code{bar} as static variable} */ -static int foo (lose); /* @r{redeclare @code{foo} as function} */ +typedef int foo, bar; +int baz (void) +@{ + static bar (bar); /* @r{redeclare @code{bar} as static variable} */ + extern foo foo (foo); /* @r{redeclare @code{foo} as function} */ + return foo (bar); +@} @end example Unfortunately, the name being declared is separated from the declaration @@ -6464,14 +6724,15 @@ bison @var{infile} Here @var{infile} is the grammar file name, which usually ends in @samp{.y}. The parser file's name is made by replacing the @samp{.y} -with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields -@file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields -@file{hack/foo.tab.c}. It's also possible, in case you are writing +with @samp{.tab.c} and removing any leading directory. Thus, the +@samp{bison foo.y} file name yields +@file{foo.tab.c}, and the @samp{bison hack/foo.y} file name yields +@file{foo.tab.c}. It's also possible, in case you are writing C++ code instead of C in your grammar file, to name it @file{foo.ypp} or @file{foo.y++}. Then, the output files will take an extension like the given one as input (respectively @file{foo.tab.cpp} and @file{foo.tab.c++}). -This feature takes effect with all options that manipulate filenames like +This feature takes effect with all options that manipulate file names like @samp{-o} or @samp{-d}. For example : @@ -6525,13 +6786,17 @@ Print a summary of the command-line options to Bison and exit. @itemx --version Print the version number of Bison and exit. -@need 1750 +@item --print-localedir +Print the name of the directory containing locale-dependent data. + @item -y @itemx --yacc -Equivalent to @samp{-o y.tab.c}; the parser output file is called +Act more like the traditional Yacc command. This can cause +different diagnostics to be generated, and may change behavior in +other minor ways. Most importantly, imitate Yacc's output +file name conventions, so that the parser output file is called @file{y.tab.c}, and the other outputs are called @file{y.output} and -@file{y.tab.h}. The purpose of this option is to imitate Yacc's output -file name conventions. Thus, the following shell script can substitute +@file{y.tab.h}. Thus, the following shell script can substitute for Yacc, and the Bison distribution contains such a script for compatibility with @acronym{POSIX}: @@ -6539,6 +6804,12 @@ compatibility with @acronym{POSIX}: #! /bin/sh bison -y "$@@" @end example + +The @option{-y}/@option{--yacc} option is intended for use with +traditional Yacc grammars. If your grammar uses a Bison extension +like @samp{%glr-parser}, Bison might not be Yacc-compatible even if +this option is specified. + @end table @noindent @@ -6618,19 +6889,17 @@ Implies @code{state} and augments the description of the automaton with the full set of items for each state, instead of its core only. @end table -For instance, on the following grammar - @item -v @itemx --verbose Pretend that @code{%verbose} was specified, i.e, write an extra output file containing verbose descriptions of the grammar and parser. @xref{Decl Summary}. -@item -o @var{filename} -@itemx --output=@var{filename} -Specify the @var{filename} for the parser file. +@item -o @var{file} +@itemx --output=@var{file} +Specify the @var{file} for the parser file. -The other output files' names are constructed from @var{filename} as +The other output files' names are constructed from @var{file} as described under the @samp{-v} and @samp{-d} options. @item -g @@ -6642,7 +6911,7 @@ be @file{foo.vcg}. @item --graph=@var{graph-file} The behavior of @var{--graph} is the same than @samp{-g}. The only difference is that it has an optional argument which is the name of -the output graph filename. +the output graph file. @end table @node Option Cross Key @@ -6664,6 +6933,7 @@ the corresponding short option. \line{ --no-lines \leaderfill -l} \line{ --no-parser \leaderfill -n} \line{ --output \leaderfill -o} +\line{ --print-localedir} \line{ --token-table \leaderfill -k} \line{ --verbose \leaderfill -v} \line{ --version \leaderfill -V} @@ -6682,6 +6952,7 @@ the corresponding short option. --no-lines -l --no-parser -n --output=@var{outfile} -o @var{outfile} +--print-localedir --token-table -k --verbose -v --version -V @@ -6715,7 +6986,713 @@ If you use the Yacc library's @code{main} function, your int yyparse (void); @end example -@c ================================================= Invoking Bison +@c ================================================= C++ Bison + +@node C++ Language Interface +@chapter C++ Language Interface + +@menu +* C++ Parsers:: The interface to generate C++ parser classes +* A Complete C++ Example:: Demonstrating their use +@end menu + +@node C++ Parsers +@section C++ Parsers + +@menu +* C++ Bison Interface:: Asking for C++ parser generation +* C++ Semantic Values:: %union vs. C++ +* C++ Location Values:: The position and location classes +* C++ Parser Interface:: Instantiating and running the parser +* C++ Scanner Interface:: Exchanges between yylex and parse +@end menu + +@node C++ Bison Interface +@subsection C++ Bison Interface +@c - %skeleton "lalr1.cc" +@c - Always pure +@c - initial action + +The C++ parser @acronym{LALR}(1) skeleton is named @file{lalr1.cc}. To select +it, you may either pass the option @option{--skeleton=lalr1.cc} to +Bison, or include the directive @samp{%skeleton "lalr1.cc"} in the +grammar preamble. When run, @command{bison} will create several +files: +@table @file +@item position.hh +@itemx location.hh +The definition of the classes @code{position} and @code{location}, +used for location tracking. @xref{C++ Location Values}. + +@item stack.hh +An auxiliary class @code{stack} used by the parser. + +@item @var{file}.hh +@itemx @var{file}.cc +The declaration and implementation of the C++ parser class. +@var{file} is the name of the output file. It follows the same +rules as with regular C parsers. + +Note that @file{@var{file}.hh} is @emph{mandatory}, the C++ cannot +work without the parser class declaration. Therefore, you must either +pass @option{-d}/@option{--defines} to @command{bison}, or use the +@samp{%defines} directive. +@end table + +All these files are documented using Doxygen; run @command{doxygen} +for a complete and accurate documentation. + +@node C++ Semantic Values +@subsection C++ Semantic Values +@c - No objects in unions +@c - YSTYPE +@c - Printer and destructor + +The @code{%union} directive works as for C, see @ref{Union Decl, ,The +Collection of Value Types}. In particular it produces a genuine +@code{union}@footnote{In the future techniques to allow complex types +within pseudo-unions (similar to Boost variants) might be implemented to +alleviate these issues.}, which have a few specific features in C++. +@itemize @minus +@item +The type @code{YYSTYPE} is defined but its use is discouraged: rather +you should refer to the parser's encapsulated type +@code{yy::parser::semantic_type}. +@item +Non POD (Plain Old Data) types cannot be used. C++ forbids any +instance of classes with constructors in unions: only @emph{pointers} +to such objects are allowed. +@end itemize + +Because objects have to be stored via pointers, memory is not +reclaimed automatically: using the @code{%destructor} directive is the +only means to avoid leaks. @xref{Destructor Decl, , Freeing Discarded +Symbols}. + + +@node C++ Location Values +@subsection C++ Location Values +@c - %locations +@c - class Position +@c - class Location +@c - %define "filename_type" "const symbol::Symbol" + +When the directive @code{%locations} is used, the C++ parser supports +location tracking, see @ref{Locations, , Locations Overview}. Two +auxiliary classes define a @code{position}, a single point in a file, +and a @code{location}, a range composed of a pair of +@code{position}s (possibly spanning several files). + +@deftypemethod {position} {std::string*} file +The name of the file. It will always be handled as a pointer, the +parser will never duplicate nor deallocate it. As an experimental +feature you may change it to @samp{@var{type}*} using @samp{%define +"filename_type" "@var{type}"}. +@end deftypemethod + +@deftypemethod {position} {unsigned int} line +The line, starting at 1. +@end deftypemethod + +@deftypemethod {position} {unsigned int} lines (int @var{height} = 1) +Advance by @var{height} lines, resetting the column number. +@end deftypemethod + +@deftypemethod {position} {unsigned int} column +The column, starting at 0. +@end deftypemethod + +@deftypemethod {position} {unsigned int} columns (int @var{width} = 1) +Advance by @var{width} columns, without changing the line number. +@end deftypemethod + +@deftypemethod {position} {position&} operator+= (position& @var{pos}, int @var{width}) +@deftypemethodx {position} {position} operator+ (const position& @var{pos}, int @var{width}) +@deftypemethodx {position} {position&} operator-= (const position& @var{pos}, int @var{width}) +@deftypemethodx {position} {position} operator- (position& @var{pos}, int @var{width}) +Various forms of syntactic sugar for @code{columns}. +@end deftypemethod + +@deftypemethod {position} {position} operator<< (std::ostream @var{o}, const position& @var{p}) +Report @var{p} on @var{o} like this: +@samp{@var{file}:@var{line}.@var{column}}, or +@samp{@var{line}.@var{column}} if @var{file} is null. +@end deftypemethod + +@deftypemethod {location} {position} begin +@deftypemethodx {location} {position} end +The first, inclusive, position of the range, and the first beyond. +@end deftypemethod + +@deftypemethod {location} {unsigned int} columns (int @var{width} = 1) +@deftypemethodx {location} {unsigned int} lines (int @var{height} = 1) +Advance the @code{end} position. +@end deftypemethod + +@deftypemethod {location} {location} operator+ (const location& @var{begin}, const location& @var{end}) +@deftypemethodx {location} {location} operator+ (const location& @var{begin}, int @var{width}) +@deftypemethodx {location} {location} operator+= (const location& @var{loc}, int @var{width}) +Various forms of syntactic sugar. +@end deftypemethod + +@deftypemethod {location} {void} step () +Move @code{begin} onto @code{end}. +@end deftypemethod + + +@node C++ Parser Interface +@subsection C++ Parser Interface +@c - define parser_class_name +@c - Ctor +@c - parse, error, set_debug_level, debug_level, set_debug_stream, +@c debug_stream. +@c - Reporting errors + +The output files @file{@var{output}.hh} and @file{@var{output}.cc} +declare and define the parser class in the namespace @code{yy}. The +class name defaults to @code{parser}, but may be changed using +@samp{%define "parser_class_name" "@var{name}"}. The interface of +this class is detailed below. It can be extended using the +@code{%parse-param} feature: its semantics is slightly changed since +it describes an additional member of the parser class, and an +additional argument for its constructor. + +@defcv {Type} {parser} {semantic_value_type} +@defcvx {Type} {parser} {location_value_type} +The types for semantics value and locations. +@end defcv + +@deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...) +Build a new parser object. There are no arguments by default, unless +@samp{%parse-param @{@var{type1} @var{arg1}@}} was used. +@end deftypemethod + +@deftypemethod {parser} {int} parse () +Run the syntactic analysis, and return 0 on success, 1 otherwise. +@end deftypemethod + +@deftypemethod {parser} {std::ostream&} debug_stream () +@deftypemethodx {parser} {void} set_debug_stream (std::ostream& @var{o}) +Get or set the stream used for tracing the parsing. It defaults to +@code{std::cerr}. +@end deftypemethod + +@deftypemethod {parser} {debug_level_type} debug_level () +@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l}) +Get or set the tracing level. Currently its value is either 0, no trace, +or nonzero, full tracing. +@end deftypemethod + +@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m}) +The definition for this member function must be supplied by the user: +the parser uses it to report a parser error occurring at @var{l}, +described by @var{m}. +@end deftypemethod + + +@node C++ Scanner Interface +@subsection C++ Scanner Interface +@c - prefix for yylex. +@c - Pure interface to yylex +@c - %lex-param + +The parser invokes the scanner by calling @code{yylex}. Contrary to C +parsers, C++ parsers are always pure: there is no point in using the +@code{%pure-parser} directive. Therefore the interface is as follows. + +@deftypemethod {parser} {int} yylex (semantic_value_type& @var{yylval}, location_type& @var{yylloc}, @var{type1} @var{arg1}, ...) +Return the next token. Its type is the return value, its semantic +value and location being @var{yylval} and @var{yylloc}. Invocations of +@samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments. +@end deftypemethod + + +@node A Complete C++ Example +@section A Complete C++ Example + +This section demonstrates the use of a C++ parser with a simple but +complete example. This example should be available on your system, +ready to compile, in the directory @dfn{../bison/examples/calc++}. It +focuses on the use of Bison, therefore the design of the various C++ +classes is very naive: no accessors, no encapsulation of members etc. +We will use a Lex scanner, and more precisely, a Flex scanner, to +demonstrate the various interaction. A hand written scanner is +actually easier to interface with. + +@menu +* Calc++ --- C++ Calculator:: The specifications +* Calc++ Parsing Driver:: An active parsing context +* Calc++ Parser:: A parser class +* Calc++ Scanner:: A pure C++ Flex scanner +* Calc++ Top Level:: Conducting the band +@end menu + +@node Calc++ --- C++ Calculator +@subsection Calc++ --- C++ Calculator + +Of course the grammar is dedicated to arithmetics, a single +expression, possibly preceded by variable assignments. An +environment containing possibly predefined variables such as +@code{one} and @code{two}, is exchanged with the parser. An example +of valid input follows. + +@example +three := 3 +seven := one + two * three +seven * seven +@end example + +@node Calc++ Parsing Driver +@subsection Calc++ Parsing Driver +@c - An env +@c - A place to store error messages +@c - A place for the result + +To support a pure interface with the parser (and the scanner) the +technique of the ``parsing context'' is convenient: a structure +containing all the data to exchange. Since, in addition to simply +launch the parsing, there are several auxiliary tasks to execute (open +the file for parsing, instantiate the parser etc.), we recommend +transforming the simple parsing context structure into a fully blown +@dfn{parsing driver} class. + +The declaration of this driver class, @file{calc++-driver.hh}, is as +follows. The first part includes the CPP guard and imports the +required standard library components, and the declaration of the parser +class. + +@comment file: calc++-driver.hh +@example +#ifndef CALCXX_DRIVER_HH +# define CALCXX_DRIVER_HH +# include +# include +# include "calc++-parser.hh" +@end example + + +@noindent +Then comes the declaration of the scanning function. Flex expects +the signature of @code{yylex} to be defined in the macro +@code{YY_DECL}, and the C++ parser expects it to be declared. We can +factor both as follows. + +@comment file: calc++-driver.hh +@example +// Announce to Flex the prototype we want for lexing function, ... +# define YY_DECL \ + int yylex (yy::calcxx_parser::semantic_type* yylval, \ + yy::calcxx_parser::location_type* yylloc, \ + calcxx_driver& driver) +// ... and declare it for the parser's sake. +YY_DECL; +@end example + +@noindent +The @code{calcxx_driver} class is then declared with its most obvious +members. + +@comment file: calc++-driver.hh +@example +// Conducting the whole scanning and parsing of Calc++. +class calcxx_driver +@{ +public: + calcxx_driver (); + virtual ~calcxx_driver (); + + std::map variables; + + int result; +@end example + +@noindent +To encapsulate the coordination with the Flex scanner, it is useful to +have two members function to open and close the scanning phase. +members. + +@comment file: calc++-driver.hh +@example + // Handling the scanner. + void scan_begin (); + void scan_end (); + bool trace_scanning; +@end example + +@noindent +Similarly for the parser itself. + +@comment file: calc++-driver.hh +@example + // Handling the parser. + void parse (const std::string& f); + std::string file; + bool trace_parsing; +@end example + +@noindent +To demonstrate pure handling of parse errors, instead of simply +dumping them on the standard error output, we will pass them to the +compiler driver using the following two member functions. Finally, we +close the class declaration and CPP guard. + +@comment file: calc++-driver.hh +@example + // Error handling. + void error (const yy::location& l, const std::string& m); + void error (const std::string& m); +@}; +#endif // ! CALCXX_DRIVER_HH +@end example + +The implementation of the driver is straightforward. The @code{parse} +member function deserves some attention. The @code{error} functions +are simple stubs, they should actually register the located error +messages and set error state. + +@comment file: calc++-driver.cc +@example +#include "calc++-driver.hh" +#include "calc++-parser.hh" + +calcxx_driver::calcxx_driver () + : trace_scanning (false), trace_parsing (false) +@{ + variables["one"] = 1; + variables["two"] = 2; +@} + +calcxx_driver::~calcxx_driver () +@{ +@} + +void +calcxx_driver::parse (const std::string &f) +@{ + file = f; + scan_begin (); + yy::calcxx_parser parser (*this); + parser.set_debug_level (trace_parsing); + parser.parse (); + scan_end (); +@} + +void +calcxx_driver::error (const yy::location& l, const std::string& m) +@{ + std::cerr << l << ": " << m << std::endl; +@} + +void +calcxx_driver::error (const std::string& m) +@{ + std::cerr << m << std::endl; +@} +@end example + +@node Calc++ Parser +@subsection Calc++ Parser + +The parser definition file @file{calc++-parser.yy} starts by asking for +the C++ LALR(1) skeleton, the creation of the parser header file, and +specifies the name of the parser class. Because the C++ skeleton +changed several times, it is safer to require the version you designed +the grammar for. + +@comment file: calc++-parser.yy +@example +%skeleton "lalr1.cc" /* -*- C++ -*- */ +%require "2.1a" +%defines +%define "parser_class_name" "calcxx_parser" +@end example + +@noindent +Then come the declarations/inclusions needed to define the +@code{%union}. Because the parser uses the parsing driver and +reciprocally, both cannot include the header of the other. Because the +driver's header needs detailed knowledge about the parser class (in +particular its inner types), it is the parser's header which will simply +use a forward declaration of the driver. + +@comment file: calc++-parser.yy +@example +%@{ +# include +class calcxx_driver; +%@} +@end example + +@noindent +The driver is passed by reference to the parser and to the scanner. +This provides a simple but effective pure interface, not relying on +global variables. + +@comment file: calc++-parser.yy +@example +// The parsing context. +%parse-param @{ calcxx_driver& driver @} +%lex-param @{ calcxx_driver& driver @} +@end example + +@noindent +Then we request the location tracking feature, and initialize the +first location's file name. Afterwards new locations are computed +relatively to the previous locations: the file name will be +automatically propagated. + +@comment file: calc++-parser.yy +@example +%locations +%initial-action +@{ + // Initialize the initial location. + @@$.begin.filename = @@$.end.filename = &driver.file; +@}; +@end example + +@noindent +Use the two following directives to enable parser tracing and verbose +error messages. + +@comment file: calc++-parser.yy +@example +%debug +%error-verbose +@end example + +@noindent +Semantic values cannot use ``real'' objects, but only pointers to +them. + +@comment file: calc++-parser.yy +@example +// Symbols. +%union +@{ + int ival; + std::string *sval; +@}; +@end example + +@noindent +The code between @samp{%@{} and @samp{%@}} after the introduction of the +@samp{%union} is output in the @file{*.cc} file; it needs detailed +knowledge about the driver. + +@comment file: calc++-parser.yy +@example +%@{ +# include "calc++-driver.hh" +%@} +@end example + + +@noindent +The token numbered as 0 corresponds to end of file; the following line +allows for nicer error messages referring to ``end of file'' instead +of ``$end''. Similarly user friendly named are provided for each +symbol. Note that the tokens names are prefixed by @code{TOKEN_} to +avoid name clashes. + +@comment file: calc++-parser.yy +@example +%token END 0 "end of file" +%token ASSIGN ":=" +%token IDENTIFIER "identifier" +%token NUMBER "number" +%type exp "expression" +@end example + +@noindent +To enable memory deallocation during error recovery, use +@code{%destructor}. + +@c FIXME: Document %printer, and mention that it takes a braced-code operand. +@comment file: calc++-parser.yy +@example +%printer @{ debug_stream () << *$$; @} "identifier" +%destructor @{ delete $$; @} "identifier" + +%printer @{ debug_stream () << $$; @} "number" "expression" +@end example + +@noindent +The grammar itself is straightforward. + +@comment file: calc++-parser.yy +@example +%% +%start unit; +unit: assignments exp @{ driver.result = $2; @}; + +assignments: assignments assignment @{@} + | /* Nothing. */ @{@}; + +assignment: "identifier" ":=" exp @{ driver.variables[*$1] = $3; @}; + +%left '+' '-'; +%left '*' '/'; +exp: exp '+' exp @{ $$ = $1 + $3; @} + | exp '-' exp @{ $$ = $1 - $3; @} + | exp '*' exp @{ $$ = $1 * $3; @} + | exp '/' exp @{ $$ = $1 / $3; @} + | "identifier" @{ $$ = driver.variables[*$1]; @} + | "number" @{ $$ = $1; @}; +%% +@end example + +@noindent +Finally the @code{error} member function registers the errors to the +driver. + +@comment file: calc++-parser.yy +@example +void +yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l, + const std::string& m) +@{ + driver.error (l, m); +@} +@end example + +@node Calc++ Scanner +@subsection Calc++ Scanner + +The Flex scanner first includes the driver declaration, then the +parser's to get the set of defined tokens. + +@comment file: calc++-scanner.ll +@example +%@{ /* -*- C++ -*- */ +# include +# include +# include +# include +# include "calc++-driver.hh" +# include "calc++-parser.hh" +%@} +@end example + +@noindent +Because there is no @code{#include}-like feature we don't need +@code{yywrap}, we don't need @code{unput} either, and we parse an +actual file, this is not an interactive session with the user. +Finally we enable the scanner tracing features. + +@comment file: calc++-scanner.ll +@example +%option noyywrap nounput batch debug +@end example + +@noindent +Abbreviations allow for more readable rules. + +@comment file: calc++-scanner.ll +@example +id [a-zA-Z][a-zA-Z_0-9]* +int [0-9]+ +blank [ \t] +@end example + +@noindent +The following paragraph suffices to track locations accurately. Each +time @code{yylex} is invoked, the begin position is moved onto the end +position. Then when a pattern is matched, the end position is +advanced of its width. In case it matched ends of lines, the end +cursor is adjusted, and each time blanks are matched, the begin cursor +is moved onto the end cursor to effectively ignore the blanks +preceding tokens. Comments would be treated equally. + +@comment file: calc++-scanner.ll +@example +%@{ +# define YY_USER_ACTION yylloc->columns (yyleng); +%@} +%% +%@{ + yylloc->step (); +%@} +@{blank@}+ yylloc->step (); +[\n]+ yylloc->lines (yyleng); yylloc->step (); +@end example + +@noindent +The rules are simple, just note the use of the driver to report errors. +It is convenient to use a typedef to shorten +@code{yy::calcxx_parser::token::identifier} into +@code{token::identifier} for instance. + +@comment file: calc++-scanner.ll +@example +%@{ + typedef yy::calcxx_parser::token token; +%@} + +[-+*/] return yytext[0]; +":=" return token::ASSIGN; +@{int@} @{ + errno = 0; + long n = strtol (yytext, NULL, 10); + if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) + driver.error (*yylloc, "integer is out of range"); + yylval->ival = n; + return token::NUMBER; +@} +@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; +. driver.error (*yylloc, "invalid character"); +%% +@end example + +@noindent +Finally, because the scanner related driver's member function depend +on the scanner's data, it is simpler to implement them in this file. + +@comment file: calc++-scanner.ll +@example +void +calcxx_driver::scan_begin () +@{ + yy_flex_debug = trace_scanning; + if (!(yyin = fopen (file.c_str (), "r"))) + error (std::string ("cannot open ") + file); +@} + +void +calcxx_driver::scan_end () +@{ + fclose (yyin); +@} +@end example + +@node Calc++ Top Level +@subsection Calc++ Top Level + +The top level file, @file{calc++.cc}, poses no problem. + +@comment file: calc++.cc +@example +#include +#include "calc++-driver.hh" + +int +main (int argc, char *argv[]) +@{ + calcxx_driver driver; + for (++argv; argv[0]; ++argv) + if (*argv == std::string ("-p")) + driver.trace_parsing = true; + else if (*argv == std::string ("-s")) + driver.trace_scanning = true; + else + @{ + driver.parse (*argv); + std::cout << driver.result << std::endl; + @} +@} +@end example + +@c ================================================= FAQ @node FAQ @chapter Frequently Asked Questions @@ -6726,18 +7703,17 @@ Several questions about Bison come up occasionally. Here some of them are addressed. @menu -* Parser Stack Overflow:: Breaking the Stack Limits +* Memory Exhausted:: Breaking the Stack Limits * How Can I Reset the Parser:: @code{yyparse} Keeps some State * Strings are Destroyed:: @code{yylval} Loses Track of Strings -* C++ Parsers:: Compiling Parsers with C++ Compilers * Implementing Gotos/Loops:: Control Flow in the Calculator @end menu -@node Parser Stack Overflow -@section Parser Stack Overflow +@node Memory Exhausted +@section Memory Exhausted @display -My parser returns with error with a @samp{parser stack overflow} +My parser returns with error with a @samp{memory exhausted} message. What can I do? @end display @@ -6894,27 +7870,6 @@ $ @kbd{printf 'one\ntwo\n' | ./split-lines} @end example -@node C++ Parsers -@section C++ Parsers - -@display -How can I generate parsers in C++? -@end display - -We are working on a C++ output for Bison, but unfortunately, for lack of -time, the skeleton is not finished. It is functional, but in numerous -respects, it will require additional work which @emph{might} break -backward compatibility. Since the skeleton for C++ is not documented, -we do not consider ourselves bound to this interface, nevertheless, as -much as possible we will try to keep compatibility. - -Another possibility is to use the regular C parsers, and to compile them -with a C++ compiler. This works properly, provided that you bear some -simple C++ rules in mind, such as not including ``real classes'' (i.e., -structure with constructors) in unions. Therefore, in the -@code{%union}, use pointers to classes. - - @node Implementing Gotos/Loops @section Implementing Gotos/Loops @@ -7029,7 +7984,7 @@ Bison declaration to create a header file meant for the scanner. @end deffn @deffn {Directive} %destructor -Specifying how the parser should reclaim the memory associated to +Specify how the parser should reclaim the memory associated to discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}. @end deffn @@ -7110,11 +8065,11 @@ parser file. @xref{Decl Summary}. @end deffn @deffn {Directive} %nonassoc -Bison declaration to assign non-associativity to token(s). +Bison declaration to assign nonassociativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn -@deffn {Directive} %output="@var{filename}" +@deffn {Directive} %output="@var{file}" Bison declaration to set the name of the parser file. @xref{Decl Summary}. @end deffn @@ -7135,6 +8090,11 @@ Bison declaration to request a pure (reentrant) parser. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. @end deffn +@deffn {Directive} %require "@var{version}" +Require version @var{version} or higher of Bison. @xref{Require Decl, , +Require a Version of Bison}. +@end deffn + @deffn {Directive} %right Bison declaration to assign right associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @@ -7190,7 +8150,7 @@ token. @xref{Action Features, ,Special Features for Use in Actions}. @end deffn @deffn {Variable} yychar -External integer variable that contains the integer value of the current +External integer variable that contains the integer value of the look-ahead token. (In a pure parser, it is a local variable within @code{yyparse}.) Error-recovery rule actions may examine this variable. @xref{Action Features, ,Special Features for Use in Actions}. @@ -7240,7 +8200,7 @@ use for @code{YYERROR_VERBOSE}, just whether you define it. Using @deffn {Macro} YYINITDEPTH Macro for specifying the initial size of the parser stack. -@xref{Stack Overflow}. +@xref{Memory Management}. @end deffn @deffn {Function} yylex @@ -7251,7 +8211,7 @@ the next token. @xref{Lexical, ,The Lexical Analyzer Function @deffn {Macro} YYLEX_PARAM An obsolete macro for specifying an extra argument (or list of extra -arguments) for @code{yyparse} to pass to @code{yylex}. he use of this +arguments) for @code{yyparse} to pass to @code{yylex}. The use of this macro is deprecated, and is supported only for Yacc like parsers. @xref{Pure Calling,, Calling Conventions for Pure Parsers}. @end deffn @@ -7260,9 +8220,12 @@ macro is deprecated, and is supported only for Yacc like parsers. External variable in which @code{yylex} should place the line and column numbers associated with a token. (In a pure parser, it is a local variable within @code{yyparse}, and its address is passed to -@code{yylex}.) You can ignore this variable if you don't use the -@samp{@@} feature in the grammar actions. @xref{Token Locations, -,Textual Locations of Tokens}. +@code{yylex}.) +You can ignore this variable if you don't use the @samp{@@} feature in the +grammar actions. +@xref{Token Locations, ,Textual Locations of Tokens}. +In semantic actions, it stores the location of the look-ahead token. +@xref{Actions and Locations, ,Actions and Locations}. @end deffn @deffn {Type} YYLTYPE @@ -7274,16 +8237,19 @@ members. @xref{Location Type, , Data Types of Locations}. External variable in which @code{yylex} should place the semantic value associated with a token. (In a pure parser, it is a local variable within @code{yyparse}, and its address is passed to -@code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}. +@code{yylex}.) +@xref{Token Values, ,Semantic Values of Tokens}. +In semantic actions, it stores the semantic value of the look-ahead token. +@xref{Actions, ,Actions}. @end deffn @deffn {Macro} YYMAXDEPTH -Macro for specifying the maximum size of the parser stack. @xref{Stack -Overflow}. +Macro for specifying the maximum size of the parser stack. @xref{Memory +Management}. @end deffn @deffn {Variable} yynerrs -Global variable which Bison increments each time there is a syntax error. +Global variable which Bison increments each time it reports a syntax error. (In a pure parser, it is a local variable within @code{yyparse}.) @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @@ -7306,10 +8272,20 @@ syntax error. @xref{Action Features, ,Special Features for Use in Actions}. @end deffn @deffn {Macro} YYSTACK_USE_ALLOCA -Macro used to control the use of @code{alloca}. If defined to @samp{0}, -the parser will not use @code{alloca} but @code{malloc} when trying to -grow its internal stacks. Do @emph{not} define @code{YYSTACK_USE_ALLOCA} -to anything else. +Macro used to control the use of @code{alloca} when the C +@acronym{LALR}(1) parser needs to extend its stacks. If defined to 0, +the parser will use @code{malloc} to extend its stacks. If defined to +1, the parser will use @code{alloca}. Values other than 0 and 1 are +reserved for future Bison extensions. If not defined, +@code{YYSTACK_USE_ALLOCA} defaults to 0. + +In the all-too-common case where your code may run on a host with a +limited stack and with unreliable stack-overflow checking, you should +set @code{YYMAXDEPTH} to a value that cannot possibly result in +unchecked stack overflow on any of your target hosts when +@code{alloca} is called. You can inspect the code that Bison +generates in order to determine the proper numeric values. This will +require some expertise in low-level implementation details. @end deffn @deffn {Type} YYSTYPE @@ -7525,7 +8501,7 @@ grammatically indivisible. The piece of text it represents is a token. @c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES @c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param @c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP -@c LocalWords: YYEMPTY YYRECOVERING yyclearin GE def UMINUS maybeword +@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword @c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH @c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm VCG notype @c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args