X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/040984073a54b4c603172be3c3f44b908ea5deb9..668c5d192776326107790e8d622dfa7ddd015ea3:/doc/bison.texinfo diff --git a/doc/bison.texinfo b/doc/bison.texinfo index d2264621..671d1c4e 100644 --- a/doc/bison.texinfo +++ b/doc/bison.texinfo @@ -145,9 +145,9 @@ The Concepts of Bison Writing @acronym{GLR} Parsers -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler Examples @@ -225,6 +225,7 @@ Tracking Locations Bison Declarations +* Require Decl:: Requiring a Bison version. * Token Decl:: Declaring terminal symbols. * Precedence Decl:: Declaring terminals with precedence and associativity. * Union Decl:: Declaring the set of all semantic value types. @@ -460,7 +461,7 @@ more information on this. @cindex @acronym{GLR} parsing @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars -@cindex non-deterministic parsing +@cindex nondeterministic parsing Parsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaning roughly that the next grammar rule to apply at any point in the input is @@ -468,7 +469,7 @@ uniquely determined by the preceding input and a fixed, finite portion (called a @dfn{look-ahead}) of the remaining input. A context-free grammar can be @dfn{ambiguous}, meaning that there are multiple ways to apply the grammar rules to get the same inputs. Even unambiguous -grammars can be @dfn{non-deterministic}, meaning that no fixed +grammars can be @dfn{nondeterministic}, meaning that no fixed look-ahead always suffices to determine the next grammar rule to apply. With the proper declarations, Bison is also able to parse these more general context-free grammars, using a technique known as @acronym{GLR} @@ -732,9 +733,9 @@ user-defined function on the resulting values to produce an arbitrary merged result. @menu -* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars -* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities -* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler +* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars +* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities +* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler @end menu @node Simple GLR Parsers @@ -909,29 +910,27 @@ parser recognizes all valid declarations, according to the limited syntax above, transparently. In fact, the user does not even notice when the parser splits. -So here we have a case where we can use the benefits of @acronym{GLR}, almost -without disadvantages. Even in simple cases like this, however, there -are at least two potential problems to beware. -First, always analyze the conflicts reported by -Bison to make sure that @acronym{GLR} splitting is only done where it is -intended. A @acronym{GLR} parser splitting inadvertently may cause -problems less obvious than an @acronym{LALR} parser statically choosing the -wrong alternative in a conflict. -Second, consider interactions with the lexer (@pxref{Semantic Tokens}) -with great care. Since a split parser consumes tokens -without performing any actions during the split, the lexer cannot -obtain information via parser actions. Some cases of -lexer interactions can be eliminated by using @acronym{GLR} to -shift the complications from the lexer to the parser. You must check -the remaining cases for correctness. - -In our example, it would be safe for the lexer to return tokens -based on their current meanings in some symbol table, because no new -symbols are defined in the middle of a type declaration. Though it -is possible for a parser to define the enumeration -constants as they are parsed, before the type declaration is -completed, it actually makes no difference since they cannot be used -within the same enumerated type declaration. +So here we have a case where we can use the benefits of @acronym{GLR}, +almost without disadvantages. Even in simple cases like this, however, +there are at least two potential problems to beware. First, always +analyze the conflicts reported by Bison to make sure that @acronym{GLR} +splitting is only done where it is intended. A @acronym{GLR} parser +splitting inadvertently may cause problems less obvious than an +@acronym{LALR} parser statically choosing the wrong alternative in a +conflict. Second, consider interactions with the lexer (@pxref{Semantic +Tokens}) with great care. Since a split parser consumes tokens without +performing any actions during the split, the lexer cannot obtain +information via parser actions. Some cases of lexer interactions can be +eliminated by using @acronym{GLR} to shift the complications from the +lexer to the parser. You must check the remaining cases for +correctness. + +In our example, it would be safe for the lexer to return tokens based on +their current meanings in some symbol table, because no new symbols are +defined in the middle of a type declaration. Though it is possible for +a parser to define the enumeration constants as they are parsed, before +the type declaration is completed, it actually makes no difference since +they cannot be used within the same enumerated type declaration. @node Merging GLR Parses @subsection Using @acronym{GLR} to Resolve Ambiguities @@ -1197,11 +1196,13 @@ function @code{yyerror} and the parser function @code{yyparse} itself. This also includes numerous identifiers used for internal purposes. Therefore, you should avoid using C identifiers starting with @samp{yy} or @samp{YY} in the Bison grammar file except for the ones defined in -this manual. +this manual. Also, you should avoid using the C identifiers +@samp{malloc} and @samp{free} for anything other than their usual +meanings. In some cases the Bison parser file includes system headers, and in those cases your code should respect the identifiers reserved by those -headers. On some non-@acronym{GNU} hosts, @code{}, +headers. On some non-@acronym{GNU} hosts, @code{}, @code{}, @code{}, and @code{} are included as needed to declare memory allocators and related types. @code{} is included if message translation is in use @@ -1716,12 +1717,12 @@ With all the source in a single file, you use the following command to convert it into a parser file: @example -bison @var{file_name}.y +bison @var{file}.y @end example @noindent In this example the file was called @file{rpcalc.y} (for ``Reverse Polish -@sc{calc}ulator''). Bison produces a file named @file{@var{file_name}.tab.c}, +@sc{calc}ulator''). Bison produces a file named @file{@var{file}.tab.c}, removing the @samp{.y} from the original file name. The file output by Bison contains the source code for @code{yyparse}. The additional functions in the input file (@code{yylex}, @code{yyerror} and @code{main}) @@ -2123,7 +2124,7 @@ as @code{sin}, @code{cos}, etc. It is easy to add new operators to the infix calculator as long as they are only single-character literals. The lexical analyzer @code{yylex} passes -back all nonnumber characters as tokens, so new grammar rules suffice for +back all nonnumeric characters as tokens, so new grammar rules suffice for adding a new operator. But we want something more flexible: built-in functions whose syntax has this form: @@ -2408,7 +2409,7 @@ getsym (char const *sym_name) The function @code{yylex} must now recognize variables, numeric values, and the single-character arithmetic operators. Strings of alphanumeric -characters with a leading non-digit are recognized as either variables or +characters with a leading letter are recognized as either variables or functions depending on what the symbol table says about them. The string is passed to @code{getsym} for look up in the symbol table. If @@ -2582,13 +2583,13 @@ continues until end of line. @cindex Prologue @cindex declarations -The @var{Prologue} section contains macro definitions and -declarations of functions and variables that are used in the actions in the -grammar rules. These are copied to the beginning of the parser file so -that they precede the definition of @code{yyparse}. You can use -@samp{#include} to get the declarations from a header file. If you don't -need any C declarations, you may omit the @samp{%@{} and @samp{%@}} -delimiters that bracket this section. +The @var{Prologue} section contains macro definitions and declarations +of functions and variables that are used in the actions in the grammar +rules. These are copied to the beginning of the parser file so that +they precede the definition of @code{yyparse}. You can use +@samp{#include} to get the declarations from a header file. If you +don't need any C declarations, you may omit the @samp{%@{} and +@samp{%@}} delimiters that bracket this section. You may have more than one @var{Prologue} section, intermixed with the @var{Bison declarations}. This allows you to have C and Bison @@ -2658,10 +2659,10 @@ even if you define them in the Epilogue. If the last section is empty, you may omit the @samp{%%} that separates it from the grammar rules. -The Bison parser itself contains many macros and identifiers whose -names start with @samp{yy} or @samp{YY}, so it is a -good idea to avoid using any such names (except those documented in this -manual) in the epilogue of the grammar file. +The Bison parser itself contains many macros and identifiers whose names +start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using +any such names (except those documented in this manual) in the epilogue +of the grammar file. @node Symbols @section Symbols, Terminal and Nonterminal @@ -2677,13 +2678,13 @@ A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a class of syntactically equivalent tokens. You use the symbol in grammar rules to mean that a token in that class is allowed. The symbol is represented in the Bison parser by a numeric code, and the @code{yylex} -function returns a token type code to indicate what kind of token has been -read. You don't need to know what the code value is; you can use the -symbol to stand for it. +function returns a token type code to indicate what kind of token has +been read. You don't need to know what the code value is; you can use +the symbol to stand for it. -A @dfn{nonterminal symbol} stands for a class of syntactically equivalent -groupings. The symbol name is used in writing grammar rules. By convention, -it should be all lower case. +A @dfn{nonterminal symbol} stands for a class of syntactically +equivalent groupings. The symbol name is used in writing grammar rules. +By convention, it should be all lower case. Symbol names can contain letters, digits (not at the beginning), underscores and periods. Periods make sense only in nonterminals. @@ -2779,7 +2780,7 @@ into a separate header file @file{@var{name}.tab.h} which you can include in the other source files that need it. @xref{Invocation, ,Invoking Bison}. If you want to write a grammar that is portable to any Standard C -host, you must use only non-null character tokens taken from the basic +host, you must use only nonnull character tokens taken from the basic execution character set of Standard C@. This set consists of the ten digits, the 52 lower- and upper-case English letters, and the characters in the following C-language string: @@ -2788,17 +2789,17 @@ characters in the following C-language string: "\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~" @end example -The @code{yylex} function and Bison must use a consistent character -set and encoding for character tokens. For example, if you run Bison in an -@acronym{ASCII} environment, but then compile and run the resulting program -in an environment that uses an incompatible character set like -@acronym{EBCDIC}, the resulting program may not work because the -tables generated by Bison will assume @acronym{ASCII} numeric values for -character tokens. It is standard -practice for software distributions to contain C source files that -were generated by Bison in an @acronym{ASCII} environment, so installers on -platforms that are incompatible with @acronym{ASCII} must rebuild those -files before compiling them. +The @code{yylex} function and Bison must use a consistent character set +and encoding for character tokens. For example, if you run Bison in an +@acronym{ASCII} environment, but then compile and run the resulting +program in an environment that uses an incompatible character set like +@acronym{EBCDIC}, the resulting program may not work because the tables +generated by Bison will assume @acronym{ASCII} numeric values for +character tokens. It is standard practice for software distributions to +contain C source files that were generated by Bison in an +@acronym{ASCII} environment, so installers on platforms that are +incompatible with @acronym{ASCII} must rebuild those files before +compiling them. The symbol @code{error} is a terminal symbol reserved for error recovery (@pxref{Error Recovery}); you shouldn't use it for any other purpose. @@ -2905,10 +2906,10 @@ with no components. @section Recursive Rules @cindex recursive rule -A rule is called @dfn{recursive} when its @var{result} nonterminal appears -also on its right hand side. Nearly all Bison grammars need to use -recursion, because that is the only way to define a sequence of any number -of a particular thing. Consider this recursive definition of a +A rule is called @dfn{recursive} when its @var{result} nonterminal +appears also on its right hand side. Nearly all Bison grammars need to +use recursion, because that is the only way to define a sequence of any +number of a particular thing. Consider this recursive definition of a comma-separated sequence of one or more expressions: @example @@ -3022,8 +3023,9 @@ This macro definition must go in the prologue of the grammar file In most programs, you will need different data types for different kinds of tokens and groupings. For example, a numeric constant may need type -@code{int} or @code{long int}, while a string constant needs type @code{char *}, -and an identifier might need a pointer to an entry in the symbol table. +@code{int} or @code{long int}, while a string constant needs type +@code{char *}, and an identifier might need a pointer to an entry in the +symbol table. To use more than one data type for semantic values in one parser, Bison requires you to do two things: @@ -3546,6 +3548,7 @@ it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}). @menu +* Require Decl:: Requiring a Bison version. * Token Decl:: Declaring terminal symbols. * Precedence Decl:: Declaring terminals with precedence and associativity. * Union Decl:: Declaring the set of all semantic value types. @@ -3558,6 +3561,20 @@ Grammars}). * Decl Summary:: Table of all Bison declarations. @end menu +@node Require Decl +@subsection Require a Version of Bison +@cindex version requirement +@cindex requiring a version of Bison +@findex %require + +You may require the minimum version of Bison to process the grammar. If +the requirement is not met, @command{bison} exits with an error (exit +status 63). + +@example +%require "@var{version}" +@end example + @node Token Decl @subsection Token Type Names @cindex declaring token type names @@ -3779,10 +3796,10 @@ Declare that the @var{code} must be invoked before parsing each time For instance, if your locations use a file name, you may use @example -%parse-param @{ const char *filename @}; +%parse-param @{ char const *file_name @}; %initial-action @{ - @@$.begin.filename = @@$.end.filename = filename; + @@$.initialize (file_name); @}; @end example @@ -3792,32 +3809,28 @@ For instance, if your locations use a file name, you may use @cindex freeing discarded symbols @findex %destructor -Some symbols can be discarded by the parser. During error -recovery (@pxref{Error Recovery}), symbols already pushed -on the stack and tokens coming from the rest of the file -are discarded until the parser falls on its feet. If the parser -runs out of memory, all the symbols on the stack must be discarded. -Even if the parser succeeds, it must discard the start symbol. +During error recovery (@pxref{Error Recovery}), symbols already pushed +on the stack and tokens coming from the rest of the file are discarded +until the parser falls on its feet. If the parser runs out of memory, +or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the +symbols on the stack must be discarded. Even if the parser succeeds, it +must discard the start symbol. When discarded symbols convey heap based information, this memory is lost. While this behavior can be tolerable for batch parsers, such as -in traditional compilers, it is unacceptable for programs like shells -or protocol implementations that may parse and execute indefinitely. +in traditional compilers, it is unacceptable for programs like shells or +protocol implementations that may parse and execute indefinitely. -The @code{%destructor} directive defines code that -is called when a symbol is discarded. +The @code{%destructor} directive defines code that is called when a +symbol is automatically discarded. @deffn {Directive} %destructor @{ @var{code} @} @var{symbols} @findex %destructor -Invoke @var{code} whenever the parser discards one of the -@var{symbols}. Within @var{code}, @code{$$} designates the semantic -value associated with the discarded symbol. The additional -parser parameters are also available -(@pxref{Parser Function, , The Parser Function @code{yyparse}}). - -@strong{Warning:} as of Bison 2.1, this feature is still -experimental, as there has not been enough user feedback. In particular, -the syntax might still change. +Invoke @var{code} whenever the parser discards one of the @var{symbols}. +Within @var{code}, @code{$$} designates the semantic value associated +with the discarded symbol. The additional parser parameters are also +available (@pxref{Parser Function, , The Parser Function +@code{yyparse}}). @end deffn For instance: @@ -3836,24 +3849,6 @@ For instance: guarantees that when a @code{STRING} or a @code{string} is discarded, its associated memory will be freed. -Note that in the future, Bison might also consider that right hand side -members that are not mentioned in the action can be destroyed. For -instance, in: - -@smallexample -comment: "/*" STRING "*/"; -@end smallexample - -@noindent -the parser is entitled to destroy the semantic value of the -@code{string}. Of course, this will not apply to the default action; -compare: - -@smallexample -typeless: string; // $$ = $1 does not apply; $1 is destroyed. -typefull: string; // $$ = $1 applies, $1 is not destroyed. -@end smallexample - @sp 1 @cindex discarded symbols @@ -3865,13 +3860,20 @@ stacked symbols popped during the first phase of error recovery, @item incoming terminals during the second phase of error recovery, @item -the current look-ahead and the entire stack when the parser aborts -(either via an explicit call to @code{YYABORT}, or as a consequence of -a failed error recovery or of memory exhaustion), and +the current look-ahead and the entire stack (except the current +right-hand side symbols) when the parser returns immediately, and @item the start symbol, when the parser succeeds. @end itemize +The parser can @dfn{return immediately} because of an explicit call to +@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory +exhaustion. + +Right-hand size symbols of a rule that explicitly triggers a syntax +error via @code{YYERROR} are not discarded automatically. As a rule +of thumb, destructors are invoked only when user actions cannot manage +the memory. @node Expect Decl @subsection Suppressing Conflict Warnings @@ -3895,19 +3897,18 @@ The declaration looks like this: %expect @var{n} @end example -Here @var{n} is a decimal integer. The declaration says there should be -no warning if there are @var{n} shift/reduce conflicts and no -reduce/reduce conflicts. The usual warning is -given if there are either more or fewer conflicts, or if there are any -reduce/reduce conflicts. +Here @var{n} is a decimal integer. The declaration says there should +be @var{n} shift/reduce conflicts and no reduce/reduce conflicts. +Bison reports an error if the number of shift/reduce conflicts differs +from @var{n}, or if there are any reduce/reduce conflicts. -For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more serious, -and should be eliminated entirely. Bison will always report -reduce/reduce conflicts for these parsers. With @acronym{GLR} parsers, however, -both shift/reduce and reduce/reduce are routine (otherwise, there -would be no need to use @acronym{GLR} parsing). Therefore, it is also possible -to specify an expected number of reduce/reduce conflicts in @acronym{GLR} -parsers, using the declaration: +For normal @acronym{LALR}(1) parsers, reduce/reduce conflicts are more +serious, and should be eliminated entirely. Bison will always report +reduce/reduce conflicts for these parsers. With @acronym{GLR} +parsers, however, both kinds of conflicts are routine; otherwise, +there would be no need to use @acronym{GLR} parsing. Therefore, it is +also possible to specify an expected number of reduce/reduce conflicts +in @acronym{GLR} parsers, using the declaration: @example %expect-rr @var{n} @@ -3928,12 +3929,12 @@ go back to the beginning. @item Add an @code{%expect} declaration, copying the number @var{n} from the -number which Bison printed. +number which Bison printed. With @acronym{GLR} parsers, add an +@code{%expect-rr} declaration as well. @end itemize -Now Bison will stop annoying you if you do not change the number of -conflicts, but it will warn you again if changes in the grammar result -in more or fewer conflicts. +Now Bison will warn you if you introduce an unexpected conflict, but +will keep silent otherwise. @node Start Decl @subsection The Start-Symbol @@ -3959,8 +3960,8 @@ may override this restriction with the @code{%start} declaration as follows: A @dfn{reentrant} program is one which does not alter in the course of execution; in other words, it consists entirely of @dfn{pure} (read-only) code. Reentrancy is important whenever asynchronous execution is possible; -for example, a non-reentrant program may not be safe to call from a signal -handler. In systems with multiple threads of control, a non-reentrant +for example, a nonreentrant program may not be safe to call from a signal +handler. In systems with multiple threads of control, a nonreentrant program must be called only within interlocks. Normally, Bison generates a parser which is not reentrant. This is @@ -4066,13 +4067,12 @@ is named @file{@var{name}.h}. Unless @code{YYSTYPE} is already defined as a macro, the output header declares @code{YYSTYPE}. Therefore, if you are using a @code{%union} -(@pxref{Multiple Types, ,More Than One Value Type}) with components -that require other definitions, or if you have defined a -@code{YYSTYPE} macro (@pxref{Value Type, ,Data Types of Semantic -Values}), you need to arrange for these definitions to be propagated to -all modules, e.g., by putting them in a -prerequisite header that is included both by your parser and by any -other module that needs @code{YYSTYPE}. +(@pxref{Multiple Types, ,More Than One Value Type}) with components that +require other definitions, or if you have defined a @code{YYSTYPE} macro +(@pxref{Value Type, ,Data Types of Semantic Values}), you need to +arrange for these definitions to be propagated to all modules, e.g., by +putting them in a prerequisite header that is included both by your +parser and by any other module that needs @code{YYSTYPE}. Unless your parser is pure, the output header declares @code{yylval} as an external variable. @xref{Pure Decl, ,A Pure (Reentrant) @@ -4083,11 +4083,11 @@ If you have also used locations, the output header declares @code{YYSTYPE} and @code{yylval}. @xref{Locations, ,Tracking Locations}. -This output file is normally essential if you wish to put the -definition of @code{yylex} in a separate source file, because -@code{yylex} typically needs to be able to refer to the -above-mentioned declarations and to the token type codes. -@xref{Token Values, ,Semantic Values of Tokens}. +This output file is normally essential if you wish to put the definition +of @code{yylex} in a separate source file, because @code{yylex} +typically needs to be able to refer to the above-mentioned declarations +and to the token type codes. @xref{Token Values, ,Semantic Values of +Tokens}. @end deffn @deffn {Directive} %destructor @@ -4133,7 +4133,7 @@ parser file contains just @code{#define} directives and static variable declarations. This option also tells Bison to write the C code for the grammar actions -into a file named @file{@var{filename}.act}, in the form of a +into a file named @file{@var{file}.act}, in the form of a brace-surrounded body fit for a @code{switch} statement. @end deffn @@ -4146,8 +4146,8 @@ associate errors with the parser file, treating it an independent source file in its own right. @end deffn -@deffn {Directive} %output="@var{filename}" -Specify the @var{filename} for the parser file. +@deffn {Directive} %output="@var{file}" +Specify @var{file} for the parser file. @end deffn @deffn {Directive} %pure-parser @@ -4155,6 +4155,11 @@ Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). @end deffn +@deffn {Directive} %require "@var{version}" +Require version @var{version} or higher of Bison. @xref{Require Decl, , +Require a Version of Bison}. +@end deffn + @deffn {Directive} %token-table Generate an array of token names in the parser file. The name of the array is @code{yytname}; @code{yytname[@var{i}]} is the name of the @@ -4271,7 +4276,11 @@ without reading further. The value returned by @code{yyparse} is 0 if parsing was successful (return is due to end-of-input). -The value is 1 if parsing failed (return is due to a syntax error). +The value is 1 if parsing failed because of invalid input, i.e., input +that contains a syntax error or that causes @code{YYABORT} to be +invoked. + +The value is 2 if parsing failed due to memory exhaustion. @end deftypefun In an action, you can cause immediate return from @code{yyparse} by using @@ -4441,7 +4450,7 @@ The @code{yytname} table is generated only if you use the @subsection Semantic Values of Tokens @vindex yylval -In an ordinary (non-reentrant) parser, the semantic value of the token must +In an ordinary (nonreentrant) parser, the semantic value of the token must be stored into the global variable @code{yylval}. When you are using just one data type for semantic values, @code{yylval} has that type. Thus, if the type is @code{int} (the default), you might write this in @@ -4489,12 +4498,11 @@ then the code in @code{yylex} might look like this: @vindex yylloc If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, , -Tracking Locations}) in actions to keep track of the -textual locations of tokens and groupings, then you must provide this -information in @code{yylex}. The function @code{yyparse} expects to -find the textual location of a token just parsed in the global variable -@code{yylloc}. So @code{yylex} must store the proper data in that -variable. +Tracking Locations}) in actions to keep track of the textual locations +of tokens and groupings, then you must provide this information in +@code{yylex}. The function @code{yyparse} expects to find the textual +location of a token just parsed in the global variable @code{yylloc}. +So @code{yylex} must store the proper data in that variable. By default, the value of @code{yylloc} is a structure and you need only initialize the members that are going to be used by the actions. The @@ -4689,7 +4697,7 @@ preferable since it more accurately describes the return type for @vindex yynerrs The variable @code{yynerrs} contains the number of syntax errors -encountered so far. Normally this variable is global; but if you +reported so far. Normally this variable is global; but if you request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) then it is a local variable which only the actions can access. @@ -4831,12 +4839,11 @@ Tracking Locations}. A Bison-generated parser can print diagnostics, including error and tracing messages. By default, they appear in English. However, Bison -also supports outputting diagnostics in the user's native language. -To make this work, the user should set the usual environment -variables. @xref{Users, , The User's View, gettext, GNU -@code{gettext} utilities}. For -example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might set -the user's locale to French Canadian using the @acronym{UTF}-8 +also supports outputting diagnostics in the user's native language. To +make this work, the user should set the usual environment variables. +@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}. +For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might +set the user's locale to French Canadian using the @acronym{UTF}-8 encoding. The exact set of available locales depends on the user's installation. @@ -5601,7 +5608,7 @@ pp.@: 615--649 @uref{http://doi.acm.org/10.1145/69622.357187}. @cindex @acronym{GLR} parsing @cindex generalized @acronym{LR} (@acronym{GLR}) parsing @cindex ambiguous grammars -@cindex non-deterministic parsing +@cindex nondeterministic parsing Bison produces @emph{deterministic} parsers that choose uniquely when to reduce and which reduction to apply @@ -5666,10 +5673,10 @@ quadratic worst-case time, and any general (possibly ambiguous) context-free grammar in cubic worst-case time. However, Bison currently uses a simpler data structure that requires time proportional to the length of the input times the maximum number of stacks required for any -prefix of the input. Thus, really ambiguous or non-deterministic +prefix of the input. Thus, really ambiguous or nondeterministic grammars can require exponential time and space to process. Such badly behaving examples, however, are not generally of practical interest. -Usually, non-determinism in a grammar is local---the parser is ``in +Usually, nondeterminism in a grammar is local---the parser is ``in doubt'' only for a few tokens at a time. Therefore, the current data structure should generally be adequate. On @acronym{LALR}(1) portions of a grammar, in particular, it is only slightly slower than with the default @@ -6605,14 +6612,15 @@ bison @var{infile} Here @var{infile} is the grammar file name, which usually ends in @samp{.y}. The parser file's name is made by replacing the @samp{.y} -with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields -@file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields -@file{hack/foo.tab.c}. It's also possible, in case you are writing +with @samp{.tab.c} and removing any leading directory. Thus, the +@samp{bison foo.y} file name yields +@file{foo.tab.c}, and the @samp{bison hack/foo.y} file name yields +@file{foo.tab.c}. It's also possible, in case you are writing C++ code instead of C in your grammar file, to name it @file{foo.ypp} or @file{foo.y++}. Then, the output files will take an extension like the given one as input (respectively @file{foo.tab.cpp} and @file{foo.tab.c++}). -This feature takes effect with all options that manipulate filenames like +This feature takes effect with all options that manipulate file names like @samp{-o} or @samp{-d}. For example : @@ -6770,11 +6778,11 @@ Pretend that @code{%verbose} was specified, i.e, write an extra output file containing verbose descriptions of the grammar and parser. @xref{Decl Summary}. -@item -o @var{filename} -@itemx --output=@var{filename} -Specify the @var{filename} for the parser file. +@item -o @var{file} +@itemx --output=@var{file} +Specify the @var{file} for the parser file. -The other output files' names are constructed from @var{filename} as +The other output files' names are constructed from @var{file} as described under the @samp{-v} and @samp{-d} options. @item -g @@ -6786,7 +6794,7 @@ be @file{foo.vcg}. @item --graph=@var{graph-file} The behavior of @var{--graph} is the same than @samp{-g}. The only difference is that it has an optional argument which is the name of -the output graph filename. +the output graph file. @end table @node Option Cross Key @@ -6902,13 +6910,13 @@ used for location tracking. @xref{C++ Location Values}. @item stack.hh An auxiliary class @code{stack} used by the parser. -@item @var{filename}.hh -@itemx @var{filename}.cc +@item @var{file}.hh +@itemx @var{file}.cc The declaration and implementation of the C++ parser class. -@var{filename} is the name of the output file. It follows the same +@var{file} is the name of the output file. It follows the same rules as with regular C parsers. -Note that @file{@var{filename}.hh} is @emph{mandatory}, the C++ cannot +Note that @file{@var{file}.hh} is @emph{mandatory}, the C++ cannot work without the parser class declaration. Therefore, you must either pass @option{-d}/@option{--defines} to @command{bison}, or use the @samp{%defines} directive. @@ -6926,12 +6934,13 @@ for a complete and accurate documentation. The @code{%union} directive works as for C, see @ref{Union Decl, ,The Collection of Value Types}. In particular it produces a genuine @code{union}@footnote{In the future techniques to allow complex types -within pseudo-unions (variants) might be implemented to alleviate -these issues.}, which have a few specific features in C++. +within pseudo-unions (similar to Boost variants) might be implemented to +alleviate these issues.}, which have a few specific features in C++. @itemize @minus @item -The name @code{YYSTYPE} also denotes @samp{union YYSTYPE}. You may -forward declare it just with @samp{union YYSTYPE;}. +The type @code{YYSTYPE} is defined but its use is discouraged: rather +you should refer to the parser's encapsulated type +@code{yy::parser::semantic_type}. @item Non POD (Plain Old Data) types cannot be used. C++ forbids any instance of classes with constructors in unions: only @emph{pointers} @@ -6957,7 +6966,7 @@ auxiliary classes define a @code{position}, a single point in a file, and a @code{location}, a range composed of a pair of @code{position}s (possibly spanning several files). -@deftypemethod {position} {std::string*} filename +@deftypemethod {position} {std::string*} file The name of the file. It will always be handled as a pointer, the parser will never duplicate nor deallocate it. As an experimental feature you may change it to @samp{@var{type}*} using @samp{%define @@ -6989,8 +6998,8 @@ Various forms of syntactic sugar for @code{columns}. @deftypemethod {position} {position} operator<< (std::ostream @var{o}, const position& @var{p}) Report @var{p} on @var{o} like this: -@samp{@var{filename}:@var{line}.@var{column}}, or -@samp{@var{line}.@var{column}} if @var{filename} is null. +@samp{@var{file}:@var{line}.@var{column}}, or +@samp{@var{line}.@var{column}} if @var{file} is null. @end deftypemethod @deftypemethod {location} {position} begin @@ -7026,7 +7035,7 @@ The output files @file{@var{output}.hh} and @file{@var{output}.cc} declare and define the parser class in the namespace @code{yy}. The class name defaults to @code{parser}, but may be changed using @samp{%define "parser_class_name" "@var{name}"}. The interface of -this class is detailled below. It can be extended using the +this class is detailed below. It can be extended using the @code{%parse-param} feature: its semantics is slightly changed since it describes an additional member of the parser class, and an additional argument for its constructor. @@ -7054,7 +7063,7 @@ Get or set the stream used for tracing the parsing. It defaults to @deftypemethod {parser} {debug_level_type} debug_level () @deftypemethodx {parser} {void} set_debug_level (debug_level @var{l}) Get or set the tracing level. Currently its value is either 0, no trace, -or non-zero, full tracing. +or nonzero, full tracing. @end deftypemethod @deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m}) @@ -7105,7 +7114,7 @@ actually easier to interface with. @subsection Calc++ --- C++ Calculator Of course the grammar is dedicated to arithmetics, a single -expression, possibily preceded by variable assignments. An +expression, possibly preceded by variable assignments. An environment containing possibly predefined variables such as @code{one} and @code{two}, is exchanged with the parser. An example of valid input follows. @@ -7132,7 +7141,8 @@ transforming the simple parsing context structure into a fully blown The declaration of this driver class, @file{calc++-driver.hh}, is as follows. The first part includes the CPP guard and imports the -required standard library components. +required standard library components, and the declaration of the parser +class. @comment file: calc++-driver.hh @example @@ -7140,26 +7150,9 @@ required standard library components. # define CALCXX_DRIVER_HH # include # include +# include "calc++-parser.hh" @end example -@noindent -Then come forward declarations. Because the parser uses the parsing -driver and reciprocally, simple inclusions of header files will not -do. Because the driver's declaration is the one that will be imported -by the rest of the project, it is saner to forward declare the -parser's information here. - -@comment file: calc++-driver.hh -@example -// Forward declarations. -union YYSTYPE; -namespace yy -@{ - class location; - class calcxx_parser; -@} -class calcxx_driver; -@end example @noindent Then comes the declaration of the scanning function. Flex expects @@ -7171,7 +7164,9 @@ factor both as follows. @example // Announce to Flex the prototype we want for lexing function, ... # define YY_DECL \ - int yylex (YYSTYPE* yylval, yy::location* yylloc, calcxx_driver& driver) + int yylex (yy::calcxx_parser::semantic_type* yylval, \ + yy::calcxx_parser::location_type* yylloc, \ + calcxx_driver& driver) // ... and declare it for the parser's sake. YY_DECL; @end example @@ -7281,19 +7276,33 @@ calcxx_driver::error (const std::string& m) @node Calc++ Parser @subsection Calc++ Parser -The parser definition file @file{calc++-parser.yy} starts by asking -for the C++ skeleton, the creation of the parser header file, and -specifies the name of the parser class. It then includes the required -headers. +The parser definition file @file{calc++-parser.yy} starts by asking for +the C++ LALR(1) skeleton, the creation of the parser header file, and +specifies the name of the parser class. Because the C++ skeleton +changed several times, it is safer to require the version you designed +the grammar for. @comment file: calc++-parser.yy @example %skeleton "lalr1.cc" /* -*- C++ -*- */ -%define "parser_class_name" "calcxx_parser" +%require "2.1a" %defines +%define "parser_class_name" "calcxx_parser" +@end example + +@noindent +Then come the declarations/inclusions needed to define the +@code{%union}. Because the parser uses the parsing driver and +reciprocally, both cannot include the header of the other. Because the +driver's header needs detailed knowledge about the parser class (in +particular its inner types), it is the parser's header which will simply +use a forward declaration of the driver. + +@comment file: calc++-parser.yy +@example %@{ # include -# include "calc++-driver.hh" +class calcxx_driver; %@} @end example @@ -7349,6 +7358,19 @@ them. @}; @end example +@noindent +The code between @samp{%@{} and @samp{%@}} after the introduction of the +@samp{%union} is output in the @file{*.cc} file; it needs detailed +knowledge about the driver. + +@comment file: calc++-parser.yy +@example +%@{ +# include "calc++-driver.hh" +%@} +@end example + + @noindent The token numbered as 0 corresponds to end of file; the following line allows for nicer error messages referring to ``end of file'' instead @@ -7358,11 +7380,11 @@ avoid name clashes. @comment file: calc++-parser.yy @example -%token YYEOF 0 "end of file" -%token TOKEN_ASSIGN ":=" -%token TOKEN_IDENTIFIER "identifier" -%token TOKEN_NUMBER "number" -%type exp "expression" +%token END 0 "end of file" +%token ASSIGN ":=" +%token IDENTIFIER "identifier" +%token NUMBER "number" +%type exp "expression" @end example @noindent @@ -7387,9 +7409,9 @@ The grammar itself is straightforward. unit: assignments exp @{ driver.result = $2; @}; assignments: assignments assignment @{@} - | /* Nothing. */ @{@}; + | /* Nothing. */ @{@}; -assignment: TOKEN_IDENTIFIER ":=" exp @{ driver.variables[*$1] = $3; @}; +assignment: "identifier" ":=" exp @{ driver.variables[*$1] = $3; @}; %left '+' '-'; %left '*' '/'; @@ -7397,8 +7419,8 @@ exp: exp '+' exp @{ $$ = $1 + $3; @} | exp '-' exp @{ $$ = $1 - $3; @} | exp '*' exp @{ $$ = $1 * $3; @} | exp '/' exp @{ $$ = $1 / $3; @} - | TOKEN_IDENTIFIER @{ $$ = driver.variables[*$1]; @} - | TOKEN_NUMBER @{ $$ = $1; @}; + | "identifier" @{ $$ = driver.variables[*$1]; @} + | "number" @{ $$ = $1; @}; %% @end example @@ -7456,7 +7478,7 @@ blank [ \t] @end example @noindent -The following paragraph suffices to track locations acurately. Each +The following paragraph suffices to track locations accurately. Each time @code{yylex} is invoked, the begin position is moved onto the end position. Then when a pattern is matched, the end position is advanced of its width. In case it matched ends of lines, the end @@ -7478,22 +7500,28 @@ preceding tokens. Comments would be treated equally. @end example @noindent -The rules are simple, just note the use of the driver to report -errors. +The rules are simple, just note the use of the driver to report errors. +It is convenient to use a typedef to shorten +@code{yy::calcxx_parser::token::identifier} into +@code{token::identifier} for instance. @comment file: calc++-scanner.ll @example +%@{ + typedef yy::calcxx_parser::token token; +%@} + [-+*/] return yytext[0]; -":=" return TOKEN_ASSIGN; +":=" return token::ASSIGN; @{int@} @{ errno = 0; long n = strtol (yytext, NULL, 10); if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE)) driver.error (*yylloc, "integer is out of range"); yylval->ival = n; - return TOKEN_NUMBER; + return token::NUMBER; @} -@{id@} yylval->sval = new std::string (yytext); return TOKEN_IDENTIFIER; +@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER; . driver.error (*yylloc, "invalid character"); %% @end example @@ -7530,7 +7558,7 @@ The top level file, @file{calc++.cc}, poses no problem. #include "calc++-driver.hh" int -main (int argc, const char* argv[]) +main (int argc, char *argv[]) @{ calcxx_driver driver; for (++argv; argv[0]; ++argv) @@ -7919,11 +7947,11 @@ parser file. @xref{Decl Summary}. @end deffn @deffn {Directive} %nonassoc -Bison declaration to assign non-associativity to token(s). +Bison declaration to assign nonassociativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @end deffn -@deffn {Directive} %output="@var{filename}" +@deffn {Directive} %output="@var{file}" Bison declaration to set the name of the parser file. @xref{Decl Summary}. @end deffn @@ -7944,6 +7972,11 @@ Bison declaration to request a pure (reentrant) parser. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. @end deffn +@deffn {Directive} %require "@var{version}" +Require version @var{version} or higher of Bison. @xref{Require Decl, , +Require a Version of Bison}. +@end deffn + @deffn {Directive} %right Bison declaration to assign right associativity to token(s). @xref{Precedence Decl, ,Operator Precedence}. @@ -8092,7 +8125,7 @@ Management}. @end deffn @deffn {Variable} yynerrs -Global variable which Bison increments each time there is a syntax error. +Global variable which Bison increments each time it reports a syntax error. (In a pure parser, it is a local variable within @code{yyparse}.) @xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}. @end deffn @@ -8122,10 +8155,7 @@ the parser will use @code{malloc} to extend its stacks. If defined to reserved for future Bison extensions. If not defined, @code{YYSTACK_USE_ALLOCA} defaults to 0. -If you define @code{YYSTACK_USE_ALLOCA} to 1, it is your -responsibility to make sure that @code{alloca} is visible, e.g., by -using @acronym{GCC} or by including @code{}. Furthermore, -in the all-too-common case where your code may run on a host with a +In the all-too-common case where your code may run on a host with a limited stack and with unreliable stack-overflow checking, you should set @code{YYMAXDEPTH} to a value that cannot possibly result in unchecked stack overflow on any of your target hosts when