X-Git-Url: https://git.saurik.com/bison.git/blobdiff_plain/c28cd5dc5aca9ef27aaa606855be891788ffd750..1f1bd5729743e155e71767810eaef22bd84e340b:/doc/bison.texi diff --git a/doc/bison.texi b/doc/bison.texi index bc3237ae..32396a89 100644 --- a/doc/bison.texi +++ b/doc/bison.texi @@ -295,6 +295,8 @@ Handling Context Dependencies Debugging Your Parser * Understanding:: Understanding the structure of your parser. +* Graphviz:: Getting a visual representation of the parser. +* Xml:: Getting a markup representation of the parser. * Tracing:: Tracing the execution of your parser. Tracing Your Parser @@ -328,6 +330,7 @@ C++ Location Values * C++ position:: One point in the source file * C++ location:: Two points in the source file +* User Defined Location Type:: Required interface for locations A Complete C++ Example @@ -4702,6 +4705,10 @@ incoming terminals during the second phase of error recovery, the current lookahead and the entire stack (except the current right-hand side symbols) when the parser returns immediately, and @item +the current lookahead and the entire stack (including the current right-hand +side symbols) when the C++ parser (@file{lalr1.cc}) catches an exception in +@code{parse}, +@item the start symbol, when the parser succeeds. @end itemize @@ -4859,6 +4866,7 @@ may override this restriction with the @code{%start} declaration as follows: @cindex reentrant parser @cindex pure parser @findex %define api.pure +@findex %define api.pure full A @dfn{reentrant} program is one which does not alter in the course of execution; in other words, it consists entirely of @dfn{pure} (read-only) @@ -4878,7 +4886,7 @@ declaration @code{%define api.pure} says that you want the parser to be reentrant. It looks like this: @example -%define api.pure +%define api.pure full @end example The result is that the communication variables @code{yylval} and @@ -4928,7 +4936,7 @@ compatibility with the impure Yacc pull mode interface. Unless you know what you are doing, your declarations should look like this: @example -%define api.pure +%define api.pure full %define api.push-pull push @end example @@ -5001,8 +5009,8 @@ yypull_parse (ps); /* Will call the lexer */ yypstate_delete (ps); @end example -Adding the @code{%define api.pure} declaration does exactly the same thing to -the generated parser with @code{%define api.push-pull both} as it did for +Adding the @code{%define api.pure full} declaration does exactly the same thing +to the generated parser with @code{%define api.push-pull both} as it did for @code{%define api.push-pull push}. @node Decl Summary @@ -5322,6 +5330,23 @@ Unaccepted @var{variable}s produce an error. Some of the accepted @var{variable}s are: @itemize @bullet +@c ================================================== api.location.type +@item @code{api.location.type} +@findex %define api.location.type + +@itemize @bullet +@item Language(s): C++, Java + +@item Purpose: Define the location type. +@xref{User Defined Location Type}. + +@item Accepted Values: String + +@item Default Value: none + +@item History: introduced in Bison 2.7 +@end itemize + @c ================================================== api.prefix @item @code{api.prefix} @findex %define api.prefix @@ -5329,7 +5354,7 @@ Some of the accepted @var{variable}s are: @itemize @bullet @item Language(s): All -@item Purpose: Rename exported symbols +@item Purpose: Rename exported symbols. @xref{Multiple Parsers, ,Multiple Parsers in the Same Program}. @item Accepted Values: String @@ -5349,9 +5374,40 @@ Some of the accepted @var{variable}s are: @item Purpose: Request a pure (reentrant) parser program. @xref{Pure Decl, ,A Pure (Reentrant) Parser}. -@item Accepted Values: Boolean +@item Accepted Values: @code{true}, @code{false}, @code{full} + +The value may be omitted: this is equivalent to specifying @code{true}, as is +the case for Boolean values. + +When @code{%define api.pure full} is used, the parser is made reentrant. This +changes the signature for yylex (@pxref{Pure Calling}), and also that of +yyerror when the tracking of locations has been activated, as shown below. + +The @code{true} value is very similar to the @code{full} value, the only +difference is in the signature of @code{yyerror} on Yacc parsers without +@code{%parse-param}, for historical reasons. + +I.e., if @samp{%locations %define api.pure} is passed then the prototypes for +@code{yyerror} are: + +@example +void yyerror (char const *msg); /* Yacc parsers. */ +void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ +@end example + +But if @samp{%locations %define api.pure %parse-param @{int *nastiness@}} is +used, then both parsers have the same signature: + +@example +void yyerror (YYLTYPE *llocp, int *nastiness, char const *msg); +@end example + +(@pxref{Error Reporting, ,The Error +Reporting Function @code{yyerror}}) @item Default Value: @code{false} + +@item History: the @code{full} value was introduced in Bison 2.7 @end itemize @c ================================================== api.push-pull @@ -5796,6 +5852,27 @@ In the grammar actions, use expressions like this to refer to the data: exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @} @end example +@noindent +Using the following: +@example +%parse-param @{int *randomness@} +@end example + +Results in these signatures: +@example +void yyerror (int *randomness, const char *msg); +int yyparse (int *randomness); +@end example + +@noindent +Or, if both @code{%define api.pure full} (or just @code{%define api.pure}) +and @code{%locations} are used: + +@example +void yyerror (YYLTYPE *llocp, int *randomness, const char *msg); +int yyparse (int *randomness); +@end example + @node Push Parser Function @section The Push Parser Function @code{yypush_parse} @findex yypush_parse @@ -6047,7 +6124,7 @@ The data type of @code{yylloc} has the name @code{YYLTYPE}. @node Pure Calling @subsection Calling Conventions for Pure Parsers -When you use the Bison declaration @code{%define api.pure} to request a +When you use the Bison declaration @code{%define api.pure full} to request a pure, reentrant parser, the global communication variables @code{yylval} and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant) Parser}.) In such parsers the two global variables are replaced by @@ -6082,35 +6159,25 @@ Declare that the braced-code @var{argument-declaration} is an additional @code{yylex} argument declaration. @end deffn +@noindent For instance: @example -%parse-param @{int *nastiness@} %lex-param @{int *nastiness@} -%parse-param @{int *randomness@} @end example @noindent -results in the following signatures: +results in the following signature: @example -int yylex (int *nastiness); -int yyparse (int *nastiness, int *randomness); -@end example - -If @code{%define api.pure} is added: - -@example -int yylex (YYSTYPE *lvalp, int *nastiness); -int yyparse (int *nastiness, int *randomness); +int yylex (int *nastiness); @end example @noindent -and finally, if both @code{%define api.pure} and @code{%locations} are used: +If @code{%define api.pure full} (or just @code{%define api.pure}) is added: @example -int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); -int yyparse (int *nastiness, int *randomness); +int yylex (YYSTYPE *lvalp, int *nastiness); @end example @node Error Reporting @@ -6170,50 +6237,16 @@ error recovery if you have written suitable error recovery grammar rules immediately return 1. Obviously, in location tracking pure parsers, @code{yyerror} should have -an access to the current location. -This is indeed the case for the GLR -parsers, but not for the Yacc parser, for historical reasons. I.e., if -@samp{%locations %define api.pure} is passed then the prototypes for -@code{yyerror} are: +an access to the current location. With @code{%define api.pure}, this is +indeed the case for the GLR parsers, but not for the Yacc parser, for +historical reasons, and this is the why @code{%define api.pure full} should be +prefered over @code{%define api.pure}. -@example -void yyerror (char const *msg); /* Yacc parsers. */ -void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */ -@end example - -If @samp{%parse-param @{int *nastiness@}} is used, then: - -@example -void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */ -void yyerror (int *nastiness, char const *msg); /* GLR parsers. */ -@end example - -Finally, GLR and Yacc parsers share the same @code{yyerror} calling -convention for absolutely pure parsers, i.e., when the calling -convention of @code{yylex} @emph{and} the calling convention of -@code{%define api.pure} are pure. -I.e.: +When @code{%locations %define api.pure full} is used, @code{yyerror} has the +following signature: @example -/* Location tracking. */ -%locations -/* Pure yylex. */ -%define api.pure -%lex-param @{int *nastiness@} -/* Pure yyparse. */ -%parse-param @{int *nastiness@} -%parse-param @{int *randomness@} -@end example - -@noindent -results in the following signatures for all the parser kinds: - -@example -int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness); -int yyparse (int *nastiness, int *randomness); -void yyerror (YYLTYPE *locp, - int *nastiness, int *randomness, - char const *msg); +void yyerror (YYLTYPE *locp, char const *msg); @end example @noindent @@ -7117,6 +7150,58 @@ redirects: @end group @end example +Yet this proposal introduces another kind of ambiguity! The input +@samp{word word} can be parsed as a single @code{words} composed of two +@samp{word}s, or as two one-@code{word} @code{words} (and likewise for +@code{redirect}/@code{redirects}). However this ambiguity is now a +shift/reduce conflict, and therefore it can now be addressed with precedence +directives. + +To simplify the matter, we will proceed with @code{word} and @code{redirect} +being tokens: @code{"word"} and @code{"redirect"}. + +To prefer the longest @code{words}, the conflict between the token +@code{"word"} and the rule @samp{sequence: sequence words} must be resolved +as a shift. To this end, we use the same techniques as exposed above, see +@ref{Non Operators,, Using Precedence For Non Operators}. One solution +relies on precedences: use @code{%prec} to give a lower precedence to the +rule: + +@example +%nonassoc "word" +%nonassoc "sequence" +%% +@group +sequence: + /* empty */ +| sequence word %prec "sequence" +| sequence redirect %prec "sequence" +; +@end group + +@group +words: + word +| words "word" +; +@end group +@end example + +Another solution relies on associativity: provide both the token and the +rule with the same precedence, but make them right-associative: + +@example +%right "word" "redirect" +%% +@group +sequence: + /* empty */ +| sequence word %prec "word" +| sequence redirect %prec "redirect" +; +@end group +@end example + @node Mysterious Conflicts @section Mysterious Conflicts @cindex Mysterious Conflicts @@ -8113,6 +8198,8 @@ automaton, and how to enable and understand the parser run-time traces. @menu * Understanding:: Understanding the structure of your parser. +* Graphviz:: Getting a visual representation of the parser. +* Xml:: Getting a markup representation of the parser. * Tracing:: Tracing the execution of your parser. @end menu @@ -8529,6 +8616,165 @@ precedence of @samp{/} with respect to @samp{+}, @samp{-}, and @samp{*}, but also because the associativity of @samp{/} is not specified. +Note that Bison may also produce an HTML version of this output, via an XML +file and XSLT processing (@pxref{Xml}). + +@c ================================================= Graphical Representation + +@node Graphviz +@section Visualizing Your Parser +@cindex dot + +As another means to gain better understanding of the shift/reduce +automaton corresponding to the Bison parser, a DOT file can be generated. Note +that debugging a real grammar with this is tedious at best, and impractical +most of the times, because the generated files are huge (the generation of +a PDF or PNG file from it will take very long, and more often than not it will +fail due to memory exhaustion). This option was rather designed for beginners, +to help them understand LR parsers. + +This file is generated when the @option{--graph} option is specified +(@pxref{Invocation, , Invoking Bison}). Its name is made by removing +@samp{.tab.c} or @samp{.c} from the parser implementation file name, and +adding @samp{.dot} instead. If the grammar file is @file{foo.y}, the +Graphviz output file is called @file{foo.dot}. + +The following grammar file, @file{rr.y}, will be used in the sequel: + +@example +%% +@group +exp: a ";" | b "."; +a: "0"; +b: "0"; +@end group +@end example + +The graphical output is very similar to the textual one, and as such it is +easier understood by making direct comparisons between them. See +@ref{Debugging, , Debugging Your Parser} for a detailled analysis of the +textual report. + +@subheading Graphical Representation of States + +The items (pointed rules) for each state are grouped together in graph nodes. +Their numbering is the same as in the verbose file. See the following points, +about transitions, for examples + +When invoked with @option{--report=lookaheads}, the lookahead tokens, when +needed, are shown next to the relevant rule between square brackets as a +comma separated list. This is the case in the figure for the representation of +reductions, below. + +@sp 1 + +The transitions are represented as directed edges between the current and +the target states. + +@subheading Graphical Representation of Shifts + +Shifts are shown as solid arrows, labelled with the lookahead token for that +shift. The following describes a reduction in the @file{rr.output} file: + +@example +@group +state 3 + + 1 exp: a . ";" + + ";" shift, and go to state 6 +@end group +@end example + +A Graphviz rendering of this portion of the graph could be: + +@center @image{figs/example-shift, 100pt} + +@subheading Graphical Representation of Reductions + +Reductions are shown as solid arrows, leading to a diamond-shaped node +bearing the number of the reduction rule. The arrow is labelled with the +appropriate comma separated lookahead tokens. If the reduction is the default +action for the given state, there is no such label. + +This is how reductions are represented in the verbose file @file{rr.output}: +@example +state 1 + + 3 a: "0" . [";"] + 4 b: "0" . ["."] + + "." reduce using rule 4 (b) + $default reduce using rule 3 (a) +@end example + +A Graphviz rendering of this portion of the graph could be: + +@center @image{figs/example-reduce, 120pt} + +When unresolved conflicts are present, because in deterministic parsing +a single decision can be made, Bison can arbitrarily choose to disable a +reduction, see @ref{Shift/Reduce, , Shift/Reduce Conflicts}. Discarded actions +are distinguished by a red filling color on these nodes, just like how they are +reported between square brackets in the verbose file. + +The reduction corresponding to the rule number 0 is the acceptation state. It +is shown as a blue diamond, labelled "Acc". + +@subheading Graphical representation of go tos + +The @samp{go to} jump transitions are represented as dotted lines bearing +the name of the rule being jumped to. + +Note that a DOT file may also be produced via an XML file and XSLT +processing (@pxref{Xml}). + +@c ================================================= XML + +@node Xml +@section Visualizing your parser in multiple formats +@cindex xml + +Bison supports two major report formats: textual output +(@pxref{Understanding}) when invoked with option @option{--verbose}, and DOT +(@pxref{Graphviz}) when invoked with option @option{--graph}. However, +another alternative is to output an XML file that may then be, with +@command{xsltproc}, rendered as either a raw text format equivalent to the +verbose file, or as an HTML version of the same file, with clickable +transitions, or even as a DOT. The @file{.output} and DOT files obtained via +XSLT have no difference whatsoever with those obtained by invoking +@command{bison} with options @option{--verbose} or @option{--graph}. + +The textual file is generated when the options @option{-x} or +@option{--xml[=FILE]} are specified, see @ref{Invocation,,Invoking Bison}. +If not specified, its name is made by removing @samp{.tab.c} or @samp{.c} +from the parser implementation file name, and adding @samp{.xml} instead. +For instance, if the grammar file is @file{foo.y}, the default XML output +file is @file{foo.xml}. + +Bison ships with a @file{data/xslt} directory, containing XSL Transformation +files to apply to the XML file. Their names are non-ambiguous: + +@table @file +@item xml2dot.xsl +Used to output a copy of the DOT visualization of the automaton. +@item xml2text.xsl +Used to output a copy of the .output file. +@item xml2xhtml.xsl +Used to output an xhtml enhancement of the .output file. +@end table + +Sample usage (requires @code{xsltproc}): +@example +$ bison -x input.y +@group +$ bison --print-datadir +/usr/local/share/bison +@end group +$ xsltproc /usr/local/share/bison/xslt/xml2xhtml.xsl input.xml > input.html +@end example + +@c ================================================= Tracing @node Tracing @section Tracing Your Parser @@ -9248,8 +9494,9 @@ generated in the following files: @table @file @item position.hh @itemx location.hh -The definition of the classes @code{position} and @code{location}, -used for location tracking. @xref{C++ Location Values}. +The definition of the classes @code{position} and @code{location}, used for +location tracking. These files are not generated if the @code{%define} +variable @code{api.location.type} is defined. @xref{C++ Location Values}. @item stack.hh An auxiliary class @code{stack} used by the parser. @@ -9305,18 +9552,22 @@ Symbols}. @c - %define filename_type "const symbol::Symbol" When the directive @code{%locations} is used, the C++ parser supports -location tracking, see @ref{Tracking Locations}. Two auxiliary classes -define a @code{position}, a single point in a file, and a @code{location}, a -range composed of a pair of @code{position}s (possibly spanning several -files). +location tracking, see @ref{Tracking Locations}. + +By default, two auxiliary classes define a @code{position}, a single point +in a file, and a @code{location}, a range composed of a pair of +@code{position}s (possibly spanning several files). But if the +@code{%define} variable @code{api.location.type} is defined, then these +classes will not be generated, and the user defined type will be used. @tindex uint In this section @code{uint} is an abbreviation for @code{unsigned int}: in genuine code only the latter is used. @menu -* C++ position:: One point in the source file -* C++ location:: Two points in the source file +* C++ position:: One point in the source file +* C++ location:: Two points in the source file +* User Defined Location Type:: Required interface for locations @end menu @node C++ position @@ -9420,6 +9671,63 @@ Report @var{p} on @var{o}, taking care of special cases such as: no @code{filename} defined, or equal filename/line or column. @end deftypefun +@node User Defined Location Type +@subsubsection User Defined Location Type +@findex %define api.location.type + +Instead of using the built-in types you may use the @code{%define} variable +@code{api.location.type} to specify your own type: + +@example +%define api.location.type @var{LocationType} +@end example + +The requirements over your @var{LocationType} are: +@itemize +@item +it must be copyable; + +@item +in order to compute the (default) value of @code{@@$} in a reduction, the +parser basically runs +@example +@@$.begin = @@$1.begin; +@@$.end = @@$@var{N}.end; // The location of last right-hand side symbol. +@end example +@noindent +so there must be copyable @code{begin} and @code{end} members; + +@item +alternatively you may redefine the computation of the default location, in +which case these members are not required (@pxref{Location Default Action}); + +@item +if traces are enabled, then there must exist an @samp{std::ostream& + operator<< (std::ostream& o, const @var{LocationType}& s)} function. +@end itemize + +@sp 1 + +In programs with several C++ parsers, you may also use the @code{%define} +variable @code{api.location.type} to share a common set of built-in +definitions for @code{position} and @code{location}. For instance, one +parser @file{master/parser.yy} might use: + +@example +%defines +%locations +%define namespace "master::" +@end example + +@noindent +to generate the @file{master/position.hh} and @file{master/location.hh} +files, reused by other parsers as follows: + +@example +%define api.location.type "master::location" +%code requires @{ #include @} +@end example + @node C++ Parser Interface @subsection C++ Parser Interface @c - define parser_class_name @@ -9457,6 +9765,11 @@ Build a new parser object. There are no arguments by default, unless @deftypemethod {parser} {int} parse () Run the syntactic analysis, and return 0 on success, 1 otherwise. + +@cindex exceptions +The whole function is wrapped in a @code{try}/@code{catch} block, so that +when an exception is thrown, the @code{%destructor}s are called to release +the lookahead symbol, and the symbols pushed on the stack. @end deftypemethod @deftypemethod {parser} {std::ostream&} debug_stream () @@ -9486,7 +9799,7 @@ described by @var{m}. The parser invokes the scanner by calling @code{yylex}. Contrary to C parsers, C++ parsers are always pure: there is no point in using the -@code{%define api.pure} directive. Therefore the interface is as follows. +@code{%define api.pure full} directive. Therefore the interface is as follows. @deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...) Return the next token. Its type is the return value, its semantic @@ -9570,8 +9883,8 @@ factor both as follows. // Tell Flex the lexer's prototype ... # define YY_DECL \ yy::calcxx_parser::token_type \ - yylex (yy::calcxx_parser::semantic_type *yylval, \ - yy::calcxx_parser::location_type *yylloc, \ + yylex (yy::calcxx_parser::semantic_type* yylval, \ + yy::calcxx_parser::location_type* yylloc, \ calcxx_driver& driver) // ... and declare it for the parser's sake. YY_DECL; @@ -10049,7 +10362,7 @@ You can create documentation for generated parsers using Javadoc. Contrary to C parsers, Java parsers do not use global variables; the state of the parser is always local to an instance of the parser class. Therefore, all Java parsers are ``pure'', and the @code{%pure-parser} -and @code{%define api.pure} directives does not do anything when used in +and @code{%define api.pure full} directives does not do anything when used in Java. Push parsers are currently unsupported in Java and @code{%define @@ -10128,11 +10441,11 @@ class defines a @dfn{position}, a single point in a file; Bison itself defines a class representing a @dfn{location}, a range composed of a pair of positions (possibly spanning several files). The location class is an inner class of the parser; the name is @code{Location} by default, and may also be -renamed using @code{%define location_type "@var{class-name}"}. +renamed using @code{%define api.location.type "@var{class-name}"}. The location class treats the position as a completely opaque value. By default, the class name is @code{Position}, but this can be changed -with @code{%define position_type "@var{class-name}"}. This class must +with @code{%define api.position.type "@var{class-name}"}. This class must be supplied by the user. @@ -10267,7 +10580,7 @@ In both cases, the scanner has to implement the following methods. @deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg}) This method is defined by the user to emit an error message. The first parameter is omitted if location tracking is not active. Its type can be -changed using @code{%define location_type "@var{class-name}".} +changed using @code{%define api.location.type "@var{class-name}".} @end deftypemethod @deftypemethod {Lexer} {int} yylex () @@ -10285,7 +10598,7 @@ Return respectively the first position of the last token that @code{yylex} returned, and the first position beyond it. These methods are not needed unless location tracking is active. -The return type can be changed using @code{%define position_type +The return type can be changed using @code{%define api.position.type "@var{class-name}".} @end deftypemethod @@ -10530,10 +10843,11 @@ comma-separated list. Default is @code{java.io.IOException}. @xref{Java Scanner Interface}. @end deffn -@deffn {Directive} {%define location_type} "@var{class}" +@deffn {Directive} {%define api.location.type} "@var{class}" The name of the class used for locations (a range between two positions). This class is generated as an inner class of the parser class by @command{bison}. Default is @code{Location}. +Formerly named @code{location_type}. @xref{Java Location Values}. @end deffn @@ -10548,9 +10862,10 @@ The name of the parser class. Default is @code{YYParser} or @xref{Java Bison Interface}. @end deffn -@deffn {Directive} {%define position_type} "@var{class}" +@deffn {Directive} {%define api.position.type} "@var{class}" The name of the class used for positions. This class must be supplied by the user. Default is @code{Position}. +Formerly named @code{position_type}. @xref{Java Location Values}. @end deffn @@ -10630,7 +10945,7 @@ or @quotation My parser includes support for an @samp{#include}-like feature, in which case I run @code{yyparse} from @code{yyparse}. This fails -although I did specify @samp{%define api.pure}. +although I did specify @samp{%define api.pure full}. @end quotation These problems typically come not from Bison itself, but from @@ -11834,7 +12149,11 @@ London, Department of Computer Science, TR-00-12 (December 2000). @c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's @c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy ints @c LocalWords: Scannerless ispell american ChangeLog smallexample CSTYPE CLTYPE -@c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate +@c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate LocationType +@c LocalWords: parsers parser's +@c LocalWords: associativity subclasses precedences unresolvable runnable +@c LocalWords: allocators subunit initializations unreferenced untyped +@c LocalWords: errorVerbose subtype subtypes @c Local Variables: @c ispell-dictionary: "american"