doc/bison.info-4

   1 This is bison.info, produced by makeinfo version 4.0 from bison.texinfo.
   2
   3 START-INFO-DIR-ENTRY
   4 * bison: (bison).       GNU Project parser generator (yacc replacement).
   5 END-INFO-DIR-ENTRY
   6
   7    This file documents the Bison parser generator.
   8
   9    Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
  10 2000 Free Software Foundation, Inc.
  11
  12    Permission is granted to make and distribute verbatim copies of this
  13 manual provided the copyright notice and this permission notice are
  14 preserved on all copies.
  15
  16    Permission is granted to copy and distribute modified versions of
  17 this manual under the conditions for verbatim copying, provided also
  18 that the sections entitled "GNU General Public License" and "Conditions
  19 for Using Bison" are included exactly as in the original, and provided
  20 that the entire resulting derived work is distributed under the terms
  21 of a permission notice identical to this one.
  22
  23    Permission is granted to copy and distribute translations of this
  24 manual into another language, under the above conditions for modified
  25 versions, except that the sections entitled "GNU General Public
  26 License", "Conditions for Using Bison" and this permission notice may be
  27 included in translations approved by the Free Software Foundation
  28 instead of in the original English.
  29
  30 \1f
  31 File: bison.info,  Node: Using Precedence,  Next: Precedence Examples,  Prev: Why Precedence,  Up: Precedence
  32
  33 Specifying Operator Precedence
  34 ------------------------------
  35
  36    Bison allows you to specify these choices with the operator
  37 precedence declarations `%left' and `%right'.  Each such declaration
  38 contains a list of tokens, which are operators whose precedence and
  39 associativity is being declared.  The `%left' declaration makes all
  40 those operators left-associative and the `%right' declaration makes
  41 them right-associative.  A third alternative is `%nonassoc', which
  42 declares that it is a syntax error to find the same operator twice "in a
  43 row".
  44
  45    The relative precedence of different operators is controlled by the
  46 order in which they are declared.  The first `%left' or `%right'
  47 declaration in the file declares the operators whose precedence is
  48 lowest, the next such declaration declares the operators whose
  49 precedence is a little higher, and so on.
  50
  51 \1f
  52 File: bison.info,  Node: Precedence Examples,  Next: How Precedence,  Prev: Using Precedence,  Up: Precedence
  53
  54 Precedence Examples
  55 -------------------
  56
  57    In our example, we would want the following declarations:
  58
  59      %left '<'
  60      %left '-'
  61      %left '*'
  62
  63    In a more complete example, which supports other operators as well,
  64 we would declare them in groups of equal precedence.  For example,
  65 `'+'' is declared with `'-'':
  66
  67      %left '<' '>' '=' NE LE GE
  68      %left '+' '-'
  69      %left '*' '/'
  70
  71 (Here `NE' and so on stand for the operators for "not equal" and so on.
  72 We assume that these tokens are more than one character long and
  73 therefore are represented by names, not character literals.)
  74
  75 \1f
  76 File: bison.info,  Node: How Precedence,  Prev: Precedence Examples,  Up: Precedence
  77
  78 How Precedence Works
  79 --------------------
  80
  81    The first effect of the precedence declarations is to assign
  82 precedence levels to the terminal symbols declared.  The second effect
  83 is to assign precedence levels to certain rules: each rule gets its
  84 precedence from the last terminal symbol mentioned in the components.
  85 (You can also specify explicitly the precedence of a rule.  *Note
  86 Context-Dependent Precedence: Contextual Precedence.)
  87
  88    Finally, the resolution of conflicts works by comparing the
  89 precedence of the rule being considered with that of the look-ahead
  90 token.  If the token's precedence is higher, the choice is to shift.
  91 If the rule's precedence is higher, the choice is to reduce.  If they
  92 have equal precedence, the choice is made based on the associativity of
  93 that precedence level.  The verbose output file made by `-v' (*note
  94 Invoking Bison: Invocation.) says how each conflict was resolved.
  95
  96    Not all rules and not all tokens have precedence.  If either the
  97 rule or the look-ahead token has no precedence, then the default is to
  98 shift.
  99
 100 \1f
 101 File: bison.info,  Node: Contextual Precedence,  Next: Parser States,  Prev: Precedence,  Up: Algorithm
 102
 103 Context-Dependent Precedence
 104 ============================
 105
 106    Often the precedence of an operator depends on the context.  This
 107 sounds outlandish at first, but it is really very common.  For example,
 108 a minus sign typically has a very high precedence as a unary operator,
 109 and a somewhat lower precedence (lower than multiplication) as a binary
 110 operator.
 111
 112    The Bison precedence declarations, `%left', `%right' and
 113 `%nonassoc', can only be used once for a given token; so a token has
 114 only one precedence declared in this way.  For context-dependent
 115 precedence, you need to use an additional mechanism: the `%prec'
 116 modifier for rules.
 117
 118    The `%prec' modifier declares the precedence of a particular rule by
 119 specifying a terminal symbol whose precedence should be used for that
 120 rule.  It's not necessary for that symbol to appear otherwise in the
 121 rule.  The modifier's syntax is:
 122
 123      %prec TERMINAL-SYMBOL
 124
 125 and it is written after the components of the rule.  Its effect is to
 126 assign the rule the precedence of TERMINAL-SYMBOL, overriding the
 127 precedence that would be deduced for it in the ordinary way.  The
 128 altered rule precedence then affects how conflicts involving that rule
 129 are resolved (*note Operator Precedence: Precedence.).
 130
 131    Here is how `%prec' solves the problem of unary minus.  First,
 132 declare a precedence for a fictitious terminal symbol named `UMINUS'.
 133 There are no tokens of this type, but the symbol serves to stand for its
 134 precedence:
 135
 136      ...
 137      %left '+' '-'
 138      %left '*'
 139      %left UMINUS
 140
 141    Now the precedence of `UMINUS' can be used in specific rules:
 142
 143      exp:    ...
 144              | exp '-' exp
 145              ...
 146              | '-' exp %prec UMINUS
 147
 148 \1f
 149 File: bison.info,  Node: Parser States,  Next: Reduce/Reduce,  Prev: Contextual Precedence,  Up: Algorithm
 150
 151 Parser States
 152 =============
 153
 154    The function `yyparse' is implemented using a finite-state machine.
 155 The values pushed on the parser stack are not simply token type codes;
 156 they represent the entire sequence of terminal and nonterminal symbols
 157 at or near the top of the stack.  The current state collects all the
 158 information about previous input which is relevant to deciding what to
 159 do next.
 160
 161    Each time a look-ahead token is read, the current parser state
 162 together with the type of look-ahead token are looked up in a table.
 163 This table entry can say, "Shift the look-ahead token."  In this case,
 164 it also specifies the new parser state, which is pushed onto the top of
 165 the parser stack.  Or it can say, "Reduce using rule number N."  This
 166 means that a certain number of tokens or groupings are taken off the
 167 top of the stack, and replaced by one grouping.  In other words, that
 168 number of states are popped from the stack, and one new state is pushed.
 169
 170    There is one other alternative: the table can say that the
 171 look-ahead token is erroneous in the current state.  This causes error
 172 processing to begin (*note Error Recovery::).
 173
 174 \1f
 175 File: bison.info,  Node: Reduce/Reduce,  Next: Mystery Conflicts,  Prev: Parser States,  Up: Algorithm
 176
 177 Reduce/Reduce Conflicts
 178 =======================
 179
 180    A reduce/reduce conflict occurs if there are two or more rules that
 181 apply to the same sequence of input.  This usually indicates a serious
 182 error in the grammar.
 183
 184    For example, here is an erroneous attempt to define a sequence of
 185 zero or more `word' groupings.
 186
 187      sequence: /* empty */
 188                      { printf ("empty sequence\n"); }
 189              | maybeword
 190              | sequence word
 191                      { printf ("added word %s\n", $2); }
 192              ;
 193
 194      maybeword: /* empty */
 195                      { printf ("empty maybeword\n"); }
 196              | word
 197                      { printf ("single word %s\n", $1); }
 198              ;
 199
 200 The error is an ambiguity: there is more than one way to parse a single
 201 `word' into a `sequence'.  It could be reduced to a `maybeword' and
 202 then into a `sequence' via the second rule.  Alternatively,
 203 nothing-at-all could be reduced into a `sequence' via the first rule,
 204 and this could be combined with the `word' using the third rule for
 205 `sequence'.
 206
 207    There is also more than one way to reduce nothing-at-all into a
 208 `sequence'.  This can be done directly via the first rule, or
 209 indirectly via `maybeword' and then the second rule.
 210
 211    You might think that this is a distinction without a difference,
 212 because it does not change whether any particular input is valid or
 213 not.  But it does affect which actions are run.  One parsing order runs
 214 the second rule's action; the other runs the first rule's action and
 215 the third rule's action.  In this example, the output of the program
 216 changes.
 217
 218    Bison resolves a reduce/reduce conflict by choosing to use the rule
 219 that appears first in the grammar, but it is very risky to rely on
 220 this.  Every reduce/reduce conflict must be studied and usually
 221 eliminated.  Here is the proper way to define `sequence':
 222
 223      sequence: /* empty */
 224                      { printf ("empty sequence\n"); }
 225              | sequence word
 226                      { printf ("added word %s\n", $2); }
 227              ;
 228
 229    Here is another common error that yields a reduce/reduce conflict:
 230
 231      sequence: /* empty */
 232              | sequence words
 233              | sequence redirects
 234              ;
 235
 236      words:    /* empty */
 237              | words word
 238              ;
 239
 240      redirects:/* empty */
 241              | redirects redirect
 242              ;
 243
 244 The intention here is to define a sequence which can contain either
 245 `word' or `redirect' groupings.  The individual definitions of
 246 `sequence', `words' and `redirects' are error-free, but the three
 247 together make a subtle ambiguity: even an empty input can be parsed in
 248 infinitely many ways!
 249
 250    Consider: nothing-at-all could be a `words'.  Or it could be two
 251 `words' in a row, or three, or any number.  It could equally well be a
 252 `redirects', or two, or any number.  Or it could be a `words' followed
 253 by three `redirects' and another `words'.  And so on.
 254
 255    Here are two ways to correct these rules.  First, to make it a
 256 single level of sequence:
 257
 258      sequence: /* empty */
 259              | sequence word
 260              | sequence redirect
 261              ;
 262
 263    Second, to prevent either a `words' or a `redirects' from being
 264 empty:
 265
 266      sequence: /* empty */
 267              | sequence words
 268              | sequence redirects
 269              ;
 270
 271      words:    word
 272              | words word
 273              ;
 274
 275      redirects:redirect
 276              | redirects redirect
 277              ;
 278
 279 \1f
 280 File: bison.info,  Node: Mystery Conflicts,  Next: Stack Overflow,  Prev: Reduce/Reduce,  Up: Algorithm
 281
 282 Mysterious Reduce/Reduce Conflicts
 283 ==================================
 284
 285    Sometimes reduce/reduce conflicts can occur that don't look
 286 warranted.  Here is an example:
 287
 288      %token ID
 289
 290      %%
 291      def:    param_spec return_spec ','
 292              ;
 293      param_spec:
 294                   type
 295              |    name_list ':' type
 296              ;
 297      return_spec:
 298                   type
 299              |    name ':' type
 300              ;
 301      type:        ID
 302              ;
 303      name:        ID
 304              ;
 305      name_list:
 306                   name
 307              |    name ',' name_list
 308              ;
 309
 310    It would seem that this grammar can be parsed with only a single
 311 token of look-ahead: when a `param_spec' is being read, an `ID' is a
 312 `name' if a comma or colon follows, or a `type' if another `ID'
 313 follows.  In other words, this grammar is LR(1).
 314
 315    However, Bison, like most parser generators, cannot actually handle
 316 all LR(1) grammars.  In this grammar, two contexts, that after an `ID'
 317 at the beginning of a `param_spec' and likewise at the beginning of a
 318 `return_spec', are similar enough that Bison assumes they are the same.
 319 They appear similar because the same set of rules would be active--the
 320 rule for reducing to a `name' and that for reducing to a `type'.  Bison
 321 is unable to determine at that stage of processing that the rules would
 322 require different look-ahead tokens in the two contexts, so it makes a
 323 single parser state for them both.  Combining the two contexts causes a
 324 conflict later.  In parser terminology, this occurrence means that the
 325 grammar is not LALR(1).
 326
 327    In general, it is better to fix deficiencies than to document them.
 328 But this particular deficiency is intrinsically hard to fix; parser
 329 generators that can handle LR(1) grammars are hard to write and tend to
 330 produce parsers that are very large.  In practice, Bison is more useful
 331 as it is now.
 332
 333    When the problem arises, you can often fix it by identifying the two
 334 parser states that are being confused, and adding something to make them
 335 look distinct.  In the above example, adding one rule to `return_spec'
 336 as follows makes the problem go away:
 337
 338      %token BOGUS
 339      ...
 340      %%
 341      ...
 342      return_spec:
 343                   type
 344              |    name ':' type
 345              /* This rule is never used.  */
 346              |    ID BOGUS
 347              ;
 348
 349    This corrects the problem because it introduces the possibility of an
 350 additional active rule in the context after the `ID' at the beginning of
 351 `return_spec'.  This rule is not active in the corresponding context in
 352 a `param_spec', so the two contexts receive distinct parser states.  As
 353 long as the token `BOGUS' is never generated by `yylex', the added rule
 354 cannot alter the way actual input is parsed.
 355
 356    In this particular example, there is another way to solve the
 357 problem: rewrite the rule for `return_spec' to use `ID' directly
 358 instead of via `name'.  This also causes the two confusing contexts to
 359 have different sets of active rules, because the one for `return_spec'
 360 activates the altered rule for `return_spec' rather than the one for
 361 `name'.
 362
 363      param_spec:
 364                   type
 365              |    name_list ':' type
 366              ;
 367      return_spec:
 368                   type
 369              |    ID ':' type
 370              ;
 371
 372 \1f
 373 File: bison.info,  Node: Stack Overflow,  Prev: Mystery Conflicts,  Up: Algorithm
 374
 375 Stack Overflow, and How to Avoid It
 376 ===================================
 377
 378    The Bison parser stack can overflow if too many tokens are shifted
 379 and not reduced.  When this happens, the parser function `yyparse'
 380 returns a nonzero value, pausing only to call `yyerror' to report the
 381 overflow.
 382
 383    By defining the macro `YYMAXDEPTH', you can control how deep the
 384 parser stack can become before a stack overflow occurs.  Define the
 385 macro with a value that is an integer.  This value is the maximum number
 386 of tokens that can be shifted (and not reduced) before overflow.  It
 387 must be a constant expression whose value is known at compile time.
 388
 389    The stack space allowed is not necessarily allocated.  If you
 390 specify a large value for `YYMAXDEPTH', the parser actually allocates a
 391 small stack at first, and then makes it bigger by stages as needed.
 392 This increasing allocation happens automatically and silently.
 393 Therefore, you do not need to make `YYMAXDEPTH' painfully small merely
 394 to save space for ordinary inputs that do not need much stack.
 395
 396    The default value of `YYMAXDEPTH', if you do not define it, is 10000.
 397
 398    You can control how much stack is allocated initially by defining the
 399 macro `YYINITDEPTH'.  This value too must be a compile-time constant
 400 integer.  The default is 200.
 401
 402 \1f
 403 File: bison.info,  Node: Error Recovery,  Next: Context Dependency,  Prev: Algorithm,  Up: Top
 404
 405 Error Recovery
 406 **************
 407
 408    It is not usually acceptable to have a program terminate on a parse
 409 error.  For example, a compiler should recover sufficiently to parse the
 410 rest of the input file and check it for errors; a calculator should
 411 accept another expression.
 412
 413    In a simple interactive command parser where each input is one line,
 414 it may be sufficient to allow `yyparse' to return 1 on error and have
 415 the caller ignore the rest of the input line when that happens (and
 416 then call `yyparse' again).  But this is inadequate for a compiler,
 417 because it forgets all the syntactic context leading up to the error.
 418 A syntax error deep within a function in the compiler input should not
 419 cause the compiler to treat the following line like the beginning of a
 420 source file.
 421
 422    You can define how to recover from a syntax error by writing rules to
 423 recognize the special token `error'.  This is a terminal symbol that is
 424 always defined (you need not declare it) and reserved for error
 425 handling.  The Bison parser generates an `error' token whenever a
 426 syntax error happens; if you have provided a rule to recognize this
 427 token in the current context, the parse can continue.
 428
 429    For example:
 430
 431      stmnts:  /* empty string */
 432              | stmnts '\n'
 433              | stmnts exp '\n'
 434              | stmnts error '\n'
 435
 436    The fourth rule in this example says that an error followed by a
 437 newline makes a valid addition to any `stmnts'.
 438
 439    What happens if a syntax error occurs in the middle of an `exp'?  The
 440 error recovery rule, interpreted strictly, applies to the precise
 441 sequence of a `stmnts', an `error' and a newline.  If an error occurs in
 442 the middle of an `exp', there will probably be some additional tokens
 443 and subexpressions on the stack after the last `stmnts', and there will
 444 be tokens to read before the next newline.  So the rule is not
 445 applicable in the ordinary way.
 446
 447    But Bison can force the situation to fit the rule, by discarding
 448 part of the semantic context and part of the input.  First it discards
 449 states and objects from the stack until it gets back to a state in
 450 which the `error' token is acceptable.  (This means that the
 451 subexpressions already parsed are discarded, back to the last complete
 452 `stmnts'.)  At this point the `error' token can be shifted.  Then, if
 453 the old look-ahead token is not acceptable to be shifted next, the
 454 parser reads tokens and discards them until it finds a token which is
 455 acceptable.  In this example, Bison reads and discards input until the
 456 next newline so that the fourth rule can apply.
 457
 458    The choice of error rules in the grammar is a choice of strategies
 459 for error recovery.  A simple and useful strategy is simply to skip the
 460 rest of the current input line or current statement if an error is
 461 detected:
 462
 463      stmnt: error ';'  /* on error, skip until ';' is read */
 464
 465    It is also useful to recover to the matching close-delimiter of an
 466 opening-delimiter that has already been parsed.  Otherwise the
 467 close-delimiter will probably appear to be unmatched, and generate
 468 another, spurious error message:
 469
 470      primary:  '(' expr ')'
 471              | '(' error ')'
 472              ...
 473              ;
 474
 475    Error recovery strategies are necessarily guesses.  When they guess
 476 wrong, one syntax error often leads to another.  In the above example,
 477 the error recovery rule guesses that an error is due to bad input
 478 within one `stmnt'.  Suppose that instead a spurious semicolon is
 479 inserted in the middle of a valid `stmnt'.  After the error recovery
 480 rule recovers from the first error, another syntax error will be found
 481 straightaway, since the text following the spurious semicolon is also
 482 an invalid `stmnt'.
 483
 484    To prevent an outpouring of error messages, the parser will output
 485 no error message for another syntax error that happens shortly after
 486 the first; only after three consecutive input tokens have been
 487 successfully shifted will error messages resume.
 488
 489    Note that rules which accept the `error' token may have actions, just
 490 as any other rules can.
 491
 492    You can make error messages resume immediately by using the macro
 493 `yyerrok' in an action.  If you do this in the error rule's action, no
 494 error messages will be suppressed.  This macro requires no arguments;
 495 `yyerrok;' is a valid C statement.
 496
 497    The previous look-ahead token is reanalyzed immediately after an
 498 error.  If this is unacceptable, then the macro `yyclearin' may be used
 499 to clear this token.  Write the statement `yyclearin;' in the error
 500 rule's action.
 501
 502    For example, suppose that on a parse error, an error handling
 503 routine is called that advances the input stream to some point where
 504 parsing should once again commence.  The next symbol returned by the
 505 lexical scanner is probably correct.  The previous look-ahead token
 506 ought to be discarded with `yyclearin;'.
 507
 508    The macro `YYRECOVERING' stands for an expression that has the value
 509 1 when the parser is recovering from a syntax error, and 0 the rest of
 510 the time.  A value of 1 indicates that error messages are currently
 511 suppressed for new syntax errors.
 512
 513 \1f
 514 File: bison.info,  Node: Context Dependency,  Next: Debugging,  Prev: Error Recovery,  Up: Top
 515
 516 Handling Context Dependencies
 517 *****************************
 518
 519    The Bison paradigm is to parse tokens first, then group them into
 520 larger syntactic units.  In many languages, the meaning of a token is
 521 affected by its context.  Although this violates the Bison paradigm,
 522 certain techniques (known as "kludges") may enable you to write Bison
 523 parsers for such languages.
 524
 525 * Menu:
 526
 527 * Semantic Tokens::   Token parsing can depend on the semantic context.
 528 * Lexical Tie-ins::   Token parsing can depend on the syntactic context.
 529 * Tie-in Recovery::   Lexical tie-ins have implications for how
 530                         error recovery rules must be written.
 531
 532    (Actually, "kludge" means any technique that gets its job done but is
 533 neither clean nor robust.)
 534
 535 \1f
 536 File: bison.info,  Node: Semantic Tokens,  Next: Lexical Tie-ins,  Up: Context Dependency
 537
 538 Semantic Info in Token Types
 539 ============================
 540
 541    The C language has a context dependency: the way an identifier is
 542 used depends on what its current meaning is.  For example, consider
 543 this:
 544
 545      foo (x);
 546
 547    This looks like a function call statement, but if `foo' is a typedef
 548 name, then this is actually a declaration of `x'.  How can a Bison
 549 parser for C decide how to parse this input?
 550
 551    The method used in GNU C is to have two different token types,
 552 `IDENTIFIER' and `TYPENAME'.  When `yylex' finds an identifier, it
 553 looks up the current declaration of the identifier in order to decide
 554 which token type to return: `TYPENAME' if the identifier is declared as
 555 a typedef, `IDENTIFIER' otherwise.
 556
 557    The grammar rules can then express the context dependency by the
 558 choice of token type to recognize.  `IDENTIFIER' is accepted as an
 559 expression, but `TYPENAME' is not.  `TYPENAME' can start a declaration,
 560 but `IDENTIFIER' cannot.  In contexts where the meaning of the
 561 identifier is _not_ significant, such as in declarations that can
 562 shadow a typedef name, either `TYPENAME' or `IDENTIFIER' is
 563 accepted--there is one rule for each of the two token types.
 564
 565    This technique is simple to use if the decision of which kinds of
 566 identifiers to allow is made at a place close to where the identifier is
 567 parsed.  But in C this is not always so: C allows a declaration to
 568 redeclare a typedef name provided an explicit type has been specified
 569 earlier:
 570
 571      typedef int foo, bar, lose;
 572      static foo (bar);        /* redeclare `bar' as static variable */
 573      static int foo (lose);   /* redeclare `foo' as function */
 574
 575    Unfortunately, the name being declared is separated from the
 576 declaration construct itself by a complicated syntactic structure--the
 577 "declarator".
 578
 579    As a result, part of the Bison parser for C needs to be duplicated,
 580 with all the nonterminal names changed: once for parsing a declaration
 581 in which a typedef name can be redefined, and once for parsing a
 582 declaration in which that can't be done.  Here is a part of the
 583 duplication, with actions omitted for brevity:
 584
 585      initdcl:
 586                declarator maybeasm '='
 587                init
 588              | declarator maybeasm
 589              ;
 590
 591      notype_initdcl:
 592                notype_declarator maybeasm '='
 593                init
 594              | notype_declarator maybeasm
 595              ;
 596
 597 Here `initdcl' can redeclare a typedef name, but `notype_initdcl'
 598 cannot.  The distinction between `declarator' and `notype_declarator'
 599 is the same sort of thing.
 600
 601    There is some similarity between this technique and a lexical tie-in
 602 (described next), in that information which alters the lexical analysis
 603 is changed during parsing by other parts of the program.  The
 604 difference is here the information is global, and is used for other
 605 purposes in the program.  A true lexical tie-in has a special-purpose
 606 flag controlled by the syntactic context.
 607
 608 \1f
 609 File: bison.info,  Node: Lexical Tie-ins,  Next: Tie-in Recovery,  Prev: Semantic Tokens,  Up: Context Dependency
 610
 611 Lexical Tie-ins
 612 ===============
 613
 614    One way to handle context-dependency is the "lexical tie-in": a flag
 615 which is set by Bison actions, whose purpose is to alter the way tokens
 616 are parsed.
 617
 618    For example, suppose we have a language vaguely like C, but with a
 619 special construct `hex (HEX-EXPR)'.  After the keyword `hex' comes an
 620 expression in parentheses in which all integers are hexadecimal.  In
 621 particular, the token `a1b' must be treated as an integer rather than
 622 as an identifier if it appears in that context.  Here is how you can do
 623 it:
 624
 625      %{
 626      int hexflag;
 627      %}
 628      %%
 629      ...
 630      expr:   IDENTIFIER
 631              | constant
 632              | HEX '('
 633                      { hexflag = 1; }
 634                expr ')'
 635                      { hexflag = 0;
 636                         $$ = $4; }
 637              | expr '+' expr
 638                      { $$ = make_sum ($1, $3); }
 639              ...
 640              ;
 641
 642      constant:
 643                INTEGER
 644              | STRING
 645              ;
 646
 647 Here we assume that `yylex' looks at the value of `hexflag'; when it is
 648 nonzero, all integers are parsed in hexadecimal, and tokens starting
 649 with letters are parsed as integers if possible.
 650
 651    The declaration of `hexflag' shown in the C declarations section of
 652 the parser file is needed to make it accessible to the actions (*note
 653 The C Declarations Section: C Declarations.).  You must also write the
 654 code in `yylex' to obey the flag.
 655
 656 \1f
 657 File: bison.info,  Node: Tie-in Recovery,  Prev: Lexical Tie-ins,  Up: Context Dependency
 658
 659 Lexical Tie-ins and Error Recovery
 660 ==================================
 661
 662    Lexical tie-ins make strict demands on any error recovery rules you
 663 have.  *Note Error Recovery::.
 664
 665    The reason for this is that the purpose of an error recovery rule is
 666 to abort the parsing of one construct and resume in some larger
 667 construct.  For example, in C-like languages, a typical error recovery
 668 rule is to skip tokens until the next semicolon, and then start a new
 669 statement, like this:
 670
 671      stmt:   expr ';'
 672              | IF '(' expr ')' stmt { ... }
 673              ...
 674              error ';'
 675                      { hexflag = 0; }
 676              ;
 677
 678    If there is a syntax error in the middle of a `hex (EXPR)'
 679 construct, this error rule will apply, and then the action for the
 680 completed `hex (EXPR)' will never run.  So `hexflag' would remain set
 681 for the entire rest of the input, or until the next `hex' keyword,
 682 causing identifiers to be misinterpreted as integers.
 683
 684    To avoid this problem the error recovery rule itself clears
 685 `hexflag'.
 686
 687    There may also be an error recovery rule that works within
 688 expressions.  For example, there could be a rule which applies within
 689 parentheses and skips to the close-parenthesis:
 690
 691      expr:   ...
 692              | '(' expr ')'
 693                      { $$ = $2; }
 694              | '(' error ')'
 695              ...
 696
 697    If this rule acts within the `hex' construct, it is not going to
 698 abort that construct (since it applies to an inner level of parentheses
 699 within the construct).  Therefore, it should not clear the flag: the
 700 rest of the `hex' construct should be parsed with the flag still in
 701 effect.
 702
 703    What if there is an error recovery rule which might abort out of the
 704 `hex' construct or might not, depending on circumstances?  There is no
 705 way you can write the action to determine whether a `hex' construct is
 706 being aborted or not.  So if you are using a lexical tie-in, you had
 707 better make sure your error recovery rules are not of this kind.  Each
 708 rule must be such that you can be sure that it always will, or always
 709 won't, have to clear the flag.
 710
 711 \1f
 712 File: bison.info,  Node: Debugging,  Next: Invocation,  Prev: Context Dependency,  Up: Top
 713
 714 Debugging Your Parser
 715 *********************
 716
 717    If a Bison grammar compiles properly but doesn't do what you want
 718 when it runs, the `yydebug' parser-trace feature can help you figure
 719 out why.
 720
 721    To enable compilation of trace facilities, you must define the macro
 722 `YYDEBUG' when you compile the parser.  You could use `-DYYDEBUG=1' as
 723 a compiler option or you could put `#define YYDEBUG 1' in the C
 724 declarations section of the grammar file (*note The C Declarations
 725 Section: C Declarations.).  Alternatively, use the `-t' option when you
 726 run Bison (*note Invoking Bison: Invocation.).  We always define
 727 `YYDEBUG' so that debugging is always possible.
 728
 729    The trace facility uses `stderr', so you must add
 730 `#include <stdio.h>' to the C declarations section unless it is already
 731 there.
 732
 733    Once you have compiled the program with trace facilities, the way to
 734 request a trace is to store a nonzero value in the variable `yydebug'.
 735 You can do this by making the C code do it (in `main', perhaps), or you
 736 can alter the value with a C debugger.
 737
 738    Each step taken by the parser when `yydebug' is nonzero produces a
 739 line or two of trace information, written on `stderr'.  The trace
 740 messages tell you these things:
 741
 742    * Each time the parser calls `yylex', what kind of token was read.
 743
 744    * Each time a token is shifted, the depth and complete contents of
 745      the state stack (*note Parser States::).
 746
 747    * Each time a rule is reduced, which rule it is, and the complete
 748      contents of the state stack afterward.
 749
 750    To make sense of this information, it helps to refer to the listing
 751 file produced by the Bison `-v' option (*note Invoking Bison:
 752 Invocation.).  This file shows the meaning of each state in terms of
 753 positions in various rules, and also what each state will do with each
 754 possible input token.  As you read the successive trace messages, you
 755 can see that the parser is functioning according to its specification
 756 in the listing file.  Eventually you will arrive at the place where
 757 something undesirable happens, and you will see which parts of the
 758 grammar are to blame.
 759
 760    The parser file is a C program and you can use C debuggers on it,
 761 but it's not easy to interpret what it is doing.  The parser function
 762 is a finite-state machine interpreter, and aside from the actions it
 763 executes the same code over and over.  Only the values of variables
 764 show where in the grammar it is working.
 765
 766    The debugging information normally gives the token type of each token
 767 read, but not its semantic value.  You can optionally define a macro
 768 named `YYPRINT' to provide a way to print the value.  If you define
 769 `YYPRINT', it should take three arguments.  The parser will pass a
 770 standard I/O stream, the numeric code for the token type, and the token
 771 value (from `yylval').
 772
 773    Here is an example of `YYPRINT' suitable for the multi-function
 774 calculator (*note Declarations for `mfcalc': Mfcalc Decl.):
 775
 776      #define YYPRINT(file, type, value)   yyprint (file, type, value)
 777
 778      static void
 779      yyprint (FILE *file, int type, YYSTYPE value)
 780      {
 781        if (type == VAR)
 782          fprintf (file, " %s", value.tptr->name);
 783        else if (type == NUM)
 784          fprintf (file, " %d", value.val);
 785      }
 786
 787 \1f
 788 File: bison.info,  Node: Invocation,  Next: Table of Symbols,  Prev: Debugging,  Up: Top
 789
 790 Invoking Bison
 791 **************
 792
 793    The usual way to invoke Bison is as follows:
 794
 795      bison INFILE
 796
 797    Here INFILE is the grammar file name, which usually ends in `.y'.
 798 The parser file's name is made by replacing the `.y' with `.tab.c'.
 799 Thus, the `bison foo.y' filename yields `foo.tab.c', and the `bison
 800 hack/foo.y' filename yields `hack/foo.tab.c'.
 801
 802 * Menu:
 803
 804 * Bison Options::     All the options described in detail,
 805                         in alphabetical order by short options.
 806 * Environment Variables::  Variables which affect Bison execution.
 807 * Option Cross Key::  Alphabetical list of long options.
 808 * VMS Invocation::    Bison command syntax on VMS.
 809
 810 \1f
 811 File: bison.info,  Node: Bison Options,  Next: Environment Variables,  Up: Invocation
 812
 813 Bison Options
 814 =============
 815
 816    Bison supports both traditional single-letter options and mnemonic
 817 long option names.  Long option names are indicated with `--' instead of
 818 `-'.  Abbreviations for option names are allowed as long as they are
 819 unique.  When a long option takes an argument, like `--file-prefix',
 820 connect the option name and the argument with `='.
 821
 822    Here is a list of options that can be used with Bison, alphabetized
 823 by short option.  It is followed by a cross key alphabetized by long
 824 option.
 825
 826 Operations modes:
 827 `-h'
 828 `--help'
 829      Print a summary of the command-line options to Bison and exit.
 830
 831 `-V'
 832 `--version'
 833      Print the version number of Bison and exit.
 834
 835 `-y'
 836 `--yacc'
 837 `--fixed-output-files'
 838      Equivalent to `-o y.tab.c'; the parser output file is called
 839      `y.tab.c', and the other outputs are called `y.output' and
 840      `y.tab.h'.  The purpose of this option is to imitate Yacc's output
 841      file name conventions.  Thus, the following shell script can
 842      substitute for Yacc:
 843
 844           bison -y $*
 845
 846 Tuning the parser:
 847
 848 `-S FILE'
 849 `--skeleton=FILE'
 850      Specify the skeleton to use.  You probably don't need this option
 851      unless you are developing Bison.
 852
 853 `-t'
 854 `--debug'
 855      Output a definition of the macro `YYDEBUG' into the parser file, so
 856      that the debugging facilities are compiled.  *Note Debugging Your
 857      Parser: Debugging.
 858
 859 `--locations'
 860      Pretend that `%locactions' was specified.  *Note Decl Summary::.
 861
 862 `-p PREFIX'
 863 `--name-prefix=PREFIX'
 864      Rename the external symbols used in the parser so that they start
 865      with PREFIX instead of `yy'.  The precise list of symbols renamed
 866      is `yyparse', `yylex', `yyerror', `yynerrs', `yylval', `yychar'
 867      and `yydebug'.
 868
 869      For example, if you use `-p c', the names become `cparse', `clex',
 870      and so on.
 871
 872      *Note Multiple Parsers in the Same Program: Multiple Parsers.
 873
 874 `-l'
 875 `--no-lines'
 876      Don't put any `#line' preprocessor commands in the parser file.
 877      Ordinarily Bison puts them in the parser file so that the C
 878      compiler and debuggers will associate errors with your source
 879      file, the grammar file.  This option causes them to associate
 880      errors with the parser file, treating it as an independent source
 881      file in its own right.
 882
 883 `-n'
 884 `--no-parser'
 885      Pretend that `%no_parser' was specified.  *Note Decl Summary::.
 886
 887 `-r'
 888 `--raw'
 889      Pretend that `%raw' was specified.  *Note Decl Summary::.
 890
 891 `-k'
 892 `--token-table'
 893      Pretend that `%token_table' was specified.  *Note Decl Summary::.
 894
 895 Adjust the output:
 896
 897 `-d'
 898 `--defines'
 899      Pretend that `%verbose' was specified, i.e., write an extra output
 900      file containing macro definitions for the token type names defined
 901      in the grammar and the semantic value type `YYSTYPE', as well as a
 902      few `extern' variable declarations.  *Note Decl Summary::.
 903
 904 `-b FILE-PREFIX'
 905 `--file-prefix=PREFIX'
 906      Specify a prefix to use for all Bison output file names.  The
 907      names are chosen as if the input file were named `PREFIX.c'.
 908
 909 `-v'
 910 `--verbose'
 911      Pretend that `%verbose' was specified, i.e, write an extra output
 912      file containing verbose descriptions of the grammar and parser.
 913      *Note Decl Summary::, for more.
 914
 915 `-o OUTFILE'
 916 `--output-file=OUTFILE'
 917      Specify the name OUTFILE for the parser file.
 918
 919      The other output files' names are constructed from OUTFILE as
 920      described under the `-v' and `-d' options.
 921
 922 \1f
 923 File: bison.info,  Node: Environment Variables,  Next: Option Cross Key,  Prev: Bison Options,  Up: Invocation
 924
 925 Environment Variables
 926 =====================
 927
 928    Here is a list of environment variables which affect the way Bison
 929 runs.
 930
 931 `BISON_SIMPLE'
 932 `BISON_HAIRY'
 933      Much of the parser generated by Bison is copied verbatim from a
 934      file called `bison.simple'.  If Bison cannot find that file, or if
 935      you would like to direct Bison to use a different copy, setting the
 936      environment variable `BISON_SIMPLE' to the path of the file will
 937      cause Bison to use that copy instead.
 938
 939      When the `%semantic_parser' declaration is used, Bison copies from
 940      a file called `bison.hairy' instead.  The location of this file can
 941      also be specified or overridden in a similar fashion, with the
 942      `BISON_HAIRY' environment variable.
 943
 944 \1f
 945 File: bison.info,  Node: Option Cross Key,  Next: VMS Invocation,  Prev: Environment Variables,  Up: Invocation
 946
 947 Option Cross Key
 948 ================
 949
 950    Here is a list of options, alphabetized by long option, to help you
 951 find the corresponding short option.
 952
 953      --debug                               -t
 954      --defines                             -d
 955      --file-prefix=PREFIX                  -b FILE-PREFIX
 956      --fixed-output-files --yacc           -y
 957      --help                                -h
 958      --name-prefix=PREFIX                  -p NAME-PREFIX
 959      --no-lines                            -l
 960      --no-parser                           -n
 961      --output-file=OUTFILE                 -o OUTFILE
 962      --raw                                 -r
 963      --token-table                         -k
 964      --verbose                             -v
 965      --version                             -V
 966
 967 \1f
 968 File: bison.info,  Node: VMS Invocation,  Prev: Option Cross Key,  Up: Invocation
 969
 970 Invoking Bison under VMS
 971 ========================
 972
 973    The command line syntax for Bison on VMS is a variant of the usual
 974 Bison command syntax--adapted to fit VMS conventions.
 975
 976    To find the VMS equivalent for any Bison option, start with the long
 977 option, and substitute a `/' for the leading `--', and substitute a `_'
 978 for each `-' in the name of the long option.  For example, the
 979 following invocation under VMS:
 980
 981      bison /debug/name_prefix=bar foo.y
 982
 983 is equivalent to the following command under POSIX.
 984
 985      bison --debug --name-prefix=bar foo.y
 986
 987    The VMS file system does not permit filenames such as `foo.tab.c'.
 988 In the above example, the output file would instead be named
 989 `foo_tab.c'.
 990
 991 \1f
 992 File: bison.info,  Node: Table of Symbols,  Next: Glossary,  Prev: Invocation,  Up: Top
 993
 994 Bison Symbols
 995 *************
 996
 997 `error'
 998      A token name reserved for error recovery.  This token may be used
 999      in grammar rules so as to allow the Bison parser to recognize an
1000      error in the grammar without halting the process.  In effect, a
1001      sentence containing an error may be recognized as valid.  On a
1002      parse error, the token `error' becomes the current look-ahead
1003      token.  Actions corresponding to `error' are then executed, and
1004      the look-ahead token is reset to the token that originally caused
1005      the violation.  *Note Error Recovery::.
1006
1007 `YYABORT'
1008      Macro to pretend that an unrecoverable syntax error has occurred,
1009      by making `yyparse' return 1 immediately.  The error reporting
1010      function `yyerror' is not called.  *Note The Parser Function
1011      `yyparse': Parser Function.
1012
1013 `YYACCEPT'
1014      Macro to pretend that a complete utterance of the language has been
1015      read, by making `yyparse' return 0 immediately.  *Note The Parser
1016      Function `yyparse': Parser Function.
1017
1018 `YYBACKUP'
1019      Macro to discard a value from the parser stack and fake a
1020      look-ahead token.  *Note Special Features for Use in Actions:
1021      Action Features.
1022
1023 `YYERROR'
1024      Macro to pretend that a syntax error has just been detected: call
1025      `yyerror' and then perform normal error recovery if possible
1026      (*note Error Recovery::), or (if recovery is impossible) make
1027      `yyparse' return 1.  *Note Error Recovery::.
1028
1029 `YYERROR_VERBOSE'
1030      Macro that you define with `#define' in the Bison declarations
1031      section to request verbose, specific error message strings when
1032      `yyerror' is called.
1033
1034 `YYINITDEPTH'
1035      Macro for specifying the initial size of the parser stack.  *Note
1036      Stack Overflow::.
1037
1038 `YYLEX_PARAM'
1039      Macro for specifying an extra argument (or list of extra
1040      arguments) for `yyparse' to pass to `yylex'.  *Note Calling
1041      Conventions for Pure Parsers: Pure Calling.
1042
1043 `YYLTYPE'
1044      Macro for the data type of `yylloc'; a structure with four
1045      members.  *Note Textual Positions of Tokens: Token Positions.
1046
1047 `yyltype'
1048      Default value for YYLTYPE.
1049
1050 `YYMAXDEPTH'
1051      Macro for specifying the maximum size of the parser stack.  *Note
1052      Stack Overflow::.
1053
1054 `YYPARSE_PARAM'
1055      Macro for specifying the name of a parameter that `yyparse' should
1056      accept.  *Note Calling Conventions for Pure Parsers: Pure Calling.
1057
1058 `YYRECOVERING'
1059      Macro whose value indicates whether the parser is recovering from a
1060      syntax error.  *Note Special Features for Use in Actions: Action
1061      Features.
1062
1063 `YYSTYPE'
1064      Macro for the data type of semantic values; `int' by default.
1065      *Note Data Types of Semantic Values: Value Type.
1066
1067 `yychar'
1068      External integer variable that contains the integer value of the
1069      current look-ahead token.  (In a pure parser, it is a local
1070      variable within `yyparse'.)  Error-recovery rule actions may
1071      examine this variable.  *Note Special Features for Use in Actions:
1072      Action Features.
1073
1074 `yyclearin'
1075      Macro used in error-recovery rule actions.  It clears the previous
1076      look-ahead token.  *Note Error Recovery::.
1077
1078 `yydebug'
1079      External integer variable set to zero by default.  If `yydebug' is
1080      given a nonzero value, the parser will output information on input
1081      symbols and parser action.  *Note Debugging Your Parser: Debugging.
1082
1083 `yyerrok'
1084      Macro to cause parser to recover immediately to its normal mode
1085      after a parse error.  *Note Error Recovery::.
1086
1087 `yyerror'
1088      User-supplied function to be called by `yyparse' on error.  The
1089      function receives one argument, a pointer to a character string
1090      containing an error message.  *Note The Error Reporting Function
1091      `yyerror': Error Reporting.
1092
1093 `yylex'
1094      User-supplied lexical analyzer function, called with no arguments
1095      to get the next token.  *Note The Lexical Analyzer Function
1096      `yylex': Lexical.
1097
1098 `yylval'
1099      External variable in which `yylex' should place the semantic value
1100      associated with a token.  (In a pure parser, it is a local
1101      variable within `yyparse', and its address is passed to `yylex'.)
1102      *Note Semantic Values of Tokens: Token Values.
1103
1104 `yylloc'
1105      External variable in which `yylex' should place the line and column
1106      numbers associated with a token.  (In a pure parser, it is a local
1107      variable within `yyparse', and its address is passed to `yylex'.)
1108      You can ignore this variable if you don't use the `@' feature in
1109      the grammar actions.  *Note Textual Positions of Tokens: Token
1110      Positions.
1111
1112 `yynerrs'
1113      Global variable which Bison increments each time there is a parse
1114      error.  (In a pure parser, it is a local variable within
1115      `yyparse'.)  *Note The Error Reporting Function `yyerror': Error
1116      Reporting.
1117
1118 `yyparse'
1119      The parser function produced by Bison; call this function to start
1120      parsing.  *Note The Parser Function `yyparse': Parser Function.
1121
1122 `%debug'
1123      Equip the parser for debugging.  *Note Decl Summary::.
1124
1125 `%defines'
1126      Bison declaration to create a header file meant for the scanner.
1127      *Note Decl Summary::.
1128
1129 `%left'
1130      Bison declaration to assign left associativity to token(s).  *Note
1131      Operator Precedence: Precedence Decl.
1132
1133 `%no_lines'
1134      Bison declaration to avoid generating `#line' directives in the
1135      parser file.  *Note Decl Summary::.
1136
1137 `%nonassoc'
1138      Bison declaration to assign non-associativity to token(s).  *Note
1139      Operator Precedence: Precedence Decl.
1140
1141 `%prec'
1142      Bison declaration to assign a precedence to a specific rule.
1143      *Note Context-Dependent Precedence: Contextual Precedence.
1144
1145 `%pure_parser'
1146      Bison declaration to request a pure (reentrant) parser.  *Note A
1147      Pure (Reentrant) Parser: Pure Decl.
1148
1149 `%raw'
1150      Bison declaration to use Bison internal token code numbers in token
1151      tables instead of the usual Yacc-compatible token code numbers.
1152      *Note Decl Summary::.
1153
1154 `%right'
1155      Bison declaration to assign right associativity to token(s).
1156      *Note Operator Precedence: Precedence Decl.
1157
1158 `%start'
1159      Bison declaration to specify the start symbol.  *Note The
1160      Start-Symbol: Start Decl.
1161
1162 `%token'
1163      Bison declaration to declare token(s) without specifying
1164      precedence.  *Note Token Type Names: Token Decl.
1165
1166 `%token_table'
1167      Bison declaration to include a token name table in the parser file.
1168      *Note Decl Summary::.
1169
1170 `%type'
1171      Bison declaration to declare nonterminals.  *Note Nonterminal
1172      Symbols: Type Decl.
1173
1174 `%union'
1175      Bison declaration to specify several possible data types for
1176      semantic values.  *Note The Collection of Value Types: Union Decl.
1177
1178    These are the punctuation and delimiters used in Bison input:
1179
1180 `%%'
1181      Delimiter used to separate the grammar rule section from the Bison
1182      declarations section or the additional C code section.  *Note The
1183      Overall Layout of a Bison Grammar: Grammar Layout.
1184
1185 `%{ %}'
1186      All code listed between `%{' and `%}' is copied directly to the
1187      output file uninterpreted.  Such code forms the "C declarations"
1188      section of the input file.  *Note Outline of a Bison Grammar:
1189      Grammar Outline.
1190
1191 `/*...*/'
1192      Comment delimiters, as in C.
1193
1194 `:'
1195      Separates a rule's result from its components.  *Note Syntax of
1196      Grammar Rules: Rules.
1197
1198 `;'
1199      Terminates a rule.  *Note Syntax of Grammar Rules: Rules.
1200
1201 `|'
1202      Separates alternate rules for the same result nonterminal.  *Note
1203      Syntax of Grammar Rules: Rules.
1204