]>
Commit | Line | Data |
---|---|---|
1 | This is bison.info, produced by makeinfo version 4.0 from bison.texinfo. | |
2 | ||
3 | START-INFO-DIR-ENTRY | |
4 | * bison: (bison). GNU Project parser generator (yacc replacement). | |
5 | END-INFO-DIR-ENTRY | |
6 | ||
7 | This file documents the Bison parser generator. | |
8 | ||
9 | Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, | |
10 | 2000 Free Software Foundation, Inc. | |
11 | ||
12 | Permission is granted to make and distribute verbatim copies of this | |
13 | manual provided the copyright notice and this permission notice are | |
14 | preserved on all copies. | |
15 | ||
16 | Permission is granted to copy and distribute modified versions of | |
17 | this manual under the conditions for verbatim copying, provided also | |
18 | that the sections entitled "GNU General Public License" and "Conditions | |
19 | for Using Bison" are included exactly as in the original, and provided | |
20 | that the entire resulting derived work is distributed under the terms | |
21 | of a permission notice identical to this one. | |
22 | ||
23 | Permission is granted to copy and distribute translations of this | |
24 | manual into another language, under the above conditions for modified | |
25 | versions, except that the sections entitled "GNU General Public | |
26 | License", "Conditions for Using Bison" and this permission notice may be | |
27 | included in translations approved by the Free Software Foundation | |
28 | instead of in the original English. | |
29 | ||
30 | \1f | |
31 | File: bison.info, Node: Mid-Rule Actions, Prev: Action Types, Up: Semantics | |
32 | ||
33 | Actions in Mid-Rule | |
34 | ------------------- | |
35 | ||
36 | Occasionally it is useful to put an action in the middle of a rule. | |
37 | These actions are written just like usual end-of-rule actions, but they | |
38 | are executed before the parser even recognizes the following components. | |
39 | ||
40 | A mid-rule action may refer to the components preceding it using | |
41 | `$N', but it may not refer to subsequent components because it is run | |
42 | before they are parsed. | |
43 | ||
44 | The mid-rule action itself counts as one of the components of the | |
45 | rule. This makes a difference when there is another action later in | |
46 | the same rule (and usually there is another at the end): you have to | |
47 | count the actions along with the symbols when working out which number | |
48 | N to use in `$N'. | |
49 | ||
50 | The mid-rule action can also have a semantic value. The action can | |
51 | set its value with an assignment to `$$', and actions later in the rule | |
52 | can refer to the value using `$N'. Since there is no symbol to name | |
53 | the action, there is no way to declare a data type for the value in | |
54 | advance, so you must use the `$<...>' construct to specify a data type | |
55 | each time you refer to this value. | |
56 | ||
57 | There is no way to set the value of the entire rule with a mid-rule | |
58 | action, because assignments to `$$' do not have that effect. The only | |
59 | way to set the value for the entire rule is with an ordinary action at | |
60 | the end of the rule. | |
61 | ||
62 | Here is an example from a hypothetical compiler, handling a `let' | |
63 | statement that looks like `let (VARIABLE) STATEMENT' and serves to | |
64 | create a variable named VARIABLE temporarily for the duration of | |
65 | STATEMENT. To parse this construct, we must put VARIABLE into the | |
66 | symbol table while STATEMENT is parsed, then remove it afterward. Here | |
67 | is how it is done: | |
68 | ||
69 | stmt: LET '(' var ')' | |
70 | { $<context>$ = push_context (); | |
71 | declare_variable ($3); } | |
72 | stmt { $$ = $6; | |
73 | pop_context ($<context>5); } | |
74 | ||
75 | As soon as `let (VARIABLE)' has been recognized, the first action is | |
76 | run. It saves a copy of the current semantic context (the list of | |
77 | accessible variables) as its semantic value, using alternative | |
78 | `context' in the data-type union. Then it calls `declare_variable' to | |
79 | add the new variable to that list. Once the first action is finished, | |
80 | the embedded statement `stmt' can be parsed. Note that the mid-rule | |
81 | action is component number 5, so the `stmt' is component number 6. | |
82 | ||
83 | After the embedded statement is parsed, its semantic value becomes | |
84 | the value of the entire `let'-statement. Then the semantic value from | |
85 | the earlier action is used to restore the prior list of variables. This | |
86 | removes the temporary `let'-variable from the list so that it won't | |
87 | appear to exist while the rest of the program is parsed. | |
88 | ||
89 | Taking action before a rule is completely recognized often leads to | |
90 | conflicts since the parser must commit to a parse in order to execute | |
91 | the action. For example, the following two rules, without mid-rule | |
92 | actions, can coexist in a working parser because the parser can shift | |
93 | the open-brace token and look at what follows before deciding whether | |
94 | there is a declaration or not: | |
95 | ||
96 | compound: '{' declarations statements '}' | |
97 | | '{' statements '}' | |
98 | ; | |
99 | ||
100 | But when we add a mid-rule action as follows, the rules become | |
101 | nonfunctional: | |
102 | ||
103 | compound: { prepare_for_local_variables (); } | |
104 | '{' declarations statements '}' | |
105 | | '{' statements '}' | |
106 | ; | |
107 | ||
108 | Now the parser is forced to decide whether to run the mid-rule action | |
109 | when it has read no farther than the open-brace. In other words, it | |
110 | must commit to using one rule or the other, without sufficient | |
111 | information to do it correctly. (The open-brace token is what is called | |
112 | the "look-ahead" token at this time, since the parser is still deciding | |
113 | what to do about it. *Note Look-Ahead Tokens: Look-Ahead.) | |
114 | ||
115 | You might think that you could correct the problem by putting | |
116 | identical actions into the two rules, like this: | |
117 | ||
118 | compound: { prepare_for_local_variables (); } | |
119 | '{' declarations statements '}' | |
120 | | { prepare_for_local_variables (); } | |
121 | '{' statements '}' | |
122 | ; | |
123 | ||
124 | But this does not help, because Bison does not realize that the two | |
125 | actions are identical. (Bison never tries to understand the C code in | |
126 | an action.) | |
127 | ||
128 | If the grammar is such that a declaration can be distinguished from a | |
129 | statement by the first token (which is true in C), then one solution | |
130 | which does work is to put the action after the open-brace, like this: | |
131 | ||
132 | compound: '{' { prepare_for_local_variables (); } | |
133 | declarations statements '}' | |
134 | | '{' statements '}' | |
135 | ; | |
136 | ||
137 | Now the first token of the following declaration or statement, which | |
138 | would in any case tell Bison which rule to use, can still do so. | |
139 | ||
140 | Another solution is to bury the action inside a nonterminal symbol | |
141 | which serves as a subroutine: | |
142 | ||
143 | subroutine: /* empty */ | |
144 | { prepare_for_local_variables (); } | |
145 | ; | |
146 | ||
147 | compound: subroutine | |
148 | '{' declarations statements '}' | |
149 | | subroutine | |
150 | '{' statements '}' | |
151 | ; | |
152 | ||
153 | Now Bison can execute the action in the rule for `subroutine' without | |
154 | deciding which rule for `compound' it will eventually use. Note that | |
155 | the action is now at the end of its rule. Any mid-rule action can be | |
156 | converted to an end-of-rule action in this way, and this is what Bison | |
157 | actually does to implement mid-rule actions. | |
158 | ||
159 | \1f | |
160 | File: bison.info, Node: Declarations, Next: Multiple Parsers, Prev: Semantics, Up: Grammar File | |
161 | ||
162 | Bison Declarations | |
163 | ================== | |
164 | ||
165 | The "Bison declarations" section of a Bison grammar defines the | |
166 | symbols used in formulating the grammar and the data types of semantic | |
167 | values. *Note Symbols::. | |
168 | ||
169 | All token type names (but not single-character literal tokens such as | |
170 | `'+'' and `'*'') must be declared. Nonterminal symbols must be | |
171 | declared if you need to specify which data type to use for the semantic | |
172 | value (*note More Than One Value Type: Multiple Types.). | |
173 | ||
174 | The first rule in the file also specifies the start symbol, by | |
175 | default. If you want some other symbol to be the start symbol, you | |
176 | must declare it explicitly (*note Languages and Context-Free Grammars: | |
177 | Language and Grammar.). | |
178 | ||
179 | * Menu: | |
180 | ||
181 | * Token Decl:: Declaring terminal symbols. | |
182 | * Precedence Decl:: Declaring terminals with precedence and associativity. | |
183 | * Union Decl:: Declaring the set of all semantic value types. | |
184 | * Type Decl:: Declaring the choice of type for a nonterminal symbol. | |
185 | * Expect Decl:: Suppressing warnings about shift/reduce conflicts. | |
186 | * Start Decl:: Specifying the start symbol. | |
187 | * Pure Decl:: Requesting a reentrant parser. | |
188 | * Decl Summary:: Table of all Bison declarations. | |
189 | ||
190 | \1f | |
191 | File: bison.info, Node: Token Decl, Next: Precedence Decl, Up: Declarations | |
192 | ||
193 | Token Type Names | |
194 | ---------------- | |
195 | ||
196 | The basic way to declare a token type name (terminal symbol) is as | |
197 | follows: | |
198 | ||
199 | %token NAME | |
200 | ||
201 | Bison will convert this into a `#define' directive in the parser, so | |
202 | that the function `yylex' (if it is in this file) can use the name NAME | |
203 | to stand for this token type's code. | |
204 | ||
205 | Alternatively, you can use `%left', `%right', or `%nonassoc' instead | |
206 | of `%token', if you wish to specify associativity and precedence. | |
207 | *Note Operator Precedence: Precedence Decl. | |
208 | ||
209 | You can explicitly specify the numeric code for a token type by | |
210 | appending an integer value in the field immediately following the token | |
211 | name: | |
212 | ||
213 | %token NUM 300 | |
214 | ||
215 | It is generally best, however, to let Bison choose the numeric codes for | |
216 | all token types. Bison will automatically select codes that don't | |
217 | conflict with each other or with ASCII characters. | |
218 | ||
219 | In the event that the stack type is a union, you must augment the | |
220 | `%token' or other token declaration to include the data type | |
221 | alternative delimited by angle-brackets (*note More Than One Value | |
222 | Type: Multiple Types.). | |
223 | ||
224 | For example: | |
225 | ||
226 | %union { /* define stack type */ | |
227 | double val; | |
228 | symrec *tptr; | |
229 | } | |
230 | %token <val> NUM /* define token NUM and its type */ | |
231 | ||
232 | You can associate a literal string token with a token type name by | |
233 | writing the literal string at the end of a `%token' declaration which | |
234 | declares the name. For example: | |
235 | ||
236 | %token arrow "=>" | |
237 | ||
238 | For example, a grammar for the C language might specify these names with | |
239 | equivalent literal string tokens: | |
240 | ||
241 | %token <operator> OR "||" | |
242 | %token <operator> LE 134 "<=" | |
243 | %left OR "<=" | |
244 | ||
245 | Once you equate the literal string and the token name, you can use them | |
246 | interchangeably in further declarations or the grammar rules. The | |
247 | `yylex' function can use the token name or the literal string to obtain | |
248 | the token type code number (*note Calling Convention::). | |
249 | ||
250 | \1f | |
251 | File: bison.info, Node: Precedence Decl, Next: Union Decl, Prev: Token Decl, Up: Declarations | |
252 | ||
253 | Operator Precedence | |
254 | ------------------- | |
255 | ||
256 | Use the `%left', `%right' or `%nonassoc' declaration to declare a | |
257 | token and specify its precedence and associativity, all at once. These | |
258 | are called "precedence declarations". *Note Operator Precedence: | |
259 | Precedence, for general information on operator precedence. | |
260 | ||
261 | The syntax of a precedence declaration is the same as that of | |
262 | `%token': either | |
263 | ||
264 | %left SYMBOLS... | |
265 | ||
266 | or | |
267 | ||
268 | %left <TYPE> SYMBOLS... | |
269 | ||
270 | And indeed any of these declarations serves the purposes of `%token'. | |
271 | But in addition, they specify the associativity and relative precedence | |
272 | for all the SYMBOLS: | |
273 | ||
274 | * The associativity of an operator OP determines how repeated uses | |
275 | of the operator nest: whether `X OP Y OP Z' is parsed by grouping | |
276 | X with Y first or by grouping Y with Z first. `%left' specifies | |
277 | left-associativity (grouping X with Y first) and `%right' | |
278 | specifies right-associativity (grouping Y with Z first). | |
279 | `%nonassoc' specifies no associativity, which means that `X OP Y | |
280 | OP Z' is considered a syntax error. | |
281 | ||
282 | * The precedence of an operator determines how it nests with other | |
283 | operators. All the tokens declared in a single precedence | |
284 | declaration have equal precedence and nest together according to | |
285 | their associativity. When two tokens declared in different | |
286 | precedence declarations associate, the one declared later has the | |
287 | higher precedence and is grouped first. | |
288 | ||
289 | \1f | |
290 | File: bison.info, Node: Union Decl, Next: Type Decl, Prev: Precedence Decl, Up: Declarations | |
291 | ||
292 | The Collection of Value Types | |
293 | ----------------------------- | |
294 | ||
295 | The `%union' declaration specifies the entire collection of possible | |
296 | data types for semantic values. The keyword `%union' is followed by a | |
297 | pair of braces containing the same thing that goes inside a `union' in | |
298 | C. | |
299 | ||
300 | For example: | |
301 | ||
302 | %union { | |
303 | double val; | |
304 | symrec *tptr; | |
305 | } | |
306 | ||
307 | This says that the two alternative types are `double' and `symrec *'. | |
308 | They are given names `val' and `tptr'; these names are used in the | |
309 | `%token' and `%type' declarations to pick one of the types for a | |
310 | terminal or nonterminal symbol (*note Nonterminal Symbols: Type Decl.). | |
311 | ||
312 | Note that, unlike making a `union' declaration in C, you do not write | |
313 | a semicolon after the closing brace. | |
314 | ||
315 | \1f | |
316 | File: bison.info, Node: Type Decl, Next: Expect Decl, Prev: Union Decl, Up: Declarations | |
317 | ||
318 | Nonterminal Symbols | |
319 | ------------------- | |
320 | ||
321 | When you use `%union' to specify multiple value types, you must declare | |
322 | the value type of each nonterminal symbol for which values are used. | |
323 | This is done with a `%type' declaration, like this: | |
324 | ||
325 | %type <TYPE> NONTERMINAL... | |
326 | ||
327 | Here NONTERMINAL is the name of a nonterminal symbol, and TYPE is the | |
328 | name given in the `%union' to the alternative that you want (*note The | |
329 | Collection of Value Types: Union Decl.). You can give any number of | |
330 | nonterminal symbols in the same `%type' declaration, if they have the | |
331 | same value type. Use spaces to separate the symbol names. | |
332 | ||
333 | You can also declare the value type of a terminal symbol. To do | |
334 | this, use the same `<TYPE>' construction in a declaration for the | |
335 | terminal symbol. All kinds of token declarations allow `<TYPE>'. | |
336 | ||
337 | \1f | |
338 | File: bison.info, Node: Expect Decl, Next: Start Decl, Prev: Type Decl, Up: Declarations | |
339 | ||
340 | Suppressing Conflict Warnings | |
341 | ----------------------------- | |
342 | ||
343 | Bison normally warns if there are any conflicts in the grammar | |
344 | (*note Shift/Reduce Conflicts: Shift/Reduce.), but most real grammars | |
345 | have harmless shift/reduce conflicts which are resolved in a | |
346 | predictable way and would be difficult to eliminate. It is desirable | |
347 | to suppress the warning about these conflicts unless the number of | |
348 | conflicts changes. You can do this with the `%expect' declaration. | |
349 | ||
350 | The declaration looks like this: | |
351 | ||
352 | %expect N | |
353 | ||
354 | Here N is a decimal integer. The declaration says there should be no | |
355 | warning if there are N shift/reduce conflicts and no reduce/reduce | |
356 | conflicts. The usual warning is given if there are either more or fewer | |
357 | conflicts, or if there are any reduce/reduce conflicts. | |
358 | ||
359 | In general, using `%expect' involves these steps: | |
360 | ||
361 | * Compile your grammar without `%expect'. Use the `-v' option to | |
362 | get a verbose list of where the conflicts occur. Bison will also | |
363 | print the number of conflicts. | |
364 | ||
365 | * Check each of the conflicts to make sure that Bison's default | |
366 | resolution is what you really want. If not, rewrite the grammar | |
367 | and go back to the beginning. | |
368 | ||
369 | * Add an `%expect' declaration, copying the number N from the number | |
370 | which Bison printed. | |
371 | ||
372 | Now Bison will stop annoying you about the conflicts you have | |
373 | checked, but it will warn you again if changes in the grammar result in | |
374 | additional conflicts. | |
375 | ||
376 | \1f | |
377 | File: bison.info, Node: Start Decl, Next: Pure Decl, Prev: Expect Decl, Up: Declarations | |
378 | ||
379 | The Start-Symbol | |
380 | ---------------- | |
381 | ||
382 | Bison assumes by default that the start symbol for the grammar is | |
383 | the first nonterminal specified in the grammar specification section. | |
384 | The programmer may override this restriction with the `%start' | |
385 | declaration as follows: | |
386 | ||
387 | %start SYMBOL | |
388 | ||
389 | \1f | |
390 | File: bison.info, Node: Pure Decl, Next: Decl Summary, Prev: Start Decl, Up: Declarations | |
391 | ||
392 | A Pure (Reentrant) Parser | |
393 | ------------------------- | |
394 | ||
395 | A "reentrant" program is one which does not alter in the course of | |
396 | execution; in other words, it consists entirely of "pure" (read-only) | |
397 | code. Reentrancy is important whenever asynchronous execution is | |
398 | possible; for example, a non-reentrant program may not be safe to call | |
399 | from a signal handler. In systems with multiple threads of control, a | |
400 | non-reentrant program must be called only within interlocks. | |
401 | ||
402 | Normally, Bison generates a parser which is not reentrant. This is | |
403 | suitable for most uses, and it permits compatibility with YACC. (The | |
404 | standard YACC interfaces are inherently nonreentrant, because they use | |
405 | statically allocated variables for communication with `yylex', | |
406 | including `yylval' and `yylloc'.) | |
407 | ||
408 | Alternatively, you can generate a pure, reentrant parser. The Bison | |
409 | declaration `%pure_parser' says that you want the parser to be | |
410 | reentrant. It looks like this: | |
411 | ||
412 | %pure_parser | |
413 | ||
414 | The result is that the communication variables `yylval' and `yylloc' | |
415 | become local variables in `yyparse', and a different calling convention | |
416 | is used for the lexical analyzer function `yylex'. *Note Calling | |
417 | Conventions for Pure Parsers: Pure Calling, for the details of this. | |
418 | The variable `yynerrs' also becomes local in `yyparse' (*note The Error | |
419 | Reporting Function `yyerror': Error Reporting.). The convention for | |
420 | calling `yyparse' itself is unchanged. | |
421 | ||
422 | Whether the parser is pure has nothing to do with the grammar rules. | |
423 | You can generate either a pure parser or a nonreentrant parser from any | |
424 | valid grammar. | |
425 | ||
426 | \1f | |
427 | File: bison.info, Node: Decl Summary, Prev: Pure Decl, Up: Declarations | |
428 | ||
429 | Bison Declaration Summary | |
430 | ------------------------- | |
431 | ||
432 | Here is a summary of all Bison declarations: | |
433 | ||
434 | `%union' | |
435 | Declare the collection of data types that semantic values may have | |
436 | (*note The Collection of Value Types: Union Decl.). | |
437 | ||
438 | `%token' | |
439 | Declare a terminal symbol (token type name) with no precedence or | |
440 | associativity specified (*note Token Type Names: Token Decl.). | |
441 | ||
442 | `%right' | |
443 | Declare a terminal symbol (token type name) that is | |
444 | right-associative (*note Operator Precedence: Precedence Decl.). | |
445 | ||
446 | `%left' | |
447 | Declare a terminal symbol (token type name) that is | |
448 | left-associative (*note Operator Precedence: Precedence Decl.). | |
449 | ||
450 | `%nonassoc' | |
451 | Declare a terminal symbol (token type name) that is nonassociative | |
452 | (using it in a way that would be associative is a syntax error) | |
453 | (*note Operator Precedence: Precedence Decl.). | |
454 | ||
455 | `%type' | |
456 | Declare the type of semantic values for a nonterminal symbol | |
457 | (*note Nonterminal Symbols: Type Decl.). | |
458 | ||
459 | `%start' | |
460 | Specify the grammar's start symbol (*note The Start-Symbol: Start | |
461 | Decl.). | |
462 | ||
463 | `%expect' | |
464 | Declare the expected number of shift-reduce conflicts (*note | |
465 | Suppressing Conflict Warnings: Expect Decl.). | |
466 | ||
467 | `%yacc' | |
468 | `%fixed_output_files' | |
469 | Pretend the option `--yacc' was given, i.e., imitate Yacc, | |
470 | including its naming conventions. *Note Bison Options::, for more. | |
471 | ||
472 | `%locations' | |
473 | Generate the code processing the locations (*note Special Features | |
474 | for Use in Actions: Action Features.). This mode is enabled as | |
475 | soon as the grammar uses the special `@N' tokens, but if your | |
476 | grammar does not use it, using `%locations' allows for more | |
477 | accurate parse error messages. | |
478 | ||
479 | `%pure_parser' | |
480 | Request a pure (reentrant) parser program (*note A Pure | |
481 | (Reentrant) Parser: Pure Decl.). | |
482 | ||
483 | `%no_parser' | |
484 | Do not include any C code in the parser file; generate tables | |
485 | only. The parser file contains just `#define' directives and | |
486 | static variable declarations. | |
487 | ||
488 | This option also tells Bison to write the C code for the grammar | |
489 | actions into a file named `FILENAME.act', in the form of a | |
490 | brace-surrounded body fit for a `switch' statement. | |
491 | ||
492 | `%no_lines' | |
493 | Don't generate any `#line' preprocessor commands in the parser | |
494 | file. Ordinarily Bison writes these commands in the parser file | |
495 | so that the C compiler and debuggers will associate errors and | |
496 | object code with your source file (the grammar file). This | |
497 | directive causes them to associate errors with the parser file, | |
498 | treating it an independent source file in its own right. | |
499 | ||
500 | `%debug' | |
501 | Output a definition of the macro `YYDEBUG' into the parser file, so | |
502 | that the debugging facilities are compiled. *Note Debugging Your | |
503 | Parser: Debugging. | |
504 | ||
505 | `%defines' | |
506 | Write an extra output file containing macro definitions for the | |
507 | token type names defined in the grammar and the semantic value type | |
508 | `YYSTYPE', as well as a few `extern' variable declarations. | |
509 | ||
510 | If the parser output file is named `NAME.c' then this file is | |
511 | named `NAME.h'. | |
512 | ||
513 | This output file is essential if you wish to put the definition of | |
514 | `yylex' in a separate source file, because `yylex' needs to be | |
515 | able to refer to token type codes and the variable `yylval'. | |
516 | *Note Semantic Values of Tokens: Token Values. | |
517 | ||
518 | `%verbose' | |
519 | Write an extra output file containing verbose descriptions of the | |
520 | parser states and what is done for each type of look-ahead token in | |
521 | that state. | |
522 | ||
523 | This file also describes all the conflicts, both those resolved by | |
524 | operator precedence and the unresolved ones. | |
525 | ||
526 | The file's name is made by removing `.tab.c' or `.c' from the | |
527 | parser output file name, and adding `.output' instead. | |
528 | ||
529 | Therefore, if the input file is `foo.y', then the parser file is | |
530 | called `foo.tab.c' by default. As a consequence, the verbose | |
531 | output file is called `foo.output'. | |
532 | ||
533 | `%raw' | |
534 | The output file `NAME.h' normally defines the tokens with | |
535 | Yacc-compatible token numbers. If this option is specified, the | |
536 | internal Bison numbers are used instead. (Yacc-compatible numbers | |
537 | start at 257 except for single-character tokens; Bison assigns | |
538 | token numbers sequentially for all tokens starting at 3.) | |
539 | ||
540 | `%token_table' | |
541 | Generate an array of token names in the parser file. The name of | |
542 | the array is `yytname'; `yytname[I]' is the name of the token | |
543 | whose internal Bison token code number is I. The first three | |
544 | elements of `yytname' are always `"$"', `"error"', and | |
545 | `"$illegal"'; after these come the symbols defined in the grammar | |
546 | file. | |
547 | ||
548 | For single-character literal tokens and literal string tokens, the | |
549 | name in the table includes the single-quote or double-quote | |
550 | characters: for example, `"'+'"' is a single-character literal and | |
551 | `"\"<=\""' is a literal string token. All the characters of the | |
552 | literal string token appear verbatim in the string found in the | |
553 | table; even double-quote characters are not escaped. For example, | |
554 | if the token consists of three characters `*"*', its string in | |
555 | `yytname' contains `"*"*"'. (In C, that would be written as | |
556 | `"\"*\"*\""'). | |
557 | ||
558 | When you specify `%token_table', Bison also generates macro | |
559 | definitions for macros `YYNTOKENS', `YYNNTS', and `YYNRULES', and | |
560 | `YYNSTATES': | |
561 | ||
562 | `YYNTOKENS' | |
563 | The highest token number, plus one. | |
564 | ||
565 | `YYNNTS' | |
566 | The number of nonterminal symbols. | |
567 | ||
568 | `YYNRULES' | |
569 | The number of grammar rules, | |
570 | ||
571 | `YYNSTATES' | |
572 | The number of parser states (*note Parser States::). | |
573 | ||
574 | \1f | |
575 | File: bison.info, Node: Multiple Parsers, Prev: Declarations, Up: Grammar File | |
576 | ||
577 | Multiple Parsers in the Same Program | |
578 | ==================================== | |
579 | ||
580 | Most programs that use Bison parse only one language and therefore | |
581 | contain only one Bison parser. But what if you want to parse more than | |
582 | one language with the same program? Then you need to avoid a name | |
583 | conflict between different definitions of `yyparse', `yylval', and so | |
584 | on. | |
585 | ||
586 | The easy way to do this is to use the option `-p PREFIX' (*note | |
587 | Invoking Bison: Invocation.). This renames the interface functions and | |
588 | variables of the Bison parser to start with PREFIX instead of `yy'. | |
589 | You can use this to give each parser distinct names that do not | |
590 | conflict. | |
591 | ||
592 | The precise list of symbols renamed is `yyparse', `yylex', | |
593 | `yyerror', `yynerrs', `yylval', `yychar' and `yydebug'. For example, | |
594 | if you use `-p c', the names become `cparse', `clex', and so on. | |
595 | ||
596 | *All the other variables and macros associated with Bison are not | |
597 | renamed.* These others are not global; there is no conflict if the same | |
598 | name is used in different parsers. For example, `YYSTYPE' is not | |
599 | renamed, but defining this in different ways in different parsers causes | |
600 | no trouble (*note Data Types of Semantic Values: Value Type.). | |
601 | ||
602 | The `-p' option works by adding macro definitions to the beginning | |
603 | of the parser source file, defining `yyparse' as `PREFIXparse', and so | |
604 | on. This effectively substitutes one name for the other in the entire | |
605 | parser file. | |
606 | ||
607 | \1f | |
608 | File: bison.info, Node: Interface, Next: Algorithm, Prev: Grammar File, Up: Top | |
609 | ||
610 | Parser C-Language Interface | |
611 | *************************** | |
612 | ||
613 | The Bison parser is actually a C function named `yyparse'. Here we | |
614 | describe the interface conventions of `yyparse' and the other functions | |
615 | that it needs to use. | |
616 | ||
617 | Keep in mind that the parser uses many C identifiers starting with | |
618 | `yy' and `YY' for internal purposes. If you use such an identifier | |
619 | (aside from those in this manual) in an action or in additional C code | |
620 | in the grammar file, you are likely to run into trouble. | |
621 | ||
622 | * Menu: | |
623 | ||
624 | * Parser Function:: How to call `yyparse' and what it returns. | |
625 | * Lexical:: You must supply a function `yylex' | |
626 | which reads tokens. | |
627 | * Error Reporting:: You must supply a function `yyerror'. | |
628 | * Action Features:: Special features for use in actions. | |
629 | ||
630 | \1f | |
631 | File: bison.info, Node: Parser Function, Next: Lexical, Up: Interface | |
632 | ||
633 | The Parser Function `yyparse' | |
634 | ============================= | |
635 | ||
636 | You call the function `yyparse' to cause parsing to occur. This | |
637 | function reads tokens, executes actions, and ultimately returns when it | |
638 | encounters end-of-input or an unrecoverable syntax error. You can also | |
639 | write an action which directs `yyparse' to return immediately without | |
640 | reading further. | |
641 | ||
642 | The value returned by `yyparse' is 0 if parsing was successful | |
643 | (return is due to end-of-input). | |
644 | ||
645 | The value is 1 if parsing failed (return is due to a syntax error). | |
646 | ||
647 | In an action, you can cause immediate return from `yyparse' by using | |
648 | these macros: | |
649 | ||
650 | `YYACCEPT' | |
651 | Return immediately with value 0 (to report success). | |
652 | ||
653 | `YYABORT' | |
654 | Return immediately with value 1 (to report failure). | |
655 | ||
656 | \1f | |
657 | File: bison.info, Node: Lexical, Next: Error Reporting, Prev: Parser Function, Up: Interface | |
658 | ||
659 | The Lexical Analyzer Function `yylex' | |
660 | ===================================== | |
661 | ||
662 | The "lexical analyzer" function, `yylex', recognizes tokens from the | |
663 | input stream and returns them to the parser. Bison does not create | |
664 | this function automatically; you must write it so that `yyparse' can | |
665 | call it. The function is sometimes referred to as a lexical scanner. | |
666 | ||
667 | In simple programs, `yylex' is often defined at the end of the Bison | |
668 | grammar file. If `yylex' is defined in a separate source file, you | |
669 | need to arrange for the token-type macro definitions to be available | |
670 | there. To do this, use the `-d' option when you run Bison, so that it | |
671 | will write these macro definitions into a separate header file | |
672 | `NAME.tab.h' which you can include in the other source files that need | |
673 | it. *Note Invoking Bison: Invocation. | |
674 | ||
675 | * Menu: | |
676 | ||
677 | * Calling Convention:: How `yyparse' calls `yylex'. | |
678 | * Token Values:: How `yylex' must return the semantic value | |
679 | of the token it has read. | |
680 | * Token Positions:: How `yylex' must return the text position | |
681 | (line number, etc.) of the token, if the | |
682 | actions want that. | |
683 | * Pure Calling:: How the calling convention differs | |
684 | in a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.). | |
685 | ||
686 | \1f | |
687 | File: bison.info, Node: Calling Convention, Next: Token Values, Up: Lexical | |
688 | ||
689 | Calling Convention for `yylex' | |
690 | ------------------------------ | |
691 | ||
692 | The value that `yylex' returns must be the numeric code for the type | |
693 | of token it has just found, or 0 for end-of-input. | |
694 | ||
695 | When a token is referred to in the grammar rules by a name, that name | |
696 | in the parser file becomes a C macro whose definition is the proper | |
697 | numeric code for that token type. So `yylex' can use the name to | |
698 | indicate that type. *Note Symbols::. | |
699 | ||
700 | When a token is referred to in the grammar rules by a character | |
701 | literal, the numeric code for that character is also the code for the | |
702 | token type. So `yylex' can simply return that character code. The | |
703 | null character must not be used this way, because its code is zero and | |
704 | that is what signifies end-of-input. | |
705 | ||
706 | Here is an example showing these things: | |
707 | ||
708 | int | |
709 | yylex (void) | |
710 | { | |
711 | ... | |
712 | if (c == EOF) /* Detect end of file. */ | |
713 | return 0; | |
714 | ... | |
715 | if (c == '+' || c == '-') | |
716 | return c; /* Assume token type for `+' is '+'. */ | |
717 | ... | |
718 | return INT; /* Return the type of the token. */ | |
719 | ... | |
720 | } | |
721 | ||
722 | This interface has been designed so that the output from the `lex' | |
723 | utility can be used without change as the definition of `yylex'. | |
724 | ||
725 | If the grammar uses literal string tokens, there are two ways that | |
726 | `yylex' can determine the token type codes for them: | |
727 | ||
728 | * If the grammar defines symbolic token names as aliases for the | |
729 | literal string tokens, `yylex' can use these symbolic names like | |
730 | all others. In this case, the use of the literal string tokens in | |
731 | the grammar file has no effect on `yylex'. | |
732 | ||
733 | * `yylex' can find the multicharacter token in the `yytname' table. | |
734 | The index of the token in the table is the token type's code. The | |
735 | name of a multicharacter token is recorded in `yytname' with a | |
736 | double-quote, the token's characters, and another double-quote. | |
737 | The token's characters are not escaped in any way; they appear | |
738 | verbatim in the contents of the string in the table. | |
739 | ||
740 | Here's code for looking up a token in `yytname', assuming that the | |
741 | characters of the token are stored in `token_buffer'. | |
742 | ||
743 | for (i = 0; i < YYNTOKENS; i++) | |
744 | { | |
745 | if (yytname[i] != 0 | |
746 | && yytname[i][0] == '"' | |
747 | && strncmp (yytname[i] + 1, token_buffer, | |
748 | strlen (token_buffer)) | |
749 | && yytname[i][strlen (token_buffer) + 1] == '"' | |
750 | && yytname[i][strlen (token_buffer) + 2] == 0) | |
751 | break; | |
752 | } | |
753 | ||
754 | The `yytname' table is generated only if you use the | |
755 | `%token_table' declaration. *Note Decl Summary::. | |
756 | ||
757 | \1f | |
758 | File: bison.info, Node: Token Values, Next: Token Positions, Prev: Calling Convention, Up: Lexical | |
759 | ||
760 | Semantic Values of Tokens | |
761 | ------------------------- | |
762 | ||
763 | In an ordinary (non-reentrant) parser, the semantic value of the | |
764 | token must be stored into the global variable `yylval'. When you are | |
765 | using just one data type for semantic values, `yylval' has that type. | |
766 | Thus, if the type is `int' (the default), you might write this in | |
767 | `yylex': | |
768 | ||
769 | ... | |
770 | yylval = value; /* Put value onto Bison stack. */ | |
771 | return INT; /* Return the type of the token. */ | |
772 | ... | |
773 | ||
774 | When you are using multiple data types, `yylval''s type is a union | |
775 | made from the `%union' declaration (*note The Collection of Value | |
776 | Types: Union Decl.). So when you store a token's value, you must use | |
777 | the proper member of the union. If the `%union' declaration looks like | |
778 | this: | |
779 | ||
780 | %union { | |
781 | int intval; | |
782 | double val; | |
783 | symrec *tptr; | |
784 | } | |
785 | ||
786 | then the code in `yylex' might look like this: | |
787 | ||
788 | ... | |
789 | yylval.intval = value; /* Put value onto Bison stack. */ | |
790 | return INT; /* Return the type of the token. */ | |
791 | ... | |
792 | ||
793 | \1f | |
794 | File: bison.info, Node: Token Positions, Next: Pure Calling, Prev: Token Values, Up: Lexical | |
795 | ||
796 | Textual Positions of Tokens | |
797 | --------------------------- | |
798 | ||
799 | If you are using the `@N'-feature (*note Special Features for Use in | |
800 | Actions: Action Features.) in actions to keep track of the textual | |
801 | locations of tokens and groupings, then you must provide this | |
802 | information in `yylex'. The function `yyparse' expects to find the | |
803 | textual location of a token just parsed in the global variable | |
804 | `yylloc'. So `yylex' must store the proper data in that variable. The | |
805 | value of `yylloc' is a structure and you need only initialize the | |
806 | members that are going to be used by the actions. The four members are | |
807 | called `first_line', `first_column', `last_line' and `last_column'. | |
808 | Note that the use of this feature makes the parser noticeably slower. | |
809 | ||
810 | The data type of `yylloc' has the name `YYLTYPE'. | |
811 | ||
812 | \1f | |
813 | File: bison.info, Node: Pure Calling, Prev: Token Positions, Up: Lexical | |
814 | ||
815 | Calling Conventions for Pure Parsers | |
816 | ------------------------------------ | |
817 | ||
818 | When you use the Bison declaration `%pure_parser' to request a pure, | |
819 | reentrant parser, the global communication variables `yylval' and | |
820 | `yylloc' cannot be used. (*Note A Pure (Reentrant) Parser: Pure Decl.) | |
821 | In such parsers the two global variables are replaced by pointers | |
822 | passed as arguments to `yylex'. You must declare them as shown here, | |
823 | and pass the information back by storing it through those pointers. | |
824 | ||
825 | int | |
826 | yylex (YYSTYPE *lvalp, YYLTYPE *llocp) | |
827 | { | |
828 | ... | |
829 | *lvalp = value; /* Put value onto Bison stack. */ | |
830 | return INT; /* Return the type of the token. */ | |
831 | ... | |
832 | } | |
833 | ||
834 | If the grammar file does not use the `@' constructs to refer to | |
835 | textual positions, then the type `YYLTYPE' will not be defined. In | |
836 | this case, omit the second argument; `yylex' will be called with only | |
837 | one argument. | |
838 | ||
839 | If you use a reentrant parser, you can optionally pass additional | |
840 | parameter information to it in a reentrant way. To do so, define the | |
841 | macro `YYPARSE_PARAM' as a variable name. This modifies the `yyparse' | |
842 | function to accept one argument, of type `void *', with that name. | |
843 | ||
844 | When you call `yyparse', pass the address of an object, casting the | |
845 | address to `void *'. The grammar actions can refer to the contents of | |
846 | the object by casting the pointer value back to its proper type and | |
847 | then dereferencing it. Here's an example. Write this in the parser: | |
848 | ||
849 | %{ | |
850 | struct parser_control | |
851 | { | |
852 | int nastiness; | |
853 | int randomness; | |
854 | }; | |
855 | ||
856 | #define YYPARSE_PARAM parm | |
857 | %} | |
858 | ||
859 | Then call the parser like this: | |
860 | ||
861 | struct parser_control | |
862 | { | |
863 | int nastiness; | |
864 | int randomness; | |
865 | }; | |
866 | ||
867 | ... | |
868 | ||
869 | { | |
870 | struct parser_control foo; | |
871 | ... /* Store proper data in `foo'. */ | |
872 | value = yyparse ((void *) &foo); | |
873 | ... | |
874 | } | |
875 | ||
876 | In the grammar actions, use expressions like this to refer to the data: | |
877 | ||
878 | ((struct parser_control *) parm)->randomness | |
879 | ||
880 | If you wish to pass the additional parameter data to `yylex', define | |
881 | the macro `YYLEX_PARAM' just like `YYPARSE_PARAM', as shown here: | |
882 | ||
883 | %{ | |
884 | struct parser_control | |
885 | { | |
886 | int nastiness; | |
887 | int randomness; | |
888 | }; | |
889 | ||
890 | #define YYPARSE_PARAM parm | |
891 | #define YYLEX_PARAM parm | |
892 | %} | |
893 | ||
894 | You should then define `yylex' to accept one additional | |
895 | argument--the value of `parm'. (This makes either two or three | |
896 | arguments in total, depending on whether an argument of type `YYLTYPE' | |
897 | is passed.) You can declare the argument as a pointer to the proper | |
898 | object type, or you can declare it as `void *' and access the contents | |
899 | as shown above. | |
900 | ||
901 | You can use `%pure_parser' to request a reentrant parser without | |
902 | also using `YYPARSE_PARAM'. Then you should call `yyparse' with no | |
903 | arguments, as usual. | |
904 | ||
905 | \1f | |
906 | File: bison.info, Node: Error Reporting, Next: Action Features, Prev: Lexical, Up: Interface | |
907 | ||
908 | The Error Reporting Function `yyerror' | |
909 | ====================================== | |
910 | ||
911 | The Bison parser detects a "parse error" or "syntax error" whenever | |
912 | it reads a token which cannot satisfy any syntax rule. An action in | |
913 | the grammar can also explicitly proclaim an error, using the macro | |
914 | `YYERROR' (*note Special Features for Use in Actions: Action Features.). | |
915 | ||
916 | The Bison parser expects to report the error by calling an error | |
917 | reporting function named `yyerror', which you must supply. It is | |
918 | called by `yyparse' whenever a syntax error is found, and it receives | |
919 | one argument. For a parse error, the string is normally | |
920 | `"parse error"'. | |
921 | ||
922 | If you define the macro `YYERROR_VERBOSE' in the Bison declarations | |
923 | section (*note The Bison Declarations Section: Bison Declarations.), | |
924 | then Bison provides a more verbose and specific error message string | |
925 | instead of just plain `"parse error"'. It doesn't matter what | |
926 | definition you use for `YYERROR_VERBOSE', just whether you define it. | |
927 | ||
928 | The parser can detect one other kind of error: stack overflow. This | |
929 | happens when the input contains constructions that are very deeply | |
930 | nested. It isn't likely you will encounter this, since the Bison | |
931 | parser extends its stack automatically up to a very large limit. But | |
932 | if overflow happens, `yyparse' calls `yyerror' in the usual fashion, | |
933 | except that the argument string is `"parser stack overflow"'. | |
934 | ||
935 | The following definition suffices in simple programs: | |
936 | ||
937 | void | |
938 | yyerror (char *s) | |
939 | { | |
940 | fprintf (stderr, "%s\n", s); | |
941 | } | |
942 | ||
943 | After `yyerror' returns to `yyparse', the latter will attempt error | |
944 | recovery if you have written suitable error recovery grammar rules | |
945 | (*note Error Recovery::). If recovery is impossible, `yyparse' will | |
946 | immediately return 1. | |
947 | ||
948 | The variable `yynerrs' contains the number of syntax errors | |
949 | encountered so far. Normally this variable is global; but if you | |
950 | request a pure parser (*note A Pure (Reentrant) Parser: Pure Decl.) | |
951 | then it is a local variable which only the actions can access. | |
952 | ||
953 | \1f | |
954 | File: bison.info, Node: Action Features, Prev: Error Reporting, Up: Interface | |
955 | ||
956 | Special Features for Use in Actions | |
957 | =================================== | |
958 | ||
959 | Here is a table of Bison constructs, variables and macros that are | |
960 | useful in actions. | |
961 | ||
962 | `$$' | |
963 | Acts like a variable that contains the semantic value for the | |
964 | grouping made by the current rule. *Note Actions::. | |
965 | ||
966 | `$N' | |
967 | Acts like a variable that contains the semantic value for the Nth | |
968 | component of the current rule. *Note Actions::. | |
969 | ||
970 | `$<TYPEALT>$' | |
971 | Like `$$' but specifies alternative TYPEALT in the union specified | |
972 | by the `%union' declaration. *Note Data Types of Values in | |
973 | Actions: Action Types. | |
974 | ||
975 | `$<TYPEALT>N' | |
976 | Like `$N' but specifies alternative TYPEALT in the union specified | |
977 | by the `%union' declaration. *Note Data Types of Values in | |
978 | Actions: Action Types. | |
979 | ||
980 | `YYABORT;' | |
981 | Return immediately from `yyparse', indicating failure. *Note The | |
982 | Parser Function `yyparse': Parser Function. | |
983 | ||
984 | `YYACCEPT;' | |
985 | Return immediately from `yyparse', indicating success. *Note The | |
986 | Parser Function `yyparse': Parser Function. | |
987 | ||
988 | `YYBACKUP (TOKEN, VALUE);' | |
989 | Unshift a token. This macro is allowed only for rules that reduce | |
990 | a single value, and only when there is no look-ahead token. It | |
991 | installs a look-ahead token with token type TOKEN and semantic | |
992 | value VALUE; then it discards the value that was going to be | |
993 | reduced by this rule. | |
994 | ||
995 | If the macro is used when it is not valid, such as when there is a | |
996 | look-ahead token already, then it reports a syntax error with a | |
997 | message `cannot back up' and performs ordinary error recovery. | |
998 | ||
999 | In either case, the rest of the action is not executed. | |
1000 | ||
1001 | `YYEMPTY' | |
1002 | Value stored in `yychar' when there is no look-ahead token. | |
1003 | ||
1004 | `YYERROR;' | |
1005 | Cause an immediate syntax error. This statement initiates error | |
1006 | recovery just as if the parser itself had detected an error; | |
1007 | however, it does not call `yyerror', and does not print any | |
1008 | message. If you want to print an error message, call `yyerror' | |
1009 | explicitly before the `YYERROR;' statement. *Note Error | |
1010 | Recovery::. | |
1011 | ||
1012 | `YYRECOVERING' | |
1013 | This macro stands for an expression that has the value 1 when the | |
1014 | parser is recovering from a syntax error, and 0 the rest of the | |
1015 | time. *Note Error Recovery::. | |
1016 | ||
1017 | `yychar' | |
1018 | Variable containing the current look-ahead token. (In a pure | |
1019 | parser, this is actually a local variable within `yyparse'.) When | |
1020 | there is no look-ahead token, the value `YYEMPTY' is stored in the | |
1021 | variable. *Note Look-Ahead Tokens: Look-Ahead. | |
1022 | ||
1023 | `yyclearin;' | |
1024 | Discard the current look-ahead token. This is useful primarily in | |
1025 | error rules. *Note Error Recovery::. | |
1026 | ||
1027 | `yyerrok;' | |
1028 | Resume generating error messages immediately for subsequent syntax | |
1029 | errors. This is useful primarily in error rules. *Note Error | |
1030 | Recovery::. | |
1031 | ||
1032 | `@N' | |
1033 | Acts like a structure variable containing information on the line | |
1034 | numbers and column numbers of the Nth component of the current | |
1035 | rule. The structure has four members, like this: | |
1036 | ||
1037 | struct { | |
1038 | int first_line, last_line; | |
1039 | int first_column, last_column; | |
1040 | }; | |
1041 | ||
1042 | Thus, to get the starting line number of the third component, you | |
1043 | would use `@3.first_line'. | |
1044 | ||
1045 | In order for the members of this structure to contain valid | |
1046 | information, you must make `yylex' supply this information about | |
1047 | each token. If you need only certain members, then `yylex' need | |
1048 | only fill in those members. | |
1049 | ||
1050 | The use of this feature makes the parser noticeably slower. | |
1051 | ||
1052 | \1f | |
1053 | File: bison.info, Node: Algorithm, Next: Error Recovery, Prev: Interface, Up: Top | |
1054 | ||
1055 | The Bison Parser Algorithm | |
1056 | ************************** | |
1057 | ||
1058 | As Bison reads tokens, it pushes them onto a stack along with their | |
1059 | semantic values. The stack is called the "parser stack". Pushing a | |
1060 | token is traditionally called "shifting". | |
1061 | ||
1062 | For example, suppose the infix calculator has read `1 + 5 *', with a | |
1063 | `3' to come. The stack will have four elements, one for each token | |
1064 | that was shifted. | |
1065 | ||
1066 | But the stack does not always have an element for each token read. | |
1067 | When the last N tokens and groupings shifted match the components of a | |
1068 | grammar rule, they can be combined according to that rule. This is | |
1069 | called "reduction". Those tokens and groupings are replaced on the | |
1070 | stack by a single grouping whose symbol is the result (left hand side) | |
1071 | of that rule. Running the rule's action is part of the process of | |
1072 | reduction, because this is what computes the semantic value of the | |
1073 | resulting grouping. | |
1074 | ||
1075 | For example, if the infix calculator's parser stack contains this: | |
1076 | ||
1077 | 1 + 5 * 3 | |
1078 | ||
1079 | and the next input token is a newline character, then the last three | |
1080 | elements can be reduced to 15 via the rule: | |
1081 | ||
1082 | expr: expr '*' expr; | |
1083 | ||
1084 | Then the stack contains just these three elements: | |
1085 | ||
1086 | 1 + 15 | |
1087 | ||
1088 | At this point, another reduction can be made, resulting in the single | |
1089 | value 16. Then the newline token can be shifted. | |
1090 | ||
1091 | The parser tries, by shifts and reductions, to reduce the entire | |
1092 | input down to a single grouping whose symbol is the grammar's | |
1093 | start-symbol (*note Languages and Context-Free Grammars: Language and | |
1094 | Grammar.). | |
1095 | ||
1096 | This kind of parser is known in the literature as a bottom-up parser. | |
1097 | ||
1098 | * Menu: | |
1099 | ||
1100 | * Look-Ahead:: Parser looks one token ahead when deciding what to do. | |
1101 | * Shift/Reduce:: Conflicts: when either shifting or reduction is valid. | |
1102 | * Precedence:: Operator precedence works by resolving conflicts. | |
1103 | * Contextual Precedence:: When an operator's precedence depends on context. | |
1104 | * Parser States:: The parser is a finite-state-machine with stack. | |
1105 | * Reduce/Reduce:: When two rules are applicable in the same situation. | |
1106 | * Mystery Conflicts:: Reduce/reduce conflicts that look unjustified. | |
1107 | * Stack Overflow:: What happens when stack gets full. How to avoid it. | |
1108 | ||
1109 | \1f | |
1110 | File: bison.info, Node: Look-Ahead, Next: Shift/Reduce, Up: Algorithm | |
1111 | ||
1112 | Look-Ahead Tokens | |
1113 | ================= | |
1114 | ||
1115 | The Bison parser does _not_ always reduce immediately as soon as the | |
1116 | last N tokens and groupings match a rule. This is because such a | |
1117 | simple strategy is inadequate to handle most languages. Instead, when a | |
1118 | reduction is possible, the parser sometimes "looks ahead" at the next | |
1119 | token in order to decide what to do. | |
1120 | ||
1121 | When a token is read, it is not immediately shifted; first it | |
1122 | becomes the "look-ahead token", which is not on the stack. Now the | |
1123 | parser can perform one or more reductions of tokens and groupings on | |
1124 | the stack, while the look-ahead token remains off to the side. When no | |
1125 | more reductions should take place, the look-ahead token is shifted onto | |
1126 | the stack. This does not mean that all possible reductions have been | |
1127 | done; depending on the token type of the look-ahead token, some rules | |
1128 | may choose to delay their application. | |
1129 | ||
1130 | Here is a simple case where look-ahead is needed. These three rules | |
1131 | define expressions which contain binary addition operators and postfix | |
1132 | unary factorial operators (`!'), and allow parentheses for grouping. | |
1133 | ||
1134 | expr: term '+' expr | |
1135 | | term | |
1136 | ; | |
1137 | ||
1138 | term: '(' expr ')' | |
1139 | | term '!' | |
1140 | | NUMBER | |
1141 | ; | |
1142 | ||
1143 | Suppose that the tokens `1 + 2' have been read and shifted; what | |
1144 | should be done? If the following token is `)', then the first three | |
1145 | tokens must be reduced to form an `expr'. This is the only valid | |
1146 | course, because shifting the `)' would produce a sequence of symbols | |
1147 | `term ')'', and no rule allows this. | |
1148 | ||
1149 | If the following token is `!', then it must be shifted immediately so | |
1150 | that `2 !' can be reduced to make a `term'. If instead the parser were | |
1151 | to reduce before shifting, `1 + 2' would become an `expr'. It would | |
1152 | then be impossible to shift the `!' because doing so would produce on | |
1153 | the stack the sequence of symbols `expr '!''. No rule allows that | |
1154 | sequence. | |
1155 | ||
1156 | The current look-ahead token is stored in the variable `yychar'. | |
1157 | *Note Special Features for Use in Actions: Action Features. | |
1158 | ||
1159 | \1f | |
1160 | File: bison.info, Node: Shift/Reduce, Next: Precedence, Prev: Look-Ahead, Up: Algorithm | |
1161 | ||
1162 | Shift/Reduce Conflicts | |
1163 | ====================== | |
1164 | ||
1165 | Suppose we are parsing a language which has if-then and if-then-else | |
1166 | statements, with a pair of rules like this: | |
1167 | ||
1168 | if_stmt: | |
1169 | IF expr THEN stmt | |
1170 | | IF expr THEN stmt ELSE stmt | |
1171 | ; | |
1172 | ||
1173 | Here we assume that `IF', `THEN' and `ELSE' are terminal symbols for | |
1174 | specific keyword tokens. | |
1175 | ||
1176 | When the `ELSE' token is read and becomes the look-ahead token, the | |
1177 | contents of the stack (assuming the input is valid) are just right for | |
1178 | reduction by the first rule. But it is also legitimate to shift the | |
1179 | `ELSE', because that would lead to eventual reduction by the second | |
1180 | rule. | |
1181 | ||
1182 | This situation, where either a shift or a reduction would be valid, | |
1183 | is called a "shift/reduce conflict". Bison is designed to resolve | |
1184 | these conflicts by choosing to shift, unless otherwise directed by | |
1185 | operator precedence declarations. To see the reason for this, let's | |
1186 | contrast it with the other alternative. | |
1187 | ||
1188 | Since the parser prefers to shift the `ELSE', the result is to attach | |
1189 | the else-clause to the innermost if-statement, making these two inputs | |
1190 | equivalent: | |
1191 | ||
1192 | if x then if y then win (); else lose; | |
1193 | ||
1194 | if x then do; if y then win (); else lose; end; | |
1195 | ||
1196 | But if the parser chose to reduce when possible rather than shift, | |
1197 | the result would be to attach the else-clause to the outermost | |
1198 | if-statement, making these two inputs equivalent: | |
1199 | ||
1200 | if x then if y then win (); else lose; | |
1201 | ||
1202 | if x then do; if y then win (); end; else lose; | |
1203 | ||
1204 | The conflict exists because the grammar as written is ambiguous: | |
1205 | either parsing of the simple nested if-statement is legitimate. The | |
1206 | established convention is that these ambiguities are resolved by | |
1207 | attaching the else-clause to the innermost if-statement; this is what | |
1208 | Bison accomplishes by choosing to shift rather than reduce. (It would | |
1209 | ideally be cleaner to write an unambiguous grammar, but that is very | |
1210 | hard to do in this case.) This particular ambiguity was first | |
1211 | encountered in the specifications of Algol 60 and is called the | |
1212 | "dangling `else'" ambiguity. | |
1213 | ||
1214 | To avoid warnings from Bison about predictable, legitimate | |
1215 | shift/reduce conflicts, use the `%expect N' declaration. There will be | |
1216 | no warning as long as the number of shift/reduce conflicts is exactly N. | |
1217 | *Note Suppressing Conflict Warnings: Expect Decl. | |
1218 | ||
1219 | The definition of `if_stmt' above is solely to blame for the | |
1220 | conflict, but the conflict does not actually appear without additional | |
1221 | rules. Here is a complete Bison input file that actually manifests the | |
1222 | conflict: | |
1223 | ||
1224 | %token IF THEN ELSE variable | |
1225 | %% | |
1226 | stmt: expr | |
1227 | | if_stmt | |
1228 | ; | |
1229 | ||
1230 | if_stmt: | |
1231 | IF expr THEN stmt | |
1232 | | IF expr THEN stmt ELSE stmt | |
1233 | ; | |
1234 | ||
1235 | expr: variable | |
1236 | ; | |
1237 | ||
1238 | \1f | |
1239 | File: bison.info, Node: Precedence, Next: Contextual Precedence, Prev: Shift/Reduce, Up: Algorithm | |
1240 | ||
1241 | Operator Precedence | |
1242 | =================== | |
1243 | ||
1244 | Another situation where shift/reduce conflicts appear is in | |
1245 | arithmetic expressions. Here shifting is not always the preferred | |
1246 | resolution; the Bison declarations for operator precedence allow you to | |
1247 | specify when to shift and when to reduce. | |
1248 | ||
1249 | * Menu: | |
1250 | ||
1251 | * Why Precedence:: An example showing why precedence is needed. | |
1252 | * Using Precedence:: How to specify precedence in Bison grammars. | |
1253 | * Precedence Examples:: How these features are used in the previous example. | |
1254 | * How Precedence:: How they work. | |
1255 | ||
1256 | \1f | |
1257 | File: bison.info, Node: Why Precedence, Next: Using Precedence, Up: Precedence | |
1258 | ||
1259 | When Precedence is Needed | |
1260 | ------------------------- | |
1261 | ||
1262 | Consider the following ambiguous grammar fragment (ambiguous because | |
1263 | the input `1 - 2 * 3' can be parsed in two different ways): | |
1264 | ||
1265 | expr: expr '-' expr | |
1266 | | expr '*' expr | |
1267 | | expr '<' expr | |
1268 | | '(' expr ')' | |
1269 | ... | |
1270 | ; | |
1271 | ||
1272 | Suppose the parser has seen the tokens `1', `-' and `2'; should it | |
1273 | reduce them via the rule for the subtraction operator? It depends on | |
1274 | the next token. Of course, if the next token is `)', we must reduce; | |
1275 | shifting is invalid because no single rule can reduce the token | |
1276 | sequence `- 2 )' or anything starting with that. But if the next token | |
1277 | is `*' or `<', we have a choice: either shifting or reduction would | |
1278 | allow the parse to complete, but with different results. | |
1279 | ||
1280 | To decide which one Bison should do, we must consider the results. | |
1281 | If the next operator token OP is shifted, then it must be reduced first | |
1282 | in order to permit another opportunity to reduce the difference. The | |
1283 | result is (in effect) `1 - (2 OP 3)'. On the other hand, if the | |
1284 | subtraction is reduced before shifting OP, the result is | |
1285 | `(1 - 2) OP 3'. Clearly, then, the choice of shift or reduce should | |
1286 | depend on the relative precedence of the operators `-' and OP: `*' | |
1287 | should be shifted first, but not `<'. | |
1288 | ||
1289 | What about input such as `1 - 2 - 5'; should this be `(1 - 2) - 5' | |
1290 | or should it be `1 - (2 - 5)'? For most operators we prefer the | |
1291 | former, which is called "left association". The latter alternative, | |
1292 | "right association", is desirable for assignment operators. The choice | |
1293 | of left or right association is a matter of whether the parser chooses | |
1294 | to shift or reduce when the stack contains `1 - 2' and the look-ahead | |
1295 | token is `-': shifting makes right-associativity. | |
1296 |