]>
Commit | Line | Data |
---|---|---|
1e24cc5b | 1 | This is bison.info, produced by makeinfo version 4.0 from bison.texinfo. |
705db0b5 AD |
2 | |
3 | START-INFO-DIR-ENTRY | |
4 | * bison: (bison). GNU Project parser generator (yacc replacement). | |
5 | END-INFO-DIR-ENTRY | |
6 | ||
7 | This file documents the Bison parser generator. | |
8 | ||
9 | Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, | |
10 | 2000 Free Software Foundation, Inc. | |
11 | ||
12 | Permission is granted to make and distribute verbatim copies of this | |
13 | manual provided the copyright notice and this permission notice are | |
14 | preserved on all copies. | |
15 | ||
16 | Permission is granted to copy and distribute modified versions of | |
17 | this manual under the conditions for verbatim copying, provided also | |
18 | that the sections entitled "GNU General Public License" and "Conditions | |
19 | for Using Bison" are included exactly as in the original, and provided | |
20 | that the entire resulting derived work is distributed under the terms | |
21 | of a permission notice identical to this one. | |
22 | ||
23 | Permission is granted to copy and distribute translations of this | |
24 | manual into another language, under the above conditions for modified | |
25 | versions, except that the sections entitled "GNU General Public | |
26 | License", "Conditions for Using Bison" and this permission notice may be | |
27 | included in translations approved by the Free Software Foundation | |
28 | instead of in the original English. | |
29 | ||
30 | \1f | |
31 | File: bison.info, Node: Rpcalc Rules, Next: Rpcalc Lexer, Prev: Rpcalc Decls, Up: RPN Calc | |
32 | ||
33 | Grammar Rules for `rpcalc' | |
34 | -------------------------- | |
35 | ||
36 | Here are the grammar rules for the reverse polish notation | |
37 | calculator. | |
38 | ||
39 | input: /* empty */ | |
40 | | input line | |
41 | ; | |
42 | ||
43 | line: '\n' | |
44 | | exp '\n' { printf ("\t%.10g\n", $1); } | |
45 | ; | |
46 | ||
47 | exp: NUM { $$ = $1; } | |
48 | | exp exp '+' { $$ = $1 + $2; } | |
49 | | exp exp '-' { $$ = $1 - $2; } | |
50 | | exp exp '*' { $$ = $1 * $2; } | |
51 | | exp exp '/' { $$ = $1 / $2; } | |
52 | /* Exponentiation */ | |
53 | | exp exp '^' { $$ = pow ($1, $2); } | |
54 | /* Unary minus */ | |
55 | | exp 'n' { $$ = -$1; } | |
56 | ; | |
57 | %% | |
58 | ||
59 | The groupings of the rpcalc "language" defined here are the | |
60 | expression (given the name `exp'), the line of input (`line'), and the | |
61 | complete input transcript (`input'). Each of these nonterminal symbols | |
62 | has several alternate rules, joined by the `|' punctuator which is read | |
63 | as "or". The following sections explain what these rules mean. | |
64 | ||
65 | The semantics of the language is determined by the actions taken | |
66 | when a grouping is recognized. The actions are the C code that appears | |
67 | inside braces. *Note Actions::. | |
68 | ||
69 | You must specify these actions in C, but Bison provides the means for | |
70 | passing semantic values between the rules. In each action, the | |
71 | pseudo-variable `$$' stands for the semantic value for the grouping | |
72 | that the rule is going to construct. Assigning a value to `$$' is the | |
73 | main job of most actions. The semantic values of the components of the | |
74 | rule are referred to as `$1', `$2', and so on. | |
75 | ||
76 | * Menu: | |
77 | ||
78 | * Rpcalc Input:: | |
79 | * Rpcalc Line:: | |
80 | * Rpcalc Expr:: | |
81 | ||
82 | \1f | |
83 | File: bison.info, Node: Rpcalc Input, Next: Rpcalc Line, Up: Rpcalc Rules | |
84 | ||
85 | Explanation of `input' | |
86 | ...................... | |
87 | ||
88 | Consider the definition of `input': | |
89 | ||
90 | input: /* empty */ | |
91 | | input line | |
92 | ; | |
93 | ||
94 | This definition reads as follows: "A complete input is either an | |
95 | empty string, or a complete input followed by an input line". Notice | |
96 | that "complete input" is defined in terms of itself. This definition | |
97 | is said to be "left recursive" since `input' appears always as the | |
98 | leftmost symbol in the sequence. *Note Recursive Rules: Recursion. | |
99 | ||
100 | The first alternative is empty because there are no symbols between | |
101 | the colon and the first `|'; this means that `input' can match an empty | |
102 | string of input (no tokens). We write the rules this way because it is | |
103 | legitimate to type `Ctrl-d' right after you start the calculator. It's | |
104 | conventional to put an empty alternative first and write the comment | |
105 | `/* empty */' in it. | |
106 | ||
107 | The second alternate rule (`input line') handles all nontrivial | |
108 | input. It means, "After reading any number of lines, read one more | |
109 | line if possible." The left recursion makes this rule into a loop. | |
110 | Since the first alternative matches empty input, the loop can be | |
111 | executed zero or more times. | |
112 | ||
113 | The parser function `yyparse' continues to process input until a | |
114 | grammatical error is seen or the lexical analyzer says there are no more | |
115 | input tokens; we will arrange for the latter to happen at end of file. | |
116 | ||
117 | \1f | |
118 | File: bison.info, Node: Rpcalc Line, Next: Rpcalc Expr, Prev: Rpcalc Input, Up: Rpcalc Rules | |
119 | ||
120 | Explanation of `line' | |
121 | ..................... | |
122 | ||
123 | Now consider the definition of `line': | |
124 | ||
125 | line: '\n' | |
126 | | exp '\n' { printf ("\t%.10g\n", $1); } | |
127 | ; | |
128 | ||
129 | The first alternative is a token which is a newline character; this | |
130 | means that rpcalc accepts a blank line (and ignores it, since there is | |
131 | no action). The second alternative is an expression followed by a | |
132 | newline. This is the alternative that makes rpcalc useful. The | |
133 | semantic value of the `exp' grouping is the value of `$1' because the | |
134 | `exp' in question is the first symbol in the alternative. The action | |
135 | prints this value, which is the result of the computation the user | |
136 | asked for. | |
137 | ||
138 | This action is unusual because it does not assign a value to `$$'. | |
139 | As a consequence, the semantic value associated with the `line' is | |
140 | uninitialized (its value will be unpredictable). This would be a bug if | |
141 | that value were ever used, but we don't use it: once rpcalc has printed | |
142 | the value of the user's input line, that value is no longer needed. | |
143 | ||
144 | \1f | |
145 | File: bison.info, Node: Rpcalc Expr, Prev: Rpcalc Line, Up: Rpcalc Rules | |
146 | ||
147 | Explanation of `expr' | |
148 | ..................... | |
149 | ||
150 | The `exp' grouping has several rules, one for each kind of | |
151 | expression. The first rule handles the simplest expressions: those | |
152 | that are just numbers. The second handles an addition-expression, | |
153 | which looks like two expressions followed by a plus-sign. The third | |
154 | handles subtraction, and so on. | |
155 | ||
156 | exp: NUM | |
157 | | exp exp '+' { $$ = $1 + $2; } | |
158 | | exp exp '-' { $$ = $1 - $2; } | |
159 | ... | |
160 | ; | |
161 | ||
162 | We have used `|' to join all the rules for `exp', but we could | |
163 | equally well have written them separately: | |
164 | ||
165 | exp: NUM ; | |
166 | exp: exp exp '+' { $$ = $1 + $2; } ; | |
167 | exp: exp exp '-' { $$ = $1 - $2; } ; | |
168 | ... | |
169 | ||
170 | Most of the rules have actions that compute the value of the | |
171 | expression in terms of the value of its parts. For example, in the | |
172 | rule for addition, `$1' refers to the first component `exp' and `$2' | |
173 | refers to the second one. The third component, `'+'', has no meaningful | |
174 | associated semantic value, but if it had one you could refer to it as | |
175 | `$3'. When `yyparse' recognizes a sum expression using this rule, the | |
176 | sum of the two subexpressions' values is produced as the value of the | |
177 | entire expression. *Note Actions::. | |
178 | ||
179 | You don't have to give an action for every rule. When a rule has no | |
180 | action, Bison by default copies the value of `$1' into `$$'. This is | |
181 | what happens in the first rule (the one that uses `NUM'). | |
182 | ||
183 | The formatting shown here is the recommended convention, but Bison | |
184 | does not require it. You can add or change whitespace as much as you | |
185 | wish. For example, this: | |
186 | ||
187 | exp : NUM | exp exp '+' {$$ = $1 + $2; } | ... | |
188 | ||
189 | means the same thing as this: | |
190 | ||
191 | exp: NUM | |
192 | | exp exp '+' { $$ = $1 + $2; } | |
193 | | ... | |
194 | ||
195 | The latter, however, is much more readable. | |
196 | ||
197 | \1f | |
198 | File: bison.info, Node: Rpcalc Lexer, Next: Rpcalc Main, Prev: Rpcalc Rules, Up: RPN Calc | |
199 | ||
200 | The `rpcalc' Lexical Analyzer | |
201 | ----------------------------- | |
202 | ||
203 | The lexical analyzer's job is low-level parsing: converting | |
204 | characters or sequences of characters into tokens. The Bison parser | |
205 | gets its tokens by calling the lexical analyzer. *Note The Lexical | |
206 | Analyzer Function `yylex': Lexical. | |
207 | ||
208 | Only a simple lexical analyzer is needed for the RPN calculator. | |
209 | This lexical analyzer skips blanks and tabs, then reads in numbers as | |
210 | `double' and returns them as `NUM' tokens. Any other character that | |
211 | isn't part of a number is a separate token. Note that the token-code | |
212 | for such a single-character token is the character itself. | |
213 | ||
214 | The return value of the lexical analyzer function is a numeric code | |
215 | which represents a token type. The same text used in Bison rules to | |
216 | stand for this token type is also a C expression for the numeric code | |
217 | for the type. This works in two ways. If the token type is a | |
218 | character literal, then its numeric code is the ASCII code for that | |
219 | character; you can use the same character literal in the lexical | |
220 | analyzer to express the number. If the token type is an identifier, | |
221 | that identifier is defined by Bison as a C macro whose definition is | |
222 | the appropriate number. In this example, therefore, `NUM' becomes a | |
223 | macro for `yylex' to use. | |
224 | ||
225 | The semantic value of the token (if it has one) is stored into the | |
226 | global variable `yylval', which is where the Bison parser will look for | |
227 | it. (The C data type of `yylval' is `YYSTYPE', which was defined at | |
228 | the beginning of the grammar; *note Declarations for `rpcalc': Rpcalc | |
229 | Decls..) | |
230 | ||
231 | A token type code of zero is returned if the end-of-file is | |
232 | encountered. (Bison recognizes any nonpositive value as indicating the | |
233 | end of the input.) | |
234 | ||
235 | Here is the code for the lexical analyzer: | |
236 | ||
237 | /* Lexical analyzer returns a double floating point | |
238 | number on the stack and the token NUM, or the ASCII | |
239 | character read if not a number. Skips all blanks | |
240 | and tabs, returns 0 for EOF. */ | |
241 | ||
242 | #include <ctype.h> | |
243 | ||
244 | int | |
245 | yylex (void) | |
246 | { | |
247 | int c; | |
248 | ||
249 | /* skip white space */ | |
250 | while ((c = getchar ()) == ' ' || c == '\t') | |
251 | ; | |
252 | /* process numbers */ | |
253 | if (c == '.' || isdigit (c)) | |
254 | { | |
255 | ungetc (c, stdin); | |
256 | scanf ("%lf", &yylval); | |
257 | return NUM; | |
258 | } | |
259 | /* return end-of-file */ | |
260 | if (c == EOF) | |
261 | return 0; | |
262 | /* return single chars */ | |
263 | return c; | |
264 | } | |
265 | ||
266 | \1f | |
267 | File: bison.info, Node: Rpcalc Main, Next: Rpcalc Error, Prev: Rpcalc Lexer, Up: RPN Calc | |
268 | ||
269 | The Controlling Function | |
270 | ------------------------ | |
271 | ||
272 | In keeping with the spirit of this example, the controlling function | |
273 | is kept to the bare minimum. The only requirement is that it call | |
274 | `yyparse' to start the process of parsing. | |
275 | ||
276 | int | |
277 | main (void) | |
278 | { | |
279 | return yyparse (); | |
280 | } | |
281 | ||
282 | \1f | |
283 | File: bison.info, Node: Rpcalc Error, Next: Rpcalc Gen, Prev: Rpcalc Main, Up: RPN Calc | |
284 | ||
285 | The Error Reporting Routine | |
286 | --------------------------- | |
287 | ||
288 | When `yyparse' detects a syntax error, it calls the error reporting | |
289 | function `yyerror' to print an error message (usually but not always | |
290 | `"parse error"'). It is up to the programmer to supply `yyerror' | |
291 | (*note Parser C-Language Interface: Interface.), so here is the | |
292 | definition we will use: | |
293 | ||
294 | #include <stdio.h> | |
295 | ||
296 | void | |
297 | yyerror (const char *s) /* Called by yyparse on error */ | |
298 | { | |
299 | printf ("%s\n", s); | |
300 | } | |
301 | ||
302 | After `yyerror' returns, the Bison parser may recover from the error | |
303 | and continue parsing if the grammar contains a suitable error rule | |
304 | (*note Error Recovery::). Otherwise, `yyparse' returns nonzero. We | |
305 | have not written any error rules in this example, so any invalid input | |
306 | will cause the calculator program to exit. This is not clean behavior | |
307 | for a real calculator, but it is adequate for the first example. | |
308 | ||
309 | \1f | |
310 | File: bison.info, Node: Rpcalc Gen, Next: Rpcalc Compile, Prev: Rpcalc Error, Up: RPN Calc | |
311 | ||
312 | Running Bison to Make the Parser | |
313 | -------------------------------- | |
314 | ||
315 | Before running Bison to produce a parser, we need to decide how to | |
316 | arrange all the source code in one or more source files. For such a | |
317 | simple example, the easiest thing is to put everything in one file. The | |
318 | definitions of `yylex', `yyerror' and `main' go at the end, in the | |
319 | "additional C code" section of the file (*note The Overall Layout of a | |
320 | Bison Grammar: Grammar Layout.). | |
321 | ||
322 | For a large project, you would probably have several source files, | |
323 | and use `make' to arrange to recompile them. | |
324 | ||
325 | With all the source in a single file, you use the following command | |
326 | to convert it into a parser file: | |
327 | ||
328 | bison FILE_NAME.y | |
329 | ||
330 | In this example the file was called `rpcalc.y' (for "Reverse Polish | |
331 | CALCulator"). Bison produces a file named `FILE_NAME.tab.c', removing | |
332 | the `.y' from the original file name. The file output by Bison contains | |
333 | the source code for `yyparse'. The additional functions in the input | |
334 | file (`yylex', `yyerror' and `main') are copied verbatim to the output. | |
335 | ||
336 | \1f | |
337 | File: bison.info, Node: Rpcalc Compile, Prev: Rpcalc Gen, Up: RPN Calc | |
338 | ||
339 | Compiling the Parser File | |
340 | ------------------------- | |
341 | ||
342 | Here is how to compile and run the parser file: | |
343 | ||
344 | # List files in current directory. | |
345 | % ls | |
346 | rpcalc.tab.c rpcalc.y | |
347 | ||
348 | # Compile the Bison parser. | |
349 | # `-lm' tells compiler to search math library for `pow'. | |
350 | % cc rpcalc.tab.c -lm -o rpcalc | |
351 | ||
352 | # List files again. | |
353 | % ls | |
354 | rpcalc rpcalc.tab.c rpcalc.y | |
355 | ||
356 | The file `rpcalc' now contains the executable code. Here is an | |
357 | example session using `rpcalc'. | |
358 | ||
359 | % rpcalc | |
360 | 4 9 + | |
361 | 13 | |
362 | 3 7 + 3 4 5 *+- | |
363 | -13 | |
364 | 3 7 + 3 4 5 * + - n Note the unary minus, `n' | |
365 | 13 | |
366 | 5 6 / 4 n + | |
367 | -3.166666667 | |
368 | 3 4 ^ Exponentiation | |
369 | 81 | |
370 | ^D End-of-file indicator | |
371 | % | |
372 | ||
373 | \1f | |
374 | File: bison.info, Node: Infix Calc, Next: Simple Error Recovery, Prev: RPN Calc, Up: Examples | |
375 | ||
376 | Infix Notation Calculator: `calc' | |
377 | ================================= | |
378 | ||
379 | We now modify rpcalc to handle infix operators instead of postfix. | |
380 | Infix notation involves the concept of operator precedence and the need | |
381 | for parentheses nested to arbitrary depth. Here is the Bison code for | |
382 | `calc.y', an infix desk-top calculator. | |
383 | ||
384 | /* Infix notation calculator--calc */ | |
385 | ||
386 | %{ | |
387 | #define YYSTYPE double | |
388 | #include <math.h> | |
389 | %} | |
390 | ||
391 | /* BISON Declarations */ | |
392 | %token NUM | |
393 | %left '-' '+' | |
394 | %left '*' '/' | |
395 | %left NEG /* negation--unary minus */ | |
396 | %right '^' /* exponentiation */ | |
397 | ||
398 | /* Grammar follows */ | |
399 | %% | |
400 | input: /* empty string */ | |
401 | | input line | |
402 | ; | |
403 | ||
404 | line: '\n' | |
405 | | exp '\n' { printf ("\t%.10g\n", $1); } | |
406 | ; | |
407 | ||
408 | exp: NUM { $$ = $1; } | |
409 | | exp '+' exp { $$ = $1 + $3; } | |
410 | | exp '-' exp { $$ = $1 - $3; } | |
411 | | exp '*' exp { $$ = $1 * $3; } | |
412 | | exp '/' exp { $$ = $1 / $3; } | |
413 | | '-' exp %prec NEG { $$ = -$2; } | |
414 | | exp '^' exp { $$ = pow ($1, $3); } | |
415 | | '(' exp ')' { $$ = $2; } | |
416 | ; | |
417 | %% | |
418 | ||
419 | The functions `yylex', `yyerror' and `main' can be the same as before. | |
420 | ||
421 | There are two important new features shown in this code. | |
422 | ||
423 | In the second section (Bison declarations), `%left' declares token | |
424 | types and says they are left-associative operators. The declarations | |
425 | `%left' and `%right' (right associativity) take the place of `%token' | |
426 | which is used to declare a token type name without associativity. | |
427 | (These tokens are single-character literals, which ordinarily don't | |
428 | need to be declared. We declare them here to specify the | |
429 | associativity.) | |
430 | ||
431 | Operator precedence is determined by the line ordering of the | |
432 | declarations; the higher the line number of the declaration (lower on | |
433 | the page or screen), the higher the precedence. Hence, exponentiation | |
434 | has the highest precedence, unary minus (`NEG') is next, followed by | |
435 | `*' and `/', and so on. *Note Operator Precedence: Precedence. | |
436 | ||
437 | The other important new feature is the `%prec' in the grammar section | |
438 | for the unary minus operator. The `%prec' simply instructs Bison that | |
439 | the rule `| '-' exp' has the same precedence as `NEG'--in this case the | |
440 | next-to-highest. *Note Context-Dependent Precedence: Contextual | |
441 | Precedence. | |
442 | ||
443 | Here is a sample run of `calc.y': | |
444 | ||
445 | % calc | |
446 | 4 + 4.5 - (34/(8*3+-3)) | |
447 | 6.880952381 | |
448 | -56 + 2 | |
449 | -54 | |
450 | 3 ^ 2 | |
451 | 9 | |
452 | ||
453 | \1f | |
454 | File: bison.info, Node: Simple Error Recovery, Next: Multi-function Calc, Prev: Infix Calc, Up: Examples | |
455 | ||
456 | Simple Error Recovery | |
457 | ===================== | |
458 | ||
459 | Up to this point, this manual has not addressed the issue of "error | |
460 | recovery"--how to continue parsing after the parser detects a syntax | |
461 | error. All we have handled is error reporting with `yyerror'. Recall | |
462 | that by default `yyparse' returns after calling `yyerror'. This means | |
463 | that an erroneous input line causes the calculator program to exit. | |
464 | Now we show how to rectify this deficiency. | |
465 | ||
466 | The Bison language itself includes the reserved word `error', which | |
467 | may be included in the grammar rules. In the example below it has been | |
468 | added to one of the alternatives for `line': | |
469 | ||
470 | line: '\n' | |
471 | | exp '\n' { printf ("\t%.10g\n", $1); } | |
472 | | error '\n' { yyerrok; } | |
473 | ; | |
474 | ||
475 | This addition to the grammar allows for simple error recovery in the | |
476 | event of a parse error. If an expression that cannot be evaluated is | |
477 | read, the error will be recognized by the third rule for `line', and | |
478 | parsing will continue. (The `yyerror' function is still called upon to | |
479 | print its message as well.) The action executes the statement | |
480 | `yyerrok', a macro defined automatically by Bison; its meaning is that | |
481 | error recovery is complete (*note Error Recovery::). Note the | |
482 | difference between `yyerrok' and `yyerror'; neither one is a misprint. | |
483 | ||
484 | This form of error recovery deals with syntax errors. There are | |
485 | other kinds of errors; for example, division by zero, which raises an | |
486 | exception signal that is normally fatal. A real calculator program | |
487 | must handle this signal and use `longjmp' to return to `main' and | |
488 | resume parsing input lines; it would also have to discard the rest of | |
489 | the current line of input. We won't discuss this issue further because | |
490 | it is not specific to Bison programs. | |
491 | ||
492 | \1f | |
493 | File: bison.info, Node: Multi-function Calc, Next: Exercises, Prev: Simple Error Recovery, Up: Examples | |
494 | ||
495 | Multi-Function Calculator: `mfcalc' | |
496 | =================================== | |
497 | ||
498 | Now that the basics of Bison have been discussed, it is time to move | |
499 | on to a more advanced problem. The above calculators provided only five | |
500 | functions, `+', `-', `*', `/' and `^'. It would be nice to have a | |
501 | calculator that provides other mathematical functions such as `sin', | |
502 | `cos', etc. | |
503 | ||
504 | It is easy to add new operators to the infix calculator as long as | |
505 | they are only single-character literals. The lexical analyzer `yylex' | |
506 | passes back all nonnumber characters as tokens, so new grammar rules | |
507 | suffice for adding a new operator. But we want something more | |
508 | flexible: built-in functions whose syntax has this form: | |
509 | ||
510 | FUNCTION_NAME (ARGUMENT) | |
511 | ||
512 | At the same time, we will add memory to the calculator, by allowing you | |
513 | to create named variables, store values in them, and use them later. | |
514 | Here is a sample session with the multi-function calculator: | |
515 | ||
516 | % mfcalc | |
517 | pi = 3.141592653589 | |
518 | 3.1415926536 | |
519 | sin(pi) | |
520 | 0.0000000000 | |
521 | alpha = beta1 = 2.3 | |
522 | 2.3000000000 | |
523 | alpha | |
524 | 2.3000000000 | |
525 | ln(alpha) | |
526 | 0.8329091229 | |
527 | exp(ln(beta1)) | |
528 | 2.3000000000 | |
529 | % | |
530 | ||
531 | Note that multiple assignment and nested function calls are | |
532 | permitted. | |
533 | ||
534 | * Menu: | |
535 | ||
536 | * Decl: Mfcalc Decl. Bison declarations for multi-function calculator. | |
537 | * Rules: Mfcalc Rules. Grammar rules for the calculator. | |
538 | * Symtab: Mfcalc Symtab. Symbol table management subroutines. | |
539 | ||
540 | \1f | |
541 | File: bison.info, Node: Mfcalc Decl, Next: Mfcalc Rules, Up: Multi-function Calc | |
542 | ||
543 | Declarations for `mfcalc' | |
544 | ------------------------- | |
545 | ||
546 | Here are the C and Bison declarations for the multi-function | |
547 | calculator. | |
548 | ||
549 | %{ | |
550 | #include <math.h> /* For math functions, cos(), sin(), etc. */ | |
551 | #include "calc.h" /* Contains definition of `symrec' */ | |
552 | %} | |
553 | %union { | |
554 | double val; /* For returning numbers. */ | |
555 | symrec *tptr; /* For returning symbol-table pointers */ | |
556 | } | |
557 | ||
558 | %token <val> NUM /* Simple double precision number */ | |
559 | %token <tptr> VAR FNCT /* Variable and Function */ | |
560 | %type <val> exp | |
561 | ||
562 | %right '=' | |
563 | %left '-' '+' | |
564 | %left '*' '/' | |
565 | %left NEG /* Negation--unary minus */ | |
566 | %right '^' /* Exponentiation */ | |
567 | ||
568 | /* Grammar follows */ | |
569 | ||
570 | %% | |
571 | ||
572 | The above grammar introduces only two new features of the Bison | |
573 | language. These features allow semantic values to have various data | |
574 | types (*note More Than One Value Type: Multiple Types.). | |
575 | ||
576 | The `%union' declaration specifies the entire list of possible types; | |
577 | this is instead of defining `YYSTYPE'. The allowable types are now | |
578 | double-floats (for `exp' and `NUM') and pointers to entries in the | |
579 | symbol table. *Note The Collection of Value Types: Union Decl. | |
580 | ||
581 | Since values can now have various types, it is necessary to | |
582 | associate a type with each grammar symbol whose semantic value is used. | |
583 | These symbols are `NUM', `VAR', `FNCT', and `exp'. Their declarations | |
584 | are augmented with information about their data type (placed between | |
585 | angle brackets). | |
586 | ||
587 | The Bison construct `%type' is used for declaring nonterminal | |
588 | symbols, just as `%token' is used for declaring token types. We have | |
589 | not used `%type' before because nonterminal symbols are normally | |
590 | declared implicitly by the rules that define them. But `exp' must be | |
591 | declared explicitly so we can specify its value type. *Note | |
592 | Nonterminal Symbols: Type Decl. | |
593 | ||
594 | \1f | |
595 | File: bison.info, Node: Mfcalc Rules, Next: Mfcalc Symtab, Prev: Mfcalc Decl, Up: Multi-function Calc | |
596 | ||
597 | Grammar Rules for `mfcalc' | |
598 | -------------------------- | |
599 | ||
600 | Here are the grammar rules for the multi-function calculator. Most | |
601 | of them are copied directly from `calc'; three rules, those which | |
602 | mention `VAR' or `FNCT', are new. | |
603 | ||
604 | input: /* empty */ | |
605 | | input line | |
606 | ; | |
607 | ||
608 | line: | |
609 | '\n' | |
610 | | exp '\n' { printf ("\t%.10g\n", $1); } | |
611 | | error '\n' { yyerrok; } | |
612 | ; | |
613 | ||
614 | exp: NUM { $$ = $1; } | |
615 | | VAR { $$ = $1->value.var; } | |
616 | | VAR '=' exp { $$ = $3; $1->value.var = $3; } | |
617 | | FNCT '(' exp ')' { $$ = (*($1->value.fnctptr))($3); } | |
618 | | exp '+' exp { $$ = $1 + $3; } | |
619 | | exp '-' exp { $$ = $1 - $3; } | |
620 | | exp '*' exp { $$ = $1 * $3; } | |
621 | | exp '/' exp { $$ = $1 / $3; } | |
622 | | '-' exp %prec NEG { $$ = -$2; } | |
623 | | exp '^' exp { $$ = pow ($1, $3); } | |
624 | | '(' exp ')' { $$ = $2; } | |
625 | ; | |
626 | /* End of grammar */ | |
627 | %% | |
628 | ||
629 | \1f | |
630 | File: bison.info, Node: Mfcalc Symtab, Prev: Mfcalc Rules, Up: Multi-function Calc | |
631 | ||
632 | The `mfcalc' Symbol Table | |
633 | ------------------------- | |
634 | ||
635 | The multi-function calculator requires a symbol table to keep track | |
636 | of the names and meanings of variables and functions. This doesn't | |
637 | affect the grammar rules (except for the actions) or the Bison | |
638 | declarations, but it requires some additional C functions for support. | |
639 | ||
640 | The symbol table itself consists of a linked list of records. Its | |
641 | definition, which is kept in the header `calc.h', is as follows. It | |
642 | provides for either functions or variables to be placed in the table. | |
643 | ||
1e24cc5b AD |
644 | /* Fonctions type. */ |
645 | typedef double (*func_t) (double); | |
646 | ||
705db0b5 AD |
647 | /* Data type for links in the chain of symbols. */ |
648 | struct symrec | |
649 | { | |
650 | char *name; /* name of symbol */ | |
651 | int type; /* type of symbol: either VAR or FNCT */ | |
1e24cc5b AD |
652 | union |
653 | { | |
654 | double var; /* value of a VAR */ | |
655 | func_t fnctptr; /* value of a FNCT */ | |
705db0b5 AD |
656 | } value; |
657 | struct symrec *next; /* link field */ | |
658 | }; | |
659 | ||
660 | typedef struct symrec symrec; | |
661 | ||
662 | /* The symbol table: a chain of `struct symrec'. */ | |
663 | extern symrec *sym_table; | |
664 | ||
1e24cc5b AD |
665 | symrec *putsym (const char *, func_t); |
666 | symrec *getsym (const char *); | |
705db0b5 AD |
667 | |
668 | The new version of `main' includes a call to `init_table', a | |
669 | function that initializes the symbol table. Here it is, and | |
670 | `init_table' as well: | |
671 | ||
672 | #include <stdio.h> | |
673 | ||
674 | int | |
675 | main (void) | |
676 | { | |
677 | init_table (); | |
678 | return yyparse (); | |
679 | } | |
680 | ||
681 | void | |
682 | yyerror (const char *s) /* Called by yyparse on error */ | |
683 | { | |
684 | printf ("%s\n", s); | |
685 | } | |
686 | ||
687 | struct init | |
688 | { | |
689 | char *fname; | |
1e24cc5b | 690 | double (*fnct)(double); |
705db0b5 AD |
691 | }; |
692 | ||
693 | struct init arith_fncts[] = | |
694 | { | |
1e24cc5b AD |
695 | "sin", sin, |
696 | "cos", cos, | |
705db0b5 | 697 | "atan", atan, |
1e24cc5b AD |
698 | "ln", log, |
699 | "exp", exp, | |
705db0b5 AD |
700 | "sqrt", sqrt, |
701 | 0, 0 | |
702 | }; | |
703 | ||
704 | /* The symbol table: a chain of `struct symrec'. */ | |
1e24cc5b | 705 | symrec *sym_table = (symrec *) 0; |
705db0b5 AD |
706 | |
707 | /* Put arithmetic functions in table. */ | |
708 | void | |
709 | init_table (void) | |
710 | { | |
711 | int i; | |
712 | symrec *ptr; | |
713 | for (i = 0; arith_fncts[i].fname != 0; i++) | |
714 | { | |
715 | ptr = putsym (arith_fncts[i].fname, FNCT); | |
716 | ptr->value.fnctptr = arith_fncts[i].fnct; | |
717 | } | |
718 | } | |
719 | ||
720 | By simply editing the initialization list and adding the necessary | |
721 | include files, you can add additional functions to the calculator. | |
722 | ||
723 | Two important functions allow look-up and installation of symbols in | |
724 | the symbol table. The function `putsym' is passed a name and the type | |
725 | (`VAR' or `FNCT') of the object to be installed. The object is linked | |
726 | to the front of the list, and a pointer to the object is returned. The | |
727 | function `getsym' is passed the name of the symbol to look up. If | |
728 | found, a pointer to that symbol is returned; otherwise zero is returned. | |
729 | ||
730 | symrec * | |
731 | putsym (char *sym_name, int sym_type) | |
732 | { | |
733 | symrec *ptr; | |
734 | ptr = (symrec *) malloc (sizeof (symrec)); | |
735 | ptr->name = (char *) malloc (strlen (sym_name) + 1); | |
736 | strcpy (ptr->name,sym_name); | |
737 | ptr->type = sym_type; | |
738 | ptr->value.var = 0; /* set value to 0 even if fctn. */ | |
739 | ptr->next = (struct symrec *)sym_table; | |
740 | sym_table = ptr; | |
741 | return ptr; | |
742 | } | |
743 | ||
744 | symrec * | |
745 | getsym (const char *sym_name) | |
746 | { | |
747 | symrec *ptr; | |
748 | for (ptr = sym_table; ptr != (symrec *) 0; | |
749 | ptr = (symrec *)ptr->next) | |
750 | if (strcmp (ptr->name,sym_name) == 0) | |
751 | return ptr; | |
752 | return 0; | |
753 | } | |
754 | ||
755 | The function `yylex' must now recognize variables, numeric values, | |
756 | and the single-character arithmetic operators. Strings of alphanumeric | |
757 | characters with a leading non-digit are recognized as either variables | |
758 | or functions depending on what the symbol table says about them. | |
759 | ||
760 | The string is passed to `getsym' for look up in the symbol table. If | |
761 | the name appears in the table, a pointer to its location and its type | |
762 | (`VAR' or `FNCT') is returned to `yyparse'. If it is not already in | |
763 | the table, then it is installed as a `VAR' using `putsym'. Again, a | |
764 | pointer and its type (which must be `VAR') is returned to `yyparse'. | |
765 | ||
766 | No change is needed in the handling of numeric values and arithmetic | |
767 | operators in `yylex'. | |
768 | ||
769 | #include <ctype.h> | |
770 | ||
771 | int | |
772 | yylex (void) | |
773 | { | |
774 | int c; | |
775 | ||
776 | /* Ignore whitespace, get first nonwhite character. */ | |
777 | while ((c = getchar ()) == ' ' || c == '\t'); | |
778 | ||
779 | if (c == EOF) | |
780 | return 0; | |
781 | ||
782 | /* Char starts a number => parse the number. */ | |
783 | if (c == '.' || isdigit (c)) | |
784 | { | |
785 | ungetc (c, stdin); | |
786 | scanf ("%lf", &yylval.val); | |
787 | return NUM; | |
788 | } | |
789 | ||
790 | /* Char starts an identifier => read the name. */ | |
791 | if (isalpha (c)) | |
792 | { | |
793 | symrec *s; | |
794 | static char *symbuf = 0; | |
795 | static int length = 0; | |
796 | int i; | |
797 | ||
798 | /* Initially make the buffer long enough | |
799 | for a 40-character symbol name. */ | |
800 | if (length == 0) | |
801 | length = 40, symbuf = (char *)malloc (length + 1); | |
802 | ||
803 | i = 0; | |
804 | do | |
805 | { | |
806 | /* If buffer is full, make it bigger. */ | |
807 | if (i == length) | |
808 | { | |
809 | length *= 2; | |
810 | symbuf = (char *)realloc (symbuf, length + 1); | |
811 | } | |
812 | /* Add this character to the buffer. */ | |
813 | symbuf[i++] = c; | |
814 | /* Get another character. */ | |
815 | c = getchar (); | |
816 | } | |
817 | while (c != EOF && isalnum (c)); | |
818 | ||
819 | ungetc (c, stdin); | |
820 | symbuf[i] = '\0'; | |
821 | ||
822 | s = getsym (symbuf); | |
823 | if (s == 0) | |
824 | s = putsym (symbuf, VAR); | |
825 | yylval.tptr = s; | |
826 | return s->type; | |
827 | } | |
828 | ||
829 | /* Any other character is a token by itself. */ | |
830 | return c; | |
831 | } | |
832 | ||
833 | This program is both powerful and flexible. You may easily add new | |
834 | functions, and it is a simple job to modify this code to install | |
835 | predefined variables such as `pi' or `e' as well. | |
836 | ||
837 | \1f | |
838 | File: bison.info, Node: Exercises, Prev: Multi-function Calc, Up: Examples | |
839 | ||
840 | Exercises | |
841 | ========= | |
842 | ||
843 | 1. Add some new functions from `math.h' to the initialization list. | |
844 | ||
845 | 2. Add another array that contains constants and their values. Then | |
846 | modify `init_table' to add these constants to the symbol table. | |
847 | It will be easiest to give the constants type `VAR'. | |
848 | ||
849 | 3. Make the program report an error if the user refers to an | |
850 | uninitialized variable in any way except to store a value in it. | |
851 | ||
852 | \1f | |
853 | File: bison.info, Node: Grammar File, Next: Interface, Prev: Examples, Up: Top | |
854 | ||
855 | Bison Grammar Files | |
856 | ******************* | |
857 | ||
858 | Bison takes as input a context-free grammar specification and | |
859 | produces a C-language function that recognizes correct instances of the | |
860 | grammar. | |
861 | ||
862 | The Bison grammar input file conventionally has a name ending in | |
863 | `.y'. | |
864 | ||
865 | * Menu: | |
866 | ||
867 | * Grammar Outline:: Overall layout of the grammar file. | |
868 | * Symbols:: Terminal and nonterminal symbols. | |
869 | * Rules:: How to write grammar rules. | |
870 | * Recursion:: Writing recursive rules. | |
871 | * Semantics:: Semantic values and actions. | |
872 | * Declarations:: All kinds of Bison declarations are described here. | |
873 | * Multiple Parsers:: Putting more than one Bison parser in one program. | |
874 | ||
875 | \1f | |
876 | File: bison.info, Node: Grammar Outline, Next: Symbols, Up: Grammar File | |
877 | ||
878 | Outline of a Bison Grammar | |
879 | ========================== | |
880 | ||
881 | A Bison grammar file has four main sections, shown here with the | |
882 | appropriate delimiters: | |
883 | ||
884 | %{ | |
885 | C DECLARATIONS | |
886 | %} | |
887 | ||
888 | BISON DECLARATIONS | |
889 | ||
890 | %% | |
891 | GRAMMAR RULES | |
892 | %% | |
893 | ||
894 | ADDITIONAL C CODE | |
895 | ||
896 | Comments enclosed in `/* ... */' may appear in any of the sections. | |
897 | ||
898 | * Menu: | |
899 | ||
900 | * C Declarations:: Syntax and usage of the C declarations section. | |
901 | * Bison Declarations:: Syntax and usage of the Bison declarations section. | |
902 | * Grammar Rules:: Syntax and usage of the grammar rules section. | |
903 | * C Code:: Syntax and usage of the additional C code section. | |
904 | ||
905 | \1f | |
906 | File: bison.info, Node: C Declarations, Next: Bison Declarations, Up: Grammar Outline | |
907 | ||
908 | The C Declarations Section | |
909 | -------------------------- | |
910 | ||
911 | The C DECLARATIONS section contains macro definitions and | |
912 | declarations of functions and variables that are used in the actions in | |
913 | the grammar rules. These are copied to the beginning of the parser | |
914 | file so that they precede the definition of `yyparse'. You can use | |
915 | `#include' to get the declarations from a header file. If you don't | |
916 | need any C declarations, you may omit the `%{' and `%}' delimiters that | |
917 | bracket this section. | |
918 | ||
919 | \1f | |
920 | File: bison.info, Node: Bison Declarations, Next: Grammar Rules, Prev: C Declarations, Up: Grammar Outline | |
921 | ||
922 | The Bison Declarations Section | |
923 | ------------------------------ | |
924 | ||
925 | The BISON DECLARATIONS section contains declarations that define | |
926 | terminal and nonterminal symbols, specify precedence, and so on. In | |
927 | some simple grammars you may not need any declarations. *Note Bison | |
928 | Declarations: Declarations. | |
929 | ||
930 | \1f | |
931 | File: bison.info, Node: Grammar Rules, Next: C Code, Prev: Bison Declarations, Up: Grammar Outline | |
932 | ||
933 | The Grammar Rules Section | |
934 | ------------------------- | |
935 | ||
936 | The "grammar rules" section contains one or more Bison grammar | |
937 | rules, and nothing else. *Note Syntax of Grammar Rules: Rules. | |
938 | ||
939 | There must always be at least one grammar rule, and the first `%%' | |
940 | (which precedes the grammar rules) may never be omitted even if it is | |
941 | the first thing in the file. | |
942 | ||
943 | \1f | |
944 | File: bison.info, Node: C Code, Prev: Grammar Rules, Up: Grammar Outline | |
945 | ||
946 | The Additional C Code Section | |
947 | ----------------------------- | |
948 | ||
949 | The ADDITIONAL C CODE section is copied verbatim to the end of the | |
950 | parser file, just as the C DECLARATIONS section is copied to the | |
951 | beginning. This is the most convenient place to put anything that you | |
952 | want to have in the parser file but which need not come before the | |
953 | definition of `yyparse'. For example, the definitions of `yylex' and | |
954 | `yyerror' often go here. *Note Parser C-Language Interface: Interface. | |
955 | ||
956 | If the last section is empty, you may omit the `%%' that separates it | |
957 | from the grammar rules. | |
958 | ||
959 | The Bison parser itself contains many static variables whose names | |
960 | start with `yy' and many macros whose names start with `YY'. It is a | |
961 | good idea to avoid using any such names (except those documented in this | |
962 | manual) in the additional C code section of the grammar file. | |
963 | ||
964 | \1f | |
965 | File: bison.info, Node: Symbols, Next: Rules, Prev: Grammar Outline, Up: Grammar File | |
966 | ||
967 | Symbols, Terminal and Nonterminal | |
968 | ================================= | |
969 | ||
970 | "Symbols" in Bison grammars represent the grammatical classifications | |
971 | of the language. | |
972 | ||
973 | A "terminal symbol" (also known as a "token type") represents a | |
974 | class of syntactically equivalent tokens. You use the symbol in grammar | |
975 | rules to mean that a token in that class is allowed. The symbol is | |
976 | represented in the Bison parser by a numeric code, and the `yylex' | |
977 | function returns a token type code to indicate what kind of token has | |
978 | been read. You don't need to know what the code value is; you can use | |
979 | the symbol to stand for it. | |
980 | ||
981 | A "nonterminal symbol" stands for a class of syntactically equivalent | |
982 | groupings. The symbol name is used in writing grammar rules. By | |
983 | convention, it should be all lower case. | |
984 | ||
985 | Symbol names can contain letters, digits (not at the beginning), | |
986 | underscores and periods. Periods make sense only in nonterminals. | |
987 | ||
988 | There are three ways of writing terminal symbols in the grammar: | |
989 | ||
990 | * A "named token type" is written with an identifier, like an | |
991 | identifier in C. By convention, it should be all upper case. Each | |
992 | such name must be defined with a Bison declaration such as | |
993 | `%token'. *Note Token Type Names: Token Decl. | |
994 | ||
995 | * A "character token type" (or "literal character token") is written | |
996 | in the grammar using the same syntax used in C for character | |
997 | constants; for example, `'+'' is a character token type. A | |
998 | character token type doesn't need to be declared unless you need to | |
999 | specify its semantic value data type (*note Data Types of Semantic | |
1000 | Values: Value Type.), associativity, or precedence (*note Operator | |
1001 | Precedence: Precedence.). | |
1002 | ||
1003 | By convention, a character token type is used only to represent a | |
1004 | token that consists of that particular character. Thus, the token | |
1005 | type `'+'' is used to represent the character `+' as a token. | |
1006 | Nothing enforces this convention, but if you depart from it, your | |
1007 | program will confuse other readers. | |
1008 | ||
1009 | All the usual escape sequences used in character literals in C can | |
1010 | be used in Bison as well, but you must not use the null character | |
1011 | as a character literal because its ASCII code, zero, is the code | |
1012 | `yylex' returns for end-of-input (*note Calling Convention for | |
1013 | `yylex': Calling Convention.). | |
1014 | ||
1015 | * A "literal string token" is written like a C string constant; for | |
1016 | example, `"<="' is a literal string token. A literal string token | |
1017 | doesn't need to be declared unless you need to specify its semantic | |
1018 | value data type (*note Value Type::), associativity, or precedence | |
1019 | (*note Precedence::). | |
1020 | ||
1021 | You can associate the literal string token with a symbolic name as | |
1022 | an alias, using the `%token' declaration (*note Token | |
1023 | Declarations: Token Decl.). If you don't do that, the lexical | |
1024 | analyzer has to retrieve the token number for the literal string | |
1025 | token from the `yytname' table (*note Calling Convention::). | |
1026 | ||
1027 | *WARNING*: literal string tokens do not work in Yacc. | |
1028 | ||
1029 | By convention, a literal string token is used only to represent a | |
1030 | token that consists of that particular string. Thus, you should | |
1031 | use the token type `"<="' to represent the string `<=' as a token. | |
1032 | Bison does not enforce this convention, but if you depart from | |
1033 | it, people who read your program will be confused. | |
1034 | ||
1035 | All the escape sequences used in string literals in C can be used | |
1036 | in Bison as well. A literal string token must contain two or more | |
1037 | characters; for a token containing just one character, use a | |
1038 | character token (see above). | |
1039 | ||
1040 | How you choose to write a terminal symbol has no effect on its | |
1041 | grammatical meaning. That depends only on where it appears in rules and | |
1042 | on when the parser function returns that symbol. | |
1043 | ||
1044 | The value returned by `yylex' is always one of the terminal symbols | |
1045 | (or 0 for end-of-input). Whichever way you write the token type in the | |
1046 | grammar rules, you write it the same way in the definition of `yylex'. | |
1047 | The numeric code for a character token type is simply the ASCII code for | |
1048 | the character, so `yylex' can use the identical character constant to | |
1049 | generate the requisite code. Each named token type becomes a C macro in | |
1050 | the parser file, so `yylex' can use the name to stand for the code. | |
1051 | (This is why periods don't make sense in terminal symbols.) *Note | |
1052 | Calling Convention for `yylex': Calling Convention. | |
1053 | ||
1054 | If `yylex' is defined in a separate file, you need to arrange for the | |
1055 | token-type macro definitions to be available there. Use the `-d' | |
1056 | option when you run Bison, so that it will write these macro definitions | |
1057 | into a separate header file `NAME.tab.h' which you can include in the | |
1058 | other source files that need it. *Note Invoking Bison: Invocation. | |
1059 | ||
1060 | The symbol `error' is a terminal symbol reserved for error recovery | |
1061 | (*note Error Recovery::); you shouldn't use it for any other purpose. | |
1062 | In particular, `yylex' should never return this value. | |
1063 | ||
1064 | \1f | |
1065 | File: bison.info, Node: Rules, Next: Recursion, Prev: Symbols, Up: Grammar File | |
1066 | ||
1067 | Syntax of Grammar Rules | |
1068 | ======================= | |
1069 | ||
1070 | A Bison grammar rule has the following general form: | |
1071 | ||
1072 | RESULT: COMPONENTS... | |
1073 | ; | |
1074 | ||
1075 | where RESULT is the nonterminal symbol that this rule describes, and | |
1076 | COMPONENTS are various terminal and nonterminal symbols that are put | |
1077 | together by this rule (*note Symbols::). | |
1078 | ||
1079 | For example, | |
1080 | ||
1081 | exp: exp '+' exp | |
1082 | ; | |
1083 | ||
1084 | says that two groupings of type `exp', with a `+' token in between, can | |
1085 | be combined into a larger grouping of type `exp'. | |
1086 | ||
1087 | Whitespace in rules is significant only to separate symbols. You | |
1088 | can add extra whitespace as you wish. | |
1089 | ||
1090 | Scattered among the components can be ACTIONS that determine the | |
1091 | semantics of the rule. An action looks like this: | |
1092 | ||
1093 | {C STATEMENTS} | |
1094 | ||
1095 | Usually there is only one action and it follows the components. *Note | |
1096 | Actions::. | |
1097 | ||
1098 | Multiple rules for the same RESULT can be written separately or can | |
1099 | be joined with the vertical-bar character `|' as follows: | |
1100 | ||
1101 | RESULT: RULE1-COMPONENTS... | |
1102 | | RULE2-COMPONENTS... | |
1103 | ... | |
1104 | ; | |
1105 | ||
1106 | They are still considered distinct rules even when joined in this way. | |
1107 | ||
1108 | If COMPONENTS in a rule is empty, it means that RESULT can match the | |
1109 | empty string. For example, here is how to define a comma-separated | |
1110 | sequence of zero or more `exp' groupings: | |
1111 | ||
1112 | expseq: /* empty */ | |
1113 | | expseq1 | |
1114 | ; | |
1115 | ||
1116 | expseq1: exp | |
1117 | | expseq1 ',' exp | |
1118 | ; | |
1119 | ||
1120 | It is customary to write a comment `/* empty */' in each rule with no | |
1121 | components. | |
1122 | ||
1123 | \1f | |
1124 | File: bison.info, Node: Recursion, Next: Semantics, Prev: Rules, Up: Grammar File | |
1125 | ||
1126 | Recursive Rules | |
1127 | =============== | |
1128 | ||
1129 | A rule is called "recursive" when its RESULT nonterminal appears | |
1130 | also on its right hand side. Nearly all Bison grammars need to use | |
1131 | recursion, because that is the only way to define a sequence of any | |
1132 | number of a particular thing. Consider this recursive definition of a | |
1133 | comma-separated sequence of one or more expressions: | |
1134 | ||
1135 | expseq1: exp | |
1136 | | expseq1 ',' exp | |
1137 | ; | |
1138 | ||
1139 | Since the recursive use of `expseq1' is the leftmost symbol in the | |
1140 | right hand side, we call this "left recursion". By contrast, here the | |
1141 | same construct is defined using "right recursion": | |
1142 | ||
1143 | expseq1: exp | |
1144 | | exp ',' expseq1 | |
1145 | ; | |
1146 | ||
1147 | Any kind of sequence can be defined using either left recursion or | |
1148 | right recursion, but you should always use left recursion, because it | |
1149 | can parse a sequence of any number of elements with bounded stack | |
1150 | space. Right recursion uses up space on the Bison stack in proportion | |
1151 | to the number of elements in the sequence, because all the elements | |
1152 | must be shifted onto the stack before the rule can be applied even | |
1153 | once. *Note The Bison Parser Algorithm: Algorithm, for further | |
1154 | explanation of this. | |
1155 | ||
1156 | "Indirect" or "mutual" recursion occurs when the result of the rule | |
1157 | does not appear directly on its right hand side, but does appear in | |
1158 | rules for other nonterminals which do appear on its right hand side. | |
1159 | ||
1160 | For example: | |
1161 | ||
1162 | expr: primary | |
1163 | | primary '+' primary | |
1164 | ; | |
1165 | ||
1166 | primary: constant | |
1167 | | '(' expr ')' | |
1168 | ; | |
1169 | ||
1170 | defines two mutually-recursive nonterminals, since each refers to the | |
1171 | other. | |
1172 | ||
1173 | \1f | |
1174 | File: bison.info, Node: Semantics, Next: Declarations, Prev: Recursion, Up: Grammar File | |
1175 | ||
1176 | Defining Language Semantics | |
1177 | =========================== | |
1178 | ||
1179 | The grammar rules for a language determine only the syntax. The | |
1180 | semantics are determined by the semantic values associated with various | |
1181 | tokens and groupings, and by the actions taken when various groupings | |
1182 | are recognized. | |
1183 | ||
1184 | For example, the calculator calculates properly because the value | |
1185 | associated with each expression is the proper number; it adds properly | |
1186 | because the action for the grouping `X + Y' is to add the numbers | |
1187 | associated with X and Y. | |
1188 | ||
1189 | * Menu: | |
1190 | ||
1191 | * Value Type:: Specifying one data type for all semantic values. | |
1192 | * Multiple Types:: Specifying several alternative data types. | |
1193 | * Actions:: An action is the semantic definition of a grammar rule. | |
1194 | * Action Types:: Specifying data types for actions to operate on. | |
1195 | * Mid-Rule Actions:: Most actions go at the end of a rule. | |
1196 | This says when, why and how to use the exceptional | |
1197 | action in the middle of a rule. | |
1198 | ||
1199 | \1f | |
1200 | File: bison.info, Node: Value Type, Next: Multiple Types, Up: Semantics | |
1201 | ||
1202 | Data Types of Semantic Values | |
1203 | ----------------------------- | |
1204 | ||
1205 | In a simple program it may be sufficient to use the same data type | |
1206 | for the semantic values of all language constructs. This was true in | |
1207 | the RPN and infix calculator examples (*note Reverse Polish Notation | |
1208 | Calculator: RPN Calc.). | |
1209 | ||
1210 | Bison's default is to use type `int' for all semantic values. To | |
1211 | specify some other type, define `YYSTYPE' as a macro, like this: | |
1212 | ||
1213 | #define YYSTYPE double | |
1214 | ||
1215 | This macro definition must go in the C declarations section of the | |
1216 | grammar file (*note Outline of a Bison Grammar: Grammar Outline.). | |
1217 | ||
1218 | \1f | |
1219 | File: bison.info, Node: Multiple Types, Next: Actions, Prev: Value Type, Up: Semantics | |
1220 | ||
1221 | More Than One Value Type | |
1222 | ------------------------ | |
1223 | ||
1224 | In most programs, you will need different data types for different | |
1225 | kinds of tokens and groupings. For example, a numeric constant may | |
1226 | need type `int' or `long', while a string constant needs type `char *', | |
1227 | and an identifier might need a pointer to an entry in the symbol table. | |
1228 | ||
1229 | To use more than one data type for semantic values in one parser, | |
1230 | Bison requires you to do two things: | |
1231 | ||
1232 | * Specify the entire collection of possible data types, with the | |
1233 | `%union' Bison declaration (*note The Collection of Value Types: | |
1234 | Union Decl.). | |
1235 | ||
1236 | * Choose one of those types for each symbol (terminal or | |
1237 | nonterminal) for which semantic values are used. This is done for | |
1238 | tokens with the `%token' Bison declaration (*note Token Type | |
1239 | Names: Token Decl.) and for groupings with the `%type' Bison | |
1240 | declaration (*note Nonterminal Symbols: Type Decl.). | |
1241 | ||
1242 | \1f | |
1243 | File: bison.info, Node: Actions, Next: Action Types, Prev: Multiple Types, Up: Semantics | |
1244 | ||
1245 | Actions | |
1246 | ------- | |
1247 | ||
1248 | An action accompanies a syntactic rule and contains C code to be | |
1249 | executed each time an instance of that rule is recognized. The task of | |
1250 | most actions is to compute a semantic value for the grouping built by | |
1251 | the rule from the semantic values associated with tokens or smaller | |
1252 | groupings. | |
1253 | ||
1254 | An action consists of C statements surrounded by braces, much like a | |
1255 | compound statement in C. It can be placed at any position in the rule; | |
1256 | it is executed at that position. Most rules have just one action at | |
1257 | the end of the rule, following all the components. Actions in the | |
1258 | middle of a rule are tricky and used only for special purposes (*note | |
1259 | Actions in Mid-Rule: Mid-Rule Actions.). | |
1260 | ||
1261 | The C code in an action can refer to the semantic values of the | |
1262 | components matched by the rule with the construct `$N', which stands for | |
1263 | the value of the Nth component. The semantic value for the grouping | |
1264 | being constructed is `$$'. (Bison translates both of these constructs | |
1265 | into array element references when it copies the actions into the parser | |
1266 | file.) | |
1267 | ||
1268 | Here is a typical example: | |
1269 | ||
1270 | exp: ... | |
1271 | | exp '+' exp | |
1272 | { $$ = $1 + $3; } | |
1273 | ||
1274 | This rule constructs an `exp' from two smaller `exp' groupings | |
1275 | connected by a plus-sign token. In the action, `$1' and `$3' refer to | |
1276 | the semantic values of the two component `exp' groupings, which are the | |
1277 | first and third symbols on the right hand side of the rule. The sum is | |
1278 | stored into `$$' so that it becomes the semantic value of the | |
1279 | addition-expression just recognized by the rule. If there were a | |
1280 | useful semantic value associated with the `+' token, it could be | |
1281 | referred to as `$2'. | |
1282 | ||
1283 | If you don't specify an action for a rule, Bison supplies a default: | |
1284 | `$$ = $1'. Thus, the value of the first symbol in the rule becomes the | |
1285 | value of the whole rule. Of course, the default rule is valid only if | |
1286 | the two data types match. There is no meaningful default action for an | |
1287 | empty rule; every empty rule must have an explicit action unless the | |
1288 | rule's value does not matter. | |
1289 | ||
1290 | `$N' with N zero or negative is allowed for reference to tokens and | |
1291 | groupings on the stack _before_ those that match the current rule. | |
1292 | This is a very risky practice, and to use it reliably you must be | |
1293 | certain of the context in which the rule is applied. Here is a case in | |
1294 | which you can use this reliably: | |
1295 | ||
1296 | foo: expr bar '+' expr { ... } | |
1297 | | expr bar '-' expr { ... } | |
1298 | ; | |
1299 | ||
1300 | bar: /* empty */ | |
1301 | { previous_expr = $0; } | |
1302 | ; | |
1303 | ||
1304 | As long as `bar' is used only in the fashion shown here, `$0' always | |
1305 | refers to the `expr' which precedes `bar' in the definition of `foo'. | |
1306 | ||
1307 | \1f | |
1308 | File: bison.info, Node: Action Types, Next: Mid-Rule Actions, Prev: Actions, Up: Semantics | |
1309 | ||
1310 | Data Types of Values in Actions | |
1311 | ------------------------------- | |
1312 | ||
1313 | If you have chosen a single data type for semantic values, the `$$' | |
1314 | and `$N' constructs always have that data type. | |
1315 | ||
1316 | If you have used `%union' to specify a variety of data types, then | |
1317 | you must declare a choice among these types for each terminal or | |
1318 | nonterminal symbol that can have a semantic value. Then each time you | |
1319 | use `$$' or `$N', its data type is determined by which symbol it refers | |
1320 | to in the rule. In this example, | |
1321 | ||
1322 | exp: ... | |
1323 | | exp '+' exp | |
1324 | { $$ = $1 + $3; } | |
1325 | ||
1326 | `$1' and `$3' refer to instances of `exp', so they all have the data | |
1327 | type declared for the nonterminal symbol `exp'. If `$2' were used, it | |
1328 | would have the data type declared for the terminal symbol `'+'', | |
1329 | whatever that might be. | |
1330 | ||
1331 | Alternatively, you can specify the data type when you refer to the | |
1332 | value, by inserting `<TYPE>' after the `$' at the beginning of the | |
1333 | reference. For example, if you have defined types as shown here: | |
1334 | ||
1335 | %union { | |
1336 | int itype; | |
1337 | double dtype; | |
1338 | } | |
1339 | ||
1340 | then you can write `$<itype>1' to refer to the first subunit of the | |
1341 | rule as an integer, or `$<dtype>1' to refer to it as a double. | |
1342 |