]> git.saurik.com Git - bison.git/blame - doc/bison.texinfo
* src/output.c (output_gram): Rely on nritems, not the
[bison.git] / doc / bison.texinfo
CommitLineData
bfa74976
RS
1\input texinfo @c -*-texinfo-*-
2@comment %**start of header
3@setfilename bison.info
df1af54c
JT
4@include version.texi
5@settitle Bison @value{VERSION}
bfa74976
RS
6@setchapternewpage odd
7
5378c3e7
DM
8@iftex
9@finalout
10@end iftex
11
13863333 12@c SMALL BOOK version
bfa74976 13@c This edition has been formatted so that you can format and print it in
13863333 14@c the smallbook format.
bfa74976
RS
15@c @smallbook
16
bfa74976
RS
17@c Set following if you have the new `shorttitlepage' command
18@c @clear shorttitlepage-enabled
19@c @set shorttitlepage-enabled
20
21@c ISPELL CHECK: done, 14 Jan 1993 --bob
22
23@c Check COPYRIGHT dates. should be updated in the titlepage, ifinfo
24@c titlepage; should NOT be changed in the GPL. --mew
25
26@iftex
27@syncodeindex fn cp
28@syncodeindex vr cp
29@syncodeindex tp cp
30@end iftex
31@ifinfo
32@synindex fn cp
33@synindex vr cp
34@synindex tp cp
35@end ifinfo
36@comment %**end of header
37
bd773d73
JT
38@ifinfo
39@format
40START-INFO-DIR-ENTRY
41* bison: (bison). GNU Project parser generator (yacc replacement).
42END-INFO-DIR-ENTRY
43@end format
44@end ifinfo
45
bfa74976
RS
46@ifinfo
47This file documents the Bison parser generator.
48
79282c6c
AD
49Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
502000, 2001
13863333 51Free Software Foundation, Inc.
bfa74976
RS
52
53Permission is granted to make and distribute verbatim copies of
54this manual provided the copyright notice and this permission notice
55are preserved on all copies.
56
57@ignore
58Permission is granted to process this file through Tex and print the
59results, provided the printed document carries copying permission
60notice identical to this one except for the removal of this paragraph
61(this paragraph not being relevant to the printed manual).
62
63@end ignore
64Permission is granted to copy and distribute modified versions of this
65manual under the conditions for verbatim copying, provided also that the
66sections entitled ``GNU General Public License'' and ``Conditions for
67Using Bison'' are included exactly as in the original, and provided that
68the entire resulting derived work is distributed under the terms of a
69permission notice identical to this one.
70
71Permission is granted to copy and distribute translations of this manual
72into another language, under the above conditions for modified versions,
73except that the sections entitled ``GNU General Public License'',
74``Conditions for Using Bison'' and this permission notice may be
75included in translations approved by the Free Software Foundation
76instead of in the original English.
77@end ifinfo
78
79@ifset shorttitlepage-enabled
80@shorttitlepage Bison
81@end ifset
82@titlepage
83@title Bison
84@subtitle The YACC-compatible Parser Generator
df1af54c 85@subtitle @value{UPDATED}, Bison Version @value{VERSION}
bfa74976
RS
86
87@author by Charles Donnelly and Richard Stallman
88
89@page
90@vskip 0pt plus 1filll
13863333 91Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
79282c6c 921999, 2000, 2001
13863333 93Free Software Foundation, Inc.
bfa74976
RS
94
95@sp 2
96Published by the Free Software Foundation @*
931c7513
RS
9759 Temple Place, Suite 330 @*
98Boston, MA 02111-1307 USA @*
9ecbd125
JT
99Printed copies are available from the Free Software Foundation.@*
100ISBN 1-882114-44-2
bfa74976
RS
101
102Permission is granted to make and distribute verbatim copies of
103this manual provided the copyright notice and this permission notice
104are preserved on all copies.
105
106@ignore
107Permission is granted to process this file through TeX and print the
108results, provided the printed document carries copying permission
109notice identical to this one except for the removal of this paragraph
110(this paragraph not being relevant to the printed manual).
111
112@end ignore
113Permission is granted to copy and distribute modified versions of this
114manual under the conditions for verbatim copying, provided also that the
115sections entitled ``GNU General Public License'' and ``Conditions for
116Using Bison'' are included exactly as in the original, and provided that
117the entire resulting derived work is distributed under the terms of a
118permission notice identical to this one.
119
120Permission is granted to copy and distribute translations of this manual
121into another language, under the above conditions for modified versions,
122except that the sections entitled ``GNU General Public License'',
123``Conditions for Using Bison'' and this permission notice may be
124included in translations approved by the Free Software Foundation
125instead of in the original English.
126@sp 2
127Cover art by Etienne Suvasa.
128@end titlepage
d5796688
JT
129
130@contents
bfa74976 131
342b8b6e
AD
132@ifnottex
133@node Top
134@top Bison
bfa74976 135
342b8b6e
AD
136This manual documents version @value{VERSION} of Bison, updated
137@value{UPDATED}.
138@end ifnottex
bfa74976
RS
139
140@menu
13863333
AD
141* Introduction::
142* Conditions::
bfa74976
RS
143* Copying:: The GNU General Public License says
144 how you can copy and share Bison
145
146Tutorial sections:
147* Concepts:: Basic concepts for understanding Bison.
148* Examples:: Three simple explained examples of using Bison.
149
150Reference sections:
151* Grammar File:: Writing Bison declarations and rules.
152* Interface:: C-language interface to the parser function @code{yyparse}.
153* Algorithm:: How the Bison parser works at run-time.
154* Error Recovery:: Writing rules for error recovery.
155* Context Dependency:: What to do if your language syntax is too
156 messy for Bison to handle straightforwardly.
157* Debugging:: Debugging Bison parsers that parse wrong.
158* Invocation:: How to run Bison (to produce the parser source file).
159* Table of Symbols:: All the keywords of the Bison language are explained.
160* Glossary:: Basic concepts are explained.
f2b5126e 161* Copying This Manual:: License for copying this manual.
bfa74976
RS
162* Index:: Cross-references to the text.
163
342b8b6e 164@detailmenu --- The Detailed Node Listing ---
bfa74976
RS
165
166The Concepts of Bison
167
168* Language and Grammar:: Languages and context-free grammars,
169 as mathematical ideas.
170* Grammar in Bison:: How we represent grammars for Bison's sake.
171* Semantic Values:: Each token or syntactic grouping can have
172 a semantic value (the value of an integer,
173 the name of an identifier, etc.).
174* Semantic Actions:: Each rule can have an action containing C code.
175* Bison Parser:: What are Bison's input and output,
176 how is the output used?
177* Stages:: Stages in writing and running Bison grammars.
178* Grammar Layout:: Overall structure of a Bison grammar file.
179
180Examples
181
182* RPN Calc:: Reverse polish notation calculator;
183 a first example with no operator precedence.
184* Infix Calc:: Infix (algebraic) notation calculator.
185 Operator precedence is introduced.
186* Simple Error Recovery:: Continuing after syntax errors.
342b8b6e 187* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
bfa74976
RS
188* Multi-function Calc:: Calculator with memory and trig functions.
189 It uses multiple data-types for semantic values.
190* Exercises:: Ideas for improving the multi-function calculator.
191
192Reverse Polish Notation Calculator
193
75f5aaea 194* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc.
bfa74976
RS
195* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
196* Lexer: Rpcalc Lexer. The lexical analyzer.
197* Main: Rpcalc Main. The controlling function.
198* Error: Rpcalc Error. The error reporting function.
199* Gen: Rpcalc Gen. Running Bison on the grammar file.
200* Comp: Rpcalc Compile. Run the C compiler on the output code.
201
202Grammar Rules for @code{rpcalc}
203
13863333
AD
204* Rpcalc Input::
205* Rpcalc Line::
206* Rpcalc Expr::
bfa74976 207
342b8b6e
AD
208Location Tracking Calculator: @code{ltcalc}
209
210* Decls: Ltcalc Decls. Bison and C declarations for ltcalc.
211* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations.
212* Lexer: Ltcalc Lexer. The lexical analyzer.
213
bfa74976
RS
214Multi-Function Calculator: @code{mfcalc}
215
216* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
217* Rules: Mfcalc Rules. Grammar rules for the calculator.
218* Symtab: Mfcalc Symtab. Symbol table management subroutines.
219
220Bison Grammar Files
221
222* Grammar Outline:: Overall layout of the grammar file.
223* Symbols:: Terminal and nonterminal symbols.
224* Rules:: How to write grammar rules.
225* Recursion:: Writing recursive rules.
226* Semantics:: Semantic values and actions.
227* Declarations:: All kinds of Bison declarations are described here.
228* Multiple Parsers:: Putting more than one Bison parser in one program.
229
230Outline of a Bison Grammar
231
75f5aaea 232* Prologue:: Syntax and usage of the prologue (declarations section).
bfa74976
RS
233* Bison Declarations:: Syntax and usage of the Bison declarations section.
234* Grammar Rules:: Syntax and usage of the grammar rules section.
75f5aaea 235* Epilogue:: Syntax and usage of the epilogue (additional code section).
bfa74976
RS
236
237Defining Language Semantics
238
239* Value Type:: Specifying one data type for all semantic values.
240* Multiple Types:: Specifying several alternative data types.
241* Actions:: An action is the semantic definition of a grammar rule.
242* Action Types:: Specifying data types for actions to operate on.
243* Mid-Rule Actions:: Most actions go at the end of a rule.
244 This says when, why and how to use the exceptional
245 action in the middle of a rule.
246
247Bison Declarations
248
249* Token Decl:: Declaring terminal symbols.
250* Precedence Decl:: Declaring terminals with precedence and associativity.
251* Union Decl:: Declaring the set of all semantic value types.
252* Type Decl:: Declaring the choice of type for a nonterminal symbol.
253* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
254* Start Decl:: Specifying the start symbol.
255* Pure Decl:: Requesting a reentrant parser.
256* Decl Summary:: Table of all Bison declarations.
257
258Parser C-Language Interface
259
260* Parser Function:: How to call @code{yyparse} and what it returns.
13863333 261* Lexical:: You must supply a function @code{yylex}
bfa74976
RS
262 which reads tokens.
263* Error Reporting:: You must supply a function @code{yyerror}.
264* Action Features:: Special features for use in actions.
265
266The Lexical Analyzer Function @code{yylex}
267
268* Calling Convention:: How @code{yyparse} calls @code{yylex}.
269* Token Values:: How @code{yylex} must return the semantic value
270 of the token it has read.
271* Token Positions:: How @code{yylex} must return the text position
272 (line number, etc.) of the token, if the
273 actions want that.
274* Pure Calling:: How the calling convention differs
275 in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
276
13863333 277The Bison Parser Algorithm
bfa74976
RS
278
279* Look-Ahead:: Parser looks one token ahead when deciding what to do.
280* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
281* Precedence:: Operator precedence works by resolving conflicts.
282* Contextual Precedence:: When an operator's precedence depends on context.
283* Parser States:: The parser is a finite-state-machine with stack.
284* Reduce/Reduce:: When two rules are applicable in the same situation.
285* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
286* Stack Overflow:: What happens when stack gets full. How to avoid it.
287
288Operator Precedence
289
290* Why Precedence:: An example showing why precedence is needed.
291* Using Precedence:: How to specify precedence in Bison grammars.
292* Precedence Examples:: How these features are used in the previous example.
293* How Precedence:: How they work.
294
295Handling Context Dependencies
296
297* Semantic Tokens:: Token parsing can depend on the semantic context.
298* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
299* Tie-in Recovery:: Lexical tie-ins have implications for how
300 error recovery rules must be written.
301
302Invoking Bison
303
13863333 304* Bison Options:: All the options described in detail,
bfa74976
RS
305 in alphabetical order by short options.
306* Option Cross Key:: Alphabetical list of long options.
307* VMS Invocation:: Bison command syntax on VMS.
f2b5126e
PB
308
309Copying This Manual
310
311* GNU Free Documentation License:: License for copying this manual.
312
342b8b6e 313@end detailmenu
bfa74976
RS
314@end menu
315
342b8b6e 316@node Introduction
bfa74976
RS
317@unnumbered Introduction
318@cindex introduction
319
320@dfn{Bison} is a general-purpose parser generator that converts a
321grammar description for an LALR(1) context-free grammar into a C
322program to parse that grammar. Once you are proficient with Bison,
323you may use it to develop a wide range of language parsers, from those
324used in simple desk calculators to complex programming languages.
325
326Bison is upward compatible with Yacc: all properly-written Yacc grammars
327ought to work with Bison with no change. Anyone familiar with Yacc
328should be able to use Bison with little trouble. You need to be fluent in
329C programming in order to use Bison or to understand this manual.
330
331We begin with tutorial chapters that explain the basic concepts of using
332Bison and show three explained examples, each building on the last. If you
333don't know Bison or Yacc, start by reading these chapters. Reference
334chapters follow which describe specific aspects of Bison in detail.
335
931c7513
RS
336Bison was written primarily by Robert Corbett; Richard Stallman made it
337Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added
14ded682 338multi-character string literals and other features.
931c7513 339
df1af54c 340This edition corresponds to version @value{VERSION} of Bison.
bfa74976 341
342b8b6e 342@node Conditions
bfa74976
RS
343@unnumbered Conditions for Using Bison
344
a31239f1 345As of Bison version 1.24, we have changed the distribution terms for
9ecbd125 346@code{yyparse} to permit using Bison's output in nonfree programs.
a31239f1
RS
347Formerly, Bison parsers could be used only in programs that were free
348software.
349
350The other GNU programming tools, such as the GNU C compiler, have never
9ecbd125 351had such a requirement. They could always be used for nonfree
a31239f1
RS
352software. The reason Bison was different was not due to a special
353policy decision; it resulted from applying the usual General Public
354License to all of the Bison source code.
355
356The output of the Bison utility---the Bison parser file---contains a
357verbatim copy of a sizable piece of Bison, which is the code for the
358@code{yyparse} function. (The actions from your grammar are inserted
359into this function at one point, but the rest of the function is not
360changed.) When we applied the GPL terms to the code for @code{yyparse},
361the effect was to restrict the use of Bison output to free software.
362
363We didn't change the terms because of sympathy for people who want to
364make software proprietary. @strong{Software should be free.} But we
365concluded that limiting Bison's use to free software was doing little to
366encourage people to make other software free. So we decided to make the
367practical conditions for using Bison match the practical conditions for
368using the other GNU tools.
bfa74976 369
c67a198d 370@include gpl.texi
bfa74976 371
342b8b6e 372@node Concepts
bfa74976
RS
373@chapter The Concepts of Bison
374
375This chapter introduces many of the basic concepts without which the
376details of Bison will not make sense. If you do not already know how to
377use Bison or Yacc, we suggest you start by reading this chapter carefully.
378
379@menu
380* Language and Grammar:: Languages and context-free grammars,
381 as mathematical ideas.
382* Grammar in Bison:: How we represent grammars for Bison's sake.
383* Semantic Values:: Each token or syntactic grouping can have
384 a semantic value (the value of an integer,
385 the name of an identifier, etc.).
386* Semantic Actions:: Each rule can have an action containing C code.
847bf1f5 387* Locations Overview:: Tracking Locations.
bfa74976
RS
388* Bison Parser:: What are Bison's input and output,
389 how is the output used?
390* Stages:: Stages in writing and running Bison grammars.
391* Grammar Layout:: Overall structure of a Bison grammar file.
392@end menu
393
342b8b6e 394@node Language and Grammar
bfa74976
RS
395@section Languages and Context-Free Grammars
396
bfa74976
RS
397@cindex context-free grammar
398@cindex grammar, context-free
399In order for Bison to parse a language, it must be described by a
400@dfn{context-free grammar}. This means that you specify one or more
401@dfn{syntactic groupings} and give rules for constructing them from their
402parts. For example, in the C language, one kind of grouping is called an
403`expression'. One rule for making an expression might be, ``An expression
404can be made of a minus sign and another expression''. Another would be,
405``An expression can be an integer''. As you can see, rules are often
406recursive, but there must be at least one rule which leads out of the
407recursion.
408
409@cindex BNF
410@cindex Backus-Naur form
411The most common formal system for presenting such rules for humans to read
412is @dfn{Backus-Naur Form} or ``BNF'', which was developed in order to
413specify the language Algol 60. Any grammar expressed in BNF is a
414context-free grammar. The input to Bison is essentially machine-readable
415BNF.
416
417Not all context-free languages can be handled by Bison, only those
418that are LALR(1). In brief, this means that it must be possible to
419tell how to parse any portion of an input string with just a single
420token of look-ahead. Strictly speaking, that is a description of an
421LR(1) grammar, and LALR(1) involves additional restrictions that are
422hard to explain simply; but it is rare in actual practice to find an
423LR(1) grammar that fails to be LALR(1). @xref{Mystery Conflicts, ,
424Mysterious Reduce/Reduce Conflicts}, for more information on this.
425
426@cindex symbols (abstract)
427@cindex token
428@cindex syntactic grouping
429@cindex grouping, syntactic
430In the formal grammatical rules for a language, each kind of syntactic unit
431or grouping is named by a @dfn{symbol}. Those which are built by grouping
432smaller constructs according to grammatical rules are called
433@dfn{nonterminal symbols}; those which can't be subdivided are called
434@dfn{terminal symbols} or @dfn{token types}. We call a piece of input
435corresponding to a single terminal symbol a @dfn{token}, and a piece
436corresponding to a single nonterminal symbol a @dfn{grouping}.@refill
437
438We can use the C language as an example of what symbols, terminal and
439nonterminal, mean. The tokens of C are identifiers, constants (numeric and
440string), and the various keywords, arithmetic operators and punctuation
441marks. So the terminal symbols of a grammar for C include `identifier',
442`number', `string', plus one symbol for each keyword, operator or
443punctuation mark: `if', `return', `const', `static', `int', `char',
444`plus-sign', `open-brace', `close-brace', `comma' and many more. (These
445tokens can be subdivided into characters, but that is a matter of
446lexicography, not grammar.)
447
448Here is a simple C function subdivided into tokens:
449
9edcd895
AD
450@ifinfo
451@example
452int /* @r{keyword `int'} */
453square (int x) /* @r{identifier, open-paren, identifier,}
454 @r{identifier, close-paren} */
455@{ /* @r{open-brace} */
456 return x * x; /* @r{keyword `return', identifier, asterisk,
457 identifier, semicolon} */
458@} /* @r{close-brace} */
459@end example
460@end ifinfo
461@ifnotinfo
bfa74976
RS
462@example
463int /* @r{keyword `int'} */
9edcd895 464square (int x) /* @r{identifier, open-paren, identifier, identifier, close-paren} */
bfa74976 465@{ /* @r{open-brace} */
9edcd895 466 return x * x; /* @r{keyword `return', identifier, asterisk, identifier, semicolon} */
bfa74976
RS
467@} /* @r{close-brace} */
468@end example
9edcd895 469@end ifnotinfo
bfa74976
RS
470
471The syntactic groupings of C include the expression, the statement, the
472declaration, and the function definition. These are represented in the
473grammar of C by nonterminal symbols `expression', `statement',
474`declaration' and `function definition'. The full grammar uses dozens of
475additional language constructs, each with its own nonterminal symbol, in
476order to express the meanings of these four. The example above is a
477function definition; it contains one declaration, and one statement. In
478the statement, each @samp{x} is an expression and so is @samp{x * x}.
479
480Each nonterminal symbol must have grammatical rules showing how it is made
481out of simpler constructs. For example, one kind of C statement is the
482@code{return} statement; this would be described with a grammar rule which
483reads informally as follows:
484
485@quotation
486A `statement' can be made of a `return' keyword, an `expression' and a
487`semicolon'.
488@end quotation
489
490@noindent
491There would be many other rules for `statement', one for each kind of
492statement in C.
493
494@cindex start symbol
495One nonterminal symbol must be distinguished as the special one which
496defines a complete utterance in the language. It is called the @dfn{start
497symbol}. In a compiler, this means a complete input program. In the C
498language, the nonterminal symbol `sequence of definitions and declarations'
499plays this role.
500
501For example, @samp{1 + 2} is a valid C expression---a valid part of a C
502program---but it is not valid as an @emph{entire} C program. In the
503context-free grammar of C, this follows from the fact that `expression' is
504not the start symbol.
505
506The Bison parser reads a sequence of tokens as its input, and groups the
507tokens using the grammar rules. If the input is valid, the end result is
508that the entire token sequence reduces to a single grouping whose symbol is
509the grammar's start symbol. If we use a grammar for C, the entire input
510must be a `sequence of definitions and declarations'. If not, the parser
511reports a syntax error.
512
342b8b6e 513@node Grammar in Bison
bfa74976
RS
514@section From Formal Rules to Bison Input
515@cindex Bison grammar
516@cindex grammar, Bison
517@cindex formal grammar
518
519A formal grammar is a mathematical construct. To define the language
520for Bison, you must write a file expressing the grammar in Bison syntax:
521a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}.
522
523A nonterminal symbol in the formal grammar is represented in Bison input
524as an identifier, like an identifier in C. By convention, it should be
525in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
526
527The Bison representation for a terminal symbol is also called a @dfn{token
528type}. Token types as well can be represented as C-like identifiers. By
529convention, these identifiers should be upper case to distinguish them from
530nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
531@code{RETURN}. A terminal symbol that stands for a particular keyword in
532the language should be named after that keyword converted to upper case.
533The terminal symbol @code{error} is reserved for error recovery.
931c7513 534@xref{Symbols}.
bfa74976
RS
535
536A terminal symbol can also be represented as a character literal, just like
537a C character constant. You should do this whenever a token is just a
538single character (parenthesis, plus-sign, etc.): use that same character in
539a literal as the terminal symbol for that token.
540
931c7513
RS
541A third way to represent a terminal symbol is with a C string constant
542containing several characters. @xref{Symbols}, for more information.
543
bfa74976
RS
544The grammar rules also have an expression in Bison syntax. For example,
545here is the Bison rule for a C @code{return} statement. The semicolon in
546quotes is a literal character token, representing part of the C syntax for
547the statement; the naked semicolon, and the colon, are Bison punctuation
548used in every rule.
549
550@example
551stmt: RETURN expr ';'
552 ;
553@end example
554
555@noindent
556@xref{Rules, ,Syntax of Grammar Rules}.
557
342b8b6e 558@node Semantic Values
bfa74976
RS
559@section Semantic Values
560@cindex semantic value
561@cindex value, semantic
562
563A formal grammar selects tokens only by their classifications: for example,
564if a rule mentions the terminal symbol `integer constant', it means that
565@emph{any} integer constant is grammatically valid in that position. The
566precise value of the constant is irrelevant to how to parse the input: if
567@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
568grammatical.@refill
569
570But the precise value is very important for what the input means once it is
571parsed. A compiler is useless if it fails to distinguish between 4, 1 and
5723989 as constants in the program! Therefore, each token in a Bison grammar
573has both a token type and a @dfn{semantic value}. @xref{Semantics, ,Defining Language Semantics},
574for details.
575
576The token type is a terminal symbol defined in the grammar, such as
577@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything
578you need to know to decide where the token may validly appear and how to
579group it with other tokens. The grammar rules know nothing about tokens
580except their types.@refill
581
582The semantic value has all the rest of the information about the
583meaning of the token, such as the value of an integer, or the name of an
584identifier. (A token such as @code{','} which is just punctuation doesn't
585need to have any semantic value.)
586
587For example, an input token might be classified as token type
588@code{INTEGER} and have the semantic value 4. Another input token might
589have the same token type @code{INTEGER} but value 3989. When a grammar
590rule says that @code{INTEGER} is allowed, either of these tokens is
591acceptable because each is an @code{INTEGER}. When the parser accepts the
592token, it keeps track of the token's semantic value.
593
594Each grouping can also have a semantic value as well as its nonterminal
595symbol. For example, in a calculator, an expression typically has a
596semantic value that is a number. In a compiler for a programming
597language, an expression typically has a semantic value that is a tree
598structure describing the meaning of the expression.
599
342b8b6e 600@node Semantic Actions
bfa74976
RS
601@section Semantic Actions
602@cindex semantic actions
603@cindex actions, semantic
604
605In order to be useful, a program must do more than parse input; it must
606also produce some output based on the input. In a Bison grammar, a grammar
607rule can have an @dfn{action} made up of C statements. Each time the
608parser recognizes a match for that rule, the action is executed.
609@xref{Actions}.
13863333 610
bfa74976
RS
611Most of the time, the purpose of an action is to compute the semantic value
612of the whole construct from the semantic values of its parts. For example,
613suppose we have a rule which says an expression can be the sum of two
614expressions. When the parser recognizes such a sum, each of the
615subexpressions has a semantic value which describes how it was built up.
616The action for this rule should create a similar sort of value for the
617newly recognized larger expression.
618
619For example, here is a rule that says an expression can be the sum of
620two subexpressions:
621
622@example
623expr: expr '+' expr @{ $$ = $1 + $3; @}
624 ;
625@end example
626
627@noindent
628The action says how to produce the semantic value of the sum expression
629from the values of the two subexpressions.
630
342b8b6e 631@node Locations Overview
847bf1f5
AD
632@section Locations
633@cindex location
634@cindex textual position
635@cindex position, textual
636
637Many applications, like interpreters or compilers, have to produce verbose
638and useful error messages. To achieve this, one must be able to keep track of
639the @dfn{textual position}, or @dfn{location}, of each syntactic construct.
640Bison provides a mechanism for handling these locations.
641
642Each token has a semantic value. In a similar fashion, each token has an
643associated location, but the type of locations is the same for all tokens and
644groupings. Moreover, the output parser is equipped with a default data
645structure for storing locations (@pxref{Locations}, for more details).
646
647Like semantic values, locations can be reached in actions using a dedicated
648set of constructs. In the example above, the location of the whole grouping
649is @code{@@$}, while the locations of the subexpressions are @code{@@1} and
650@code{@@3}.
651
652When a rule is matched, a default action is used to compute the semantic value
653of its left hand side (@pxref{Actions}). In the same way, another default
654action is used for locations. However, the action for locations is general
655enough for most cases, meaning there is usually no need to describe for each
656rule how @code{@@$} should be formed. When building a new location for a given
657grouping, the default behavior of the output parser is to take the beginning
658of the first symbol, and the end of the last symbol.
659
342b8b6e 660@node Bison Parser
bfa74976
RS
661@section Bison Output: the Parser File
662@cindex Bison parser
663@cindex Bison utility
664@cindex lexical analyzer, purpose
665@cindex parser
666
667When you run Bison, you give it a Bison grammar file as input. The output
668is a C source file that parses the language described by the grammar.
669This file is called a @dfn{Bison parser}. Keep in mind that the Bison
670utility and the Bison parser are two distinct programs: the Bison utility
671is a program whose output is the Bison parser that becomes part of your
672program.
673
674The job of the Bison parser is to group tokens into groupings according to
675the grammar rules---for example, to build identifiers and operators into
676expressions. As it does this, it runs the actions for the grammar rules it
677uses.
678
679The tokens come from a function called the @dfn{lexical analyzer} that you
680must supply in some fashion (such as by writing it in C). The Bison parser
681calls the lexical analyzer each time it wants a new token. It doesn't know
682what is ``inside'' the tokens (though their semantic values may reflect
683this). Typically the lexical analyzer makes the tokens by parsing
684characters of text, but Bison does not depend on this. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
685
686The Bison parser file is C code which defines a function named
687@code{yyparse} which implements that grammar. This function does not make
688a complete C program: you must supply some additional functions. One is
689the lexical analyzer. Another is an error-reporting function which the
690parser calls to report an error. In addition, a complete C program must
691start with a function called @code{main}; you have to provide this, and
692arrange for it to call @code{yyparse} or the parser will never run.
693@xref{Interface, ,Parser C-Language Interface}.
694
695Aside from the token type names and the symbols in the actions you
7093d0f5 696write, all symbols defined in the Bison parser file itself
bfa74976
RS
697begin with @samp{yy} or @samp{YY}. This includes interface functions
698such as the lexical analyzer function @code{yylex}, the error reporting
699function @code{yyerror} and the parser function @code{yyparse} itself.
700This also includes numerous identifiers used for internal purposes.
701Therefore, you should avoid using C identifiers starting with @samp{yy}
702or @samp{YY} in the Bison grammar file except for the ones defined in
703this manual.
704
7093d0f5
AD
705In some cases the Bison parser file includes system headers, and in
706those cases your code should respect the identifiers reserved by those
707headers. On some non-@sc{gnu} hosts, @code{<alloca.h>},
708@code{<stddef.h>}, and @code{<stdlib.h>} are included as needed to
02a81e05
PE
709declare memory allocators and related types. In the same situation,
710C++ parsers may include @code{<cstddef>} and @code{<cstdlib>} instead.
4947ebdb
PE
711Other system headers may be included if you define @code{YYDEBUG} to a
712nonzero value (@pxref{Debugging, ,Debugging Your Parser}).
7093d0f5 713
342b8b6e 714@node Stages
bfa74976
RS
715@section Stages in Using Bison
716@cindex stages in using Bison
717@cindex using Bison
718
719The actual language-design process using Bison, from grammar specification
720to a working compiler or interpreter, has these parts:
721
722@enumerate
723@item
724Formally specify the grammar in a form recognized by Bison
725(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule in the language,
726describe the action that is to be taken when an instance of that rule
727is recognized. The action is described by a sequence of C statements.
728
729@item
730Write a lexical analyzer to process input and pass tokens to the
731parser. The lexical analyzer may be written by hand in C
732(@pxref{Lexical, ,The Lexical Analyzer Function @code{yylex}}). It could also be produced using Lex, but the use
733of Lex is not discussed in this manual.
734
735@item
736Write a controlling function that calls the Bison-produced parser.
737
738@item
739Write error-reporting routines.
740@end enumerate
741
742To turn this source code as written into a runnable program, you
743must follow these steps:
744
745@enumerate
746@item
747Run Bison on the grammar to produce the parser.
748
749@item
750Compile the code output by Bison, as well as any other source files.
751
752@item
753Link the object files to produce the finished product.
754@end enumerate
755
342b8b6e 756@node Grammar Layout
bfa74976
RS
757@section The Overall Layout of a Bison Grammar
758@cindex grammar file
759@cindex file format
760@cindex format of grammar file
761@cindex layout of Bison grammar
762
763The input file for the Bison utility is a @dfn{Bison grammar file}. The
764general form of a Bison grammar file is as follows:
765
766@example
767%@{
75f5aaea 768@var{Prologue (declarations)}
bfa74976
RS
769%@}
770
771@var{Bison declarations}
772
773%%
774@var{Grammar rules}
775%%
75f5aaea 776@var{Epilogue (additional code)}
bfa74976
RS
777@end example
778
779@noindent
780The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
781in every Bison grammar file to separate the sections.
782
342b8b6e
AD
783The prologue may define types and variables used in the actions. You can
784also use preprocessor commands to define macros used there, and use
bfa74976
RS
785@code{#include} to include header files that do any of these things.
786
787The Bison declarations declare the names of the terminal and nonterminal
788symbols, and may also describe operator precedence and the data types of
789semantic values of various symbols.
790
791The grammar rules define how to construct each nonterminal symbol from its
792parts.
793
342b8b6e
AD
794The epilogue can contain any code you want to use. Often the definition of
795the lexical analyzer @code{yylex} goes here, plus subroutines called by the
796actions in the grammar rules. In a simple program, all the rest of the
75f5aaea 797program can go here.
bfa74976 798
342b8b6e 799@node Examples
bfa74976
RS
800@chapter Examples
801@cindex simple examples
802@cindex examples, simple
803
804Now we show and explain three sample programs written using Bison: a
805reverse polish notation calculator, an algebraic (infix) notation
806calculator, and a multi-function calculator. All three have been tested
807under BSD Unix 4.3; each produces a usable, though limited, interactive
808desk-top calculator.
809
810These examples are simple, but Bison grammars for real programming
811languages are written the same way.
812@ifinfo
813You can copy these examples out of the Info file and into a source file
814to try them.
815@end ifinfo
816
817@menu
818* RPN Calc:: Reverse polish notation calculator;
819 a first example with no operator precedence.
820* Infix Calc:: Infix (algebraic) notation calculator.
821 Operator precedence is introduced.
822* Simple Error Recovery:: Continuing after syntax errors.
342b8b6e 823* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
bfa74976
RS
824* Multi-function Calc:: Calculator with memory and trig functions.
825 It uses multiple data-types for semantic values.
826* Exercises:: Ideas for improving the multi-function calculator.
827@end menu
828
342b8b6e 829@node RPN Calc
bfa74976
RS
830@section Reverse Polish Notation Calculator
831@cindex reverse polish notation
832@cindex polish notation calculator
833@cindex @code{rpcalc}
834@cindex calculator, simple
835
836The first example is that of a simple double-precision @dfn{reverse polish
837notation} calculator (a calculator using postfix operators). This example
838provides a good starting point, since operator precedence is not an issue.
839The second example will illustrate how operator precedence is handled.
840
841The source code for this calculator is named @file{rpcalc.y}. The
842@samp{.y} extension is a convention used for Bison input files.
843
844@menu
75f5aaea 845* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc.
bfa74976
RS
846* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
847* Lexer: Rpcalc Lexer. The lexical analyzer.
848* Main: Rpcalc Main. The controlling function.
849* Error: Rpcalc Error. The error reporting function.
850* Gen: Rpcalc Gen. Running Bison on the grammar file.
851* Comp: Rpcalc Compile. Run the C compiler on the output code.
852@end menu
853
342b8b6e 854@node Rpcalc Decls
bfa74976
RS
855@subsection Declarations for @code{rpcalc}
856
857Here are the C and Bison declarations for the reverse polish notation
858calculator. As in C, comments are placed between @samp{/*@dots{}*/}.
859
860@example
861/* Reverse polish notation calculator. */
862
863%@{
864#define YYSTYPE double
865#include <math.h>
866%@}
867
868%token NUM
869
870%% /* Grammar rules and actions follow */
871@end example
872
75f5aaea 873The declarations section (@pxref{Prologue, , The prologue}) contains two
bfa74976
RS
874preprocessor directives.
875
876The @code{#define} directive defines the macro @code{YYSTYPE}, thus
1964ad8c
AD
877specifying the C data type for semantic values of both tokens and
878groupings (@pxref{Value Type, ,Data Types of Semantic Values}). The
879Bison parser will use whatever type @code{YYSTYPE} is defined as; if you
880don't define it, @code{int} is the default. Because we specify
881@code{double}, each token and each expression has an associated value,
882which is a floating point number.
bfa74976
RS
883
884The @code{#include} directive is used to declare the exponentiation
885function @code{pow}.
886
887The second section, Bison declarations, provides information to Bison about
888the token types (@pxref{Bison Declarations, ,The Bison Declarations Section}). Each terminal symbol that is
889not a single-character literal must be declared here. (Single-character
890literals normally don't need to be declared.) In this example, all the
891arithmetic operators are designated by single-character literals, so the
892only terminal symbol that needs to be declared is @code{NUM}, the token
893type for numeric constants.
894
342b8b6e 895@node Rpcalc Rules
bfa74976
RS
896@subsection Grammar Rules for @code{rpcalc}
897
898Here are the grammar rules for the reverse polish notation calculator.
899
900@example
901input: /* empty */
902 | input line
903;
904
905line: '\n'
906 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
907;
908
909exp: NUM @{ $$ = $1; @}
910 | exp exp '+' @{ $$ = $1 + $2; @}
911 | exp exp '-' @{ $$ = $1 - $2; @}
912 | exp exp '*' @{ $$ = $1 * $2; @}
913 | exp exp '/' @{ $$ = $1 / $2; @}
914 /* Exponentiation */
915 | exp exp '^' @{ $$ = pow ($1, $2); @}
916 /* Unary minus */
917 | exp 'n' @{ $$ = -$1; @}
918;
919%%
920@end example
921
922The groupings of the rpcalc ``language'' defined here are the expression
923(given the name @code{exp}), the line of input (@code{line}), and the
924complete input transcript (@code{input}). Each of these nonterminal
925symbols has several alternate rules, joined by the @samp{|} punctuator
926which is read as ``or''. The following sections explain what these rules
927mean.
928
929The semantics of the language is determined by the actions taken when a
930grouping is recognized. The actions are the C code that appears inside
931braces. @xref{Actions}.
932
933You must specify these actions in C, but Bison provides the means for
934passing semantic values between the rules. In each action, the
935pseudo-variable @code{$$} stands for the semantic value for the grouping
936that the rule is going to construct. Assigning a value to @code{$$} is the
937main job of most actions. The semantic values of the components of the
938rule are referred to as @code{$1}, @code{$2}, and so on.
939
940@menu
13863333
AD
941* Rpcalc Input::
942* Rpcalc Line::
943* Rpcalc Expr::
bfa74976
RS
944@end menu
945
342b8b6e 946@node Rpcalc Input
bfa74976
RS
947@subsubsection Explanation of @code{input}
948
949Consider the definition of @code{input}:
950
951@example
952input: /* empty */
953 | input line
954;
955@end example
956
957This definition reads as follows: ``A complete input is either an empty
958string, or a complete input followed by an input line''. Notice that
959``complete input'' is defined in terms of itself. This definition is said
960to be @dfn{left recursive} since @code{input} appears always as the
961leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}.
962
963The first alternative is empty because there are no symbols between the
964colon and the first @samp{|}; this means that @code{input} can match an
965empty string of input (no tokens). We write the rules this way because it
966is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
967It's conventional to put an empty alternative first and write the comment
968@samp{/* empty */} in it.
969
970The second alternate rule (@code{input line}) handles all nontrivial input.
971It means, ``After reading any number of lines, read one more line if
972possible.'' The left recursion makes this rule into a loop. Since the
973first alternative matches empty input, the loop can be executed zero or
974more times.
975
976The parser function @code{yyparse} continues to process input until a
977grammatical error is seen or the lexical analyzer says there are no more
978input tokens; we will arrange for the latter to happen at end of file.
979
342b8b6e 980@node Rpcalc Line
bfa74976
RS
981@subsubsection Explanation of @code{line}
982
983Now consider the definition of @code{line}:
984
985@example
986line: '\n'
987 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
988;
989@end example
990
991The first alternative is a token which is a newline character; this means
992that rpcalc accepts a blank line (and ignores it, since there is no
993action). The second alternative is an expression followed by a newline.
994This is the alternative that makes rpcalc useful. The semantic value of
995the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
996question is the first symbol in the alternative. The action prints this
997value, which is the result of the computation the user asked for.
998
999This action is unusual because it does not assign a value to @code{$$}. As
1000a consequence, the semantic value associated with the @code{line} is
1001uninitialized (its value will be unpredictable). This would be a bug if
1002that value were ever used, but we don't use it: once rpcalc has printed the
1003value of the user's input line, that value is no longer needed.
1004
342b8b6e 1005@node Rpcalc Expr
bfa74976
RS
1006@subsubsection Explanation of @code{expr}
1007
1008The @code{exp} grouping has several rules, one for each kind of expression.
1009The first rule handles the simplest expressions: those that are just numbers.
1010The second handles an addition-expression, which looks like two expressions
1011followed by a plus-sign. The third handles subtraction, and so on.
1012
1013@example
1014exp: NUM
1015 | exp exp '+' @{ $$ = $1 + $2; @}
1016 | exp exp '-' @{ $$ = $1 - $2; @}
1017 @dots{}
1018 ;
1019@end example
1020
1021We have used @samp{|} to join all the rules for @code{exp}, but we could
1022equally well have written them separately:
1023
1024@example
1025exp: NUM ;
1026exp: exp exp '+' @{ $$ = $1 + $2; @} ;
1027exp: exp exp '-' @{ $$ = $1 - $2; @} ;
1028 @dots{}
1029@end example
1030
1031Most of the rules have actions that compute the value of the expression in
1032terms of the value of its parts. For example, in the rule for addition,
1033@code{$1} refers to the first component @code{exp} and @code{$2} refers to
1034the second one. The third component, @code{'+'}, has no meaningful
1035associated semantic value, but if it had one you could refer to it as
1036@code{$3}. When @code{yyparse} recognizes a sum expression using this
1037rule, the sum of the two subexpressions' values is produced as the value of
1038the entire expression. @xref{Actions}.
1039
1040You don't have to give an action for every rule. When a rule has no
1041action, Bison by default copies the value of @code{$1} into @code{$$}.
1042This is what happens in the first rule (the one that uses @code{NUM}).
1043
1044The formatting shown here is the recommended convention, but Bison does
1045not require it. You can add or change whitespace as much as you wish.
1046For example, this:
1047
1048@example
1049exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{}
1050@end example
1051
1052@noindent
1053means the same thing as this:
1054
1055@example
1056exp: NUM
1057 | exp exp '+' @{ $$ = $1 + $2; @}
1058 | @dots{}
1059@end example
1060
1061@noindent
1062The latter, however, is much more readable.
1063
342b8b6e 1064@node Rpcalc Lexer
bfa74976
RS
1065@subsection The @code{rpcalc} Lexical Analyzer
1066@cindex writing a lexical analyzer
1067@cindex lexical analyzer, writing
1068
1069The lexical analyzer's job is low-level parsing: converting characters or
1070sequences of characters into tokens. The Bison parser gets its tokens by
1071calling the lexical analyzer. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
1072
1073Only a simple lexical analyzer is needed for the RPN calculator. This
1074lexical analyzer skips blanks and tabs, then reads in numbers as
1075@code{double} and returns them as @code{NUM} tokens. Any other character
1076that isn't part of a number is a separate token. Note that the token-code
1077for such a single-character token is the character itself.
1078
1079The return value of the lexical analyzer function is a numeric code which
1080represents a token type. The same text used in Bison rules to stand for
1081this token type is also a C expression for the numeric code for the type.
1082This works in two ways. If the token type is a character literal, then its
1083numeric code is the ASCII code for that character; you can use the same
1084character literal in the lexical analyzer to express the number. If the
1085token type is an identifier, that identifier is defined by Bison as a C
1086macro whose definition is the appropriate number. In this example,
1087therefore, @code{NUM} becomes a macro for @code{yylex} to use.
1088
1964ad8c
AD
1089The semantic value of the token (if it has one) is stored into the
1090global variable @code{yylval}, which is where the Bison parser will look
1091for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was
1092defined at the beginning of the grammar; @pxref{Rpcalc Decls,
1093,Declarations for @code{rpcalc}}.)
bfa74976
RS
1094
1095A token type code of zero is returned if the end-of-file is encountered.
1096(Bison recognizes any nonpositive value as indicating the end of the
1097input.)
1098
1099Here is the code for the lexical analyzer:
1100
1101@example
1102@group
13863333 1103/* Lexical analyzer returns a double floating point
bfa74976
RS
1104 number on the stack and the token NUM, or the ASCII
1105 character read if not a number. Skips all blanks
1106 and tabs, returns 0 for EOF. */
1107
1108#include <ctype.h>
1109@end group
1110
1111@group
13863333
AD
1112int
1113yylex (void)
bfa74976
RS
1114@{
1115 int c;
1116
1117 /* skip white space */
13863333 1118 while ((c = getchar ()) == ' ' || c == '\t')
bfa74976
RS
1119 ;
1120@end group
1121@group
1122 /* process numbers */
13863333 1123 if (c == '.' || isdigit (c))
bfa74976
RS
1124 @{
1125 ungetc (c, stdin);
1126 scanf ("%lf", &yylval);
1127 return NUM;
1128 @}
1129@end group
1130@group
1131 /* return end-of-file */
13863333 1132 if (c == EOF)
bfa74976
RS
1133 return 0;
1134 /* return single chars */
13863333 1135 return c;
bfa74976
RS
1136@}
1137@end group
1138@end example
1139
342b8b6e 1140@node Rpcalc Main
bfa74976
RS
1141@subsection The Controlling Function
1142@cindex controlling function
1143@cindex main function in simple example
1144
1145In keeping with the spirit of this example, the controlling function is
1146kept to the bare minimum. The only requirement is that it call
1147@code{yyparse} to start the process of parsing.
1148
1149@example
1150@group
13863333
AD
1151int
1152main (void)
bfa74976 1153@{
13863333 1154 return yyparse ();
bfa74976
RS
1155@}
1156@end group
1157@end example
1158
342b8b6e 1159@node Rpcalc Error
bfa74976
RS
1160@subsection The Error Reporting Routine
1161@cindex error reporting routine
1162
1163When @code{yyparse} detects a syntax error, it calls the error reporting
13863333
AD
1164function @code{yyerror} to print an error message (usually but not
1165always @code{"parse error"}). It is up to the programmer to supply
1166@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
1167here is the definition we will use:
bfa74976
RS
1168
1169@example
1170@group
1171#include <stdio.h>
1172
13863333
AD
1173void
1174yyerror (const char *s) /* Called by yyparse on error */
bfa74976
RS
1175@{
1176 printf ("%s\n", s);
1177@}
1178@end group
1179@end example
1180
1181After @code{yyerror} returns, the Bison parser may recover from the error
1182and continue parsing if the grammar contains a suitable error rule
1183(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
1184have not written any error rules in this example, so any invalid input will
1185cause the calculator program to exit. This is not clean behavior for a
9ecbd125 1186real calculator, but it is adequate for the first example.
bfa74976 1187
342b8b6e 1188@node Rpcalc Gen
bfa74976
RS
1189@subsection Running Bison to Make the Parser
1190@cindex running Bison (introduction)
1191
ceed8467
AD
1192Before running Bison to produce a parser, we need to decide how to
1193arrange all the source code in one or more source files. For such a
1194simple example, the easiest thing is to put everything in one file. The
1195definitions of @code{yylex}, @code{yyerror} and @code{main} go at the
342b8b6e 1196end, in the epilogue of the file
75f5aaea 1197(@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}).
bfa74976
RS
1198
1199For a large project, you would probably have several source files, and use
1200@code{make} to arrange to recompile them.
1201
1202With all the source in a single file, you use the following command to
1203convert it into a parser file:
1204
1205@example
1206bison @var{file_name}.y
1207@end example
1208
1209@noindent
1210In this example the file was called @file{rpcalc.y} (for ``Reverse Polish
1211CALCulator''). Bison produces a file named @file{@var{file_name}.tab.c},
1212removing the @samp{.y} from the original file name. The file output by
1213Bison contains the source code for @code{yyparse}. The additional
1214functions in the input file (@code{yylex}, @code{yyerror} and @code{main})
1215are copied verbatim to the output.
1216
342b8b6e 1217@node Rpcalc Compile
bfa74976
RS
1218@subsection Compiling the Parser File
1219@cindex compiling the parser
1220
1221Here is how to compile and run the parser file:
1222
1223@example
1224@group
1225# @r{List files in current directory.}
9edcd895 1226$ @kbd{ls}
bfa74976
RS
1227rpcalc.tab.c rpcalc.y
1228@end group
1229
1230@group
1231# @r{Compile the Bison parser.}
1232# @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
9edcd895 1233$ @kbd{cc rpcalc.tab.c -lm -o rpcalc}
bfa74976
RS
1234@end group
1235
1236@group
1237# @r{List files again.}
9edcd895 1238$ @kbd{ls}
bfa74976
RS
1239rpcalc rpcalc.tab.c rpcalc.y
1240@end group
1241@end example
1242
1243The file @file{rpcalc} now contains the executable code. Here is an
1244example session using @code{rpcalc}.
1245
1246@example
9edcd895
AD
1247$ @kbd{rpcalc}
1248@kbd{4 9 +}
bfa74976 124913
9edcd895 1250@kbd{3 7 + 3 4 5 *+-}
bfa74976 1251-13
9edcd895 1252@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
bfa74976 125313
9edcd895 1254@kbd{5 6 / 4 n +}
bfa74976 1255-3.166666667
9edcd895 1256@kbd{3 4 ^} @r{Exponentiation}
bfa74976 125781
9edcd895
AD
1258@kbd{^D} @r{End-of-file indicator}
1259$
bfa74976
RS
1260@end example
1261
342b8b6e 1262@node Infix Calc
bfa74976
RS
1263@section Infix Notation Calculator: @code{calc}
1264@cindex infix notation calculator
1265@cindex @code{calc}
1266@cindex calculator, infix notation
1267
1268We now modify rpcalc to handle infix operators instead of postfix. Infix
1269notation involves the concept of operator precedence and the need for
1270parentheses nested to arbitrary depth. Here is the Bison code for
1271@file{calc.y}, an infix desk-top calculator.
1272
1273@example
1274/* Infix notation calculator--calc */
1275
1276%@{
1277#define YYSTYPE double
1278#include <math.h>
1279%@}
1280
1281/* BISON Declarations */
1282%token NUM
1283%left '-' '+'
1284%left '*' '/'
1285%left NEG /* negation--unary minus */
1286%right '^' /* exponentiation */
1287
1288/* Grammar follows */
1289%%
1290input: /* empty string */
1291 | input line
1292;
1293
1294line: '\n'
1295 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1296;
1297
1298exp: NUM @{ $$ = $1; @}
1299 | exp '+' exp @{ $$ = $1 + $3; @}
1300 | exp '-' exp @{ $$ = $1 - $3; @}
1301 | exp '*' exp @{ $$ = $1 * $3; @}
1302 | exp '/' exp @{ $$ = $1 / $3; @}
1303 | '-' exp %prec NEG @{ $$ = -$2; @}
1304 | exp '^' exp @{ $$ = pow ($1, $3); @}
1305 | '(' exp ')' @{ $$ = $2; @}
1306;
1307%%
1308@end example
1309
1310@noindent
ceed8467
AD
1311The functions @code{yylex}, @code{yyerror} and @code{main} can be the
1312same as before.
bfa74976
RS
1313
1314There are two important new features shown in this code.
1315
1316In the second section (Bison declarations), @code{%left} declares token
1317types and says they are left-associative operators. The declarations
1318@code{%left} and @code{%right} (right associativity) take the place of
1319@code{%token} which is used to declare a token type name without
1320associativity. (These tokens are single-character literals, which
1321ordinarily don't need to be declared. We declare them here to specify
1322the associativity.)
1323
1324Operator precedence is determined by the line ordering of the
1325declarations; the higher the line number of the declaration (lower on
1326the page or screen), the higher the precedence. Hence, exponentiation
1327has the highest precedence, unary minus (@code{NEG}) is next, followed
1328by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator Precedence}.
1329
1330The other important new feature is the @code{%prec} in the grammar section
1331for the unary minus operator. The @code{%prec} simply instructs Bison that
1332the rule @samp{| '-' exp} has the same precedence as @code{NEG}---in this
1333case the next-to-highest. @xref{Contextual Precedence, ,Context-Dependent Precedence}.
1334
1335Here is a sample run of @file{calc.y}:
1336
1337@need 500
1338@example
9edcd895
AD
1339$ @kbd{calc}
1340@kbd{4 + 4.5 - (34/(8*3+-3))}
bfa74976 13416.880952381
9edcd895 1342@kbd{-56 + 2}
bfa74976 1343-54
9edcd895 1344@kbd{3 ^ 2}
bfa74976
RS
13459
1346@end example
1347
342b8b6e 1348@node Simple Error Recovery
bfa74976
RS
1349@section Simple Error Recovery
1350@cindex error recovery, simple
1351
1352Up to this point, this manual has not addressed the issue of @dfn{error
1353recovery}---how to continue parsing after the parser detects a syntax
ceed8467
AD
1354error. All we have handled is error reporting with @code{yyerror}.
1355Recall that by default @code{yyparse} returns after calling
1356@code{yyerror}. This means that an erroneous input line causes the
1357calculator program to exit. Now we show how to rectify this deficiency.
bfa74976
RS
1358
1359The Bison language itself includes the reserved word @code{error}, which
1360may be included in the grammar rules. In the example below it has
1361been added to one of the alternatives for @code{line}:
1362
1363@example
1364@group
1365line: '\n'
1366 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1367 | error '\n' @{ yyerrok; @}
1368;
1369@end group
1370@end example
1371
ceed8467
AD
1372This addition to the grammar allows for simple error recovery in the
1373event of a parse error. If an expression that cannot be evaluated is
1374read, the error will be recognized by the third rule for @code{line},
1375and parsing will continue. (The @code{yyerror} function is still called
1376upon to print its message as well.) The action executes the statement
1377@code{yyerrok}, a macro defined automatically by Bison; its meaning is
1378that error recovery is complete (@pxref{Error Recovery}). Note the
1379difference between @code{yyerrok} and @code{yyerror}; neither one is a
1380misprint.@refill
bfa74976
RS
1381
1382This form of error recovery deals with syntax errors. There are other
1383kinds of errors; for example, division by zero, which raises an exception
1384signal that is normally fatal. A real calculator program must handle this
1385signal and use @code{longjmp} to return to @code{main} and resume parsing
1386input lines; it would also have to discard the rest of the current line of
1387input. We won't discuss this issue further because it is not specific to
1388Bison programs.
1389
342b8b6e
AD
1390@node Location Tracking Calc
1391@section Location Tracking Calculator: @code{ltcalc}
1392@cindex location tracking calculator
1393@cindex @code{ltcalc}
1394@cindex calculator, location tracking
1395
9edcd895
AD
1396This example extends the infix notation calculator with location
1397tracking. This feature will be used to improve the error messages. For
1398the sake of clarity, this example is a simple integer calculator, since
1399most of the work needed to use locations will be done in the lexical
1400analyser.
342b8b6e
AD
1401
1402@menu
1403* Decls: Ltcalc Decls. Bison and C declarations for ltcalc.
1404* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations.
1405* Lexer: Ltcalc Lexer. The lexical analyzer.
1406@end menu
1407
1408@node Ltcalc Decls
1409@subsection Declarations for @code{ltcalc}
1410
9edcd895
AD
1411The C and Bison declarations for the location tracking calculator are
1412the same as the declarations for the infix notation calculator.
342b8b6e
AD
1413
1414@example
1415/* Location tracking calculator. */
1416
1417%@{
1418#define YYSTYPE int
1419#include <math.h>
1420%@}
1421
1422/* Bison declarations. */
1423%token NUM
1424
1425%left '-' '+'
1426%left '*' '/'
1427%left NEG
1428%right '^'
1429
1430%% /* Grammar follows */
1431@end example
1432
9edcd895
AD
1433@noindent
1434Note there are no declarations specific to locations. Defining a data
1435type for storing locations is not needed: we will use the type provided
1436by default (@pxref{Location Type, ,Data Types of Locations}), which is a
1437four member structure with the following integer fields:
1438@code{first_line}, @code{first_column}, @code{last_line} and
1439@code{last_column}.
342b8b6e
AD
1440
1441@node Ltcalc Rules
1442@subsection Grammar Rules for @code{ltcalc}
1443
9edcd895
AD
1444Whether handling locations or not has no effect on the syntax of your
1445language. Therefore, grammar rules for this example will be very close
1446to those of the previous example: we will only modify them to benefit
1447from the new information.
342b8b6e 1448
9edcd895
AD
1449Here, we will use locations to report divisions by zero, and locate the
1450wrong expressions or subexpressions.
342b8b6e
AD
1451
1452@example
1453@group
1454input : /* empty */
1455 | input line
1456;
1457@end group
1458
1459@group
1460line : '\n'
1461 | exp '\n' @{ printf ("%d\n", $1); @}
1462;
1463@end group
1464
1465@group
1466exp : NUM @{ $$ = $1; @}
1467 | exp '+' exp @{ $$ = $1 + $3; @}
1468 | exp '-' exp @{ $$ = $1 - $3; @}
1469 | exp '*' exp @{ $$ = $1 * $3; @}
1470@end group
342b8b6e 1471@group
9edcd895 1472 | exp '/' exp
342b8b6e
AD
1473 @{
1474 if ($3)
1475 $$ = $1 / $3;
1476 else
1477 @{
1478 $$ = 1;
9edcd895
AD
1479 fprintf (stderr, "%d.%d-%d.%d: division by zero",
1480 @@3.first_line, @@3.first_column,
1481 @@3.last_line, @@3.last_column);
342b8b6e
AD
1482 @}
1483 @}
1484@end group
1485@group
1486 | '-' exp %preg NEG @{ $$ = -$2; @}
1487 | exp '^' exp @{ $$ = pow ($1, $3); @}
1488 | '(' exp ')' @{ $$ = $2; @}
1489@end group
1490@end example
1491
1492This code shows how to reach locations inside of semantic actions, by
1493using the pseudo-variables @code{@@@var{n}} for rule components, and the
1494pseudo-variable @code{@@$} for groupings.
1495
9edcd895
AD
1496We don't need to assign a value to @code{@@$}: the output parser does it
1497automatically. By default, before executing the C code of each action,
1498@code{@@$} is set to range from the beginning of @code{@@1} to the end
1499of @code{@@@var{n}}, for a rule with @var{n} components. This behavior
1500can be redefined (@pxref{Location Default Action, , Default Action for
1501Locations}), and for very specific rules, @code{@@$} can be computed by
1502hand.
342b8b6e
AD
1503
1504@node Ltcalc Lexer
1505@subsection The @code{ltcalc} Lexical Analyzer.
1506
9edcd895
AD
1507Until now, we relied on Bison's defaults to enable location
1508tracking. The next step is to rewrite the lexical analyser, and make it
1509able to feed the parser with the token locations, as it already does for
1510semantic values.
342b8b6e 1511
9edcd895
AD
1512To this end, we must take into account every single character of the
1513input text, to avoid the computed locations of being fuzzy or wrong:
342b8b6e
AD
1514
1515@example
1516@group
1517int
1518yylex (void)
1519@{
1520 int c;
1521
1522 /* skip white space */
1523 while ((c = getchar ()) == ' ' || c == '\t')
1524 ++yylloc.last_column;
1525
1526 /* step */
1527 yylloc.first_line = yylloc.last_line;
1528 yylloc.first_column = yylloc.last_column;
1529@end group
1530
1531@group
1532 /* process numbers */
1533 if (isdigit (c))
1534 @{
1535 yylval = c - '0';
1536 ++yylloc.last_column;
1537 while (isdigit (c = getchar ()))
1538 @{
1539 ++yylloc.last_column;
1540 yylval = yylval * 10 + c - '0';
1541 @}
1542 ungetc (c, stdin);
1543 return NUM;
1544 @}
1545@end group
1546
1547 /* return end-of-file */
1548 if (c == EOF)
1549 return 0;
1550
1551 /* return single chars and update location */
1552 if (c == '\n')
1553 @{
1554 ++yylloc.last_line;
1555 yylloc.last_column = 0;
1556 @}
1557 else
1558 ++yylloc.last_column;
1559 return c;
1560@}
1561@end example
1562
9edcd895
AD
1563Basically, the lexical analyzer performs the same processing as before:
1564it skips blanks and tabs, and reads numbers or single-character tokens.
1565In addition, it updates @code{yylloc}, the global variable (of type
1566@code{YYLTYPE}) containing the token's location.
342b8b6e 1567
9edcd895
AD
1568Now, each time this function returns a token, the parser has its number
1569as well as its semantic value, and its location in the text. The last
1570needed change is to initialize @code{yylloc}, for example in the
1571controlling function:
342b8b6e
AD
1572
1573@example
9edcd895 1574@group
342b8b6e
AD
1575int
1576main (void)
1577@{
1578 yylloc.first_line = yylloc.last_line = 1;
1579 yylloc.first_column = yylloc.last_column = 0;
1580 return yyparse ();
1581@}
9edcd895 1582@end group
342b8b6e
AD
1583@end example
1584
9edcd895
AD
1585Remember that computing locations is not a matter of syntax. Every
1586character must be associated to a location update, whether it is in
1587valid input, in comments, in literal strings, and so on.
342b8b6e
AD
1588
1589@node Multi-function Calc
bfa74976
RS
1590@section Multi-Function Calculator: @code{mfcalc}
1591@cindex multi-function calculator
1592@cindex @code{mfcalc}
1593@cindex calculator, multi-function
1594
1595Now that the basics of Bison have been discussed, it is time to move on to
1596a more advanced problem. The above calculators provided only five
1597functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would
1598be nice to have a calculator that provides other mathematical functions such
1599as @code{sin}, @code{cos}, etc.
1600
1601It is easy to add new operators to the infix calculator as long as they are
1602only single-character literals. The lexical analyzer @code{yylex} passes
9ecbd125 1603back all nonnumber characters as tokens, so new grammar rules suffice for
bfa74976
RS
1604adding a new operator. But we want something more flexible: built-in
1605functions whose syntax has this form:
1606
1607@example
1608@var{function_name} (@var{argument})
1609@end example
1610
1611@noindent
1612At the same time, we will add memory to the calculator, by allowing you
1613to create named variables, store values in them, and use them later.
1614Here is a sample session with the multi-function calculator:
1615
1616@example
9edcd895
AD
1617$ @kbd{mfcalc}
1618@kbd{pi = 3.141592653589}
bfa74976 16193.1415926536
9edcd895 1620@kbd{sin(pi)}
bfa74976 16210.0000000000
9edcd895 1622@kbd{alpha = beta1 = 2.3}
bfa74976 16232.3000000000
9edcd895 1624@kbd{alpha}
bfa74976 16252.3000000000
9edcd895 1626@kbd{ln(alpha)}
bfa74976 16270.8329091229
9edcd895 1628@kbd{exp(ln(beta1))}
bfa74976 16292.3000000000
9edcd895 1630$
bfa74976
RS
1631@end example
1632
1633Note that multiple assignment and nested function calls are permitted.
1634
1635@menu
1636* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
1637* Rules: Mfcalc Rules. Grammar rules for the calculator.
1638* Symtab: Mfcalc Symtab. Symbol table management subroutines.
1639@end menu
1640
342b8b6e 1641@node Mfcalc Decl
bfa74976
RS
1642@subsection Declarations for @code{mfcalc}
1643
1644Here are the C and Bison declarations for the multi-function calculator.
1645
1646@smallexample
1647%@{
1648#include <math.h> /* For math functions, cos(), sin(), etc. */
1649#include "calc.h" /* Contains definition of `symrec' */
1650%@}
1651%union @{
1652double val; /* For returning numbers. */
1653symrec *tptr; /* For returning symbol-table pointers */
1654@}
1655
1656%token <val> NUM /* Simple double precision number */
1657%token <tptr> VAR FNCT /* Variable and Function */
1658%type <val> exp
1659
1660%right '='
1661%left '-' '+'
1662%left '*' '/'
1663%left NEG /* Negation--unary minus */
1664%right '^' /* Exponentiation */
1665
1666/* Grammar follows */
1667
1668%%
1669@end smallexample
1670
1671The above grammar introduces only two new features of the Bison language.
1672These features allow semantic values to have various data types
1673(@pxref{Multiple Types, ,More Than One Value Type}).
1674
1675The @code{%union} declaration specifies the entire list of possible types;
1676this is instead of defining @code{YYSTYPE}. The allowable types are now
1677double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
1678the symbol table. @xref{Union Decl, ,The Collection of Value Types}.
1679
1680Since values can now have various types, it is necessary to associate a
1681type with each grammar symbol whose semantic value is used. These symbols
1682are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their
1683declarations are augmented with information about their data type (placed
1684between angle brackets).
1685
1686The Bison construct @code{%type} is used for declaring nonterminal symbols,
1687just as @code{%token} is used for declaring token types. We have not used
1688@code{%type} before because nonterminal symbols are normally declared
1689implicitly by the rules that define them. But @code{exp} must be declared
1690explicitly so we can specify its value type. @xref{Type Decl, ,Nonterminal Symbols}.
1691
342b8b6e 1692@node Mfcalc Rules
bfa74976
RS
1693@subsection Grammar Rules for @code{mfcalc}
1694
1695Here are the grammar rules for the multi-function calculator.
1696Most of them are copied directly from @code{calc}; three rules,
1697those which mention @code{VAR} or @code{FNCT}, are new.
1698
1699@smallexample
1700input: /* empty */
1701 | input line
1702;
1703
1704line:
1705 '\n'
1706 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1707 | error '\n' @{ yyerrok; @}
1708;
1709
1710exp: NUM @{ $$ = $1; @}
1711 | VAR @{ $$ = $1->value.var; @}
1712 | VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
1713 | FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
1714 | exp '+' exp @{ $$ = $1 + $3; @}
1715 | exp '-' exp @{ $$ = $1 - $3; @}
1716 | exp '*' exp @{ $$ = $1 * $3; @}
1717 | exp '/' exp @{ $$ = $1 / $3; @}
1718 | '-' exp %prec NEG @{ $$ = -$2; @}
1719 | exp '^' exp @{ $$ = pow ($1, $3); @}
1720 | '(' exp ')' @{ $$ = $2; @}
1721;
1722/* End of grammar */
1723%%
1724@end smallexample
1725
342b8b6e 1726@node Mfcalc Symtab
bfa74976
RS
1727@subsection The @code{mfcalc} Symbol Table
1728@cindex symbol table example
1729
1730The multi-function calculator requires a symbol table to keep track of the
1731names and meanings of variables and functions. This doesn't affect the
1732grammar rules (except for the actions) or the Bison declarations, but it
1733requires some additional C functions for support.
1734
1735The symbol table itself consists of a linked list of records. Its
1736definition, which is kept in the header @file{calc.h}, is as follows. It
1737provides for either functions or variables to be placed in the table.
1738
1739@smallexample
1740@group
32dfccf8
AD
1741/* Fonctions type. */
1742typedef double (*func_t) (double);
1743
bfa74976
RS
1744/* Data type for links in the chain of symbols. */
1745struct symrec
1746@{
1747 char *name; /* name of symbol */
1748 int type; /* type of symbol: either VAR or FNCT */
32dfccf8
AD
1749 union
1750 @{
1751 double var; /* value of a VAR */
1752 func_t fnctptr; /* value of a FNCT */
bfa74976
RS
1753 @} value;
1754 struct symrec *next; /* link field */
1755@};
1756@end group
1757
1758@group
1759typedef struct symrec symrec;
1760
1761/* The symbol table: a chain of `struct symrec'. */
1762extern symrec *sym_table;
1763
32dfccf8
AD
1764symrec *putsym (const char *, func_t);
1765symrec *getsym (const char *);
bfa74976
RS
1766@end group
1767@end smallexample
1768
1769The new version of @code{main} includes a call to @code{init_table}, a
1770function that initializes the symbol table. Here it is, and
1771@code{init_table} as well:
1772
1773@smallexample
1774@group
1775#include <stdio.h>
1776
13863333
AD
1777int
1778main (void)
bfa74976
RS
1779@{
1780 init_table ();
13863333 1781 return yyparse ();
bfa74976
RS
1782@}
1783@end group
1784
1785@group
13863333
AD
1786void
1787yyerror (const char *s) /* Called by yyparse on error */
bfa74976
RS
1788@{
1789 printf ("%s\n", s);
1790@}
1791
1792struct init
1793@{
1794 char *fname;
32dfccf8 1795 double (*fnct)(double);
bfa74976
RS
1796@};
1797@end group
1798
1799@group
13863333
AD
1800struct init arith_fncts[] =
1801@{
32dfccf8
AD
1802 "sin", sin,
1803 "cos", cos,
13863333 1804 "atan", atan,
32dfccf8
AD
1805 "ln", log,
1806 "exp", exp,
13863333
AD
1807 "sqrt", sqrt,
1808 0, 0
1809@};
bfa74976
RS
1810
1811/* The symbol table: a chain of `struct symrec'. */
32dfccf8 1812symrec *sym_table = (symrec *) 0;
bfa74976
RS
1813@end group
1814
1815@group
13863333
AD
1816/* Put arithmetic functions in table. */
1817void
1818init_table (void)
bfa74976
RS
1819@{
1820 int i;
1821 symrec *ptr;
1822 for (i = 0; arith_fncts[i].fname != 0; i++)
1823 @{
1824 ptr = putsym (arith_fncts[i].fname, FNCT);
1825 ptr->value.fnctptr = arith_fncts[i].fnct;
1826 @}
1827@}
1828@end group
1829@end smallexample
1830
1831By simply editing the initialization list and adding the necessary include
1832files, you can add additional functions to the calculator.
1833
1834Two important functions allow look-up and installation of symbols in the
1835symbol table. The function @code{putsym} is passed a name and the type
1836(@code{VAR} or @code{FNCT}) of the object to be installed. The object is
1837linked to the front of the list, and a pointer to the object is returned.
1838The function @code{getsym} is passed the name of the symbol to look up. If
1839found, a pointer to that symbol is returned; otherwise zero is returned.
1840
1841@smallexample
1842symrec *
13863333 1843putsym (char *sym_name, int sym_type)
bfa74976
RS
1844@{
1845 symrec *ptr;
1846 ptr = (symrec *) malloc (sizeof (symrec));
1847 ptr->name = (char *) malloc (strlen (sym_name) + 1);
1848 strcpy (ptr->name,sym_name);
1849 ptr->type = sym_type;
1850 ptr->value.var = 0; /* set value to 0 even if fctn. */
1851 ptr->next = (struct symrec *)sym_table;
1852 sym_table = ptr;
1853 return ptr;
1854@}
1855
1856symrec *
13863333 1857getsym (const char *sym_name)
bfa74976
RS
1858@{
1859 symrec *ptr;
1860 for (ptr = sym_table; ptr != (symrec *) 0;
1861 ptr = (symrec *)ptr->next)
1862 if (strcmp (ptr->name,sym_name) == 0)
1863 return ptr;
1864 return 0;
1865@}
1866@end smallexample
1867
1868The function @code{yylex} must now recognize variables, numeric values, and
1869the single-character arithmetic operators. Strings of alphanumeric
14ded682 1870characters with a leading non-digit are recognized as either variables or
bfa74976
RS
1871functions depending on what the symbol table says about them.
1872
1873The string is passed to @code{getsym} for look up in the symbol table. If
1874the name appears in the table, a pointer to its location and its type
1875(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not
1876already in the table, then it is installed as a @code{VAR} using
1877@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
1878returned to @code{yyparse}.@refill
1879
1880No change is needed in the handling of numeric values and arithmetic
1881operators in @code{yylex}.
1882
1883@smallexample
1884@group
1885#include <ctype.h>
13863333
AD
1886
1887int
1888yylex (void)
bfa74976
RS
1889@{
1890 int c;
1891
1892 /* Ignore whitespace, get first nonwhite character. */
1893 while ((c = getchar ()) == ' ' || c == '\t');
1894
1895 if (c == EOF)
1896 return 0;
1897@end group
1898
1899@group
1900 /* Char starts a number => parse the number. */
1901 if (c == '.' || isdigit (c))
1902 @{
1903 ungetc (c, stdin);
1904 scanf ("%lf", &yylval.val);
1905 return NUM;
1906 @}
1907@end group
1908
1909@group
1910 /* Char starts an identifier => read the name. */
1911 if (isalpha (c))
1912 @{
1913 symrec *s;
1914 static char *symbuf = 0;
1915 static int length = 0;
1916 int i;
1917@end group
1918
1919@group
1920 /* Initially make the buffer long enough
1921 for a 40-character symbol name. */
1922 if (length == 0)
1923 length = 40, symbuf = (char *)malloc (length + 1);
1924
1925 i = 0;
1926 do
1927@end group
1928@group
1929 @{
1930 /* If buffer is full, make it bigger. */
1931 if (i == length)
1932 @{
1933 length *= 2;
1934 symbuf = (char *)realloc (symbuf, length + 1);
1935 @}
1936 /* Add this character to the buffer. */
1937 symbuf[i++] = c;
1938 /* Get another character. */
1939 c = getchar ();
1940 @}
1941@end group
1942@group
1943 while (c != EOF && isalnum (c));
1944
1945 ungetc (c, stdin);
1946 symbuf[i] = '\0';
1947@end group
1948
1949@group
1950 s = getsym (symbuf);
1951 if (s == 0)
1952 s = putsym (symbuf, VAR);
1953 yylval.tptr = s;
1954 return s->type;
1955 @}
1956
1957 /* Any other character is a token by itself. */
1958 return c;
1959@}
1960@end group
1961@end smallexample
1962
1963This program is both powerful and flexible. You may easily add new
1964functions, and it is a simple job to modify this code to install predefined
1965variables such as @code{pi} or @code{e} as well.
1966
342b8b6e 1967@node Exercises
bfa74976
RS
1968@section Exercises
1969@cindex exercises
1970
1971@enumerate
1972@item
1973Add some new functions from @file{math.h} to the initialization list.
1974
1975@item
1976Add another array that contains constants and their values. Then
1977modify @code{init_table} to add these constants to the symbol table.
1978It will be easiest to give the constants type @code{VAR}.
1979
1980@item
1981Make the program report an error if the user refers to an
1982uninitialized variable in any way except to store a value in it.
1983@end enumerate
1984
342b8b6e 1985@node Grammar File
bfa74976
RS
1986@chapter Bison Grammar Files
1987
1988Bison takes as input a context-free grammar specification and produces a
1989C-language function that recognizes correct instances of the grammar.
1990
1991The Bison grammar input file conventionally has a name ending in @samp{.y}.
234a3be3 1992@xref{Invocation, ,Invoking Bison}.
bfa74976
RS
1993
1994@menu
1995* Grammar Outline:: Overall layout of the grammar file.
1996* Symbols:: Terminal and nonterminal symbols.
1997* Rules:: How to write grammar rules.
1998* Recursion:: Writing recursive rules.
1999* Semantics:: Semantic values and actions.
847bf1f5 2000* Locations:: Locations and actions.
bfa74976
RS
2001* Declarations:: All kinds of Bison declarations are described here.
2002* Multiple Parsers:: Putting more than one Bison parser in one program.
2003@end menu
2004
342b8b6e 2005@node Grammar Outline
bfa74976
RS
2006@section Outline of a Bison Grammar
2007
2008A Bison grammar file has four main sections, shown here with the
2009appropriate delimiters:
2010
2011@example
2012%@{
75f5aaea 2013@var{Prologue}
bfa74976
RS
2014%@}
2015
2016@var{Bison declarations}
2017
2018%%
2019@var{Grammar rules}
2020%%
2021
75f5aaea 2022@var{Epilogue}
bfa74976
RS
2023@end example
2024
2025Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
2026
2027@menu
75f5aaea 2028* Prologue:: Syntax and usage of the prologue.
bfa74976
RS
2029* Bison Declarations:: Syntax and usage of the Bison declarations section.
2030* Grammar Rules:: Syntax and usage of the grammar rules section.
75f5aaea 2031* Epilogue:: Syntax and usage of the epilogue.
bfa74976
RS
2032@end menu
2033
75f5aaea
MA
2034@node Prologue, Bison Declarations, , Grammar Outline
2035@subsection The prologue
2036@cindex declarations section
2037@cindex Prologue
2038@cindex declarations
bfa74976 2039
75f5aaea 2040The @var{prologue} section contains macro definitions and
bfa74976
RS
2041declarations of functions and variables that are used in the actions in the
2042grammar rules. These are copied to the beginning of the parser file so
2043that they precede the definition of @code{yyparse}. You can use
2044@samp{#include} to get the declarations from a header file. If you don't
2045need any C declarations, you may omit the @samp{%@{} and @samp{%@}}
2046delimiters that bracket this section.
2047
342b8b6e 2048@node Bison Declarations
bfa74976
RS
2049@subsection The Bison Declarations Section
2050@cindex Bison declarations (introduction)
2051@cindex declarations, Bison (introduction)
2052
2053The @var{Bison declarations} section contains declarations that define
2054terminal and nonterminal symbols, specify precedence, and so on.
2055In some simple grammars you may not need any declarations.
2056@xref{Declarations, ,Bison Declarations}.
2057
342b8b6e 2058@node Grammar Rules
bfa74976
RS
2059@subsection The Grammar Rules Section
2060@cindex grammar rules section
2061@cindex rules section for grammar
2062
2063The @dfn{grammar rules} section contains one or more Bison grammar
2064rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}.
2065
2066There must always be at least one grammar rule, and the first
2067@samp{%%} (which precedes the grammar rules) may never be omitted even
2068if it is the first thing in the file.
2069
75f5aaea
MA
2070@node Epilogue, , Grammar Rules, Grammar Outline
2071@subsection The epilogue
bfa74976 2072@cindex additional C code section
75f5aaea 2073@cindex epilogue
bfa74976
RS
2074@cindex C code, section for additional
2075
342b8b6e
AD
2076The @var{epilogue} is copied verbatim to the end of the parser file, just as
2077the @var{prologue} is copied to the beginning. This is the most convenient
2078place to put anything that you want to have in the parser file but which need
2079not come before the definition of @code{yyparse}. For example, the
2080definitions of @code{yylex} and @code{yyerror} often go here.
75f5aaea 2081@xref{Interface, ,Parser C-Language Interface}.
bfa74976
RS
2082
2083If the last section is empty, you may omit the @samp{%%} that separates it
2084from the grammar rules.
2085
2086The Bison parser itself contains many static variables whose names start
2087with @samp{yy} and many macros whose names start with @samp{YY}. It is a
2088good idea to avoid using any such names (except those documented in this
75f5aaea 2089manual) in the epilogue of the grammar file.
bfa74976 2090
342b8b6e 2091@node Symbols
bfa74976
RS
2092@section Symbols, Terminal and Nonterminal
2093@cindex nonterminal symbol
2094@cindex terminal symbol
2095@cindex token type
2096@cindex symbol
2097
2098@dfn{Symbols} in Bison grammars represent the grammatical classifications
2099of the language.
2100
2101A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
2102class of syntactically equivalent tokens. You use the symbol in grammar
2103rules to mean that a token in that class is allowed. The symbol is
2104represented in the Bison parser by a numeric code, and the @code{yylex}
2105function returns a token type code to indicate what kind of token has been
2106read. You don't need to know what the code value is; you can use the
2107symbol to stand for it.
2108
2109A @dfn{nonterminal symbol} stands for a class of syntactically equivalent
2110groupings. The symbol name is used in writing grammar rules. By convention,
2111it should be all lower case.
2112
2113Symbol names can contain letters, digits (not at the beginning),
2114underscores and periods. Periods make sense only in nonterminals.
2115
931c7513 2116There are three ways of writing terminal symbols in the grammar:
bfa74976
RS
2117
2118@itemize @bullet
2119@item
2120A @dfn{named token type} is written with an identifier, like an
2121identifier in C. By convention, it should be all upper case. Each
2122such name must be defined with a Bison declaration such as
2123@code{%token}. @xref{Token Decl, ,Token Type Names}.
2124
2125@item
2126@cindex character token
2127@cindex literal token
2128@cindex single-character literal
931c7513
RS
2129A @dfn{character token type} (or @dfn{literal character token}) is
2130written in the grammar using the same syntax used in C for character
2131constants; for example, @code{'+'} is a character token type. A
2132character token type doesn't need to be declared unless you need to
2133specify its semantic value data type (@pxref{Value Type, ,Data Types of
2134Semantic Values}), associativity, or precedence (@pxref{Precedence,
2135,Operator Precedence}).
bfa74976
RS
2136
2137By convention, a character token type is used only to represent a
2138token that consists of that particular character. Thus, the token
2139type @code{'+'} is used to represent the character @samp{+} as a
2140token. Nothing enforces this convention, but if you depart from it,
2141your program will confuse other readers.
2142
2143All the usual escape sequences used in character literals in C can be
2144used in Bison as well, but you must not use the null character as a
931c7513
RS
2145character literal because its ASCII code, zero, is the code @code{yylex}
2146returns for end-of-input (@pxref{Calling Convention, ,Calling Convention
2147for @code{yylex}}).
2148
2149@item
2150@cindex string token
2151@cindex literal string token
9ecbd125 2152@cindex multicharacter literal
931c7513
RS
2153A @dfn{literal string token} is written like a C string constant; for
2154example, @code{"<="} is a literal string token. A literal string token
2155doesn't need to be declared unless you need to specify its semantic
14ded682 2156value data type (@pxref{Value Type}), associativity, or precedence
931c7513
RS
2157(@pxref{Precedence}).
2158
2159You can associate the literal string token with a symbolic name as an
2160alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
2161Declarations}). If you don't do that, the lexical analyzer has to
2162retrieve the token number for the literal string token from the
2163@code{yytname} table (@pxref{Calling Convention}).
2164
2165@strong{WARNING}: literal string tokens do not work in Yacc.
2166
2167By convention, a literal string token is used only to represent a token
2168that consists of that particular string. Thus, you should use the token
2169type @code{"<="} to represent the string @samp{<=} as a token. Bison
9ecbd125 2170does not enforce this convention, but if you depart from it, people who
931c7513
RS
2171read your program will be confused.
2172
2173All the escape sequences used in string literals in C can be used in
2174Bison as well. A literal string token must contain two or more
2175characters; for a token containing just one character, use a character
2176token (see above).
bfa74976
RS
2177@end itemize
2178
2179How you choose to write a terminal symbol has no effect on its
2180grammatical meaning. That depends only on where it appears in rules and
2181on when the parser function returns that symbol.
2182
2183The value returned by @code{yylex} is always one of the terminal symbols
2184(or 0 for end-of-input). Whichever way you write the token type in the
2185grammar rules, you write it the same way in the definition of @code{yylex}.
2186The numeric code for a character token type is simply the ASCII code for
2187the character, so @code{yylex} can use the identical character constant to
2188generate the requisite code. Each named token type becomes a C macro in
2189the parser file, so @code{yylex} can use the name to stand for the code.
13863333 2190(This is why periods don't make sense in terminal symbols.)
bfa74976
RS
2191@xref{Calling Convention, ,Calling Convention for @code{yylex}}.
2192
2193If @code{yylex} is defined in a separate file, you need to arrange for the
2194token-type macro definitions to be available there. Use the @samp{-d}
2195option when you run Bison, so that it will write these macro definitions
2196into a separate header file @file{@var{name}.tab.h} which you can include
2197in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
2198
2199The symbol @code{error} is a terminal symbol reserved for error recovery
2200(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
2201In particular, @code{yylex} should never return this value.
2202
342b8b6e 2203@node Rules
bfa74976
RS
2204@section Syntax of Grammar Rules
2205@cindex rule syntax
2206@cindex grammar rule syntax
2207@cindex syntax of grammar rules
2208
2209A Bison grammar rule has the following general form:
2210
2211@example
e425e872 2212@group
bfa74976
RS
2213@var{result}: @var{components}@dots{}
2214 ;
e425e872 2215@end group
bfa74976
RS
2216@end example
2217
2218@noindent
9ecbd125 2219where @var{result} is the nonterminal symbol that this rule describes,
bfa74976 2220and @var{components} are various terminal and nonterminal symbols that
13863333 2221are put together by this rule (@pxref{Symbols}).
bfa74976
RS
2222
2223For example,
2224
2225@example
2226@group
2227exp: exp '+' exp
2228 ;
2229@end group
2230@end example
2231
2232@noindent
2233says that two groupings of type @code{exp}, with a @samp{+} token in between,
2234can be combined into a larger grouping of type @code{exp}.
2235
2236Whitespace in rules is significant only to separate symbols. You can add
2237extra whitespace as you wish.
2238
2239Scattered among the components can be @var{actions} that determine
2240the semantics of the rule. An action looks like this:
2241
2242@example
2243@{@var{C statements}@}
2244@end example
2245
2246@noindent
2247Usually there is only one action and it follows the components.
2248@xref{Actions}.
2249
2250@findex |
2251Multiple rules for the same @var{result} can be written separately or can
2252be joined with the vertical-bar character @samp{|} as follows:
2253
2254@ifinfo
2255@example
2256@var{result}: @var{rule1-components}@dots{}
2257 | @var{rule2-components}@dots{}
2258 @dots{}
2259 ;
2260@end example
2261@end ifinfo
2262@iftex
2263@example
2264@group
2265@var{result}: @var{rule1-components}@dots{}
2266 | @var{rule2-components}@dots{}
2267 @dots{}
2268 ;
2269@end group
2270@end example
2271@end iftex
2272
2273@noindent
2274They are still considered distinct rules even when joined in this way.
2275
2276If @var{components} in a rule is empty, it means that @var{result} can
2277match the empty string. For example, here is how to define a
2278comma-separated sequence of zero or more @code{exp} groupings:
2279
2280@example
2281@group
2282expseq: /* empty */
2283 | expseq1
2284 ;
2285@end group
2286
2287@group
2288expseq1: exp
2289 | expseq1 ',' exp
2290 ;
2291@end group
2292@end example
2293
2294@noindent
2295It is customary to write a comment @samp{/* empty */} in each rule
2296with no components.
2297
342b8b6e 2298@node Recursion
bfa74976
RS
2299@section Recursive Rules
2300@cindex recursive rule
2301
2302A rule is called @dfn{recursive} when its @var{result} nonterminal appears
2303also on its right hand side. Nearly all Bison grammars need to use
2304recursion, because that is the only way to define a sequence of any number
9ecbd125
JT
2305of a particular thing. Consider this recursive definition of a
2306comma-separated sequence of one or more expressions:
bfa74976
RS
2307
2308@example
2309@group
2310expseq1: exp
2311 | expseq1 ',' exp
2312 ;
2313@end group
2314@end example
2315
2316@cindex left recursion
2317@cindex right recursion
2318@noindent
2319Since the recursive use of @code{expseq1} is the leftmost symbol in the
2320right hand side, we call this @dfn{left recursion}. By contrast, here
2321the same construct is defined using @dfn{right recursion}:
2322
2323@example
2324@group
2325expseq1: exp
2326 | exp ',' expseq1
2327 ;
2328@end group
2329@end example
2330
2331@noindent
2332Any kind of sequence can be defined using either left recursion or
2333right recursion, but you should always use left recursion, because it
2334can parse a sequence of any number of elements with bounded stack
2335space. Right recursion uses up space on the Bison stack in proportion
2336to the number of elements in the sequence, because all the elements
2337must be shifted onto the stack before the rule can be applied even
2338once. @xref{Algorithm, ,The Bison Parser Algorithm }, for
2339further explanation of this.
2340
2341@cindex mutual recursion
2342@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
2343rule does not appear directly on its right hand side, but does appear
2344in rules for other nonterminals which do appear on its right hand
13863333 2345side.
bfa74976
RS
2346
2347For example:
2348
2349@example
2350@group
2351expr: primary
2352 | primary '+' primary
2353 ;
2354@end group
2355
2356@group
2357primary: constant
2358 | '(' expr ')'
2359 ;
2360@end group
2361@end example
2362
2363@noindent
2364defines two mutually-recursive nonterminals, since each refers to the
2365other.
2366
342b8b6e 2367@node Semantics
bfa74976
RS
2368@section Defining Language Semantics
2369@cindex defining language semantics
13863333 2370@cindex language semantics, defining
bfa74976
RS
2371
2372The grammar rules for a language determine only the syntax. The semantics
2373are determined by the semantic values associated with various tokens and
2374groupings, and by the actions taken when various groupings are recognized.
2375
2376For example, the calculator calculates properly because the value
2377associated with each expression is the proper number; it adds properly
2378because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
2379the numbers associated with @var{x} and @var{y}.
2380
2381@menu
2382* Value Type:: Specifying one data type for all semantic values.
2383* Multiple Types:: Specifying several alternative data types.
2384* Actions:: An action is the semantic definition of a grammar rule.
2385* Action Types:: Specifying data types for actions to operate on.
2386* Mid-Rule Actions:: Most actions go at the end of a rule.
2387 This says when, why and how to use the exceptional
2388 action in the middle of a rule.
2389@end menu
2390
342b8b6e 2391@node Value Type
bfa74976
RS
2392@subsection Data Types of Semantic Values
2393@cindex semantic value type
2394@cindex value type, semantic
2395@cindex data types of semantic values
2396@cindex default data type
2397
2398In a simple program it may be sufficient to use the same data type for
2399the semantic values of all language constructs. This was true in the
1964ad8c
AD
2400RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish
2401Notation Calculator}).
bfa74976
RS
2402
2403Bison's default is to use type @code{int} for all semantic values. To
2404specify some other type, define @code{YYSTYPE} as a macro, like this:
2405
2406@example
2407#define YYSTYPE double
2408@end example
2409
2410@noindent
342b8b6e 2411This macro definition must go in the prologue of the grammar file
75f5aaea 2412(@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
bfa74976 2413
342b8b6e 2414@node Multiple Types
bfa74976
RS
2415@subsection More Than One Value Type
2416
2417In most programs, you will need different data types for different kinds
2418of tokens and groupings. For example, a numeric constant may need type
2419@code{int} or @code{long}, while a string constant needs type @code{char *},
2420and an identifier might need a pointer to an entry in the symbol table.
2421
2422To use more than one data type for semantic values in one parser, Bison
2423requires you to do two things:
2424
2425@itemize @bullet
2426@item
2427Specify the entire collection of possible data types, with the
2428@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}).
2429
2430@item
14ded682
AD
2431Choose one of those types for each symbol (terminal or nonterminal) for
2432which semantic values are used. This is done for tokens with the
2433@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names})
2434and for groupings with the @code{%type} Bison declaration (@pxref{Type
2435Decl, ,Nonterminal Symbols}).
bfa74976
RS
2436@end itemize
2437
342b8b6e 2438@node Actions
bfa74976
RS
2439@subsection Actions
2440@cindex action
2441@vindex $$
2442@vindex $@var{n}
2443
2444An action accompanies a syntactic rule and contains C code to be executed
2445each time an instance of that rule is recognized. The task of most actions
2446is to compute a semantic value for the grouping built by the rule from the
2447semantic values associated with tokens or smaller groupings.
2448
2449An action consists of C statements surrounded by braces, much like a
2450compound statement in C. It can be placed at any position in the rule; it
2451is executed at that position. Most rules have just one action at the end
2452of the rule, following all the components. Actions in the middle of a rule
2453are tricky and used only for special purposes (@pxref{Mid-Rule Actions, ,Actions in Mid-Rule}).
2454
2455The C code in an action can refer to the semantic values of the components
2456matched by the rule with the construct @code{$@var{n}}, which stands for
2457the value of the @var{n}th component. The semantic value for the grouping
2458being constructed is @code{$$}. (Bison translates both of these constructs
2459into array element references when it copies the actions into the parser
2460file.)
2461
2462Here is a typical example:
2463
2464@example
2465@group
2466exp: @dots{}
2467 | exp '+' exp
2468 @{ $$ = $1 + $3; @}
2469@end group
2470@end example
2471
2472@noindent
2473This rule constructs an @code{exp} from two smaller @code{exp} groupings
2474connected by a plus-sign token. In the action, @code{$1} and @code{$3}
2475refer to the semantic values of the two component @code{exp} groupings,
2476which are the first and third symbols on the right hand side of the rule.
2477The sum is stored into @code{$$} so that it becomes the semantic value of
2478the addition-expression just recognized by the rule. If there were a
2479useful semantic value associated with the @samp{+} token, it could be
2480referred to as @code{$2}.@refill
2481
2482@cindex default action
2483If you don't specify an action for a rule, Bison supplies a default:
2484@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule becomes
2485the value of the whole rule. Of course, the default rule is valid only
2486if the two data types match. There is no meaningful default action for
2487an empty rule; every empty rule must have an explicit action unless the
2488rule's value does not matter.
2489
2490@code{$@var{n}} with @var{n} zero or negative is allowed for reference
2491to tokens and groupings on the stack @emph{before} those that match the
2492current rule. This is a very risky practice, and to use it reliably
2493you must be certain of the context in which the rule is applied. Here
2494is a case in which you can use this reliably:
2495
2496@example
2497@group
2498foo: expr bar '+' expr @{ @dots{} @}
2499 | expr bar '-' expr @{ @dots{} @}
2500 ;
2501@end group
2502
2503@group
2504bar: /* empty */
2505 @{ previous_expr = $0; @}
2506 ;
2507@end group
2508@end example
2509
2510As long as @code{bar} is used only in the fashion shown here, @code{$0}
2511always refers to the @code{expr} which precedes @code{bar} in the
2512definition of @code{foo}.
2513
342b8b6e 2514@node Action Types
bfa74976
RS
2515@subsection Data Types of Values in Actions
2516@cindex action data types
2517@cindex data types in actions
2518
2519If you have chosen a single data type for semantic values, the @code{$$}
2520and @code{$@var{n}} constructs always have that data type.
2521
2522If you have used @code{%union} to specify a variety of data types, then you
2523must declare a choice among these types for each terminal or nonterminal
2524symbol that can have a semantic value. Then each time you use @code{$$} or
2525@code{$@var{n}}, its data type is determined by which symbol it refers to
2526in the rule. In this example,@refill
2527
2528@example
2529@group
2530exp: @dots{}
2531 | exp '+' exp
2532 @{ $$ = $1 + $3; @}
2533@end group
2534@end example
2535
2536@noindent
2537@code{$1} and @code{$3} refer to instances of @code{exp}, so they all
2538have the data type declared for the nonterminal symbol @code{exp}. If
2539@code{$2} were used, it would have the data type declared for the
2540terminal symbol @code{'+'}, whatever that might be.@refill
2541
2542Alternatively, you can specify the data type when you refer to the value,
2543by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
2544reference. For example, if you have defined types as shown here:
2545
2546@example
2547@group
2548%union @{
2549 int itype;
2550 double dtype;
2551@}
2552@end group
2553@end example
2554
2555@noindent
2556then you can write @code{$<itype>1} to refer to the first subunit of the
2557rule as an integer, or @code{$<dtype>1} to refer to it as a double.
2558
342b8b6e 2559@node Mid-Rule Actions
bfa74976
RS
2560@subsection Actions in Mid-Rule
2561@cindex actions in mid-rule
2562@cindex mid-rule actions
2563
2564Occasionally it is useful to put an action in the middle of a rule.
2565These actions are written just like usual end-of-rule actions, but they
2566are executed before the parser even recognizes the following components.
2567
2568A mid-rule action may refer to the components preceding it using
2569@code{$@var{n}}, but it may not refer to subsequent components because
2570it is run before they are parsed.
2571
2572The mid-rule action itself counts as one of the components of the rule.
2573This makes a difference when there is another action later in the same rule
2574(and usually there is another at the end): you have to count the actions
2575along with the symbols when working out which number @var{n} to use in
2576@code{$@var{n}}.
2577
2578The mid-rule action can also have a semantic value. The action can set
2579its value with an assignment to @code{$$}, and actions later in the rule
2580can refer to the value using @code{$@var{n}}. Since there is no symbol
2581to name the action, there is no way to declare a data type for the value
fdc6758b
MA
2582in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to
2583specify a data type each time you refer to this value.
bfa74976
RS
2584
2585There is no way to set the value of the entire rule with a mid-rule
2586action, because assignments to @code{$$} do not have that effect. The
2587only way to set the value for the entire rule is with an ordinary action
2588at the end of the rule.
2589
2590Here is an example from a hypothetical compiler, handling a @code{let}
2591statement that looks like @samp{let (@var{variable}) @var{statement}} and
2592serves to create a variable named @var{variable} temporarily for the
2593duration of @var{statement}. To parse this construct, we must put
2594@var{variable} into the symbol table while @var{statement} is parsed, then
2595remove it afterward. Here is how it is done:
2596
2597@example
2598@group
2599stmt: LET '(' var ')'
2600 @{ $<context>$ = push_context ();
2601 declare_variable ($3); @}
2602 stmt @{ $$ = $6;
2603 pop_context ($<context>5); @}
2604@end group
2605@end example
2606
2607@noindent
2608As soon as @samp{let (@var{variable})} has been recognized, the first
2609action is run. It saves a copy of the current semantic context (the
2610list of accessible variables) as its semantic value, using alternative
2611@code{context} in the data-type union. Then it calls
2612@code{declare_variable} to add the new variable to that list. Once the
2613first action is finished, the embedded statement @code{stmt} can be
2614parsed. Note that the mid-rule action is component number 5, so the
2615@samp{stmt} is component number 6.
2616
2617After the embedded statement is parsed, its semantic value becomes the
2618value of the entire @code{let}-statement. Then the semantic value from the
2619earlier action is used to restore the prior list of variables. This
2620removes the temporary @code{let}-variable from the list so that it won't
2621appear to exist while the rest of the program is parsed.
2622
2623Taking action before a rule is completely recognized often leads to
2624conflicts since the parser must commit to a parse in order to execute the
2625action. For example, the following two rules, without mid-rule actions,
2626can coexist in a working parser because the parser can shift the open-brace
2627token and look at what follows before deciding whether there is a
2628declaration or not:
2629
2630@example
2631@group
2632compound: '@{' declarations statements '@}'
2633 | '@{' statements '@}'
2634 ;
2635@end group
2636@end example
2637
2638@noindent
2639But when we add a mid-rule action as follows, the rules become nonfunctional:
2640
2641@example
2642@group
2643compound: @{ prepare_for_local_variables (); @}
2644 '@{' declarations statements '@}'
2645@end group
2646@group
2647 | '@{' statements '@}'
2648 ;
2649@end group
2650@end example
2651
2652@noindent
2653Now the parser is forced to decide whether to run the mid-rule action
2654when it has read no farther than the open-brace. In other words, it
2655must commit to using one rule or the other, without sufficient
2656information to do it correctly. (The open-brace token is what is called
2657the @dfn{look-ahead} token at this time, since the parser is still
2658deciding what to do about it. @xref{Look-Ahead, ,Look-Ahead Tokens}.)
2659
2660You might think that you could correct the problem by putting identical
2661actions into the two rules, like this:
2662
2663@example
2664@group
2665compound: @{ prepare_for_local_variables (); @}
2666 '@{' declarations statements '@}'
2667 | @{ prepare_for_local_variables (); @}
2668 '@{' statements '@}'
2669 ;
2670@end group
2671@end example
2672
2673@noindent
2674But this does not help, because Bison does not realize that the two actions
2675are identical. (Bison never tries to understand the C code in an action.)
2676
2677If the grammar is such that a declaration can be distinguished from a
2678statement by the first token (which is true in C), then one solution which
2679does work is to put the action after the open-brace, like this:
2680
2681@example
2682@group
2683compound: '@{' @{ prepare_for_local_variables (); @}
2684 declarations statements '@}'
2685 | '@{' statements '@}'
2686 ;
2687@end group
2688@end example
2689
2690@noindent
2691Now the first token of the following declaration or statement,
2692which would in any case tell Bison which rule to use, can still do so.
2693
2694Another solution is to bury the action inside a nonterminal symbol which
2695serves as a subroutine:
2696
2697@example
2698@group
2699subroutine: /* empty */
2700 @{ prepare_for_local_variables (); @}
2701 ;
2702
2703@end group
2704
2705@group
2706compound: subroutine
2707 '@{' declarations statements '@}'
2708 | subroutine
2709 '@{' statements '@}'
2710 ;
2711@end group
2712@end example
2713
2714@noindent
2715Now Bison can execute the action in the rule for @code{subroutine} without
2716deciding which rule for @code{compound} it will eventually use. Note that
2717the action is now at the end of its rule. Any mid-rule action can be
2718converted to an end-of-rule action in this way, and this is what Bison
2719actually does to implement mid-rule actions.
2720
342b8b6e 2721@node Locations
847bf1f5
AD
2722@section Tracking Locations
2723@cindex location
2724@cindex textual position
2725@cindex position, textual
2726
2727Though grammar rules and semantic actions are enough to write a fully
2728functional parser, it can be useful to process some additionnal informations,
3e259915
MA
2729especially symbol locations.
2730
2731@c (terminal or not) ?
847bf1f5
AD
2732
2733The way locations are handled is defined by providing a data type, and actions
2734to take when rules are matched.
2735
2736@menu
2737* Location Type:: Specifying a data type for locations.
2738* Actions and Locations:: Using locations in actions.
2739* Location Default Action:: Defining a general way to compute locations.
2740@end menu
2741
342b8b6e 2742@node Location Type
847bf1f5
AD
2743@subsection Data Type of Locations
2744@cindex data type of locations
2745@cindex default location type
2746
2747Defining a data type for locations is much simpler than for semantic values,
2748since all tokens and groupings always use the same type.
2749
2750The type of locations is specified by defining a macro called @code{YYLTYPE}.
2751When @code{YYLTYPE} is not defined, Bison uses a default structure type with
2752four members:
2753
2754@example
2755struct
2756@{
2757 int first_line;
2758 int first_column;
2759 int last_line;
2760 int last_column;
2761@}
2762@end example
2763
342b8b6e 2764@node Actions and Locations
847bf1f5
AD
2765@subsection Actions and Locations
2766@cindex location actions
2767@cindex actions, location
2768@vindex @@$
2769@vindex @@@var{n}
2770
2771Actions are not only useful for defining language semantics, but also for
2772describing the behavior of the output parser with locations.
2773
2774The most obvious way for building locations of syntactic groupings is very
2775similar to the way semantic values are computed. In a given rule, several
2776constructs can be used to access the locations of the elements being matched.
2777The location of the @var{n}th component of the right hand side is
2778@code{@@@var{n}}, while the location of the left hand side grouping is
2779@code{@@$}.
2780
3e259915 2781Here is a basic example using the default data type for locations:
847bf1f5
AD
2782
2783@example
2784@group
2785exp: @dots{}
3e259915 2786 | exp '/' exp
847bf1f5 2787 @{
3e259915
MA
2788 @@$.first_column = @@1.first_column;
2789 @@$.first_line = @@1.first_line;
847bf1f5
AD
2790 @@$.last_column = @@3.last_column;
2791 @@$.last_line = @@3.last_line;
3e259915
MA
2792 if ($3)
2793 $$ = $1 / $3;
2794 else
2795 @{
2796 $$ = 1;
2797 printf("Division by zero, l%d,c%d-l%d,c%d",
2798 @@3.first_line, @@3.first_column,
2799 @@3.last_line, @@3.last_column);
2800 @}
847bf1f5
AD
2801 @}
2802@end group
2803@end example
2804
3e259915
MA
2805As for semantic values, there is a default action for locations that is
2806run each time a rule is matched. It sets the beginning of @code{@@$} to the
2807beginning of the first symbol, and the end of @code{@@$} to the end of the
79282c6c 2808last symbol.
3e259915
MA
2809
2810With this default action, the location tracking can be fully automatic. The
2811example above simply rewrites this way:
2812
2813@example
2814@group
2815exp: @dots{}
2816 | exp '/' exp
2817 @{
2818 if ($3)
2819 $$ = $1 / $3;
2820 else
2821 @{
2822 $$ = 1;
2823 printf("Division by zero, l%d,c%d-l%d,c%d",
2824 @@3.first_line, @@3.first_column,
2825 @@3.last_line, @@3.last_column);
2826 @}
2827 @}
2828@end group
2829@end example
847bf1f5 2830
342b8b6e 2831@node Location Default Action
847bf1f5
AD
2832@subsection Default Action for Locations
2833@vindex YYLLOC_DEFAULT
2834
2835Actually, actions are not the best place to compute locations. Since locations
2836are much more general than semantic values, there is room in the output parser
79282c6c 2837to redefine the default action to take for each rule. The
3e259915 2838@code{YYLLOC_DEFAULT} macro is called each time a rule is matched, before the
79282c6c 2839associated action is run.
847bf1f5 2840
3e259915 2841Most of the time, this macro is general enough to suppress location
79282c6c 2842dedicated code from semantic actions.
847bf1f5 2843
79282c6c
AD
2844The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
2845the location of the grouping (the result of the computation). The second one
2846is an array holding locations of all right hand side elements of the rule
3e259915 2847being matched. The last one is the size of the right hand side rule.
847bf1f5 2848
3e259915 2849By default, it is defined this way:
847bf1f5
AD
2850
2851@example
2852@group
3e259915
MA
2853#define YYLLOC_DEFAULT(Current, Rhs, N) \
2854 Current.last_line = Rhs[N].last_line; \
2855 Current.last_column = Rhs[N].last_column;
847bf1f5
AD
2856@end group
2857@end example
2858
3e259915 2859When defining @code{YYLLOC_DEFAULT}, you should consider that:
847bf1f5 2860
3e259915 2861@itemize @bullet
79282c6c 2862@item
3e259915
MA
2863All arguments are free of side-effects. However, only the first one (the
2864result) should be modified by @code{YYLLOC_DEFAULT}.
847bf1f5 2865
3e259915
MA
2866@item
2867Before @code{YYLLOC_DEFAULT} is executed, the output parser sets @code{@@$}
2868to @code{@@1}.
2869
2870@item
2871For consistency with semantic actions, valid indexes for the location array
2872range from 1 to @var{n}.
2873@end itemize
847bf1f5 2874
342b8b6e 2875@node Declarations
bfa74976
RS
2876@section Bison Declarations
2877@cindex declarations, Bison
2878@cindex Bison declarations
2879
2880The @dfn{Bison declarations} section of a Bison grammar defines the symbols
2881used in formulating the grammar and the data types of semantic values.
2882@xref{Symbols}.
2883
2884All token type names (but not single-character literal tokens such as
2885@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
2886declared if you need to specify which data type to use for the semantic
2887value (@pxref{Multiple Types, ,More Than One Value Type}).
2888
2889The first rule in the file also specifies the start symbol, by default.
2890If you want some other symbol to be the start symbol, you must declare
2891it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
2892
2893@menu
2894* Token Decl:: Declaring terminal symbols.
2895* Precedence Decl:: Declaring terminals with precedence and associativity.
2896* Union Decl:: Declaring the set of all semantic value types.
2897* Type Decl:: Declaring the choice of type for a nonterminal symbol.
2898* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
2899* Start Decl:: Specifying the start symbol.
2900* Pure Decl:: Requesting a reentrant parser.
2901* Decl Summary:: Table of all Bison declarations.
2902@end menu
2903
342b8b6e 2904@node Token Decl
bfa74976
RS
2905@subsection Token Type Names
2906@cindex declaring token type names
2907@cindex token type names, declaring
931c7513 2908@cindex declaring literal string tokens
bfa74976
RS
2909@findex %token
2910
2911The basic way to declare a token type name (terminal symbol) is as follows:
2912
2913@example
2914%token @var{name}
2915@end example
2916
2917Bison will convert this into a @code{#define} directive in
2918the parser, so that the function @code{yylex} (if it is in this file)
2919can use the name @var{name} to stand for this token type's code.
2920
14ded682
AD
2921Alternatively, you can use @code{%left}, @code{%right}, or
2922@code{%nonassoc} instead of @code{%token}, if you wish to specify
2923associativity and precedence. @xref{Precedence Decl, ,Operator
2924Precedence}.
bfa74976
RS
2925
2926You can explicitly specify the numeric code for a token type by appending
2927an integer value in the field immediately following the token name:
2928
2929@example
2930%token NUM 300
2931@end example
2932
2933@noindent
2934It is generally best, however, to let Bison choose the numeric codes for
2935all token types. Bison will automatically select codes that don't conflict
2936with each other or with ASCII characters.
2937
2938In the event that the stack type is a union, you must augment the
2939@code{%token} or other token declaration to include the data type
13863333 2940alternative delimited by angle-brackets (@pxref{Multiple Types, ,More Than One Value Type}).
bfa74976
RS
2941
2942For example:
2943
2944@example
2945@group
2946%union @{ /* define stack type */
2947 double val;
2948 symrec *tptr;
2949@}
2950%token <val> NUM /* define token NUM and its type */
2951@end group
2952@end example
2953
931c7513
RS
2954You can associate a literal string token with a token type name by
2955writing the literal string at the end of a @code{%token}
2956declaration which declares the name. For example:
2957
2958@example
2959%token arrow "=>"
2960@end example
2961
2962@noindent
2963For example, a grammar for the C language might specify these names with
2964equivalent literal string tokens:
2965
2966@example
2967%token <operator> OR "||"
2968%token <operator> LE 134 "<="
2969%left OR "<="
2970@end example
2971
2972@noindent
2973Once you equate the literal string and the token name, you can use them
2974interchangeably in further declarations or the grammar rules. The
2975@code{yylex} function can use the token name or the literal string to
2976obtain the token type code number (@pxref{Calling Convention}).
2977
342b8b6e 2978@node Precedence Decl
bfa74976
RS
2979@subsection Operator Precedence
2980@cindex precedence declarations
2981@cindex declaring operator precedence
2982@cindex operator precedence, declaring
2983
2984Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
2985declare a token and specify its precedence and associativity, all at
2986once. These are called @dfn{precedence declarations}.
2987@xref{Precedence, ,Operator Precedence}, for general information on operator precedence.
2988
2989The syntax of a precedence declaration is the same as that of
2990@code{%token}: either
2991
2992@example
2993%left @var{symbols}@dots{}
2994@end example
2995
2996@noindent
2997or
2998
2999@example
3000%left <@var{type}> @var{symbols}@dots{}
3001@end example
3002
3003And indeed any of these declarations serves the purposes of @code{%token}.
3004But in addition, they specify the associativity and relative precedence for
3005all the @var{symbols}:
3006
3007@itemize @bullet
3008@item
3009The associativity of an operator @var{op} determines how repeated uses
3010of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
3011@var{z}} is parsed by grouping @var{x} with @var{y} first or by
3012grouping @var{y} with @var{z} first. @code{%left} specifies
3013left-associativity (grouping @var{x} with @var{y} first) and
3014@code{%right} specifies right-associativity (grouping @var{y} with
3015@var{z} first). @code{%nonassoc} specifies no associativity, which
3016means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
3017considered a syntax error.
3018
3019@item
3020The precedence of an operator determines how it nests with other operators.
3021All the tokens declared in a single precedence declaration have equal
3022precedence and nest together according to their associativity.
3023When two tokens declared in different precedence declarations associate,
3024the one declared later has the higher precedence and is grouped first.
3025@end itemize
3026
342b8b6e 3027@node Union Decl
bfa74976
RS
3028@subsection The Collection of Value Types
3029@cindex declaring value types
3030@cindex value types, declaring
3031@findex %union
3032
3033The @code{%union} declaration specifies the entire collection of possible
3034data types for semantic values. The keyword @code{%union} is followed by a
3035pair of braces containing the same thing that goes inside a @code{union} in
13863333 3036C.
bfa74976
RS
3037
3038For example:
3039
3040@example
3041@group
3042%union @{
3043 double val;
3044 symrec *tptr;
3045@}
3046@end group
3047@end example
3048
3049@noindent
3050This says that the two alternative types are @code{double} and @code{symrec
3051*}. They are given names @code{val} and @code{tptr}; these names are used
3052in the @code{%token} and @code{%type} declarations to pick one of the types
3053for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
3054
3055Note that, unlike making a @code{union} declaration in C, you do not write
3056a semicolon after the closing brace.
3057
342b8b6e 3058@node Type Decl
bfa74976
RS
3059@subsection Nonterminal Symbols
3060@cindex declaring value types, nonterminals
3061@cindex value types, nonterminals, declaring
3062@findex %type
3063
3064@noindent
3065When you use @code{%union} to specify multiple value types, you must
3066declare the value type of each nonterminal symbol for which values are
3067used. This is done with a @code{%type} declaration, like this:
3068
3069@example
3070%type <@var{type}> @var{nonterminal}@dots{}
3071@end example
3072
3073@noindent
3074Here @var{nonterminal} is the name of a nonterminal symbol, and @var{type}
3075is the name given in the @code{%union} to the alternative that you want
3076(@pxref{Union Decl, ,The Collection of Value Types}). You can give any number of nonterminal symbols in
3077the same @code{%type} declaration, if they have the same value type. Use
3078spaces to separate the symbol names.
3079
931c7513
RS
3080You can also declare the value type of a terminal symbol. To do this,
3081use the same @code{<@var{type}>} construction in a declaration for the
3082terminal symbol. All kinds of token declarations allow
3083@code{<@var{type}>}.
3084
342b8b6e 3085@node Expect Decl
bfa74976
RS
3086@subsection Suppressing Conflict Warnings
3087@cindex suppressing conflict warnings
3088@cindex preventing warnings about conflicts
3089@cindex warnings, preventing
3090@cindex conflicts, suppressing warnings of
3091@findex %expect
3092
3093Bison normally warns if there are any conflicts in the grammar
7da99ede
AD
3094(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars
3095have harmless shift/reduce conflicts which are resolved in a predictable
3096way and would be difficult to eliminate. It is desirable to suppress
3097the warning about these conflicts unless the number of conflicts
3098changes. You can do this with the @code{%expect} declaration.
bfa74976
RS
3099
3100The declaration looks like this:
3101
3102@example
3103%expect @var{n}
3104@end example
3105
7da99ede
AD
3106Here @var{n} is a decimal integer. The declaration says there should be
3107no warning if there are @var{n} shift/reduce conflicts and no
3108reduce/reduce conflicts. An error, instead of the usual warning, is
3109given if there are either more or fewer conflicts, or if there are any
3110reduce/reduce conflicts.
bfa74976
RS
3111
3112In general, using @code{%expect} involves these steps:
3113
3114@itemize @bullet
3115@item
3116Compile your grammar without @code{%expect}. Use the @samp{-v} option
3117to get a verbose list of where the conflicts occur. Bison will also
3118print the number of conflicts.
3119
3120@item
3121Check each of the conflicts to make sure that Bison's default
3122resolution is what you really want. If not, rewrite the grammar and
3123go back to the beginning.
3124
3125@item
3126Add an @code{%expect} declaration, copying the number @var{n} from the
3127number which Bison printed.
3128@end itemize
3129
3130Now Bison will stop annoying you about the conflicts you have checked, but
3131it will warn you again if changes in the grammar result in additional
3132conflicts.
3133
342b8b6e 3134@node Start Decl
bfa74976
RS
3135@subsection The Start-Symbol
3136@cindex declaring the start symbol
3137@cindex start symbol, declaring
3138@cindex default start symbol
3139@findex %start
3140
3141Bison assumes by default that the start symbol for the grammar is the first
3142nonterminal specified in the grammar specification section. The programmer
3143may override this restriction with the @code{%start} declaration as follows:
3144
3145@example
3146%start @var{symbol}
3147@end example
3148
342b8b6e 3149@node Pure Decl
bfa74976
RS
3150@subsection A Pure (Reentrant) Parser
3151@cindex reentrant parser
3152@cindex pure parser
3153@findex %pure_parser
3154
3155A @dfn{reentrant} program is one which does not alter in the course of
3156execution; in other words, it consists entirely of @dfn{pure} (read-only)
3157code. Reentrancy is important whenever asynchronous execution is possible;
14ded682
AD
3158for example, a non-reentrant program may not be safe to call from a signal
3159handler. In systems with multiple threads of control, a non-reentrant
bfa74976
RS
3160program must be called only within interlocks.
3161
70811b85
RS
3162Normally, Bison generates a parser which is not reentrant. This is
3163suitable for most uses, and it permits compatibility with YACC. (The
3164standard YACC interfaces are inherently nonreentrant, because they use
3165statically allocated variables for communication with @code{yylex},
3166including @code{yylval} and @code{yylloc}.)
bfa74976 3167
70811b85
RS
3168Alternatively, you can generate a pure, reentrant parser. The Bison
3169declaration @code{%pure_parser} says that you want the parser to be
3170reentrant. It looks like this:
bfa74976
RS
3171
3172@example
3173%pure_parser
3174@end example
3175
70811b85
RS
3176The result is that the communication variables @code{yylval} and
3177@code{yylloc} become local variables in @code{yyparse}, and a different
3178calling convention is used for the lexical analyzer function
3179@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
3180Parsers}, for the details of this. The variable @code{yynerrs} also
3181becomes local in @code{yyparse} (@pxref{Error Reporting, ,The Error
3182Reporting Function @code{yyerror}}). The convention for calling
3183@code{yyparse} itself is unchanged.
3184
3185Whether the parser is pure has nothing to do with the grammar rules.
3186You can generate either a pure parser or a nonreentrant parser from any
3187valid grammar.
bfa74976 3188
342b8b6e 3189@node Decl Summary
bfa74976
RS
3190@subsection Bison Declaration Summary
3191@cindex Bison declaration summary
3192@cindex declaration summary
3193@cindex summary, Bison declaration
3194
d8988b2f 3195Here is a summary of the declarations used to define a grammar:
bfa74976
RS
3196
3197@table @code
3198@item %union
3199Declare the collection of data types that semantic values may have
3200(@pxref{Union Decl, ,The Collection of Value Types}).
3201
3202@item %token
3203Declare a terminal symbol (token type name) with no precedence
3204or associativity specified (@pxref{Token Decl, ,Token Type Names}).
3205
3206@item %right
3207Declare a terminal symbol (token type name) that is right-associative
3208(@pxref{Precedence Decl, ,Operator Precedence}).
3209
3210@item %left
3211Declare a terminal symbol (token type name) that is left-associative
3212(@pxref{Precedence Decl, ,Operator Precedence}).
3213
3214@item %nonassoc
3215Declare a terminal symbol (token type name) that is nonassociative
3216(using it in a way that would be associative is a syntax error)
3217(@pxref{Precedence Decl, ,Operator Precedence}).
3218
3219@item %type
3220Declare the type of semantic values for a nonterminal symbol
3221(@pxref{Type Decl, ,Nonterminal Symbols}).
3222
3223@item %start
89cab50d
AD
3224Specify the grammar's start symbol (@pxref{Start Decl, ,The
3225Start-Symbol}).
bfa74976
RS
3226
3227@item %expect
3228Declare the expected number of shift-reduce conflicts
3229(@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
d8988b2f 3230@end table
bfa74976 3231
d8988b2f
AD
3232@sp 1
3233@noindent
3234In order to change the behavior of @command{bison}, use the following
3235directives:
3236
3237@table @code
3238@item %debug
4947ebdb
PE
3239In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
3240already defined, so that the debugging facilities are compiled.
3241@xref{Debugging, ,Debugging Your Parser}.
d8988b2f
AD
3242
3243@item %defines
3244Write an extra output file containing macro definitions for the token
3245type names defined in the grammar and the semantic value type
3246@code{YYSTYPE}, as well as a few @code{extern} variable declarations.
3247
3248If the parser output file is named @file{@var{name}.c} then this file
3249is named @file{@var{name}.h}.@refill
3250
3251This output file is essential if you wish to put the definition of
3252@code{yylex} in a separate source file, because @code{yylex} needs to
3253be able to refer to token type codes and the variable
3254@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill
3255
3256@item %file-prefix="@var{prefix}"
3257Specify a prefix to use for all Bison output file names. The names are
3258chosen as if the input file were named @file{@var{prefix}.y}.
3259
3260@c @item %header_extension
3261@c Specify the extension of the parser header file generated when
3262@c @code{%define} or @samp{-d} are used.
3263@c
3264@c For example, a grammar file named @file{foo.ypp} and containing a
3265@c @code{%header_extension .hh} directive will produce a header file
3266@c named @file{foo.tab.hh}
6deb4447 3267
89cab50d
AD
3268@item %locations
3269Generate the code processing the locations (@pxref{Action Features,
3270,Special Features for Use in Actions}). This mode is enabled as soon as
3271the grammar uses the special @samp{@@@var{n}} tokens, but if your
3272grammar does not use it, using @samp{%locations} allows for more
3273accurate parse error messages.
3274
d8988b2f
AD
3275@item %name-prefix="@var{prefix}"
3276Rename the external symbols used in the parser so that they start with
3277@var{prefix} instead of @samp{yy}. The precise list of symbols renamed
3278is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
3279@code{yylval}, @code{yychar} and @code{yydebug}. For example, if you
3280use @samp{%name-prefix="c_"}, the names become @code{c_parse},
3281@code{c_lex}, and so on. @xref{Multiple Parsers, ,Multiple Parsers in
3282the Same Program}.
931c7513 3283
d8988b2f 3284@item %no-parser
6deb4447
AD
3285Do not include any C code in the parser file; generate tables only. The
3286parser file contains just @code{#define} directives and static variable
3287declarations.
3288
3289This option also tells Bison to write the C code for the grammar actions
3290into a file named @file{@var{filename}.act}, in the form of a
3291brace-surrounded body fit for a @code{switch} statement.
3292
d8988b2f 3293@item %no-lines
931c7513
RS
3294Don't generate any @code{#line} preprocessor commands in the parser
3295file. Ordinarily Bison writes these commands in the parser file so that
3296the C compiler and debuggers will associate errors and object code with
3297your source file (the grammar file). This directive causes them to
3298associate errors with the parser file, treating it an independent source
3299file in its own right.
3300
d8988b2f
AD
3301@item %output="@var{filename}"
3302Specify the @var{filename} for the parser file.
6deb4447 3303
d8988b2f
AD
3304@item %pure-parser
3305Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure
3306(Reentrant) Parser}).
6deb4447 3307
f9a8293a
AD
3308@c @item %source_extension
3309@c Specify the extension of the parser output file.
3310@c
3311@c For example, a grammar file named @file{foo.yy} and containing a
3312@c @code{%source_extension .cpp} directive will produce a parser file
3313@c named @file{foo.tab.cpp}
6deb4447 3314
931c7513
RS
3315@item %token_table
3316Generate an array of token names in the parser file. The name of the
3317array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
3318token whose internal Bison token code number is @var{i}. The first three
3319elements of @code{yytname} are always @code{"$"}, @code{"error"}, and
3320@code{"$illegal"}; after these come the symbols defined in the grammar
3321file.
3322
3323For single-character literal tokens and literal string tokens, the name
3324in the table includes the single-quote or double-quote characters: for
3325example, @code{"'+'"} is a single-character literal and @code{"\"<=\""}
3326is a literal string token. All the characters of the literal string
3327token appear verbatim in the string found in the table; even
3328double-quote characters are not escaped. For example, if the token
3329consists of three characters @samp{*"*}, its string in @code{yytname}
3330contains @samp{"*"*"}. (In C, that would be written as
3331@code{"\"*\"*\""}).
3332
3333When you specify @code{%token_table}, Bison also generates macro
3334definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
3335@code{YYNRULES}, and @code{YYNSTATES}:
3336
3337@table @code
3338@item YYNTOKENS
3339The highest token number, plus one.
3340@item YYNNTS
9ecbd125 3341The number of nonterminal symbols.
931c7513
RS
3342@item YYNRULES
3343The number of grammar rules,
3344@item YYNSTATES
3345The number of parser states (@pxref{Parser States}).
3346@end table
d8988b2f
AD
3347
3348@item %verbose
3349Write an extra output file containing verbose descriptions of the
3350parser states and what is done for each type of look-ahead token in
3351that state.
3352
3353This file also describes all the conflicts, both those resolved by
3354operator precedence and the unresolved ones.
3355
3356The file's name is made by removing @samp{.tab.c} or @samp{.c} from
3357the parser output file name, and adding @samp{.output} instead.@refill
3358
3359Therefore, if the input file is @file{foo.y}, then the parser file is
3360called @file{foo.tab.c} by default. As a consequence, the verbose
3361output file is called @file{foo.output}.@refill
3362
3363@item %yacc
3364@itemx %fixed-output-files
3365Pretend the option @option{--yacc} was given, i.e., imitate Yacc,
3366including its naming conventions. @xref{Bison Options}, for more.
bfa74976
RS
3367@end table
3368
d8988b2f
AD
3369
3370
3371
342b8b6e 3372@node Multiple Parsers
bfa74976
RS
3373@section Multiple Parsers in the Same Program
3374
3375Most programs that use Bison parse only one language and therefore contain
3376only one Bison parser. But what if you want to parse more than one
3377language with the same program? Then you need to avoid a name conflict
3378between different definitions of @code{yyparse}, @code{yylval}, and so on.
3379
3380The easy way to do this is to use the option @samp{-p @var{prefix}}
3381(@pxref{Invocation, ,Invoking Bison}). This renames the interface functions and
3382variables of the Bison parser to start with @var{prefix} instead of
3383@samp{yy}. You can use this to give each parser distinct names that do
3384not conflict.
3385
3386The precise list of symbols renamed is @code{yyparse}, @code{yylex},
c656404a
RS
3387@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yychar} and
3388@code{yydebug}. For example, if you use @samp{-p c}, the names become
3389@code{cparse}, @code{clex}, and so on.
bfa74976
RS
3390
3391@strong{All the other variables and macros associated with Bison are not
3392renamed.} These others are not global; there is no conflict if the same
3393name is used in different parsers. For example, @code{YYSTYPE} is not
3394renamed, but defining this in different ways in different parsers causes
3395no trouble (@pxref{Value Type, ,Data Types of Semantic Values}).
3396
3397The @samp{-p} option works by adding macro definitions to the beginning
3398of the parser source file, defining @code{yyparse} as
3399@code{@var{prefix}parse}, and so on. This effectively substitutes one
3400name for the other in the entire parser file.
3401
342b8b6e 3402@node Interface
bfa74976
RS
3403@chapter Parser C-Language Interface
3404@cindex C-language interface
3405@cindex interface
3406
3407The Bison parser is actually a C function named @code{yyparse}. Here we
3408describe the interface conventions of @code{yyparse} and the other
3409functions that it needs to use.
3410
3411Keep in mind that the parser uses many C identifiers starting with
3412@samp{yy} and @samp{YY} for internal purposes. If you use such an
75f5aaea
MA
3413identifier (aside from those in this manual) in an action or in epilogue
3414in the grammar file, you are likely to run into trouble.
bfa74976
RS
3415
3416@menu
3417* Parser Function:: How to call @code{yyparse} and what it returns.
13863333 3418* Lexical:: You must supply a function @code{yylex}
bfa74976
RS
3419 which reads tokens.
3420* Error Reporting:: You must supply a function @code{yyerror}.
3421* Action Features:: Special features for use in actions.
3422@end menu
3423
342b8b6e 3424@node Parser Function
bfa74976
RS
3425@section The Parser Function @code{yyparse}
3426@findex yyparse
3427
3428You call the function @code{yyparse} to cause parsing to occur. This
3429function reads tokens, executes actions, and ultimately returns when it
3430encounters end-of-input or an unrecoverable syntax error. You can also
14ded682
AD
3431write an action which directs @code{yyparse} to return immediately
3432without reading further.
bfa74976
RS
3433
3434The value returned by @code{yyparse} is 0 if parsing was successful (return
3435is due to end-of-input).
3436
3437The value is 1 if parsing failed (return is due to a syntax error).
3438
3439In an action, you can cause immediate return from @code{yyparse} by using
3440these macros:
3441
3442@table @code
3443@item YYACCEPT
3444@findex YYACCEPT
3445Return immediately with value 0 (to report success).
3446
3447@item YYABORT
3448@findex YYABORT
3449Return immediately with value 1 (to report failure).
3450@end table
3451
342b8b6e 3452@node Lexical
bfa74976
RS
3453@section The Lexical Analyzer Function @code{yylex}
3454@findex yylex
3455@cindex lexical analyzer
3456
3457The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
3458the input stream and returns them to the parser. Bison does not create
3459this function automatically; you must write it so that @code{yyparse} can
3460call it. The function is sometimes referred to as a lexical scanner.
3461
3462In simple programs, @code{yylex} is often defined at the end of the Bison
3463grammar file. If @code{yylex} is defined in a separate source file, you
3464need to arrange for the token-type macro definitions to be available there.
3465To do this, use the @samp{-d} option when you run Bison, so that it will
3466write these macro definitions into a separate header file
3467@file{@var{name}.tab.h} which you can include in the other source files
3468that need it. @xref{Invocation, ,Invoking Bison}.@refill
3469
3470@menu
3471* Calling Convention:: How @code{yyparse} calls @code{yylex}.
3472* Token Values:: How @code{yylex} must return the semantic value
3473 of the token it has read.
3474* Token Positions:: How @code{yylex} must return the text position
3475 (line number, etc.) of the token, if the
3476 actions want that.
3477* Pure Calling:: How the calling convention differs
3478 in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
3479@end menu
3480
342b8b6e 3481@node Calling Convention
bfa74976
RS
3482@subsection Calling Convention for @code{yylex}
3483
3484The value that @code{yylex} returns must be the numeric code for the type
3485of token it has just found, or 0 for end-of-input.
3486
3487When a token is referred to in the grammar rules by a name, that name
3488in the parser file becomes a C macro whose definition is the proper
3489numeric code for that token type. So @code{yylex} can use the name
3490to indicate that type. @xref{Symbols}.
3491
3492When a token is referred to in the grammar rules by a character literal,
3493the numeric code for that character is also the code for the token type.
3494So @code{yylex} can simply return that character code. The null character
3495must not be used this way, because its code is zero and that is what
3496signifies end-of-input.
3497
3498Here is an example showing these things:
3499
3500@example
13863333
AD
3501int
3502yylex (void)
bfa74976
RS
3503@{
3504 @dots{}
3505 if (c == EOF) /* Detect end of file. */
3506 return 0;
3507 @dots{}
3508 if (c == '+' || c == '-')
3509 return c; /* Assume token type for `+' is '+'. */
3510 @dots{}
3511 return INT; /* Return the type of the token. */
3512 @dots{}
3513@}
3514@end example
3515
3516@noindent
3517This interface has been designed so that the output from the @code{lex}
3518utility can be used without change as the definition of @code{yylex}.
3519
931c7513
RS
3520If the grammar uses literal string tokens, there are two ways that
3521@code{yylex} can determine the token type codes for them:
3522
3523@itemize @bullet
3524@item
3525If the grammar defines symbolic token names as aliases for the
3526literal string tokens, @code{yylex} can use these symbolic names like
3527all others. In this case, the use of the literal string tokens in
3528the grammar file has no effect on @code{yylex}.
3529
3530@item
9ecbd125 3531@code{yylex} can find the multicharacter token in the @code{yytname}
931c7513 3532table. The index of the token in the table is the token type's code.
9ecbd125 3533The name of a multicharacter token is recorded in @code{yytname} with a
931c7513
RS
3534double-quote, the token's characters, and another double-quote. The
3535token's characters are not escaped in any way; they appear verbatim in
3536the contents of the string in the table.
3537
3538Here's code for looking up a token in @code{yytname}, assuming that the
3539characters of the token are stored in @code{token_buffer}.
3540
3541@smallexample
3542for (i = 0; i < YYNTOKENS; i++)
3543 @{
3544 if (yytname[i] != 0
3545 && yytname[i][0] == '"'
6f515a27
JT
3546 && strncmp (yytname[i] + 1, token_buffer,
3547 strlen (token_buffer))
931c7513
RS
3548 && yytname[i][strlen (token_buffer) + 1] == '"'
3549 && yytname[i][strlen (token_buffer) + 2] == 0)
3550 break;
3551 @}
3552@end smallexample
3553
3554The @code{yytname} table is generated only if you use the
3555@code{%token_table} declaration. @xref{Decl Summary}.
3556@end itemize
3557
342b8b6e 3558@node Token Values
bfa74976
RS
3559@subsection Semantic Values of Tokens
3560
3561@vindex yylval
14ded682 3562In an ordinary (non-reentrant) parser, the semantic value of the token must
bfa74976
RS
3563be stored into the global variable @code{yylval}. When you are using
3564just one data type for semantic values, @code{yylval} has that type.
3565Thus, if the type is @code{int} (the default), you might write this in
3566@code{yylex}:
3567
3568@example
3569@group
3570 @dots{}
3571 yylval = value; /* Put value onto Bison stack. */
3572 return INT; /* Return the type of the token. */
3573 @dots{}
3574@end group
3575@end example
3576
3577When you are using multiple data types, @code{yylval}'s type is a union
3578made from the @code{%union} declaration (@pxref{Union Decl, ,The Collection of Value Types}). So when
3579you store a token's value, you must use the proper member of the union.
3580If the @code{%union} declaration looks like this:
3581
3582@example
3583@group
3584%union @{
3585 int intval;
3586 double val;
3587 symrec *tptr;
3588@}
3589@end group
3590@end example
3591
3592@noindent
3593then the code in @code{yylex} might look like this:
3594
3595@example
3596@group
3597 @dots{}
3598 yylval.intval = value; /* Put value onto Bison stack. */
3599 return INT; /* Return the type of the token. */
3600 @dots{}
3601@end group
3602@end example
3603
342b8b6e 3604@node Token Positions
bfa74976
RS
3605@subsection Textual Positions of Tokens
3606
3607@vindex yylloc
847bf1f5
AD
3608If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
3609Tracking Locations}) in actions to keep track of the
89cab50d
AD
3610textual locations of tokens and groupings, then you must provide this
3611information in @code{yylex}. The function @code{yyparse} expects to
3612find the textual location of a token just parsed in the global variable
3613@code{yylloc}. So @code{yylex} must store the proper data in that
847bf1f5
AD
3614variable.
3615
3616By default, the value of @code{yylloc} is a structure and you need only
89cab50d
AD
3617initialize the members that are going to be used by the actions. The
3618four members are called @code{first_line}, @code{first_column},
3619@code{last_line} and @code{last_column}. Note that the use of this
3620feature makes the parser noticeably slower.
bfa74976
RS
3621
3622@tindex YYLTYPE
3623The data type of @code{yylloc} has the name @code{YYLTYPE}.
3624
342b8b6e 3625@node Pure Calling
c656404a 3626@subsection Calling Conventions for Pure Parsers
bfa74976 3627
e425e872
RS
3628When you use the Bison declaration @code{%pure_parser} to request a
3629pure, reentrant parser, the global communication variables @code{yylval}
3630and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
3631Parser}.) In such parsers the two global variables are replaced by
3632pointers passed as arguments to @code{yylex}. You must declare them as
3633shown here, and pass the information back by storing it through those
3634pointers.
bfa74976
RS
3635
3636@example
13863333
AD
3637int
3638yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
bfa74976
RS
3639@{
3640 @dots{}
3641 *lvalp = value; /* Put value onto Bison stack. */
3642 return INT; /* Return the type of the token. */
3643 @dots{}
3644@}
3645@end example
3646
3647If the grammar file does not use the @samp{@@} constructs to refer to
3648textual positions, then the type @code{YYLTYPE} will not be defined. In
3649this case, omit the second argument; @code{yylex} will be called with
3650only one argument.
3651
c656404a 3652@vindex YYPARSE_PARAM
931c7513
RS
3653If you use a reentrant parser, you can optionally pass additional
3654parameter information to it in a reentrant way. To do so, define the
3655macro @code{YYPARSE_PARAM} as a variable name. This modifies the
3656@code{yyparse} function to accept one argument, of type @code{void *},
3657with that name.
e425e872
RS
3658
3659When you call @code{yyparse}, pass the address of an object, casting the
3660address to @code{void *}. The grammar actions can refer to the contents
3661of the object by casting the pointer value back to its proper type and
3662then dereferencing it. Here's an example. Write this in the parser:
3663
3664@example
3665%@{
3666struct parser_control
3667@{
3668 int nastiness;
3669 int randomness;
3670@};
3671
3672#define YYPARSE_PARAM parm
3673%@}
3674@end example
3675
3676@noindent
3677Then call the parser like this:
3678
3679@example
3680struct parser_control
3681@{
3682 int nastiness;
3683 int randomness;
3684@};
3685
3686@dots{}
3687
3688@{
3689 struct parser_control foo;
3690 @dots{} /* @r{Store proper data in @code{foo}.} */
3691 value = yyparse ((void *) &foo);
3692 @dots{}
3693@}
3694@end example
3695
3696@noindent
3697In the grammar actions, use expressions like this to refer to the data:
3698
3699@example
3700((struct parser_control *) parm)->randomness
3701@end example
3702
c656404a
RS
3703@vindex YYLEX_PARAM
3704If you wish to pass the additional parameter data to @code{yylex},
3705define the macro @code{YYLEX_PARAM} just like @code{YYPARSE_PARAM}, as
3706shown here:
3707
3708@example
3709%@{
3710struct parser_control
3711@{
3712 int nastiness;
3713 int randomness;
3714@};
3715
3716#define YYPARSE_PARAM parm
3717#define YYLEX_PARAM parm
3718%@}
3719@end example
3720
3721You should then define @code{yylex} to accept one additional
3722argument---the value of @code{parm}. (This makes either two or three
3723arguments in total, depending on whether an argument of type
3724@code{YYLTYPE} is passed.) You can declare the argument as a pointer to
3725the proper object type, or you can declare it as @code{void *} and
3726access the contents as shown above.
3727
931c7513
RS
3728You can use @samp{%pure_parser} to request a reentrant parser without
3729also using @code{YYPARSE_PARAM}. Then you should call @code{yyparse}
3730with no arguments, as usual.
3731
342b8b6e 3732@node Error Reporting
bfa74976
RS
3733@section The Error Reporting Function @code{yyerror}
3734@cindex error reporting function
3735@findex yyerror
3736@cindex parse error
3737@cindex syntax error
3738
3739The Bison parser detects a @dfn{parse error} or @dfn{syntax error}
9ecbd125 3740whenever it reads a token which cannot satisfy any syntax rule. An
bfa74976 3741action in the grammar can also explicitly proclaim an error, using the
ceed8467
AD
3742macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use
3743in Actions}).
bfa74976
RS
3744
3745The Bison parser expects to report the error by calling an error
3746reporting function named @code{yyerror}, which you must supply. It is
3747called by @code{yyparse} whenever a syntax error is found, and it
3748receives one argument. For a parse error, the string is normally
3749@w{@code{"parse error"}}.
3750
3751@findex YYERROR_VERBOSE
3752If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations
ceed8467
AD
3753section (@pxref{Bison Declarations, ,The Bison Declarations Section}),
3754then Bison provides a more verbose and specific error message string
3755instead of just plain @w{@code{"parse error"}}. It doesn't matter what
3756definition you use for @code{YYERROR_VERBOSE}, just whether you define
3757it.
bfa74976
RS
3758
3759The parser can detect one other kind of error: stack overflow. This
3760happens when the input contains constructions that are very deeply
3761nested. It isn't likely you will encounter this, since the Bison
3762parser extends its stack automatically up to a very large limit. But
3763if overflow happens, @code{yyparse} calls @code{yyerror} in the usual
3764fashion, except that the argument string is @w{@code{"parser stack
3765overflow"}}.
3766
3767The following definition suffices in simple programs:
3768
3769@example
3770@group
13863333
AD
3771void
3772yyerror (char *s)
bfa74976
RS
3773@{
3774@end group
3775@group
3776 fprintf (stderr, "%s\n", s);
3777@}
3778@end group
3779@end example
3780
3781After @code{yyerror} returns to @code{yyparse}, the latter will attempt
3782error recovery if you have written suitable error recovery grammar rules
3783(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
3784immediately return 1.
3785
3786@vindex yynerrs
3787The variable @code{yynerrs} contains the number of syntax errors
3788encountered so far. Normally this variable is global; but if you
3789request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) then it is a local variable
3790which only the actions can access.
3791
342b8b6e 3792@node Action Features
bfa74976
RS
3793@section Special Features for Use in Actions
3794@cindex summary, action features
3795@cindex action features summary
3796
3797Here is a table of Bison constructs, variables and macros that
3798are useful in actions.
3799
3800@table @samp
3801@item $$
3802Acts like a variable that contains the semantic value for the
3803grouping made by the current rule. @xref{Actions}.
3804
3805@item $@var{n}
3806Acts like a variable that contains the semantic value for the
3807@var{n}th component of the current rule. @xref{Actions}.
3808
3809@item $<@var{typealt}>$
3810Like @code{$$} but specifies alternative @var{typealt} in the union
3811specified by the @code{%union} declaration. @xref{Action Types, ,Data Types of Values in Actions}.
3812
3813@item $<@var{typealt}>@var{n}
3814Like @code{$@var{n}} but specifies alternative @var{typealt} in the
13863333 3815union specified by the @code{%union} declaration.
bfa74976
RS
3816@xref{Action Types, ,Data Types of Values in Actions}.@refill
3817
3818@item YYABORT;
3819Return immediately from @code{yyparse}, indicating failure.
3820@xref{Parser Function, ,The Parser Function @code{yyparse}}.
3821
3822@item YYACCEPT;
3823Return immediately from @code{yyparse}, indicating success.
3824@xref{Parser Function, ,The Parser Function @code{yyparse}}.
3825
3826@item YYBACKUP (@var{token}, @var{value});
3827@findex YYBACKUP
3828Unshift a token. This macro is allowed only for rules that reduce
3829a single value, and only when there is no look-ahead token.
3830It installs a look-ahead token with token type @var{token} and
3831semantic value @var{value}; then it discards the value that was
3832going to be reduced by this rule.
3833
3834If the macro is used when it is not valid, such as when there is
3835a look-ahead token already, then it reports a syntax error with
3836a message @samp{cannot back up} and performs ordinary error
3837recovery.
3838
3839In either case, the rest of the action is not executed.
3840
3841@item YYEMPTY
3842@vindex YYEMPTY
3843Value stored in @code{yychar} when there is no look-ahead token.
3844
3845@item YYERROR;
3846@findex YYERROR
3847Cause an immediate syntax error. This statement initiates error
3848recovery just as if the parser itself had detected an error; however, it
3849does not call @code{yyerror}, and does not print any message. If you
3850want to print an error message, call @code{yyerror} explicitly before
3851the @samp{YYERROR;} statement. @xref{Error Recovery}.
3852
3853@item YYRECOVERING
3854This macro stands for an expression that has the value 1 when the parser
3855is recovering from a syntax error, and 0 the rest of the time.
3856@xref{Error Recovery}.
3857
3858@item yychar
3859Variable containing the current look-ahead token. (In a pure parser,
3860this is actually a local variable within @code{yyparse}.) When there is
3861no look-ahead token, the value @code{YYEMPTY} is stored in the variable.
3862@xref{Look-Ahead, ,Look-Ahead Tokens}.
3863
3864@item yyclearin;
3865Discard the current look-ahead token. This is useful primarily in
3866error rules. @xref{Error Recovery}.
3867
3868@item yyerrok;
3869Resume generating error messages immediately for subsequent syntax
13863333 3870errors. This is useful primarily in error rules.
bfa74976
RS
3871@xref{Error Recovery}.
3872
847bf1f5
AD
3873@item @@$
3874@findex @@$
3875Acts like a structure variable containing information on the textual position
3876of the grouping made by the current rule. @xref{Locations, ,
3877Tracking Locations}.
bfa74976 3878
847bf1f5
AD
3879@c Check if those paragraphs are still useful or not.
3880
3881@c @example
3882@c struct @{
3883@c int first_line, last_line;
3884@c int first_column, last_column;
3885@c @};
3886@c @end example
3887
3888@c Thus, to get the starting line number of the third component, you would
3889@c use @samp{@@3.first_line}.
bfa74976 3890
847bf1f5
AD
3891@c In order for the members of this structure to contain valid information,
3892@c you must make @code{yylex} supply this information about each token.
3893@c If you need only certain members, then @code{yylex} need only fill in
3894@c those members.
bfa74976 3895
847bf1f5
AD
3896@c The use of this feature makes the parser noticeably slower.
3897
3898@item @@@var{n}
3899@findex @@@var{n}
3900Acts like a structure variable containing information on the textual position
3901of the @var{n}th component of the current rule. @xref{Locations, ,
3902Tracking Locations}.
bfa74976 3903
bfa74976
RS
3904@end table
3905
342b8b6e 3906@node Algorithm
13863333
AD
3907@chapter The Bison Parser Algorithm
3908@cindex Bison parser algorithm
bfa74976
RS
3909@cindex algorithm of parser
3910@cindex shifting
3911@cindex reduction
3912@cindex parser stack
3913@cindex stack, parser
3914
3915As Bison reads tokens, it pushes them onto a stack along with their
3916semantic values. The stack is called the @dfn{parser stack}. Pushing a
3917token is traditionally called @dfn{shifting}.
3918
3919For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
3920@samp{3} to come. The stack will have four elements, one for each token
3921that was shifted.
3922
3923But the stack does not always have an element for each token read. When
3924the last @var{n} tokens and groupings shifted match the components of a
3925grammar rule, they can be combined according to that rule. This is called
3926@dfn{reduction}. Those tokens and groupings are replaced on the stack by a
3927single grouping whose symbol is the result (left hand side) of that rule.
3928Running the rule's action is part of the process of reduction, because this
3929is what computes the semantic value of the resulting grouping.
3930
3931For example, if the infix calculator's parser stack contains this:
3932
3933@example
39341 + 5 * 3
3935@end example
3936
3937@noindent
3938and the next input token is a newline character, then the last three
3939elements can be reduced to 15 via the rule:
3940
3941@example
3942expr: expr '*' expr;
3943@end example
3944
3945@noindent
3946Then the stack contains just these three elements:
3947
3948@example
39491 + 15
3950@end example
3951
3952@noindent
3953At this point, another reduction can be made, resulting in the single value
395416. Then the newline token can be shifted.
3955
3956The parser tries, by shifts and reductions, to reduce the entire input down
3957to a single grouping whose symbol is the grammar's start-symbol
3958(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
3959
3960This kind of parser is known in the literature as a bottom-up parser.
3961
3962@menu
3963* Look-Ahead:: Parser looks one token ahead when deciding what to do.
3964* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
3965* Precedence:: Operator precedence works by resolving conflicts.
3966* Contextual Precedence:: When an operator's precedence depends on context.
3967* Parser States:: The parser is a finite-state-machine with stack.
3968* Reduce/Reduce:: When two rules are applicable in the same situation.
3969* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
3970* Stack Overflow:: What happens when stack gets full. How to avoid it.
3971@end menu
3972
342b8b6e 3973@node Look-Ahead
bfa74976
RS
3974@section Look-Ahead Tokens
3975@cindex look-ahead token
3976
3977The Bison parser does @emph{not} always reduce immediately as soon as the
3978last @var{n} tokens and groupings match a rule. This is because such a
3979simple strategy is inadequate to handle most languages. Instead, when a
3980reduction is possible, the parser sometimes ``looks ahead'' at the next
3981token in order to decide what to do.
3982
3983When a token is read, it is not immediately shifted; first it becomes the
3984@dfn{look-ahead token}, which is not on the stack. Now the parser can
3985perform one or more reductions of tokens and groupings on the stack, while
3986the look-ahead token remains off to the side. When no more reductions
3987should take place, the look-ahead token is shifted onto the stack. This
3988does not mean that all possible reductions have been done; depending on the
3989token type of the look-ahead token, some rules may choose to delay their
3990application.
3991
3992Here is a simple case where look-ahead is needed. These three rules define
3993expressions which contain binary addition operators and postfix unary
3994factorial operators (@samp{!}), and allow parentheses for grouping.
3995
3996@example
3997@group
3998expr: term '+' expr
3999 | term
4000 ;
4001@end group
4002
4003@group
4004term: '(' expr ')'
4005 | term '!'
4006 | NUMBER
4007 ;
4008@end group
4009@end example
4010
4011Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
4012should be done? If the following token is @samp{)}, then the first three
4013tokens must be reduced to form an @code{expr}. This is the only valid
4014course, because shifting the @samp{)} would produce a sequence of symbols
4015@w{@code{term ')'}}, and no rule allows this.
4016
4017If the following token is @samp{!}, then it must be shifted immediately so
4018that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
4019parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
4020@code{expr}. It would then be impossible to shift the @samp{!} because
4021doing so would produce on the stack the sequence of symbols @code{expr
4022'!'}. No rule allows that sequence.
4023
4024@vindex yychar
4025The current look-ahead token is stored in the variable @code{yychar}.
4026@xref{Action Features, ,Special Features for Use in Actions}.
4027
342b8b6e 4028@node Shift/Reduce
bfa74976
RS
4029@section Shift/Reduce Conflicts
4030@cindex conflicts
4031@cindex shift/reduce conflicts
4032@cindex dangling @code{else}
4033@cindex @code{else}, dangling
4034
4035Suppose we are parsing a language which has if-then and if-then-else
4036statements, with a pair of rules like this:
4037
4038@example
4039@group
4040if_stmt:
4041 IF expr THEN stmt
4042 | IF expr THEN stmt ELSE stmt
4043 ;
4044@end group
4045@end example
4046
4047@noindent
4048Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
4049terminal symbols for specific keyword tokens.
4050
4051When the @code{ELSE} token is read and becomes the look-ahead token, the
4052contents of the stack (assuming the input is valid) are just right for
4053reduction by the first rule. But it is also legitimate to shift the
4054@code{ELSE}, because that would lead to eventual reduction by the second
4055rule.
4056
4057This situation, where either a shift or a reduction would be valid, is
4058called a @dfn{shift/reduce conflict}. Bison is designed to resolve
4059these conflicts by choosing to shift, unless otherwise directed by
4060operator precedence declarations. To see the reason for this, let's
4061contrast it with the other alternative.
4062
4063Since the parser prefers to shift the @code{ELSE}, the result is to attach
4064the else-clause to the innermost if-statement, making these two inputs
4065equivalent:
4066
4067@example
4068if x then if y then win (); else lose;
4069
4070if x then do; if y then win (); else lose; end;
4071@end example
4072
4073But if the parser chose to reduce when possible rather than shift, the
4074result would be to attach the else-clause to the outermost if-statement,
4075making these two inputs equivalent:
4076
4077@example
4078if x then if y then win (); else lose;
4079
4080if x then do; if y then win (); end; else lose;
4081@end example
4082
4083The conflict exists because the grammar as written is ambiguous: either
4084parsing of the simple nested if-statement is legitimate. The established
4085convention is that these ambiguities are resolved by attaching the
4086else-clause to the innermost if-statement; this is what Bison accomplishes
4087by choosing to shift rather than reduce. (It would ideally be cleaner to
4088write an unambiguous grammar, but that is very hard to do in this case.)
4089This particular ambiguity was first encountered in the specifications of
4090Algol 60 and is called the ``dangling @code{else}'' ambiguity.
4091
4092To avoid warnings from Bison about predictable, legitimate shift/reduce
4093conflicts, use the @code{%expect @var{n}} declaration. There will be no
4094warning as long as the number of shift/reduce conflicts is exactly @var{n}.
4095@xref{Expect Decl, ,Suppressing Conflict Warnings}.
4096
4097The definition of @code{if_stmt} above is solely to blame for the
4098conflict, but the conflict does not actually appear without additional
4099rules. Here is a complete Bison input file that actually manifests the
4100conflict:
4101
4102@example
4103@group
4104%token IF THEN ELSE variable
4105%%
4106@end group
4107@group
4108stmt: expr
4109 | if_stmt
4110 ;
4111@end group
4112
4113@group
4114if_stmt:
4115 IF expr THEN stmt
4116 | IF expr THEN stmt ELSE stmt
4117 ;
4118@end group
4119
4120expr: variable
4121 ;
4122@end example
4123
342b8b6e 4124@node Precedence
bfa74976
RS
4125@section Operator Precedence
4126@cindex operator precedence
4127@cindex precedence of operators
4128
4129Another situation where shift/reduce conflicts appear is in arithmetic
4130expressions. Here shifting is not always the preferred resolution; the
4131Bison declarations for operator precedence allow you to specify when to
4132shift and when to reduce.
4133
4134@menu
4135* Why Precedence:: An example showing why precedence is needed.
4136* Using Precedence:: How to specify precedence in Bison grammars.
4137* Precedence Examples:: How these features are used in the previous example.
4138* How Precedence:: How they work.
4139@end menu
4140
342b8b6e 4141@node Why Precedence
bfa74976
RS
4142@subsection When Precedence is Needed
4143
4144Consider the following ambiguous grammar fragment (ambiguous because the
4145input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
4146
4147@example
4148@group
4149expr: expr '-' expr
4150 | expr '*' expr
4151 | expr '<' expr
4152 | '(' expr ')'
4153 @dots{}
4154 ;
4155@end group
4156@end example
4157
4158@noindent
4159Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
14ded682
AD
4160should it reduce them via the rule for the subtraction operator? It
4161depends on the next token. Of course, if the next token is @samp{)}, we
4162must reduce; shifting is invalid because no single rule can reduce the
4163token sequence @w{@samp{- 2 )}} or anything starting with that. But if
4164the next token is @samp{*} or @samp{<}, we have a choice: either
4165shifting or reduction would allow the parse to complete, but with
4166different results.
4167
4168To decide which one Bison should do, we must consider the results. If
4169the next operator token @var{op} is shifted, then it must be reduced
4170first in order to permit another opportunity to reduce the difference.
4171The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other
4172hand, if the subtraction is reduced before shifting @var{op}, the result
4173is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or
4174reduce should depend on the relative precedence of the operators
4175@samp{-} and @var{op}: @samp{*} should be shifted first, but not
4176@samp{<}.
bfa74976
RS
4177
4178@cindex associativity
4179What about input such as @w{@samp{1 - 2 - 5}}; should this be
14ded682
AD
4180@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most
4181operators we prefer the former, which is called @dfn{left association}.
4182The latter alternative, @dfn{right association}, is desirable for
4183assignment operators. The choice of left or right association is a
4184matter of whether the parser chooses to shift or reduce when the stack
4185contains @w{@samp{1 - 2}} and the look-ahead token is @samp{-}: shifting
4186makes right-associativity.
bfa74976 4187
342b8b6e 4188@node Using Precedence
bfa74976
RS
4189@subsection Specifying Operator Precedence
4190@findex %left
4191@findex %right
4192@findex %nonassoc
4193
4194Bison allows you to specify these choices with the operator precedence
4195declarations @code{%left} and @code{%right}. Each such declaration
4196contains a list of tokens, which are operators whose precedence and
4197associativity is being declared. The @code{%left} declaration makes all
4198those operators left-associative and the @code{%right} declaration makes
4199them right-associative. A third alternative is @code{%nonassoc}, which
4200declares that it is a syntax error to find the same operator twice ``in a
4201row''.
4202
4203The relative precedence of different operators is controlled by the
4204order in which they are declared. The first @code{%left} or
4205@code{%right} declaration in the file declares the operators whose
4206precedence is lowest, the next such declaration declares the operators
4207whose precedence is a little higher, and so on.
4208
342b8b6e 4209@node Precedence Examples
bfa74976
RS
4210@subsection Precedence Examples
4211
4212In our example, we would want the following declarations:
4213
4214@example
4215%left '<'
4216%left '-'
4217%left '*'
4218@end example
4219
4220In a more complete example, which supports other operators as well, we
4221would declare them in groups of equal precedence. For example, @code{'+'} is
4222declared with @code{'-'}:
4223
4224@example
4225%left '<' '>' '=' NE LE GE
4226%left '+' '-'
4227%left '*' '/'
4228@end example
4229
4230@noindent
4231(Here @code{NE} and so on stand for the operators for ``not equal''
4232and so on. We assume that these tokens are more than one character long
4233and therefore are represented by names, not character literals.)
4234
342b8b6e 4235@node How Precedence
bfa74976
RS
4236@subsection How Precedence Works
4237
4238The first effect of the precedence declarations is to assign precedence
4239levels to the terminal symbols declared. The second effect is to assign
4240precedence levels to certain rules: each rule gets its precedence from the
4241last terminal symbol mentioned in the components. (You can also specify
4242explicitly the precedence of a rule. @xref{Contextual Precedence, ,Context-Dependent Precedence}.)
4243
4244Finally, the resolution of conflicts works by comparing the
4245precedence of the rule being considered with that of the
4246look-ahead token. If the token's precedence is higher, the
4247choice is to shift. If the rule's precedence is higher, the
4248choice is to reduce. If they have equal precedence, the choice
4249is made based on the associativity of that precedence level. The
4250verbose output file made by @samp{-v} (@pxref{Invocation, ,Invoking Bison}) says
4251how each conflict was resolved.
4252
4253Not all rules and not all tokens have precedence. If either the rule or
4254the look-ahead token has no precedence, then the default is to shift.
4255
342b8b6e 4256@node Contextual Precedence
bfa74976
RS
4257@section Context-Dependent Precedence
4258@cindex context-dependent precedence
4259@cindex unary operator precedence
4260@cindex precedence, context-dependent
4261@cindex precedence, unary operator
4262@findex %prec
4263
4264Often the precedence of an operator depends on the context. This sounds
4265outlandish at first, but it is really very common. For example, a minus
4266sign typically has a very high precedence as a unary operator, and a
4267somewhat lower precedence (lower than multiplication) as a binary operator.
4268
4269The Bison precedence declarations, @code{%left}, @code{%right} and
4270@code{%nonassoc}, can only be used once for a given token; so a token has
4271only one precedence declared in this way. For context-dependent
4272precedence, you need to use an additional mechanism: the @code{%prec}
4273modifier for rules.@refill
4274
4275The @code{%prec} modifier declares the precedence of a particular rule by
4276specifying a terminal symbol whose precedence should be used for that rule.
4277It's not necessary for that symbol to appear otherwise in the rule. The
4278modifier's syntax is:
4279
4280@example
4281%prec @var{terminal-symbol}
4282@end example
4283
4284@noindent
4285and it is written after the components of the rule. Its effect is to
4286assign the rule the precedence of @var{terminal-symbol}, overriding
4287the precedence that would be deduced for it in the ordinary way. The
4288altered rule precedence then affects how conflicts involving that rule
4289are resolved (@pxref{Precedence, ,Operator Precedence}).
4290
4291Here is how @code{%prec} solves the problem of unary minus. First, declare
4292a precedence for a fictitious terminal symbol named @code{UMINUS}. There
4293are no tokens of this type, but the symbol serves to stand for its
4294precedence:
4295
4296@example
4297@dots{}
4298%left '+' '-'
4299%left '*'
4300%left UMINUS
4301@end example
4302
4303Now the precedence of @code{UMINUS} can be used in specific rules:
4304
4305@example
4306@group
4307exp: @dots{}
4308 | exp '-' exp
4309 @dots{}
4310 | '-' exp %prec UMINUS
4311@end group
4312@end example
4313
342b8b6e 4314@node Parser States
bfa74976
RS
4315@section Parser States
4316@cindex finite-state machine
4317@cindex parser state
4318@cindex state (of parser)
4319
4320The function @code{yyparse} is implemented using a finite-state machine.
4321The values pushed on the parser stack are not simply token type codes; they
4322represent the entire sequence of terminal and nonterminal symbols at or
4323near the top of the stack. The current state collects all the information
4324about previous input which is relevant to deciding what to do next.
4325
4326Each time a look-ahead token is read, the current parser state together
4327with the type of look-ahead token are looked up in a table. This table
4328entry can say, ``Shift the look-ahead token.'' In this case, it also
4329specifies the new parser state, which is pushed onto the top of the
4330parser stack. Or it can say, ``Reduce using rule number @var{n}.''
4331This means that a certain number of tokens or groupings are taken off
4332the top of the stack, and replaced by one grouping. In other words,
4333that number of states are popped from the stack, and one new state is
4334pushed.
4335
4336There is one other alternative: the table can say that the look-ahead token
4337is erroneous in the current state. This causes error processing to begin
4338(@pxref{Error Recovery}).
4339
342b8b6e 4340@node Reduce/Reduce
bfa74976
RS
4341@section Reduce/Reduce Conflicts
4342@cindex reduce/reduce conflict
4343@cindex conflicts, reduce/reduce
4344
4345A reduce/reduce conflict occurs if there are two or more rules that apply
4346to the same sequence of input. This usually indicates a serious error
4347in the grammar.
4348
4349For example, here is an erroneous attempt to define a sequence
4350of zero or more @code{word} groupings.
4351
4352@example
4353sequence: /* empty */
4354 @{ printf ("empty sequence\n"); @}
4355 | maybeword
4356 | sequence word
4357 @{ printf ("added word %s\n", $2); @}
4358 ;
4359
4360maybeword: /* empty */
4361 @{ printf ("empty maybeword\n"); @}
4362 | word
4363 @{ printf ("single word %s\n", $1); @}
4364 ;
4365@end example
4366
4367@noindent
4368The error is an ambiguity: there is more than one way to parse a single
4369@code{word} into a @code{sequence}. It could be reduced to a
4370@code{maybeword} and then into a @code{sequence} via the second rule.
4371Alternatively, nothing-at-all could be reduced into a @code{sequence}
4372via the first rule, and this could be combined with the @code{word}
4373using the third rule for @code{sequence}.
4374
4375There is also more than one way to reduce nothing-at-all into a
4376@code{sequence}. This can be done directly via the first rule,
4377or indirectly via @code{maybeword} and then the second rule.
4378
4379You might think that this is a distinction without a difference, because it
4380does not change whether any particular input is valid or not. But it does
4381affect which actions are run. One parsing order runs the second rule's
4382action; the other runs the first rule's action and the third rule's action.
4383In this example, the output of the program changes.
4384
4385Bison resolves a reduce/reduce conflict by choosing to use the rule that
4386appears first in the grammar, but it is very risky to rely on this. Every
4387reduce/reduce conflict must be studied and usually eliminated. Here is the
4388proper way to define @code{sequence}:
4389
4390@example
4391sequence: /* empty */
4392 @{ printf ("empty sequence\n"); @}
4393 | sequence word
4394 @{ printf ("added word %s\n", $2); @}
4395 ;
4396@end example
4397
4398Here is another common error that yields a reduce/reduce conflict:
4399
4400@example
4401sequence: /* empty */
4402 | sequence words
4403 | sequence redirects
4404 ;
4405
4406words: /* empty */
4407 | words word
4408 ;
4409
4410redirects:/* empty */
4411 | redirects redirect
4412 ;
4413@end example
4414
4415@noindent
4416The intention here is to define a sequence which can contain either
4417@code{word} or @code{redirect} groupings. The individual definitions of
4418@code{sequence}, @code{words} and @code{redirects} are error-free, but the
4419three together make a subtle ambiguity: even an empty input can be parsed
4420in infinitely many ways!
4421
4422Consider: nothing-at-all could be a @code{words}. Or it could be two
4423@code{words} in a row, or three, or any number. It could equally well be a
4424@code{redirects}, or two, or any number. Or it could be a @code{words}
4425followed by three @code{redirects} and another @code{words}. And so on.
4426
4427Here are two ways to correct these rules. First, to make it a single level
4428of sequence:
4429
4430@example
4431sequence: /* empty */
4432 | sequence word
4433 | sequence redirect
4434 ;
4435@end example
4436
4437Second, to prevent either a @code{words} or a @code{redirects}
4438from being empty:
4439
4440@example
4441sequence: /* empty */
4442 | sequence words
4443 | sequence redirects
4444 ;
4445
4446words: word
4447 | words word
4448 ;
4449
4450redirects:redirect
4451 | redirects redirect
4452 ;
4453@end example
4454
342b8b6e 4455@node Mystery Conflicts
bfa74976
RS
4456@section Mysterious Reduce/Reduce Conflicts
4457
4458Sometimes reduce/reduce conflicts can occur that don't look warranted.
4459Here is an example:
4460
4461@example
4462@group
4463%token ID
4464
4465%%
4466def: param_spec return_spec ','
4467 ;
4468param_spec:
4469 type
4470 | name_list ':' type
4471 ;
4472@end group
4473@group
4474return_spec:
4475 type
4476 | name ':' type
4477 ;
4478@end group
4479@group
4480type: ID
4481 ;
4482@end group
4483@group
4484name: ID
4485 ;
4486name_list:
4487 name
4488 | name ',' name_list
4489 ;
4490@end group
4491@end example
4492
4493It would seem that this grammar can be parsed with only a single token
13863333 4494of look-ahead: when a @code{param_spec} is being read, an @code{ID} is
bfa74976
RS
4495a @code{name} if a comma or colon follows, or a @code{type} if another
4496@code{ID} follows. In other words, this grammar is LR(1).
4497
4498@cindex LR(1)
4499@cindex LALR(1)
4500However, Bison, like most parser generators, cannot actually handle all
4501LR(1) grammars. In this grammar, two contexts, that after an @code{ID}
4502at the beginning of a @code{param_spec} and likewise at the beginning of
4503a @code{return_spec}, are similar enough that Bison assumes they are the
4504same. They appear similar because the same set of rules would be
4505active---the rule for reducing to a @code{name} and that for reducing to
4506a @code{type}. Bison is unable to determine at that stage of processing
4507that the rules would require different look-ahead tokens in the two
4508contexts, so it makes a single parser state for them both. Combining
4509the two contexts causes a conflict later. In parser terminology, this
4510occurrence means that the grammar is not LALR(1).
4511
4512In general, it is better to fix deficiencies than to document them. But
4513this particular deficiency is intrinsically hard to fix; parser
4514generators that can handle LR(1) grammars are hard to write and tend to
4515produce parsers that are very large. In practice, Bison is more useful
4516as it is now.
4517
4518When the problem arises, you can often fix it by identifying the two
a220f555
MA
4519parser states that are being confused, and adding something to make them
4520look distinct. In the above example, adding one rule to
bfa74976
RS
4521@code{return_spec} as follows makes the problem go away:
4522
4523@example
4524@group
4525%token BOGUS
4526@dots{}
4527%%
4528@dots{}
4529return_spec:
4530 type
4531 | name ':' type
4532 /* This rule is never used. */
4533 | ID BOGUS
4534 ;
4535@end group
4536@end example
4537
4538This corrects the problem because it introduces the possibility of an
4539additional active rule in the context after the @code{ID} at the beginning of
4540@code{return_spec}. This rule is not active in the corresponding context
4541in a @code{param_spec}, so the two contexts receive distinct parser states.
4542As long as the token @code{BOGUS} is never generated by @code{yylex},
4543the added rule cannot alter the way actual input is parsed.
4544
4545In this particular example, there is another way to solve the problem:
4546rewrite the rule for @code{return_spec} to use @code{ID} directly
4547instead of via @code{name}. This also causes the two confusing
4548contexts to have different sets of active rules, because the one for
4549@code{return_spec} activates the altered rule for @code{return_spec}
4550rather than the one for @code{name}.
4551
4552@example
4553param_spec:
4554 type
4555 | name_list ':' type
4556 ;
4557return_spec:
4558 type
4559 | ID ':' type
4560 ;
4561@end example
4562
342b8b6e 4563@node Stack Overflow
bfa74976
RS
4564@section Stack Overflow, and How to Avoid It
4565@cindex stack overflow
4566@cindex parser stack overflow
4567@cindex overflow of parser stack
4568
4569The Bison parser stack can overflow if too many tokens are shifted and
4570not reduced. When this happens, the parser function @code{yyparse}
4571returns a nonzero value, pausing only to call @code{yyerror} to report
4572the overflow.
4573
4574@vindex YYMAXDEPTH
4575By defining the macro @code{YYMAXDEPTH}, you can control how deep the
4576parser stack can become before a stack overflow occurs. Define the
4577macro with a value that is an integer. This value is the maximum number
4578of tokens that can be shifted (and not reduced) before overflow.
4579It must be a constant expression whose value is known at compile time.
4580
4581The stack space allowed is not necessarily allocated. If you specify a
4582large value for @code{YYMAXDEPTH}, the parser actually allocates a small
4583stack at first, and then makes it bigger by stages as needed. This
4584increasing allocation happens automatically and silently. Therefore,
4585you do not need to make @code{YYMAXDEPTH} painfully small merely to save
4586space for ordinary inputs that do not need much stack.
4587
4588@cindex default stack limit
4589The default value of @code{YYMAXDEPTH}, if you do not define it, is
459010000.
4591
4592@vindex YYINITDEPTH
4593You can control how much stack is allocated initially by defining the
4594macro @code{YYINITDEPTH}. This value too must be a compile-time
4595constant integer. The default is 200.
4596
342b8b6e 4597@node Error Recovery
bfa74976
RS
4598@chapter Error Recovery
4599@cindex error recovery
4600@cindex recovery from errors
4601
4602It is not usually acceptable to have a program terminate on a parse
4603error. For example, a compiler should recover sufficiently to parse the
4604rest of the input file and check it for errors; a calculator should accept
4605another expression.
4606
4607In a simple interactive command parser where each input is one line, it may
4608be sufficient to allow @code{yyparse} to return 1 on error and have the
4609caller ignore the rest of the input line when that happens (and then call
4610@code{yyparse} again). But this is inadequate for a compiler, because it
4611forgets all the syntactic context leading up to the error. A syntax error
4612deep within a function in the compiler input should not cause the compiler
4613to treat the following line like the beginning of a source file.
4614
4615@findex error
4616You can define how to recover from a syntax error by writing rules to
4617recognize the special token @code{error}. This is a terminal symbol that
4618is always defined (you need not declare it) and reserved for error
4619handling. The Bison parser generates an @code{error} token whenever a
4620syntax error happens; if you have provided a rule to recognize this token
13863333 4621in the current context, the parse can continue.
bfa74976
RS
4622
4623For example:
4624
4625@example
4626stmnts: /* empty string */
4627 | stmnts '\n'
4628 | stmnts exp '\n'
4629 | stmnts error '\n'
4630@end example
4631
4632The fourth rule in this example says that an error followed by a newline
4633makes a valid addition to any @code{stmnts}.
4634
4635What happens if a syntax error occurs in the middle of an @code{exp}? The
4636error recovery rule, interpreted strictly, applies to the precise sequence
4637of a @code{stmnts}, an @code{error} and a newline. If an error occurs in
4638the middle of an @code{exp}, there will probably be some additional tokens
4639and subexpressions on the stack after the last @code{stmnts}, and there
4640will be tokens to read before the next newline. So the rule is not
4641applicable in the ordinary way.
4642
4643But Bison can force the situation to fit the rule, by discarding part of
4644the semantic context and part of the input. First it discards states and
4645objects from the stack until it gets back to a state in which the
4646@code{error} token is acceptable. (This means that the subexpressions
4647already parsed are discarded, back to the last complete @code{stmnts}.) At
4648this point the @code{error} token can be shifted. Then, if the old
4649look-ahead token is not acceptable to be shifted next, the parser reads
4650tokens and discards them until it finds a token which is acceptable. In
4651this example, Bison reads and discards input until the next newline
4652so that the fourth rule can apply.
4653
4654The choice of error rules in the grammar is a choice of strategies for
4655error recovery. A simple and useful strategy is simply to skip the rest of
4656the current input line or current statement if an error is detected:
4657
4658@example
4659stmnt: error ';' /* on error, skip until ';' is read */
4660@end example
4661
4662It is also useful to recover to the matching close-delimiter of an
4663opening-delimiter that has already been parsed. Otherwise the
4664close-delimiter will probably appear to be unmatched, and generate another,
4665spurious error message:
4666
4667@example
4668primary: '(' expr ')'
4669 | '(' error ')'
4670 @dots{}
4671 ;
4672@end example
4673
4674Error recovery strategies are necessarily guesses. When they guess wrong,
4675one syntax error often leads to another. In the above example, the error
4676recovery rule guesses that an error is due to bad input within one
4677@code{stmnt}. Suppose that instead a spurious semicolon is inserted in the
4678middle of a valid @code{stmnt}. After the error recovery rule recovers
4679from the first error, another syntax error will be found straightaway,
4680since the text following the spurious semicolon is also an invalid
4681@code{stmnt}.
4682
4683To prevent an outpouring of error messages, the parser will output no error
4684message for another syntax error that happens shortly after the first; only
4685after three consecutive input tokens have been successfully shifted will
4686error messages resume.
4687
4688Note that rules which accept the @code{error} token may have actions, just
4689as any other rules can.
4690
4691@findex yyerrok
4692You can make error messages resume immediately by using the macro
4693@code{yyerrok} in an action. If you do this in the error rule's action, no
4694error messages will be suppressed. This macro requires no arguments;
4695@samp{yyerrok;} is a valid C statement.
4696
4697@findex yyclearin
4698The previous look-ahead token is reanalyzed immediately after an error. If
4699this is unacceptable, then the macro @code{yyclearin} may be used to clear
4700this token. Write the statement @samp{yyclearin;} in the error rule's
4701action.
4702
4703For example, suppose that on a parse error, an error handling routine is
4704called that advances the input stream to some point where parsing should
4705once again commence. The next symbol returned by the lexical scanner is
4706probably correct. The previous look-ahead token ought to be discarded
4707with @samp{yyclearin;}.
4708
4709@vindex YYRECOVERING
4710The macro @code{YYRECOVERING} stands for an expression that has the
4711value 1 when the parser is recovering from a syntax error, and 0 the
4712rest of the time. A value of 1 indicates that error messages are
4713currently suppressed for new syntax errors.
4714
342b8b6e 4715@node Context Dependency
bfa74976
RS
4716@chapter Handling Context Dependencies
4717
4718The Bison paradigm is to parse tokens first, then group them into larger
4719syntactic units. In many languages, the meaning of a token is affected by
4720its context. Although this violates the Bison paradigm, certain techniques
4721(known as @dfn{kludges}) may enable you to write Bison parsers for such
4722languages.
4723
4724@menu
4725* Semantic Tokens:: Token parsing can depend on the semantic context.
4726* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
4727* Tie-in Recovery:: Lexical tie-ins have implications for how
4728 error recovery rules must be written.
4729@end menu
4730
4731(Actually, ``kludge'' means any technique that gets its job done but is
4732neither clean nor robust.)
4733
342b8b6e 4734@node Semantic Tokens
bfa74976
RS
4735@section Semantic Info in Token Types
4736
4737The C language has a context dependency: the way an identifier is used
4738depends on what its current meaning is. For example, consider this:
4739
4740@example
4741foo (x);
4742@end example
4743
4744This looks like a function call statement, but if @code{foo} is a typedef
4745name, then this is actually a declaration of @code{x}. How can a Bison
4746parser for C decide how to parse this input?
4747
4748The method used in GNU C is to have two different token types,
4749@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
4750identifier, it looks up the current declaration of the identifier in order
4751to decide which token type to return: @code{TYPENAME} if the identifier is
4752declared as a typedef, @code{IDENTIFIER} otherwise.
4753
4754The grammar rules can then express the context dependency by the choice of
4755token type to recognize. @code{IDENTIFIER} is accepted as an expression,
4756but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
4757@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
4758is @emph{not} significant, such as in declarations that can shadow a
4759typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
4760accepted---there is one rule for each of the two token types.
4761
4762This technique is simple to use if the decision of which kinds of
4763identifiers to allow is made at a place close to where the identifier is
4764parsed. But in C this is not always so: C allows a declaration to
4765redeclare a typedef name provided an explicit type has been specified
4766earlier:
4767
4768@example
4769typedef int foo, bar, lose;
4770static foo (bar); /* @r{redeclare @code{bar} as static variable} */
4771static int foo (lose); /* @r{redeclare @code{foo} as function} */
4772@end example
4773
4774Unfortunately, the name being declared is separated from the declaration
4775construct itself by a complicated syntactic structure---the ``declarator''.
4776
9ecbd125 4777As a result, part of the Bison parser for C needs to be duplicated, with
14ded682
AD
4778all the nonterminal names changed: once for parsing a declaration in
4779which a typedef name can be redefined, and once for parsing a
4780declaration in which that can't be done. Here is a part of the
4781duplication, with actions omitted for brevity:
bfa74976
RS
4782
4783@example
4784initdcl:
4785 declarator maybeasm '='
4786 init
4787 | declarator maybeasm
4788 ;
4789
4790notype_initdcl:
4791 notype_declarator maybeasm '='
4792 init
4793 | notype_declarator maybeasm
4794 ;
4795@end example
4796
4797@noindent
4798Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
4799cannot. The distinction between @code{declarator} and
4800@code{notype_declarator} is the same sort of thing.
4801
4802There is some similarity between this technique and a lexical tie-in
4803(described next), in that information which alters the lexical analysis is
4804changed during parsing by other parts of the program. The difference is
4805here the information is global, and is used for other purposes in the
4806program. A true lexical tie-in has a special-purpose flag controlled by
4807the syntactic context.
4808
342b8b6e 4809@node Lexical Tie-ins
bfa74976
RS
4810@section Lexical Tie-ins
4811@cindex lexical tie-in
4812
4813One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
4814which is set by Bison actions, whose purpose is to alter the way tokens are
4815parsed.
4816
4817For example, suppose we have a language vaguely like C, but with a special
4818construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
4819an expression in parentheses in which all integers are hexadecimal. In
4820particular, the token @samp{a1b} must be treated as an integer rather than
4821as an identifier if it appears in that context. Here is how you can do it:
4822
4823@example
4824@group
4825%@{
4826int hexflag;
4827%@}
4828%%
4829@dots{}
4830@end group
4831@group
4832expr: IDENTIFIER
4833 | constant
4834 | HEX '('
4835 @{ hexflag = 1; @}
4836 expr ')'
4837 @{ hexflag = 0;
4838 $$ = $4; @}
4839 | expr '+' expr
4840 @{ $$ = make_sum ($1, $3); @}
4841 @dots{}
4842 ;
4843@end group
4844
4845@group
4846constant:
4847 INTEGER
4848 | STRING
4849 ;
4850@end group
4851@end example
4852
4853@noindent
4854Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
4855it is nonzero, all integers are parsed in hexadecimal, and tokens starting
4856with letters are parsed as integers if possible.
4857
342b8b6e
AD
4858The declaration of @code{hexflag} shown in the prologue of the parser file
4859is needed to make it accessible to the actions (@pxref{Prologue, ,The Prologue}).
75f5aaea 4860You must also write the code in @code{yylex} to obey the flag.
bfa74976 4861
342b8b6e 4862@node Tie-in Recovery
bfa74976
RS
4863@section Lexical Tie-ins and Error Recovery
4864
4865Lexical tie-ins make strict demands on any error recovery rules you have.
4866@xref{Error Recovery}.
4867
4868The reason for this is that the purpose of an error recovery rule is to
4869abort the parsing of one construct and resume in some larger construct.
4870For example, in C-like languages, a typical error recovery rule is to skip
4871tokens until the next semicolon, and then start a new statement, like this:
4872
4873@example
4874stmt: expr ';'
4875 | IF '(' expr ')' stmt @{ @dots{} @}
4876 @dots{}
4877 error ';'
4878 @{ hexflag = 0; @}
4879 ;
4880@end example
4881
4882If there is a syntax error in the middle of a @samp{hex (@var{expr})}
4883construct, this error rule will apply, and then the action for the
4884completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
4885remain set for the entire rest of the input, or until the next @code{hex}
4886keyword, causing identifiers to be misinterpreted as integers.
4887
4888To avoid this problem the error recovery rule itself clears @code{hexflag}.
4889
4890There may also be an error recovery rule that works within expressions.
4891For example, there could be a rule which applies within parentheses
4892and skips to the close-parenthesis:
4893
4894@example
4895@group
4896expr: @dots{}
4897 | '(' expr ')'
4898 @{ $$ = $2; @}
4899 | '(' error ')'
4900 @dots{}
4901@end group
4902@end example
4903
4904If this rule acts within the @code{hex} construct, it is not going to abort
4905that construct (since it applies to an inner level of parentheses within
4906the construct). Therefore, it should not clear the flag: the rest of
4907the @code{hex} construct should be parsed with the flag still in effect.
4908
4909What if there is an error recovery rule which might abort out of the
4910@code{hex} construct or might not, depending on circumstances? There is no
4911way you can write the action to determine whether a @code{hex} construct is
4912being aborted or not. So if you are using a lexical tie-in, you had better
4913make sure your error recovery rules are not of this kind. Each rule must
4914be such that you can be sure that it always will, or always won't, have to
4915clear the flag.
4916
342b8b6e 4917@node Debugging
bfa74976
RS
4918@chapter Debugging Your Parser
4919@findex YYDEBUG
4920@findex yydebug
4921@cindex debugging
4922@cindex tracing the parser
4923
4924If a Bison grammar compiles properly but doesn't do what you want when it
4925runs, the @code{yydebug} parser-trace feature can help you figure out why.
4926
4927To enable compilation of trace facilities, you must define the macro
4947ebdb
PE
4928@code{YYDEBUG} to a nonzero value when you compile the parser. You
4929could use @samp{-DYYDEBUG=1} as a compiler option or you could put
4930@samp{#define YYDEBUG 1} in the prologue of the grammar file
4931(@pxref{Prologue, , The Prologue}). Alternatively, use the @samp{-t}
4932option when you run Bison (@pxref{Invocation, ,Invoking Bison}) or the
4933@code{%debug} declaration (@pxref{Decl Summary, ,Bison Declaration
4934Summary}). We suggest that you always define @code{YYDEBUG} so that
4935debugging is always possible.
bfa74976 4936
02a81e05
PE
4937The trace facility outputs messages with macro calls of the form
4938@code{YYFPRINTF (YYSTDERR, @var{format}, @var{args})} where
4939@var{format} and @var{args} are the usual @code{printf} format and
4947ebdb
PE
4940arguments. If you define @code{YYDEBUG} to a nonzero value but do not
4941define @code{YYFPRINTF}, @code{<stdio.h>} is automatically included
4942and the macros are defined to @code{fprintf} and @code{stderr}. In
4943the same situation, C++ parsers include @code{<cstdio.h>} instead, and
4944use @code{std::fprintf} and @code{std::stderr}.
bfa74976
RS
4945
4946Once you have compiled the program with trace facilities, the way to
4947request a trace is to store a nonzero value in the variable @code{yydebug}.
4948You can do this by making the C code do it (in @code{main}, perhaps), or
4949you can alter the value with a C debugger.
4950
4951Each step taken by the parser when @code{yydebug} is nonzero produces a
4952line or two of trace information, written on @code{stderr}. The trace
4953messages tell you these things:
4954
4955@itemize @bullet
4956@item
4957Each time the parser calls @code{yylex}, what kind of token was read.
4958
4959@item
4960Each time a token is shifted, the depth and complete contents of the
4961state stack (@pxref{Parser States}).
4962
4963@item
4964Each time a rule is reduced, which rule it is, and the complete contents
4965of the state stack afterward.
4966@end itemize
4967
4968To make sense of this information, it helps to refer to the listing file
4969produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking Bison}). This file
4970shows the meaning of each state in terms of positions in various rules, and
4971also what each state will do with each possible input token. As you read
4972the successive trace messages, you can see that the parser is functioning
4973according to its specification in the listing file. Eventually you will
4974arrive at the place where something undesirable happens, and you will see
4975which parts of the grammar are to blame.
4976
4977The parser file is a C program and you can use C debuggers on it, but it's
4978not easy to interpret what it is doing. The parser function is a
4979finite-state machine interpreter, and aside from the actions it executes
4980the same code over and over. Only the values of variables show where in
4981the grammar it is working.
4982
4983@findex YYPRINT
4984The debugging information normally gives the token type of each token
4985read, but not its semantic value. You can optionally define a macro
4986named @code{YYPRINT} to provide a way to print the value. If you define
4987@code{YYPRINT}, it should take three arguments. The parser will pass a
4988standard I/O stream, the numeric code for the token type, and the token
4989value (from @code{yylval}).
4990
4991Here is an example of @code{YYPRINT} suitable for the multi-function
4992calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}):
4993
4994@smallexample
4995#define YYPRINT(file, type, value) yyprint (file, type, value)
4996
4997static void
13863333 4998yyprint (FILE *file, int type, YYSTYPE value)
bfa74976
RS
4999@{
5000 if (type == VAR)
5001 fprintf (file, " %s", value.tptr->name);
5002 else if (type == NUM)
5003 fprintf (file, " %d", value.val);
5004@}
5005@end smallexample
5006
342b8b6e 5007@node Invocation
bfa74976
RS
5008@chapter Invoking Bison
5009@cindex invoking Bison
5010@cindex Bison invocation
5011@cindex options for invoking Bison
5012
5013The usual way to invoke Bison is as follows:
5014
5015@example
5016bison @var{infile}
5017@end example
5018
5019Here @var{infile} is the grammar file name, which usually ends in
5020@samp{.y}. The parser file's name is made by replacing the @samp{.y}
5021with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields
5022@file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields
02a81e05 5023@file{hack/foo.tab.c}. It's is also possible, in case you are writing
79282c6c 5024C++ code instead of C in your grammar file, to name it @file{foo.ypp}
234a3be3
AD
5025or @file{foo.y++}. Then, the output files will take an extention like
5026the given one as input (repectively @file{foo.tab.cpp} and @file{foo.tab.c++}).
5027This feature takes effect with all options that manipulate filenames like
5028@samp{-o} or @samp{-d}.
5029
5030For example :
5031
5032@example
5033bison -d @var{infile.yxx}
5034@end example
84163231 5035@noindent
234a3be3
AD
5036will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}. and
5037
5038@example
5039bison -d @var{infile.y} -o @var{output.c++}
5040@end example
84163231 5041@noindent
234a3be3
AD
5042will produce @file{output.c++} and @file{outfile.h++}.
5043
bfa74976
RS
5044
5045@menu
13863333 5046* Bison Options:: All the options described in detail,
bfa74976 5047 in alphabetical order by short options.
9ecbd125 5048* Environment Variables:: Variables which affect Bison execution.
bfa74976
RS
5049* Option Cross Key:: Alphabetical list of long options.
5050* VMS Invocation:: Bison command syntax on VMS.
5051@end menu
5052
342b8b6e 5053@node Bison Options
bfa74976
RS
5054@section Bison Options
5055
5056Bison supports both traditional single-letter options and mnemonic long
5057option names. Long option names are indicated with @samp{--} instead of
5058@samp{-}. Abbreviations for option names are allowed as long as they
5059are unique. When a long option takes an argument, like
5060@samp{--file-prefix}, connect the option name and the argument with
5061@samp{=}.
5062
5063Here is a list of options that can be used with Bison, alphabetized by
5064short option. It is followed by a cross key alphabetized by long
5065option.
5066
89cab50d
AD
5067@c Please, keep this ordered as in `bison --help'.
5068@noindent
5069Operations modes:
5070@table @option
5071@item -h
5072@itemx --help
5073Print a summary of the command-line options to Bison and exit.
bfa74976 5074
89cab50d
AD
5075@item -V
5076@itemx --version
5077Print the version number of Bison and exit.
bfa74976 5078
89cab50d
AD
5079@need 1750
5080@item -y
5081@itemx --yacc
5082@itemx --fixed-output-files
5083Equivalent to @samp{-o y.tab.c}; the parser output file is called
5084@file{y.tab.c}, and the other outputs are called @file{y.output} and
5085@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
5086file name conventions. Thus, the following shell script can substitute
5087for Yacc:@refill
bfa74976 5088
89cab50d
AD
5089@example
5090bison -y $*
5091@end example
5092@end table
5093
5094@noindent
5095Tuning the parser:
5096
5097@table @option
cd5bd6ac
AD
5098@item -S @var{file}
5099@itemx --skeleton=@var{file}
5100Specify the skeleton to use. You probably don't need this option unless
5101you are developing Bison.
5102
89cab50d
AD
5103@item -t
5104@itemx --debug
4947ebdb
PE
5105In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
5106already defined, so that the debugging facilities are compiled.
5107@xref{Debugging, ,Debugging Your Parser}.
89cab50d
AD
5108
5109@item --locations
d8988b2f 5110Pretend that @code{%locations} was specified. @xref{Decl Summary}.
89cab50d
AD
5111
5112@item -p @var{prefix}
5113@itemx --name-prefix=@var{prefix}
d8988b2f
AD
5114Pretend that @code{%name-prefix="@var{prefix}"} was specified.
5115@xref{Decl Summary}.
bfa74976
RS
5116
5117@item -l
5118@itemx --no-lines
5119Don't put any @code{#line} preprocessor commands in the parser file.
5120Ordinarily Bison puts them in the parser file so that the C compiler
5121and debuggers will associate errors with your source file, the
5122grammar file. This option causes them to associate errors with the
95e742f7 5123parser file, treating it as an independent source file in its own right.
bfa74976 5124
931c7513
RS
5125@item -n
5126@itemx --no-parser
d8988b2f 5127Pretend that @code{%no-parser} was specified. @xref{Decl Summary}.
931c7513 5128
89cab50d
AD
5129@item -k
5130@itemx --token-table
d8988b2f 5131Pretend that @code{%token-table} was specified. @xref{Decl Summary}.
89cab50d 5132@end table
bfa74976 5133
89cab50d
AD
5134@noindent
5135Adjust the output:
bfa74976 5136
89cab50d
AD
5137@table @option
5138@item -d
d8988b2f
AD
5139@itemx --defines
5140Pretend that @code{%defines} was specified, i.e., write an extra output
6deb4447
AD
5141file containing macro definitions for the token type names defined in
5142the grammar and the semantic value type @code{YYSTYPE}, as well as a few
5143@code{extern} variable declarations. @xref{Decl Summary}.
931c7513 5144
342b8b6e 5145@item --defines=@var{defines-file}
d8988b2f 5146Same as above, but save in the file @var{defines-file}.
342b8b6e 5147
89cab50d
AD
5148@item -b @var{file-prefix}
5149@itemx --file-prefix=@var{prefix}
d8988b2f
AD
5150Pretend that @code{%verbose} was specified, i.e, specify prefix to use
5151for all Bison output file names. @xref{Decl Summary}.
bfa74976
RS
5152
5153@item -v
5154@itemx --verbose
6deb4447
AD
5155Pretend that @code{%verbose} was specified, i.e, write an extra output
5156file containing verbose descriptions of the grammar and
d8988b2f 5157parser. @xref{Decl Summary}.
bfa74976 5158
d8988b2f
AD
5159@item -o @var{filename}
5160@itemx --output=@var{filename}
5161Specify the @var{filename} for the parser file.
bfa74976 5162
d8988b2f
AD
5163The other output files' names are constructed from @var{filename} as
5164described under the @samp{-v} and @samp{-d} options.
342b8b6e
AD
5165
5166@item -g
5167Output a VCG definition of the LALR(1) grammar automaton computed by
5168Bison. If the grammar file is @file{foo.y}, the VCG output file will
5169be @file{foo.vcg}.
5170
5171@item --graph=@var{graph-file}
5172The behaviour of @var{--graph} is the same than @samp{-g}. The only
5173difference is that it has an optionnal argument which is the name of
5174the output graph filename.
bfa74976
RS
5175@end table
5176
342b8b6e 5177@node Environment Variables
9ecbd125
JT
5178@section Environment Variables
5179@cindex environment variables
5180@cindex BISON_HAIRY
5181@cindex BISON_SIMPLE
5182
5183Here is a list of environment variables which affect the way Bison
5184runs.
5185
5186@table @samp
5187@item BISON_SIMPLE
5188@itemx BISON_HAIRY
5189Much of the parser generated by Bison is copied verbatim from a file
5190called @file{bison.simple}. If Bison cannot find that file, or if you
5191would like to direct Bison to use a different copy, setting the
5192environment variable @code{BISON_SIMPLE} to the path of the file will
5193cause Bison to use that copy instead.
5194
13863333 5195When the @samp{%semantic_parser} declaration is used, Bison copies from
9ecbd125
JT
5196a file called @file{bison.hairy} instead. The location of this file can
5197also be specified or overridden in a similar fashion, with the
5198@code{BISON_HAIRY} environment variable.
5199
5200@end table
5201
342b8b6e 5202@node Option Cross Key
bfa74976
RS
5203@section Option Cross Key
5204
5205Here is a list of options, alphabetized by long option, to help you find
5206the corresponding short option.
5207
5208@tex
5209\def\leaderfill{\leaders\hbox to 1em{\hss.\hss}\hfill}
5210
5211{\tt
5212\line{ --debug \leaderfill -t}
5213\line{ --defines \leaderfill -d}
5214\line{ --file-prefix \leaderfill -b}
5215\line{ --fixed-output-files \leaderfill -y}
342b8b6e 5216\line{ --graph \leaderfill -g}
ff51d159 5217\line{ --help \leaderfill -h}
bfa74976
RS
5218\line{ --name-prefix \leaderfill -p}
5219\line{ --no-lines \leaderfill -l}
931c7513 5220\line{ --no-parser \leaderfill -n}
d8988b2f 5221\line{ --output \leaderfill -o}
931c7513 5222\line{ --token-table \leaderfill -k}
bfa74976
RS
5223\line{ --verbose \leaderfill -v}
5224\line{ --version \leaderfill -V}
5225\line{ --yacc \leaderfill -y}
5226}
5227@end tex
5228
5229@ifinfo
5230@example
5231--debug -t
342b8b6e 5232--defines=@var{defines-file} -d
bfa74976
RS
5233--file-prefix=@var{prefix} -b @var{file-prefix}
5234--fixed-output-files --yacc -y
342b8b6e 5235--graph=@var{graph-file} -d
ff51d159 5236--help -h
931c7513 5237--name-prefix=@var{prefix} -p @var{name-prefix}
bfa74976 5238--no-lines -l
931c7513 5239--no-parser -n
d8988b2f 5240--output=@var{outfile} -o @var{outfile}
931c7513 5241--token-table -k
bfa74976
RS
5242--verbose -v
5243--version -V
5244@end example
5245@end ifinfo
5246
342b8b6e 5247@node VMS Invocation
bfa74976
RS
5248@section Invoking Bison under VMS
5249@cindex invoking Bison under VMS
5250@cindex VMS
5251
5252The command line syntax for Bison on VMS is a variant of the usual
5253Bison command syntax---adapted to fit VMS conventions.
5254
5255To find the VMS equivalent for any Bison option, start with the long
5256option, and substitute a @samp{/} for the leading @samp{--}, and
5257substitute a @samp{_} for each @samp{-} in the name of the long option.
5258For example, the following invocation under VMS:
5259
5260@example
5261bison /debug/name_prefix=bar foo.y
5262@end example
5263
5264@noindent
5265is equivalent to the following command under POSIX.
5266
5267@example
5268bison --debug --name-prefix=bar foo.y
5269@end example
5270
5271The VMS file system does not permit filenames such as
5272@file{foo.tab.c}. In the above example, the output file
5273would instead be named @file{foo_tab.c}.
5274
342b8b6e 5275@node Table of Symbols
bfa74976
RS
5276@appendix Bison Symbols
5277@cindex Bison symbols, table of
5278@cindex symbols in Bison, table of
5279
5280@table @code
5281@item error
5282A token name reserved for error recovery. This token may be used in
5283grammar rules so as to allow the Bison parser to recognize an error in
5284the grammar without halting the process. In effect, a sentence
5285containing an error may be recognized as valid. On a parse error, the
5286token @code{error} becomes the current look-ahead token. Actions
5287corresponding to @code{error} are then executed, and the look-ahead
5288token is reset to the token that originally caused the violation.
5289@xref{Error Recovery}.
5290
5291@item YYABORT
5292Macro to pretend that an unrecoverable syntax error has occurred, by
5293making @code{yyparse} return 1 immediately. The error reporting
ceed8467
AD
5294function @code{yyerror} is not called. @xref{Parser Function, ,The
5295Parser Function @code{yyparse}}.
bfa74976
RS
5296
5297@item YYACCEPT
5298Macro to pretend that a complete utterance of the language has been
13863333 5299read, by making @code{yyparse} return 0 immediately.
bfa74976
RS
5300@xref{Parser Function, ,The Parser Function @code{yyparse}}.
5301
5302@item YYBACKUP
5303Macro to discard a value from the parser stack and fake a look-ahead
5304token. @xref{Action Features, ,Special Features for Use in Actions}.
5305
5306@item YYERROR
5307Macro to pretend that a syntax error has just been detected: call
5308@code{yyerror} and then perform normal error recovery if possible
5309(@pxref{Error Recovery}), or (if recovery is impossible) make
5310@code{yyparse} return 1. @xref{Error Recovery}.
5311
5312@item YYERROR_VERBOSE
5313Macro that you define with @code{#define} in the Bison declarations
5314section to request verbose, specific error message strings when
5315@code{yyerror} is called.
5316
5317@item YYINITDEPTH
5318Macro for specifying the initial size of the parser stack.
5319@xref{Stack Overflow}.
5320
c656404a
RS
5321@item YYLEX_PARAM
5322Macro for specifying an extra argument (or list of extra arguments) for
5323@code{yyparse} to pass to @code{yylex}. @xref{Pure Calling,, Calling
5324Conventions for Pure Parsers}.
5325
bfa74976
RS
5326@item YYLTYPE
5327Macro for the data type of @code{yylloc}; a structure with four
847bf1f5 5328members. @xref{Location Type, , Data Types of Locations}.
bfa74976 5329
931c7513
RS
5330@item yyltype
5331Default value for YYLTYPE.
5332
bfa74976
RS
5333@item YYMAXDEPTH
5334Macro for specifying the maximum size of the parser stack.
5335@xref{Stack Overflow}.
5336
c656404a
RS
5337@item YYPARSE_PARAM
5338Macro for specifying the name of a parameter that @code{yyparse} should
5339accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}.
5340
bfa74976
RS
5341@item YYRECOVERING
5342Macro whose value indicates whether the parser is recovering from a
5343syntax error. @xref{Action Features, ,Special Features for Use in Actions}.
5344
f9a8293a
AD
5345@item YYSTACK_USE_ALLOCA
5346Macro used to control the use of @code{alloca}. If defined to @samp{0},
5347the parser will not use @code{alloca} but @code{malloc} when trying to
5348grow its internal stacks. Do @emph{not} define @code{YYSTACK_USE_ALLOCA}
5349to anything else.
5350
bfa74976
RS
5351@item YYSTYPE
5352Macro for the data type of semantic values; @code{int} by default.
5353@xref{Value Type, ,Data Types of Semantic Values}.
5354
5355@item yychar
13863333
AD
5356External integer variable that contains the integer value of the current
5357look-ahead token. (In a pure parser, it is a local variable within
5358@code{yyparse}.) Error-recovery rule actions may examine this variable.
5359@xref{Action Features, ,Special Features for Use in Actions}.
bfa74976
RS
5360
5361@item yyclearin
5362Macro used in error-recovery rule actions. It clears the previous
5363look-ahead token. @xref{Error Recovery}.
5364
5365@item yydebug
5366External integer variable set to zero by default. If @code{yydebug}
5367is given a nonzero value, the parser will output information on input
5368symbols and parser action. @xref{Debugging, ,Debugging Your Parser}.
5369
5370@item yyerrok
5371Macro to cause parser to recover immediately to its normal mode
5372after a parse error. @xref{Error Recovery}.
5373
5374@item yyerror
5375User-supplied function to be called by @code{yyparse} on error. The
5376function receives one argument, a pointer to a character string
13863333
AD
5377containing an error message. @xref{Error Reporting, ,The Error
5378Reporting Function @code{yyerror}}.
bfa74976
RS
5379
5380@item yylex
5381User-supplied lexical analyzer function, called with no arguments
5382to get the next token. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
5383
5384@item yylval
5385External variable in which @code{yylex} should place the semantic
5386value associated with a token. (In a pure parser, it is a local
5387variable within @code{yyparse}, and its address is passed to
5388@code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}.
5389
5390@item yylloc
13863333
AD
5391External variable in which @code{yylex} should place the line and column
5392numbers associated with a token. (In a pure parser, it is a local
5393variable within @code{yyparse}, and its address is passed to
bfa74976 5394@code{yylex}.) You can ignore this variable if you don't use the
13863333
AD
5395@samp{@@} feature in the grammar actions. @xref{Token Positions,
5396,Textual Positions of Tokens}.
bfa74976
RS
5397
5398@item yynerrs
13863333
AD
5399Global variable which Bison increments each time there is a parse error.
5400(In a pure parser, it is a local variable within @code{yyparse}.)
5401@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
bfa74976
RS
5402
5403@item yyparse
5404The parser function produced by Bison; call this function to start
5405parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
5406
6deb4447
AD
5407@item %debug
5408Equip the parser for debugging. @xref{Decl Summary}.
5409
5410@item %defines
5411Bison declaration to create a header file meant for the scanner.
5412@xref{Decl Summary}.
5413
d8988b2f
AD
5414@item %file-prefix="@var{prefix}"
5415Bison declaration to set tge prefix of the output files. @xref{Decl
5416Summary}.
5417
f9a8293a
AD
5418@c @item %source_extension
5419@c Bison declaration to specify the generated parser output file extension.
5420@c @xref{Decl Summary}.
5421@c
5422@c @item %header_extension
5423@c Bison declaration to specify the generated parser header file extension
5424@c if required. @xref{Decl Summary}.
5425
bfa74976
RS
5426@item %left
5427Bison declaration to assign left associativity to token(s).
5428@xref{Precedence Decl, ,Operator Precedence}.
5429
d8988b2f
AD
5430@item %name-prefix="@var{prefix}"
5431Bison declaration to rename the external symbols. @xref{Decl Summary}.
5432
5433@item %no-lines
931c7513
RS
5434Bison declaration to avoid generating @code{#line} directives in the
5435parser file. @xref{Decl Summary}.
5436
bfa74976 5437@item %nonassoc
14ded682 5438Bison declaration to assign non-associativity to token(s).
bfa74976
RS
5439@xref{Precedence Decl, ,Operator Precedence}.
5440
d8988b2f
AD
5441@item %output="@var{filename}"
5442Bison declaration to set the name of the parser file. @xref{Decl
5443Summary}.
5444
bfa74976
RS
5445@item %prec
5446Bison declaration to assign a precedence to a specific rule.
5447@xref{Contextual Precedence, ,Context-Dependent Precedence}.
5448
d8988b2f 5449@item %pure-parser
bfa74976
RS
5450Bison declaration to request a pure (reentrant) parser.
5451@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
5452
5453@item %right
5454Bison declaration to assign right associativity to token(s).
5455@xref{Precedence Decl, ,Operator Precedence}.
5456
5457@item %start
5458Bison declaration to specify the start symbol. @xref{Start Decl, ,The Start-Symbol}.
5459
5460@item %token
5461Bison declaration to declare token(s) without specifying precedence.
5462@xref{Token Decl, ,Token Type Names}.
5463
d8988b2f 5464@item %token-table
931c7513
RS
5465Bison declaration to include a token name table in the parser file.
5466@xref{Decl Summary}.
5467
bfa74976
RS
5468@item %type
5469Bison declaration to declare nonterminals. @xref{Type Decl, ,Nonterminal Symbols}.
5470
5471@item %union
5472Bison declaration to specify several possible data types for semantic
5473values. @xref{Union Decl, ,The Collection of Value Types}.
5474@end table
5475
5476These are the punctuation and delimiters used in Bison input:
5477
5478@table @samp
5479@item %%
5480Delimiter used to separate the grammar rule section from the
75f5aaea 5481Bison declarations section or the epilogue.
bfa74976
RS
5482@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
5483
5484@item %@{ %@}
89cab50d 5485All code listed between @samp{%@{} and @samp{%@}} is copied directly to
342b8b6e 5486the output file uninterpreted. Such code forms the prologue of the input
75f5aaea 5487file. @xref{Grammar Outline, ,Outline of a Bison
89cab50d 5488Grammar}.
bfa74976
RS
5489
5490@item /*@dots{}*/
5491Comment delimiters, as in C.
5492
5493@item :
89cab50d
AD
5494Separates a rule's result from its components. @xref{Rules, ,Syntax of
5495Grammar Rules}.
bfa74976
RS
5496
5497@item ;
5498Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}.
5499
5500@item |
5501Separates alternate rules for the same result nonterminal.
5502@xref{Rules, ,Syntax of Grammar Rules}.
5503@end table
5504
342b8b6e 5505@node Glossary
bfa74976
RS
5506@appendix Glossary
5507@cindex glossary
5508
5509@table @asis
5510@item Backus-Naur Form (BNF)
5511Formal method of specifying context-free grammars. BNF was first used
89cab50d
AD
5512in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar,
5513,Languages and Context-Free Grammars}.
bfa74976
RS
5514
5515@item Context-free grammars
5516Grammars specified as rules that can be applied regardless of context.
5517Thus, if there is a rule which says that an integer can be used as an
5518expression, integers are allowed @emph{anywhere} an expression is
89cab50d
AD
5519permitted. @xref{Language and Grammar, ,Languages and Context-Free
5520Grammars}.
bfa74976
RS
5521
5522@item Dynamic allocation
5523Allocation of memory that occurs during execution, rather than at
5524compile time or on entry to a function.
5525
5526@item Empty string
5527Analogous to the empty set in set theory, the empty string is a
5528character string of length zero.
5529
5530@item Finite-state stack machine
5531A ``machine'' that has discrete states in which it is said to exist at
5532each instant in time. As input to the machine is processed, the
5533machine moves from state to state as specified by the logic of the
5534machine. In the case of the parser, the input is the language being
5535parsed, and the states correspond to various stages in the grammar
5536rules. @xref{Algorithm, ,The Bison Parser Algorithm }.
5537
5538@item Grouping
5539A language construct that is (in general) grammatically divisible;
13863333 5540for example, `expression' or `declaration' in C.
bfa74976
RS
5541@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5542
5543@item Infix operator
5544An arithmetic operator that is placed between the operands on which it
5545performs some operation.
5546
5547@item Input stream
5548A continuous flow of data between devices or programs.
5549
5550@item Language construct
5551One of the typical usage schemas of the language. For example, one of
5552the constructs of the C language is the @code{if} statement.
5553@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5554
5555@item Left associativity
5556Operators having left associativity are analyzed from left to right:
5557@samp{a+b+c} first computes @samp{a+b} and then combines with
5558@samp{c}. @xref{Precedence, ,Operator Precedence}.
5559
5560@item Left recursion
89cab50d
AD
5561A rule whose result symbol is also its first component symbol; for
5562example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive
5563Rules}.
bfa74976
RS
5564
5565@item Left-to-right parsing
5566Parsing a sentence of a language by analyzing it token by token from
5567left to right. @xref{Algorithm, ,The Bison Parser Algorithm }.
5568
5569@item Lexical analyzer (scanner)
5570A function that reads an input stream and returns tokens one by one.
5571@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
5572
5573@item Lexical tie-in
5574A flag, set by actions in the grammar rules, which alters the way
5575tokens are parsed. @xref{Lexical Tie-ins}.
5576
931c7513 5577@item Literal string token
14ded682 5578A token which consists of two or more fixed characters. @xref{Symbols}.
931c7513 5579
bfa74976 5580@item Look-ahead token
89cab50d
AD
5581A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead
5582Tokens}.
bfa74976
RS
5583
5584@item LALR(1)
5585The class of context-free grammars that Bison (like most other parser
5586generators) can handle; a subset of LR(1). @xref{Mystery Conflicts, ,
5587Mysterious Reduce/Reduce Conflicts}.
5588
5589@item LR(1)
5590The class of context-free grammars in which at most one token of
5591look-ahead is needed to disambiguate the parsing of any piece of input.
5592
5593@item Nonterminal symbol
5594A grammar symbol standing for a grammatical construct that can
5595be expressed through rules in terms of smaller constructs; in other
5596words, a construct that is not a token. @xref{Symbols}.
5597
5598@item Parse error
5599An error encountered during parsing of an input stream due to invalid
5600syntax. @xref{Error Recovery}.
5601
5602@item Parser
5603A function that recognizes valid sentences of a language by analyzing
5604the syntax structure of a set of tokens passed to it from a lexical
5605analyzer.
5606
5607@item Postfix operator
5608An arithmetic operator that is placed after the operands upon which it
5609performs some operation.
5610
5611@item Reduction
5612Replacing a string of nonterminals and/or terminals with a single
89cab50d
AD
5613nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison
5614Parser Algorithm }.
bfa74976
RS
5615
5616@item Reentrant
5617A reentrant subprogram is a subprogram which can be in invoked any
5618number of times in parallel, without interference between the various
5619invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
5620
5621@item Reverse polish notation
5622A language in which all operators are postfix operators.
5623
5624@item Right recursion
89cab50d
AD
5625A rule whose result symbol is also its last component symbol; for
5626example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive
5627Rules}.
bfa74976
RS
5628
5629@item Semantics
5630In computer languages, the semantics are specified by the actions
5631taken for each instance of the language, i.e., the meaning of
5632each statement. @xref{Semantics, ,Defining Language Semantics}.
5633
5634@item Shift
5635A parser is said to shift when it makes the choice of analyzing
5636further input from the stream rather than reducing immediately some
5637already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm }.
5638
5639@item Single-character literal
5640A single character that is recognized and interpreted as is.
5641@xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
5642
5643@item Start symbol
5644The nonterminal symbol that stands for a complete valid utterance in
5645the language being parsed. The start symbol is usually listed as the
13863333 5646first nonterminal symbol in a language specification.
bfa74976
RS
5647@xref{Start Decl, ,The Start-Symbol}.
5648
5649@item Symbol table
5650A data structure where symbol names and associated data are stored
5651during parsing to allow for recognition and use of existing
5652information in repeated uses of a symbol. @xref{Multi-function Calc}.
5653
5654@item Token
5655A basic, grammatically indivisible unit of a language. The symbol
5656that describes a token in the grammar is a terminal symbol.
5657The input of the Bison parser is a stream of tokens which comes from
5658the lexical analyzer. @xref{Symbols}.
5659
5660@item Terminal symbol
89cab50d
AD
5661A grammar symbol that has no rules in the grammar and therefore is
5662grammatically indivisible. The piece of text it represents is a token.
5663@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
bfa74976
RS
5664@end table
5665
342b8b6e 5666@node Copying This Manual
f2b5126e 5667@appendix Copying This Manual
f9a8293a 5668
f2b5126e
PB
5669@menu
5670* GNU Free Documentation License:: License for copying this manual.
5671@end menu
f9a8293a 5672
f2b5126e
PB
5673@include fdl.texi
5674
342b8b6e 5675@node Index
bfa74976
RS
5676@unnumbered Index
5677
5678@printindex cp
5679
bfa74976 5680@bye