]> git.saurik.com Git - bison.git/blame - doc/bison.texinfo
doc: style changes.
[bison.git] / doc / bison.texinfo
CommitLineData
bfa74976
RS
1\input texinfo @c -*-texinfo-*-
2@comment %**start of header
3@setfilename bison.info
df1af54c
JT
4@include version.texi
5@settitle Bison @value{VERSION}
bfa74976
RS
6@setchapternewpage odd
7
5378c3e7 8@finalout
5378c3e7 9
13863333 10@c SMALL BOOK version
bfa74976 11@c This edition has been formatted so that you can format and print it in
13863333 12@c the smallbook format.
bfa74976
RS
13@c @smallbook
14
91d2c560
PE
15@c Set following if you want to document %default-prec and %no-default-prec.
16@c This feature is experimental and may change in future Bison versions.
17@c @set defaultprec
18
8c5b881d 19@ifnotinfo
bfa74976
RS
20@syncodeindex fn cp
21@syncodeindex vr cp
22@syncodeindex tp cp
8c5b881d 23@end ifnotinfo
bfa74976
RS
24@ifinfo
25@synindex fn cp
26@synindex vr cp
27@synindex tp cp
28@end ifinfo
29@comment %**end of header
30
fae437e8 31@copying
bd773d73 32
35430378
JD
33This manual (@value{UPDATED}) is for GNU Bison (version
34@value{VERSION}), the GNU parser generator.
fae437e8 35
c932d613 36Copyright @copyright{} 1988-1993, 1995, 1998-2012 Free Software
ea0a7676 37Foundation, Inc.
fae437e8
AD
38
39@quotation
40Permission is granted to copy, distribute and/or modify this document
35430378 41under the terms of the GNU Free Documentation License,
241ac701 42Version 1.3 or any later version published by the Free Software
c827f760 43Foundation; with no Invariant Sections, with the Front-Cover texts
35430378 44being ``A GNU Manual,'' and with the Back-Cover Texts as in
c827f760 45(a) below. A copy of the license is included in the section entitled
35430378 46``GNU Free Documentation License.''
c827f760 47
389c8cfd 48(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
35430378
JD
49modify this GNU manual. Buying copies from the FSF
50supports it in developing GNU and promoting software
389c8cfd 51freedom.''
fae437e8
AD
52@end quotation
53@end copying
54
e62f1a89 55@dircategory Software development
fae437e8 56@direntry
35430378 57* bison: (bison). GNU parser generator (Yacc replacement).
fae437e8 58@end direntry
bfa74976 59
bfa74976
RS
60@titlepage
61@title Bison
c827f760 62@subtitle The Yacc-compatible Parser Generator
df1af54c 63@subtitle @value{UPDATED}, Bison Version @value{VERSION}
bfa74976
RS
64
65@author by Charles Donnelly and Richard Stallman
66
67@page
68@vskip 0pt plus 1filll
fae437e8 69@insertcopying
bfa74976
RS
70@sp 2
71Published by the Free Software Foundation @*
0fb669f9
PE
7251 Franklin Street, Fifth Floor @*
73Boston, MA 02110-1301 USA @*
9ecbd125 74Printed copies are available from the Free Software Foundation.@*
35430378 75ISBN 1-882114-44-2
bfa74976
RS
76@sp 2
77Cover art by Etienne Suvasa.
78@end titlepage
d5796688
JT
79
80@contents
bfa74976 81
342b8b6e
AD
82@ifnottex
83@node Top
84@top Bison
fae437e8 85@insertcopying
342b8b6e 86@end ifnottex
bfa74976
RS
87
88@menu
13863333
AD
89* Introduction::
90* Conditions::
35430378 91* Copying:: The GNU General Public License says
f56274a8 92 how you can copy and share Bison.
bfa74976
RS
93
94Tutorial sections:
f56274a8
DJ
95* Concepts:: Basic concepts for understanding Bison.
96* Examples:: Three simple explained examples of using Bison.
bfa74976
RS
97
98Reference sections:
f56274a8
DJ
99* Grammar File:: Writing Bison declarations and rules.
100* Interface:: C-language interface to the parser function @code{yyparse}.
101* Algorithm:: How the Bison parser works at run-time.
102* Error Recovery:: Writing rules for error recovery.
bfa74976 103* Context Dependency:: What to do if your language syntax is too
f56274a8
DJ
104 messy for Bison to handle straightforwardly.
105* Debugging:: Understanding or debugging Bison parsers.
9913d6e4 106* Invocation:: How to run Bison (to produce the parser implementation).
f56274a8
DJ
107* Other Languages:: Creating C++ and Java parsers.
108* FAQ:: Frequently Asked Questions
109* Table of Symbols:: All the keywords of the Bison language are explained.
110* Glossary:: Basic concepts are explained.
111* Copying This Manual:: License for copying this manual.
71caec06 112* Bibliography:: Publications cited in this manual.
f56274a8 113* Index:: Cross-references to the text.
bfa74976 114
93dd49ab
PE
115@detailmenu
116 --- The Detailed Node Listing ---
bfa74976
RS
117
118The Concepts of Bison
119
f56274a8
DJ
120* Language and Grammar:: Languages and context-free grammars,
121 as mathematical ideas.
122* Grammar in Bison:: How we represent grammars for Bison's sake.
123* Semantic Values:: Each token or syntactic grouping can have
124 a semantic value (the value of an integer,
125 the name of an identifier, etc.).
126* Semantic Actions:: Each rule can have an action containing C code.
127* GLR Parsers:: Writing parsers for general context-free languages.
83484365 128* Locations:: Overview of location tracking.
f56274a8
DJ
129* Bison Parser:: What are Bison's input and output,
130 how is the output used?
131* Stages:: Stages in writing and running Bison grammars.
132* Grammar Layout:: Overall structure of a Bison grammar file.
bfa74976 133
35430378 134Writing GLR Parsers
fa7e68c3 135
35430378
JD
136* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
137* Merging GLR Parses:: Using GLR parsers to resolve ambiguities.
f56274a8 138* GLR Semantic Actions:: Deferred semantic actions have special concerns.
35430378 139* Compiler Requirements:: GLR parsers require a modern C compiler.
fa7e68c3 140
bfa74976
RS
141Examples
142
f56274a8
DJ
143* RPN Calc:: Reverse polish notation calculator;
144 a first example with no operator precedence.
145* Infix Calc:: Infix (algebraic) notation calculator.
146 Operator precedence is introduced.
bfa74976 147* Simple Error Recovery:: Continuing after syntax errors.
342b8b6e 148* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
f56274a8
DJ
149* Multi-function Calc:: Calculator with memory and trig functions.
150 It uses multiple data-types for semantic values.
151* Exercises:: Ideas for improving the multi-function calculator.
bfa74976
RS
152
153Reverse Polish Notation Calculator
154
f56274a8
DJ
155* Rpcalc Declarations:: Prologue (declarations) for rpcalc.
156* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.
157* Rpcalc Lexer:: The lexical analyzer.
158* Rpcalc Main:: The controlling function.
159* Rpcalc Error:: The error reporting function.
160* Rpcalc Generate:: Running Bison on the grammar file.
161* Rpcalc Compile:: Run the C compiler on the output code.
bfa74976
RS
162
163Grammar Rules for @code{rpcalc}
164
13863333
AD
165* Rpcalc Input::
166* Rpcalc Line::
167* Rpcalc Expr::
bfa74976 168
342b8b6e
AD
169Location Tracking Calculator: @code{ltcalc}
170
f56274a8
DJ
171* Ltcalc Declarations:: Bison and C declarations for ltcalc.
172* Ltcalc Rules:: Grammar rules for ltcalc, with explanations.
173* Ltcalc Lexer:: The lexical analyzer.
342b8b6e 174
bfa74976
RS
175Multi-Function Calculator: @code{mfcalc}
176
f56274a8
DJ
177* Mfcalc Declarations:: Bison declarations for multi-function calculator.
178* Mfcalc Rules:: Grammar rules for the calculator.
179* Mfcalc Symbol Table:: Symbol table management subroutines.
bfa74976
RS
180
181Bison Grammar Files
182
7404cdf3
JD
183* Grammar Outline:: Overall layout of the grammar file.
184* Symbols:: Terminal and nonterminal symbols.
185* Rules:: How to write grammar rules.
186* Recursion:: Writing recursive rules.
187* Semantics:: Semantic values and actions.
188* Tracking Locations:: Locations and actions.
189* Named References:: Using named references in actions.
190* Declarations:: All kinds of Bison declarations are described here.
191* Multiple Parsers:: Putting more than one Bison parser in one program.
bfa74976
RS
192
193Outline of a Bison Grammar
194
f56274a8 195* Prologue:: Syntax and usage of the prologue.
2cbe6b7f 196* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
f56274a8
DJ
197* Bison Declarations:: Syntax and usage of the Bison declarations section.
198* Grammar Rules:: Syntax and usage of the grammar rules section.
199* Epilogue:: Syntax and usage of the epilogue.
bfa74976
RS
200
201Defining Language Semantics
202
203* Value Type:: Specifying one data type for all semantic values.
204* Multiple Types:: Specifying several alternative data types.
205* Actions:: An action is the semantic definition of a grammar rule.
206* Action Types:: Specifying data types for actions to operate on.
207* Mid-Rule Actions:: Most actions go at the end of a rule.
208 This says when, why and how to use the exceptional
209 action in the middle of a rule.
210
93dd49ab
PE
211Tracking Locations
212
213* Location Type:: Specifying a data type for locations.
214* Actions and Locations:: Using locations in actions.
215* Location Default Action:: Defining a general way to compute locations.
216
bfa74976
RS
217Bison Declarations
218
b50d2359 219* Require Decl:: Requiring a Bison version.
bfa74976
RS
220* Token Decl:: Declaring terminal symbols.
221* Precedence Decl:: Declaring terminals with precedence and associativity.
222* Union Decl:: Declaring the set of all semantic value types.
223* Type Decl:: Declaring the choice of type for a nonterminal symbol.
18d192f0 224* Initial Action Decl:: Code run before parsing starts.
72f889cc 225* Destructor Decl:: Declaring how symbols are freed.
d6328241 226* Expect Decl:: Suppressing warnings about parsing conflicts.
bfa74976
RS
227* Start Decl:: Specifying the start symbol.
228* Pure Decl:: Requesting a reentrant parser.
9987d1b3 229* Push Decl:: Requesting a push parser.
bfa74976 230* Decl Summary:: Table of all Bison declarations.
2f4518a1 231* %define Summary:: Defining variables to adjust Bison's behavior.
8e6f2266 232* %code Summary:: Inserting code into the parser source.
bfa74976
RS
233
234Parser C-Language Interface
235
f56274a8
DJ
236* Parser Function:: How to call @code{yyparse} and what it returns.
237* Push Parser Function:: How to call @code{yypush_parse} and what it returns.
238* Pull Parser Function:: How to call @code{yypull_parse} and what it returns.
239* Parser Create Function:: How to call @code{yypstate_new} and what it returns.
240* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns.
241* Lexical:: You must supply a function @code{yylex}
242 which reads tokens.
243* Error Reporting:: You must supply a function @code{yyerror}.
244* Action Features:: Special features for use in actions.
245* Internationalization:: How to let the parser speak in the user's
246 native language.
bfa74976
RS
247
248The Lexical Analyzer Function @code{yylex}
249
250* Calling Convention:: How @code{yyparse} calls @code{yylex}.
f56274a8
DJ
251* Token Values:: How @code{yylex} must return the semantic value
252 of the token it has read.
253* Token Locations:: How @code{yylex} must return the text location
254 (line number, etc.) of the token, if the
255 actions want that.
256* Pure Calling:: How the calling convention differs in a pure parser
257 (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
bfa74976 258
13863333 259The Bison Parser Algorithm
bfa74976 260
742e4900 261* Lookahead:: Parser looks one token ahead when deciding what to do.
bfa74976
RS
262* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
263* Precedence:: Operator precedence works by resolving conflicts.
264* Contextual Precedence:: When an operator's precedence depends on context.
265* Parser States:: The parser is a finite-state-machine with stack.
266* Reduce/Reduce:: When two rules are applicable in the same situation.
5da0355a 267* Mysterious Conflicts:: Conflicts that look unjustified.
6f04ee6c 268* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
676385e2 269* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
1a059451 270* Memory Management:: What happens when memory is exhausted. How to avoid it.
bfa74976
RS
271
272Operator Precedence
273
274* Why Precedence:: An example showing why precedence is needed.
275* Using Precedence:: How to specify precedence in Bison grammars.
276* Precedence Examples:: How these features are used in the previous example.
277* How Precedence:: How they work.
278
6f04ee6c
JD
279Tuning LR
280
281* LR Table Construction:: Choose a different construction algorithm.
282* Default Reductions:: Disable default reductions.
283* LAC:: Correct lookahead sets in the parser states.
284* Unreachable States:: Keep unreachable parser states for debugging.
285
bfa74976
RS
286Handling Context Dependencies
287
288* Semantic Tokens:: Token parsing can depend on the semantic context.
289* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
290* Tie-in Recovery:: Lexical tie-ins have implications for how
291 error recovery rules must be written.
292
93dd49ab 293Debugging Your Parser
ec3bc396
AD
294
295* Understanding:: Understanding the structure of your parser.
296* Tracing:: Tracing the execution of your parser.
297
bfa74976
RS
298Invoking Bison
299
13863333 300* Bison Options:: All the options described in detail,
c827f760 301 in alphabetical order by short options.
bfa74976 302* Option Cross Key:: Alphabetical list of long options.
93dd49ab 303* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
f2b5126e 304
8405b70c 305Parsers Written In Other Languages
12545799
AD
306
307* C++ Parsers:: The interface to generate C++ parser classes
8405b70c 308* Java Parsers:: The interface to generate Java parser classes
12545799
AD
309
310C++ Parsers
311
312* C++ Bison Interface:: Asking for C++ parser generation
313* C++ Semantic Values:: %union vs. C++
314* C++ Location Values:: The position and location classes
315* C++ Parser Interface:: Instantiating and running the parser
316* C++ Scanner Interface:: Exchanges between yylex and parse
8405b70c 317* A Complete C++ Example:: Demonstrating their use
12545799
AD
318
319A Complete C++ Example
320
321* Calc++ --- C++ Calculator:: The specifications
322* Calc++ Parsing Driver:: An active parsing context
323* Calc++ Parser:: A parser class
324* Calc++ Scanner:: A pure C++ Flex scanner
325* Calc++ Top Level:: Conducting the band
326
8405b70c
PB
327Java Parsers
328
f56274a8
DJ
329* Java Bison Interface:: Asking for Java parser generation
330* Java Semantic Values:: %type and %token vs. Java
331* Java Location Values:: The position and location classes
332* Java Parser Interface:: Instantiating and running the parser
333* Java Scanner Interface:: Specifying the scanner for the parser
334* Java Action Features:: Special features for use in actions
335* Java Differences:: Differences between C/C++ and Java Grammars
336* Java Declarations Summary:: List of Bison declarations used with Java
8405b70c 337
d1a1114f
AD
338Frequently Asked Questions
339
f56274a8
DJ
340* Memory Exhausted:: Breaking the Stack Limits
341* How Can I Reset the Parser:: @code{yyparse} Keeps some State
342* Strings are Destroyed:: @code{yylval} Loses Track of Strings
343* Implementing Gotos/Loops:: Control Flow in the Calculator
344* Multiple start-symbols:: Factoring closely related grammars
35430378 345* Secure? Conform?:: Is Bison POSIX safe?
f56274a8
DJ
346* I can't build Bison:: Troubleshooting
347* Where can I find help?:: Troubleshouting
348* Bug Reports:: Troublereporting
349* More Languages:: Parsers in C++, Java, and so on
350* Beta Testing:: Experimenting development versions
351* Mailing Lists:: Meeting other Bison users
d1a1114f 352
f2b5126e
PB
353Copying This Manual
354
f56274a8 355* Copying This Manual:: License for copying this manual.
f2b5126e 356
342b8b6e 357@end detailmenu
bfa74976
RS
358@end menu
359
342b8b6e 360@node Introduction
bfa74976
RS
361@unnumbered Introduction
362@cindex introduction
363
6077da58 364@dfn{Bison} is a general-purpose parser generator that converts an
d89e48b3
JD
365annotated context-free grammar into a deterministic LR or generalized
366LR (GLR) parser employing LALR(1) parser tables. As an experimental
367feature, Bison can also generate IELR(1) or canonical LR(1) parser
368tables. Once you are proficient with Bison, you can use it to develop
369a wide range of language parsers, from those used in simple desk
370calculators to complex programming languages.
371
372Bison is upward compatible with Yacc: all properly-written Yacc
373grammars ought to work with Bison with no change. Anyone familiar
374with Yacc should be able to use Bison with little trouble. You need
375to be fluent in C or C++ programming in order to use Bison or to
376understand this manual. Java is also supported as an experimental
377feature.
378
379We begin with tutorial chapters that explain the basic concepts of
380using Bison and show three explained examples, each building on the
381last. If you don't know Bison or Yacc, start by reading these
382chapters. Reference chapters follow, which describe specific aspects
383of Bison in detail.
bfa74976 384
840341d6
JD
385Bison was written originally by Robert Corbett. Richard Stallman made
386it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University
387added multi-character string literals and other features. Since then,
388Bison has grown more robust and evolved many other new features thanks
389to the hard work of a long list of volunteers. For details, see the
390@file{THANKS} and @file{ChangeLog} files included in the Bison
391distribution.
931c7513 392
df1af54c 393This edition corresponds to version @value{VERSION} of Bison.
bfa74976 394
342b8b6e 395@node Conditions
bfa74976
RS
396@unnumbered Conditions for Using Bison
397
193d7c70
PE
398The distribution terms for Bison-generated parsers permit using the
399parsers in nonfree programs. Before Bison version 2.2, these extra
35430378 400permissions applied only when Bison was generating LALR(1)
193d7c70 401parsers in C@. And before Bison version 1.24, Bison-generated
262aa8dd 402parsers could be used only in programs that were free software.
a31239f1 403
35430378 404The other GNU programming tools, such as the GNU C
c827f760 405compiler, have never
9ecbd125 406had such a requirement. They could always be used for nonfree
a31239f1
RS
407software. The reason Bison was different was not due to a special
408policy decision; it resulted from applying the usual General Public
409License to all of the Bison source code.
410
9913d6e4
JD
411The main output of the Bison utility---the Bison parser implementation
412file---contains a verbatim copy of a sizable piece of Bison, which is
413the code for the parser's implementation. (The actions from your
414grammar are inserted into this implementation at one point, but most
415of the rest of the implementation is not changed.) When we applied
416the GPL terms to the skeleton code for the parser's implementation,
a31239f1
RS
417the effect was to restrict the use of Bison output to free software.
418
419We didn't change the terms because of sympathy for people who want to
420make software proprietary. @strong{Software should be free.} But we
421concluded that limiting Bison's use to free software was doing little to
422encourage people to make other software free. So we decided to make the
423practical conditions for using Bison match the practical conditions for
35430378 424using the other GNU tools.
bfa74976 425
193d7c70
PE
426This exception applies when Bison is generating code for a parser.
427You can tell whether the exception applies to a Bison output file by
428inspecting the file for text beginning with ``As a special
429exception@dots{}''. The text spells out the exact terms of the
430exception.
262aa8dd 431
f16b0819
PE
432@node Copying
433@unnumbered GNU GENERAL PUBLIC LICENSE
434@include gpl-3.0.texi
bfa74976 435
342b8b6e 436@node Concepts
bfa74976
RS
437@chapter The Concepts of Bison
438
439This chapter introduces many of the basic concepts without which the
440details of Bison will not make sense. If you do not already know how to
441use Bison or Yacc, we suggest you start by reading this chapter carefully.
442
443@menu
f56274a8
DJ
444* Language and Grammar:: Languages and context-free grammars,
445 as mathematical ideas.
446* Grammar in Bison:: How we represent grammars for Bison's sake.
447* Semantic Values:: Each token or syntactic grouping can have
448 a semantic value (the value of an integer,
449 the name of an identifier, etc.).
450* Semantic Actions:: Each rule can have an action containing C code.
451* GLR Parsers:: Writing parsers for general context-free languages.
83484365 452* Locations:: Overview of location tracking.
f56274a8
DJ
453* Bison Parser:: What are Bison's input and output,
454 how is the output used?
455* Stages:: Stages in writing and running Bison grammars.
456* Grammar Layout:: Overall structure of a Bison grammar file.
bfa74976
RS
457@end menu
458
342b8b6e 459@node Language and Grammar
bfa74976
RS
460@section Languages and Context-Free Grammars
461
bfa74976
RS
462@cindex context-free grammar
463@cindex grammar, context-free
464In order for Bison to parse a language, it must be described by a
465@dfn{context-free grammar}. This means that you specify one or more
466@dfn{syntactic groupings} and give rules for constructing them from their
467parts. For example, in the C language, one kind of grouping is called an
468`expression'. One rule for making an expression might be, ``An expression
469can be made of a minus sign and another expression''. Another would be,
470``An expression can be an integer''. As you can see, rules are often
471recursive, but there must be at least one rule which leads out of the
472recursion.
473
35430378 474@cindex BNF
bfa74976
RS
475@cindex Backus-Naur form
476The most common formal system for presenting such rules for humans to read
35430378 477is @dfn{Backus-Naur Form} or ``BNF'', which was developed in
c827f760 478order to specify the language Algol 60. Any grammar expressed in
35430378
JD
479BNF is a context-free grammar. The input to Bison is
480essentially machine-readable BNF.
bfa74976 481
6f04ee6c
JD
482@cindex LALR grammars
483@cindex IELR grammars
484@cindex LR grammars
485There are various important subclasses of context-free grammars. Although
486it can handle almost all context-free grammars, Bison is optimized for what
487are called LR(1) grammars. In brief, in these grammars, it must be possible
488to tell how to parse any portion of an input string with just a single token
489of lookahead. For historical reasons, Bison by default is limited by the
490additional restrictions of LALR(1), which is hard to explain simply.
5da0355a
JD
491@xref{Mysterious Conflicts}, for more information on this. As an
492experimental feature, you can escape these additional restrictions by
493requesting IELR(1) or canonical LR(1) parser tables. @xref{LR Table
494Construction}, to learn how.
bfa74976 495
35430378
JD
496@cindex GLR parsing
497@cindex generalized LR (GLR) parsing
676385e2 498@cindex ambiguous grammars
9d9b8b70 499@cindex nondeterministic parsing
9501dc6e 500
35430378 501Parsers for LR(1) grammars are @dfn{deterministic}, meaning
9501dc6e
AD
502roughly that the next grammar rule to apply at any point in the input is
503uniquely determined by the preceding input and a fixed, finite portion
742e4900 504(called a @dfn{lookahead}) of the remaining input. A context-free
9501dc6e 505grammar can be @dfn{ambiguous}, meaning that there are multiple ways to
e4f85c39 506apply the grammar rules to get the same inputs. Even unambiguous
9d9b8b70 507grammars can be @dfn{nondeterministic}, meaning that no fixed
742e4900 508lookahead always suffices to determine the next grammar rule to apply.
9501dc6e 509With the proper declarations, Bison is also able to parse these more
35430378
JD
510general context-free grammars, using a technique known as GLR
511parsing (for Generalized LR). Bison's GLR parsers
9501dc6e
AD
512are able to handle any context-free grammar for which the number of
513possible parses of any given string is finite.
676385e2 514
bfa74976
RS
515@cindex symbols (abstract)
516@cindex token
517@cindex syntactic grouping
518@cindex grouping, syntactic
9501dc6e
AD
519In the formal grammatical rules for a language, each kind of syntactic
520unit or grouping is named by a @dfn{symbol}. Those which are built by
521grouping smaller constructs according to grammatical rules are called
bfa74976
RS
522@dfn{nonterminal symbols}; those which can't be subdivided are called
523@dfn{terminal symbols} or @dfn{token types}. We call a piece of input
524corresponding to a single terminal symbol a @dfn{token}, and a piece
e0c471a9 525corresponding to a single nonterminal symbol a @dfn{grouping}.
bfa74976
RS
526
527We can use the C language as an example of what symbols, terminal and
9501dc6e
AD
528nonterminal, mean. The tokens of C are identifiers, constants (numeric
529and string), and the various keywords, arithmetic operators and
530punctuation marks. So the terminal symbols of a grammar for C include
531`identifier', `number', `string', plus one symbol for each keyword,
532operator or punctuation mark: `if', `return', `const', `static', `int',
533`char', `plus-sign', `open-brace', `close-brace', `comma' and many more.
534(These tokens can be subdivided into characters, but that is a matter of
bfa74976
RS
535lexicography, not grammar.)
536
537Here is a simple C function subdivided into tokens:
538
9edcd895
AD
539@ifinfo
540@example
541int /* @r{keyword `int'} */
14d4662b 542square (int x) /* @r{identifier, open-paren, keyword `int',}
9edcd895
AD
543 @r{identifier, close-paren} */
544@{ /* @r{open-brace} */
aa08666d
AD
545 return x * x; /* @r{keyword `return', identifier, asterisk,}
546 @r{identifier, semicolon} */
9edcd895
AD
547@} /* @r{close-brace} */
548@end example
549@end ifinfo
550@ifnotinfo
bfa74976
RS
551@example
552int /* @r{keyword `int'} */
14d4662b 553square (int x) /* @r{identifier, open-paren, keyword `int', identifier, close-paren} */
bfa74976 554@{ /* @r{open-brace} */
9edcd895 555 return x * x; /* @r{keyword `return', identifier, asterisk, identifier, semicolon} */
bfa74976
RS
556@} /* @r{close-brace} */
557@end example
9edcd895 558@end ifnotinfo
bfa74976
RS
559
560The syntactic groupings of C include the expression, the statement, the
561declaration, and the function definition. These are represented in the
562grammar of C by nonterminal symbols `expression', `statement',
563`declaration' and `function definition'. The full grammar uses dozens of
564additional language constructs, each with its own nonterminal symbol, in
565order to express the meanings of these four. The example above is a
566function definition; it contains one declaration, and one statement. In
567the statement, each @samp{x} is an expression and so is @samp{x * x}.
568
569Each nonterminal symbol must have grammatical rules showing how it is made
570out of simpler constructs. For example, one kind of C statement is the
571@code{return} statement; this would be described with a grammar rule which
572reads informally as follows:
573
574@quotation
575A `statement' can be made of a `return' keyword, an `expression' and a
576`semicolon'.
577@end quotation
578
579@noindent
580There would be many other rules for `statement', one for each kind of
581statement in C.
582
583@cindex start symbol
584One nonterminal symbol must be distinguished as the special one which
585defines a complete utterance in the language. It is called the @dfn{start
586symbol}. In a compiler, this means a complete input program. In the C
587language, the nonterminal symbol `sequence of definitions and declarations'
588plays this role.
589
590For example, @samp{1 + 2} is a valid C expression---a valid part of a C
591program---but it is not valid as an @emph{entire} C program. In the
592context-free grammar of C, this follows from the fact that `expression' is
593not the start symbol.
594
595The Bison parser reads a sequence of tokens as its input, and groups the
596tokens using the grammar rules. If the input is valid, the end result is
597that the entire token sequence reduces to a single grouping whose symbol is
598the grammar's start symbol. If we use a grammar for C, the entire input
599must be a `sequence of definitions and declarations'. If not, the parser
600reports a syntax error.
601
342b8b6e 602@node Grammar in Bison
bfa74976
RS
603@section From Formal Rules to Bison Input
604@cindex Bison grammar
605@cindex grammar, Bison
606@cindex formal grammar
607
608A formal grammar is a mathematical construct. To define the language
609for Bison, you must write a file expressing the grammar in Bison syntax:
610a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}.
611
612A nonterminal symbol in the formal grammar is represented in Bison input
c827f760 613as an identifier, like an identifier in C@. By convention, it should be
bfa74976
RS
614in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
615
616The Bison representation for a terminal symbol is also called a @dfn{token
617type}. Token types as well can be represented as C-like identifiers. By
618convention, these identifiers should be upper case to distinguish them from
619nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
620@code{RETURN}. A terminal symbol that stands for a particular keyword in
621the language should be named after that keyword converted to upper case.
622The terminal symbol @code{error} is reserved for error recovery.
931c7513 623@xref{Symbols}.
bfa74976
RS
624
625A terminal symbol can also be represented as a character literal, just like
626a C character constant. You should do this whenever a token is just a
627single character (parenthesis, plus-sign, etc.): use that same character in
628a literal as the terminal symbol for that token.
629
931c7513
RS
630A third way to represent a terminal symbol is with a C string constant
631containing several characters. @xref{Symbols}, for more information.
632
bfa74976
RS
633The grammar rules also have an expression in Bison syntax. For example,
634here is the Bison rule for a C @code{return} statement. The semicolon in
635quotes is a literal character token, representing part of the C syntax for
636the statement; the naked semicolon, and the colon, are Bison punctuation
637used in every rule.
638
639@example
640stmt: RETURN expr ';'
641 ;
642@end example
643
644@noindent
645@xref{Rules, ,Syntax of Grammar Rules}.
646
342b8b6e 647@node Semantic Values
bfa74976
RS
648@section Semantic Values
649@cindex semantic value
650@cindex value, semantic
651
652A formal grammar selects tokens only by their classifications: for example,
653if a rule mentions the terminal symbol `integer constant', it means that
654@emph{any} integer constant is grammatically valid in that position. The
655precise value of the constant is irrelevant to how to parse the input: if
656@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
e0c471a9 657grammatical.
bfa74976
RS
658
659But the precise value is very important for what the input means once it is
660parsed. A compiler is useless if it fails to distinguish between 4, 1 and
6613989 as constants in the program! Therefore, each token in a Bison grammar
c827f760
PE
662has both a token type and a @dfn{semantic value}. @xref{Semantics,
663,Defining Language Semantics},
bfa74976
RS
664for details.
665
666The token type is a terminal symbol defined in the grammar, such as
667@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything
668you need to know to decide where the token may validly appear and how to
669group it with other tokens. The grammar rules know nothing about tokens
e0c471a9 670except their types.
bfa74976
RS
671
672The semantic value has all the rest of the information about the
673meaning of the token, such as the value of an integer, or the name of an
674identifier. (A token such as @code{','} which is just punctuation doesn't
675need to have any semantic value.)
676
677For example, an input token might be classified as token type
678@code{INTEGER} and have the semantic value 4. Another input token might
679have the same token type @code{INTEGER} but value 3989. When a grammar
680rule says that @code{INTEGER} is allowed, either of these tokens is
681acceptable because each is an @code{INTEGER}. When the parser accepts the
682token, it keeps track of the token's semantic value.
683
684Each grouping can also have a semantic value as well as its nonterminal
685symbol. For example, in a calculator, an expression typically has a
686semantic value that is a number. In a compiler for a programming
687language, an expression typically has a semantic value that is a tree
688structure describing the meaning of the expression.
689
342b8b6e 690@node Semantic Actions
bfa74976
RS
691@section Semantic Actions
692@cindex semantic actions
693@cindex actions, semantic
694
695In order to be useful, a program must do more than parse input; it must
696also produce some output based on the input. In a Bison grammar, a grammar
697rule can have an @dfn{action} made up of C statements. Each time the
698parser recognizes a match for that rule, the action is executed.
699@xref{Actions}.
13863333 700
bfa74976
RS
701Most of the time, the purpose of an action is to compute the semantic value
702of the whole construct from the semantic values of its parts. For example,
703suppose we have a rule which says an expression can be the sum of two
704expressions. When the parser recognizes such a sum, each of the
705subexpressions has a semantic value which describes how it was built up.
706The action for this rule should create a similar sort of value for the
707newly recognized larger expression.
708
709For example, here is a rule that says an expression can be the sum of
710two subexpressions:
711
712@example
713expr: expr '+' expr @{ $$ = $1 + $3; @}
714 ;
715@end example
716
717@noindent
718The action says how to produce the semantic value of the sum expression
719from the values of the two subexpressions.
720
676385e2 721@node GLR Parsers
35430378
JD
722@section Writing GLR Parsers
723@cindex GLR parsing
724@cindex generalized LR (GLR) parsing
676385e2
PH
725@findex %glr-parser
726@cindex conflicts
727@cindex shift/reduce conflicts
fa7e68c3 728@cindex reduce/reduce conflicts
676385e2 729
34a6c2d1 730In some grammars, Bison's deterministic
35430378 731LR(1) parsing algorithm cannot decide whether to apply a
9501dc6e
AD
732certain grammar rule at a given point. That is, it may not be able to
733decide (on the basis of the input read so far) which of two possible
734reductions (applications of a grammar rule) applies, or whether to apply
735a reduction or read more of the input and apply a reduction later in the
736input. These are known respectively as @dfn{reduce/reduce} conflicts
737(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts
738(@pxref{Shift/Reduce}).
739
35430378 740To use a grammar that is not easily modified to be LR(1), a
9501dc6e 741more general parsing algorithm is sometimes necessary. If you include
676385e2 742@code{%glr-parser} among the Bison declarations in your file
35430378
JD
743(@pxref{Grammar Outline}), the result is a Generalized LR
744(GLR) parser. These parsers handle Bison grammars that
9501dc6e 745contain no unresolved conflicts (i.e., after applying precedence
34a6c2d1 746declarations) identically to deterministic parsers. However, when
9501dc6e 747faced with unresolved shift/reduce and reduce/reduce conflicts,
35430378 748GLR parsers use the simple expedient of doing both,
9501dc6e
AD
749effectively cloning the parser to follow both possibilities. Each of
750the resulting parsers can again split, so that at any given time, there
751can be any number of possible parses being explored. The parsers
676385e2
PH
752proceed in lockstep; that is, all of them consume (shift) a given input
753symbol before any of them proceed to the next. Each of the cloned
754parsers eventually meets one of two possible fates: either it runs into
755a parsing error, in which case it simply vanishes, or it merges with
756another parser, because the two of them have reduced the input to an
757identical set of symbols.
758
759During the time that there are multiple parsers, semantic actions are
760recorded, but not performed. When a parser disappears, its recorded
761semantic actions disappear as well, and are never performed. When a
762reduction makes two parsers identical, causing them to merge, Bison
763records both sets of semantic actions. Whenever the last two parsers
764merge, reverting to the single-parser case, Bison resolves all the
765outstanding actions either by precedences given to the grammar rules
766involved, or by performing both actions, and then calling a designated
767user-defined function on the resulting values to produce an arbitrary
768merged result.
769
fa7e68c3 770@menu
35430378
JD
771* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
772* Merging GLR Parses:: Using GLR parsers to resolve ambiguities.
f56274a8 773* GLR Semantic Actions:: Deferred semantic actions have special concerns.
35430378 774* Compiler Requirements:: GLR parsers require a modern C compiler.
fa7e68c3
PE
775@end menu
776
777@node Simple GLR Parsers
35430378
JD
778@subsection Using GLR on Unambiguous Grammars
779@cindex GLR parsing, unambiguous grammars
780@cindex generalized LR (GLR) parsing, unambiguous grammars
fa7e68c3
PE
781@findex %glr-parser
782@findex %expect-rr
783@cindex conflicts
784@cindex reduce/reduce conflicts
785@cindex shift/reduce conflicts
786
35430378
JD
787In the simplest cases, you can use the GLR algorithm
788to parse grammars that are unambiguous but fail to be LR(1).
34a6c2d1 789Such grammars typically require more than one symbol of lookahead.
fa7e68c3
PE
790
791Consider a problem that
792arises in the declaration of enumerated and subrange types in the
793programming language Pascal. Here are some examples:
794
795@example
796type subrange = lo .. hi;
797type enum = (a, b, c);
798@end example
799
800@noindent
801The original language standard allows only numeric
802literals and constant identifiers for the subrange bounds (@samp{lo}
35430378 803and @samp{hi}), but Extended Pascal (ISO/IEC
fa7e68c3
PE
80410206) and many other
805Pascal implementations allow arbitrary expressions there. This gives
806rise to the following situation, containing a superfluous pair of
807parentheses:
808
809@example
810type subrange = (a) .. b;
811@end example
812
813@noindent
814Compare this to the following declaration of an enumerated
815type with only one value:
816
817@example
818type enum = (a);
819@end example
820
821@noindent
822(These declarations are contrived, but they are syntactically
823valid, and more-complicated cases can come up in practical programs.)
824
825These two declarations look identical until the @samp{..} token.
35430378 826With normal LR(1) one-token lookahead it is not
fa7e68c3
PE
827possible to decide between the two forms when the identifier
828@samp{a} is parsed. It is, however, desirable
829for a parser to decide this, since in the latter case
830@samp{a} must become a new identifier to represent the enumeration
831value, while in the former case @samp{a} must be evaluated with its
832current meaning, which may be a constant or even a function call.
833
834You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',
835to be resolved later, but this typically requires substantial
836contortions in both semantic actions and large parts of the
837grammar, where the parentheses are nested in the recursive rules for
838expressions.
839
840You might think of using the lexer to distinguish between the two
841forms by returning different tokens for currently defined and
842undefined identifiers. But if these declarations occur in a local
843scope, and @samp{a} is defined in an outer scope, then both forms
844are possible---either locally redefining @samp{a}, or using the
845value of @samp{a} from the outer scope. So this approach cannot
846work.
847
e757bb10 848A simple solution to this problem is to declare the parser to
35430378
JD
849use the GLR algorithm.
850When the GLR parser reaches the critical state, it
fa7e68c3
PE
851merely splits into two branches and pursues both syntax rules
852simultaneously. Sooner or later, one of them runs into a parsing
853error. If there is a @samp{..} token before the next
854@samp{;}, the rule for enumerated types fails since it cannot
855accept @samp{..} anywhere; otherwise, the subrange type rule
856fails since it requires a @samp{..} token. So one of the branches
857fails silently, and the other one continues normally, performing
858all the intermediate actions that were postponed during the split.
859
860If the input is syntactically incorrect, both branches fail and the parser
861reports a syntax error as usual.
862
863The effect of all this is that the parser seems to ``guess'' the
864correct branch to take, or in other words, it seems to use more
35430378
JD
865lookahead than the underlying LR(1) algorithm actually allows
866for. In this example, LR(2) would suffice, but also some cases
867that are not LR(@math{k}) for any @math{k} can be handled this way.
fa7e68c3 868
35430378 869In general, a GLR parser can take quadratic or cubic worst-case time,
fa7e68c3
PE
870and the current Bison parser even takes exponential time and space
871for some grammars. In practice, this rarely happens, and for many
872grammars it is possible to prove that it cannot happen.
873The present example contains only one conflict between two
874rules, and the type-declaration context containing the conflict
875cannot be nested. So the number of
876branches that can exist at any time is limited by the constant 2,
877and the parsing time is still linear.
878
879Here is a Bison grammar corresponding to the example above. It
880parses a vastly simplified form of Pascal type declarations.
881
882@example
883%token TYPE DOTDOT ID
884
885@group
886%left '+' '-'
887%left '*' '/'
888@end group
889
890%%
891
892@group
893type_decl : TYPE ID '=' type ';'
894 ;
895@end group
896
897@group
898type : '(' id_list ')'
899 | expr DOTDOT expr
900 ;
901@end group
902
903@group
904id_list : ID
905 | id_list ',' ID
906 ;
907@end group
908
909@group
910expr : '(' expr ')'
911 | expr '+' expr
912 | expr '-' expr
913 | expr '*' expr
914 | expr '/' expr
915 | ID
916 ;
917@end group
918@end example
919
35430378 920When used as a normal LR(1) grammar, Bison correctly complains
fa7e68c3
PE
921about one reduce/reduce conflict. In the conflicting situation the
922parser chooses one of the alternatives, arbitrarily the one
923declared first. Therefore the following correct input is not
924recognized:
925
926@example
927type t = (a) .. b;
928@end example
929
35430378 930The parser can be turned into a GLR parser, while also telling Bison
9913d6e4
JD
931to be silent about the one known reduce/reduce conflict, by adding
932these two declarations to the Bison grammar file (before the first
fa7e68c3
PE
933@samp{%%}):
934
935@example
936%glr-parser
937%expect-rr 1
938@end example
939
940@noindent
941No change in the grammar itself is required. Now the
942parser recognizes all valid declarations, according to the
943limited syntax above, transparently. In fact, the user does not even
944notice when the parser splits.
945
35430378 946So here we have a case where we can use the benefits of GLR,
f8e1c9e5
AD
947almost without disadvantages. Even in simple cases like this, however,
948there are at least two potential problems to beware. First, always
35430378
JD
949analyze the conflicts reported by Bison to make sure that GLR
950splitting is only done where it is intended. A GLR parser
f8e1c9e5 951splitting inadvertently may cause problems less obvious than an
35430378 952LR parser statically choosing the wrong alternative in a
f8e1c9e5
AD
953conflict. Second, consider interactions with the lexer (@pxref{Semantic
954Tokens}) with great care. Since a split parser consumes tokens without
955performing any actions during the split, the lexer cannot obtain
956information via parser actions. Some cases of lexer interactions can be
35430378 957eliminated by using GLR to shift the complications from the
f8e1c9e5
AD
958lexer to the parser. You must check the remaining cases for
959correctness.
960
961In our example, it would be safe for the lexer to return tokens based on
962their current meanings in some symbol table, because no new symbols are
963defined in the middle of a type declaration. Though it is possible for
964a parser to define the enumeration constants as they are parsed, before
965the type declaration is completed, it actually makes no difference since
966they cannot be used within the same enumerated type declaration.
fa7e68c3
PE
967
968@node Merging GLR Parses
35430378
JD
969@subsection Using GLR to Resolve Ambiguities
970@cindex GLR parsing, ambiguous grammars
971@cindex generalized LR (GLR) parsing, ambiguous grammars
fa7e68c3
PE
972@findex %dprec
973@findex %merge
974@cindex conflicts
975@cindex reduce/reduce conflicts
976
2a8d363a 977Let's consider an example, vastly simplified from a C++ grammar.
676385e2
PH
978
979@example
980%@{
38a92d50
PE
981 #include <stdio.h>
982 #define YYSTYPE char const *
983 int yylex (void);
984 void yyerror (char const *);
676385e2
PH
985%@}
986
987%token TYPENAME ID
988
989%right '='
990%left '+'
991
992%glr-parser
993
994%%
995
fae437e8 996prog :
676385e2
PH
997 | prog stmt @{ printf ("\n"); @}
998 ;
999
1000stmt : expr ';' %dprec 1
1001 | decl %dprec 2
1002 ;
1003
2a8d363a 1004expr : ID @{ printf ("%s ", $$); @}
fae437e8 1005 | TYPENAME '(' expr ')'
2a8d363a
AD
1006 @{ printf ("%s <cast> ", $1); @}
1007 | expr '+' expr @{ printf ("+ "); @}
1008 | expr '=' expr @{ printf ("= "); @}
676385e2
PH
1009 ;
1010
fae437e8 1011decl : TYPENAME declarator ';'
2a8d363a 1012 @{ printf ("%s <declare> ", $1); @}
676385e2 1013 | TYPENAME declarator '=' expr ';'
2a8d363a 1014 @{ printf ("%s <init-declare> ", $1); @}
676385e2
PH
1015 ;
1016
2a8d363a 1017declarator : ID @{ printf ("\"%s\" ", $1); @}
676385e2
PH
1018 | '(' declarator ')'
1019 ;
1020@end example
1021
1022@noindent
1023This models a problematic part of the C++ grammar---the ambiguity between
1024certain declarations and statements. For example,
1025
1026@example
1027T (x) = y+z;
1028@end example
1029
1030@noindent
1031parses as either an @code{expr} or a @code{stmt}
c827f760
PE
1032(assuming that @samp{T} is recognized as a @code{TYPENAME} and
1033@samp{x} as an @code{ID}).
676385e2 1034Bison detects this as a reduce/reduce conflict between the rules
fae437e8 1035@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the
e757bb10 1036time it encounters @code{x} in the example above. Since this is a
35430378 1037GLR parser, it therefore splits the problem into two parses, one for
fa7e68c3
PE
1038each choice of resolving the reduce/reduce conflict.
1039Unlike the example from the previous section (@pxref{Simple GLR Parsers}),
1040however, neither of these parses ``dies,'' because the grammar as it stands is
e757bb10
AD
1041ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and
1042the other reduces @code{stmt : decl}, after which both parsers are in an
1043identical state: they've seen @samp{prog stmt} and have the same unprocessed
1044input remaining. We say that these parses have @dfn{merged.}
fa7e68c3 1045
35430378 1046At this point, the GLR parser requires a specification in the
fa7e68c3
PE
1047grammar of how to choose between the competing parses.
1048In the example above, the two @code{%dprec}
e757bb10 1049declarations specify that Bison is to give precedence
fa7e68c3 1050to the parse that interprets the example as a
676385e2
PH
1051@code{decl}, which implies that @code{x} is a declarator.
1052The parser therefore prints
1053
1054@example
fae437e8 1055"x" y z + T <init-declare>
676385e2
PH
1056@end example
1057
fa7e68c3
PE
1058The @code{%dprec} declarations only come into play when more than one
1059parse survives. Consider a different input string for this parser:
676385e2
PH
1060
1061@example
1062T (x) + y;
1063@end example
1064
1065@noindent
35430378 1066This is another example of using GLR to parse an unambiguous
fa7e68c3 1067construct, as shown in the previous section (@pxref{Simple GLR Parsers}).
676385e2
PH
1068Here, there is no ambiguity (this cannot be parsed as a declaration).
1069However, at the time the Bison parser encounters @code{x}, it does not
1070have enough information to resolve the reduce/reduce conflict (again,
1071between @code{x} as an @code{expr} or a @code{declarator}). In this
fa7e68c3 1072case, no precedence declaration is used. Again, the parser splits
676385e2
PH
1073into two, one assuming that @code{x} is an @code{expr}, and the other
1074assuming @code{x} is a @code{declarator}. The second of these parsers
1075then vanishes when it sees @code{+}, and the parser prints
1076
1077@example
fae437e8 1078x T <cast> y +
676385e2
PH
1079@end example
1080
1081Suppose that instead of resolving the ambiguity, you wanted to see all
fa7e68c3 1082the possibilities. For this purpose, you must merge the semantic
676385e2
PH
1083actions of the two possible parsers, rather than choosing one over the
1084other. To do so, you could change the declaration of @code{stmt} as
1085follows:
1086
1087@example
1088stmt : expr ';' %merge <stmtMerge>
1089 | decl %merge <stmtMerge>
1090 ;
1091@end example
1092
1093@noindent
676385e2
PH
1094and define the @code{stmtMerge} function as:
1095
1096@example
38a92d50
PE
1097static YYSTYPE
1098stmtMerge (YYSTYPE x0, YYSTYPE x1)
676385e2
PH
1099@{
1100 printf ("<OR> ");
1101 return "";
1102@}
1103@end example
1104
1105@noindent
1106with an accompanying forward declaration
1107in the C declarations at the beginning of the file:
1108
1109@example
1110%@{
38a92d50 1111 #define YYSTYPE char const *
676385e2
PH
1112 static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
1113%@}
1114@end example
1115
1116@noindent
fa7e68c3
PE
1117With these declarations, the resulting parser parses the first example
1118as both an @code{expr} and a @code{decl}, and prints
676385e2
PH
1119
1120@example
fae437e8 1121"x" y z + T <init-declare> x T <cast> y z + = <OR>
676385e2
PH
1122@end example
1123
fa7e68c3 1124Bison requires that all of the
e757bb10 1125productions that participate in any particular merge have identical
fa7e68c3
PE
1126@samp{%merge} clauses. Otherwise, the ambiguity would be unresolvable,
1127and the parser will report an error during any parse that results in
1128the offending merge.
9501dc6e 1129
32c29292
JD
1130@node GLR Semantic Actions
1131@subsection GLR Semantic Actions
1132
1133@cindex deferred semantic actions
1134By definition, a deferred semantic action is not performed at the same time as
1135the associated reduction.
1136This raises caveats for several Bison features you might use in a semantic
35430378 1137action in a GLR parser.
32c29292
JD
1138
1139@vindex yychar
35430378 1140@cindex GLR parsers and @code{yychar}
32c29292 1141@vindex yylval
35430378 1142@cindex GLR parsers and @code{yylval}
32c29292 1143@vindex yylloc
35430378 1144@cindex GLR parsers and @code{yylloc}
32c29292 1145In any semantic action, you can examine @code{yychar} to determine the type of
742e4900 1146the lookahead token present at the time of the associated reduction.
32c29292
JD
1147After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF},
1148you can then examine @code{yylval} and @code{yylloc} to determine the
742e4900 1149lookahead token's semantic value and location, if any.
32c29292
JD
1150In a nondeferred semantic action, you can also modify any of these variables to
1151influence syntax analysis.
742e4900 1152@xref{Lookahead, ,Lookahead Tokens}.
32c29292
JD
1153
1154@findex yyclearin
35430378 1155@cindex GLR parsers and @code{yyclearin}
32c29292
JD
1156In a deferred semantic action, it's too late to influence syntax analysis.
1157In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to
1158shallow copies of the values they had at the time of the associated reduction.
1159For this reason alone, modifying them is dangerous.
1160Moreover, the result of modifying them is undefined and subject to change with
1161future versions of Bison.
1162For example, if a semantic action might be deferred, you should never write it
1163to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free
1164memory referenced by @code{yylval}.
1165
1166@findex YYERROR
35430378 1167@cindex GLR parsers and @code{YYERROR}
32c29292 1168Another Bison feature requiring special consideration is @code{YYERROR}
8710fc41 1169(@pxref{Action Features}), which you can invoke in a semantic action to
32c29292 1170initiate error recovery.
35430378 1171During deterministic GLR operation, the effect of @code{YYERROR} is
34a6c2d1 1172the same as its effect in a deterministic parser.
32c29292
JD
1173In a deferred semantic action, its effect is undefined.
1174@c The effect is probably a syntax error at the split point.
1175
8710fc41 1176Also, see @ref{Location Default Action, ,Default Action for Locations}, which
35430378 1177describes a special usage of @code{YYLLOC_DEFAULT} in GLR parsers.
8710fc41 1178
fa7e68c3 1179@node Compiler Requirements
35430378 1180@subsection Considerations when Compiling GLR Parsers
fa7e68c3 1181@cindex @code{inline}
35430378 1182@cindex GLR parsers and @code{inline}
fa7e68c3 1183
35430378 1184The GLR parsers require a compiler for ISO C89 or
38a92d50
PE
1185later. In addition, they use the @code{inline} keyword, which is not
1186C89, but is C99 and is a common extension in pre-C99 compilers. It is
1187up to the user of these parsers to handle
9501dc6e
AD
1188portability issues. For instance, if using Autoconf and the Autoconf
1189macro @code{AC_C_INLINE}, a mere
1190
1191@example
1192%@{
38a92d50 1193 #include <config.h>
9501dc6e
AD
1194%@}
1195@end example
1196
1197@noindent
1198will suffice. Otherwise, we suggest
1199
1200@example
1201%@{
2c0f9706
AD
1202 #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \
1203 && ! defined inline)
1204 # define inline
38a92d50 1205 #endif
9501dc6e
AD
1206%@}
1207@end example
676385e2 1208
83484365 1209@node Locations
847bf1f5
AD
1210@section Locations
1211@cindex location
95923bd6
AD
1212@cindex textual location
1213@cindex location, textual
847bf1f5
AD
1214
1215Many applications, like interpreters or compilers, have to produce verbose
72d2299c 1216and useful error messages. To achieve this, one must be able to keep track of
95923bd6 1217the @dfn{textual location}, or @dfn{location}, of each syntactic construct.
847bf1f5
AD
1218Bison provides a mechanism for handling these locations.
1219
72d2299c 1220Each token has a semantic value. In a similar fashion, each token has an
7404cdf3
JD
1221associated location, but the type of locations is the same for all tokens
1222and groupings. Moreover, the output parser is equipped with a default data
1223structure for storing locations (@pxref{Tracking Locations}, for more
1224details).
847bf1f5
AD
1225
1226Like semantic values, locations can be reached in actions using a dedicated
72d2299c 1227set of constructs. In the example above, the location of the whole grouping
847bf1f5
AD
1228is @code{@@$}, while the locations of the subexpressions are @code{@@1} and
1229@code{@@3}.
1230
1231When a rule is matched, a default action is used to compute the semantic value
72d2299c
PE
1232of its left hand side (@pxref{Actions}). In the same way, another default
1233action is used for locations. However, the action for locations is general
847bf1f5 1234enough for most cases, meaning there is usually no need to describe for each
72d2299c 1235rule how @code{@@$} should be formed. When building a new location for a given
847bf1f5
AD
1236grouping, the default behavior of the output parser is to take the beginning
1237of the first symbol, and the end of the last symbol.
1238
342b8b6e 1239@node Bison Parser
9913d6e4 1240@section Bison Output: the Parser Implementation File
bfa74976
RS
1241@cindex Bison parser
1242@cindex Bison utility
1243@cindex lexical analyzer, purpose
1244@cindex parser
1245
9913d6e4
JD
1246When you run Bison, you give it a Bison grammar file as input. The
1247most important output is a C source file that implements a parser for
1248the language described by the grammar. This parser is called a
1249@dfn{Bison parser}, and this file is called a @dfn{Bison parser
1250implementation file}. Keep in mind that the Bison utility and the
1251Bison parser are two distinct programs: the Bison utility is a program
1252whose output is the Bison parser implementation file that becomes part
1253of your program.
bfa74976
RS
1254
1255The job of the Bison parser is to group tokens into groupings according to
1256the grammar rules---for example, to build identifiers and operators into
1257expressions. As it does this, it runs the actions for the grammar rules it
1258uses.
1259
704a47c4
AD
1260The tokens come from a function called the @dfn{lexical analyzer} that
1261you must supply in some fashion (such as by writing it in C). The Bison
1262parser calls the lexical analyzer each time it wants a new token. It
1263doesn't know what is ``inside'' the tokens (though their semantic values
1264may reflect this). Typically the lexical analyzer makes the tokens by
1265parsing characters of text, but Bison does not depend on this.
1266@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
bfa74976 1267
9913d6e4
JD
1268The Bison parser implementation file is C code which defines a
1269function named @code{yyparse} which implements that grammar. This
1270function does not make a complete C program: you must supply some
1271additional functions. One is the lexical analyzer. Another is an
1272error-reporting function which the parser calls to report an error.
1273In addition, a complete C program must start with a function called
1274@code{main}; you have to provide this, and arrange for it to call
1275@code{yyparse} or the parser will never run. @xref{Interface, ,Parser
1276C-Language Interface}.
bfa74976 1277
f7ab6a50 1278Aside from the token type names and the symbols in the actions you
9913d6e4
JD
1279write, all symbols defined in the Bison parser implementation file
1280itself begin with @samp{yy} or @samp{YY}. This includes interface
1281functions such as the lexical analyzer function @code{yylex}, the
1282error reporting function @code{yyerror} and the parser function
1283@code{yyparse} itself. This also includes numerous identifiers used
1284for internal purposes. Therefore, you should avoid using C
1285identifiers starting with @samp{yy} or @samp{YY} in the Bison grammar
1286file except for the ones defined in this manual. Also, you should
1287avoid using the C identifiers @samp{malloc} and @samp{free} for
1288anything other than their usual meanings.
1289
1290In some cases the Bison parser implementation file includes system
1291headers, and in those cases your code should respect the identifiers
1292reserved by those headers. On some non-GNU hosts, @code{<alloca.h>},
1293@code{<malloc.h>}, @code{<stddef.h>}, and @code{<stdlib.h>} are
1294included as needed to declare memory allocators and related types.
1295@code{<libintl.h>} is included if message translation is in use
1296(@pxref{Internationalization}). Other system headers may be included
1297if you define @code{YYDEBUG} to a nonzero value (@pxref{Tracing,
1298,Tracing Your Parser}).
7093d0f5 1299
342b8b6e 1300@node Stages
bfa74976
RS
1301@section Stages in Using Bison
1302@cindex stages in using Bison
1303@cindex using Bison
1304
1305The actual language-design process using Bison, from grammar specification
1306to a working compiler or interpreter, has these parts:
1307
1308@enumerate
1309@item
1310Formally specify the grammar in a form recognized by Bison
704a47c4
AD
1311(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule
1312in the language, describe the action that is to be taken when an
1313instance of that rule is recognized. The action is described by a
1314sequence of C statements.
bfa74976
RS
1315
1316@item
704a47c4
AD
1317Write a lexical analyzer to process input and pass tokens to the parser.
1318The lexical analyzer may be written by hand in C (@pxref{Lexical, ,The
1319Lexical Analyzer Function @code{yylex}}). It could also be produced
1320using Lex, but the use of Lex is not discussed in this manual.
bfa74976
RS
1321
1322@item
1323Write a controlling function that calls the Bison-produced parser.
1324
1325@item
1326Write error-reporting routines.
1327@end enumerate
1328
1329To turn this source code as written into a runnable program, you
1330must follow these steps:
1331
1332@enumerate
1333@item
1334Run Bison on the grammar to produce the parser.
1335
1336@item
1337Compile the code output by Bison, as well as any other source files.
1338
1339@item
1340Link the object files to produce the finished product.
1341@end enumerate
1342
342b8b6e 1343@node Grammar Layout
bfa74976
RS
1344@section The Overall Layout of a Bison Grammar
1345@cindex grammar file
1346@cindex file format
1347@cindex format of grammar file
1348@cindex layout of Bison grammar
1349
1350The input file for the Bison utility is a @dfn{Bison grammar file}. The
1351general form of a Bison grammar file is as follows:
1352
1353@example
1354%@{
08e49d20 1355@var{Prologue}
bfa74976
RS
1356%@}
1357
1358@var{Bison declarations}
1359
1360%%
1361@var{Grammar rules}
1362%%
08e49d20 1363@var{Epilogue}
bfa74976
RS
1364@end example
1365
1366@noindent
1367The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
1368in every Bison grammar file to separate the sections.
1369
72d2299c 1370The prologue may define types and variables used in the actions. You can
342b8b6e 1371also use preprocessor commands to define macros used there, and use
bfa74976 1372@code{#include} to include header files that do any of these things.
38a92d50
PE
1373You need to declare the lexical analyzer @code{yylex} and the error
1374printer @code{yyerror} here, along with any other global identifiers
1375used by the actions in the grammar rules.
bfa74976
RS
1376
1377The Bison declarations declare the names of the terminal and nonterminal
1378symbols, and may also describe operator precedence and the data types of
1379semantic values of various symbols.
1380
1381The grammar rules define how to construct each nonterminal symbol from its
1382parts.
1383
38a92d50
PE
1384The epilogue can contain any code you want to use. Often the
1385definitions of functions declared in the prologue go here. In a
1386simple program, all the rest of the program can go here.
bfa74976 1387
342b8b6e 1388@node Examples
bfa74976
RS
1389@chapter Examples
1390@cindex simple examples
1391@cindex examples, simple
1392
2c0f9706 1393Now we show and explain several sample programs written using Bison: a
bfa74976 1394reverse polish notation calculator, an algebraic (infix) notation
2c0f9706
AD
1395calculator --- later extended to track ``locations'' ---
1396and a multi-function calculator. All
1397produce usable, though limited, interactive desk-top calculators.
bfa74976
RS
1398
1399These examples are simple, but Bison grammars for real programming
aa08666d
AD
1400languages are written the same way. You can copy these examples into a
1401source file to try them.
bfa74976
RS
1402
1403@menu
f56274a8
DJ
1404* RPN Calc:: Reverse polish notation calculator;
1405 a first example with no operator precedence.
1406* Infix Calc:: Infix (algebraic) notation calculator.
1407 Operator precedence is introduced.
bfa74976 1408* Simple Error Recovery:: Continuing after syntax errors.
342b8b6e 1409* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
f56274a8
DJ
1410* Multi-function Calc:: Calculator with memory and trig functions.
1411 It uses multiple data-types for semantic values.
1412* Exercises:: Ideas for improving the multi-function calculator.
bfa74976
RS
1413@end menu
1414
342b8b6e 1415@node RPN Calc
bfa74976
RS
1416@section Reverse Polish Notation Calculator
1417@cindex reverse polish notation
1418@cindex polish notation calculator
1419@cindex @code{rpcalc}
1420@cindex calculator, simple
1421
1422The first example is that of a simple double-precision @dfn{reverse polish
1423notation} calculator (a calculator using postfix operators). This example
1424provides a good starting point, since operator precedence is not an issue.
1425The second example will illustrate how operator precedence is handled.
1426
1427The source code for this calculator is named @file{rpcalc.y}. The
9913d6e4 1428@samp{.y} extension is a convention used for Bison grammar files.
bfa74976
RS
1429
1430@menu
f56274a8
DJ
1431* Rpcalc Declarations:: Prologue (declarations) for rpcalc.
1432* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.
1433* Rpcalc Lexer:: The lexical analyzer.
1434* Rpcalc Main:: The controlling function.
1435* Rpcalc Error:: The error reporting function.
1436* Rpcalc Generate:: Running Bison on the grammar file.
1437* Rpcalc Compile:: Run the C compiler on the output code.
bfa74976
RS
1438@end menu
1439
f56274a8 1440@node Rpcalc Declarations
bfa74976
RS
1441@subsection Declarations for @code{rpcalc}
1442
1443Here are the C and Bison declarations for the reverse polish notation
1444calculator. As in C, comments are placed between @samp{/*@dots{}*/}.
1445
1446@example
72d2299c 1447/* Reverse polish notation calculator. */
bfa74976
RS
1448
1449%@{
38a92d50
PE
1450 #define YYSTYPE double
1451 #include <math.h>
1452 int yylex (void);
1453 void yyerror (char const *);
bfa74976
RS
1454%@}
1455
1456%token NUM
1457
72d2299c 1458%% /* Grammar rules and actions follow. */
bfa74976
RS
1459@end example
1460
75f5aaea 1461The declarations section (@pxref{Prologue, , The prologue}) contains two
38a92d50 1462preprocessor directives and two forward declarations.
bfa74976
RS
1463
1464The @code{#define} directive defines the macro @code{YYSTYPE}, thus
1964ad8c
AD
1465specifying the C data type for semantic values of both tokens and
1466groupings (@pxref{Value Type, ,Data Types of Semantic Values}). The
1467Bison parser will use whatever type @code{YYSTYPE} is defined as; if you
1468don't define it, @code{int} is the default. Because we specify
1469@code{double}, each token and each expression has an associated value,
1470which is a floating point number.
bfa74976
RS
1471
1472The @code{#include} directive is used to declare the exponentiation
1473function @code{pow}.
1474
38a92d50
PE
1475The forward declarations for @code{yylex} and @code{yyerror} are
1476needed because the C language requires that functions be declared
1477before they are used. These functions will be defined in the
1478epilogue, but the parser calls them so they must be declared in the
1479prologue.
1480
704a47c4
AD
1481The second section, Bison declarations, provides information to Bison
1482about the token types (@pxref{Bison Declarations, ,The Bison
1483Declarations Section}). Each terminal symbol that is not a
1484single-character literal must be declared here. (Single-character
bfa74976
RS
1485literals normally don't need to be declared.) In this example, all the
1486arithmetic operators are designated by single-character literals, so the
1487only terminal symbol that needs to be declared is @code{NUM}, the token
1488type for numeric constants.
1489
342b8b6e 1490@node Rpcalc Rules
bfa74976
RS
1491@subsection Grammar Rules for @code{rpcalc}
1492
1493Here are the grammar rules for the reverse polish notation calculator.
1494
1495@example
2c0f9706 1496@group
bfa74976
RS
1497input: /* empty */
1498 | input line
1499;
2c0f9706 1500@end group
bfa74976 1501
2c0f9706 1502@group
bfa74976 1503line: '\n'
18b519c0 1504 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
bfa74976 1505;
2c0f9706 1506@end group
bfa74976 1507
2c0f9706 1508@group
18b519c0
AD
1509exp: NUM @{ $$ = $1; @}
1510 | exp exp '+' @{ $$ = $1 + $2; @}
1511 | exp exp '-' @{ $$ = $1 - $2; @}
1512 | exp exp '*' @{ $$ = $1 * $2; @}
1513 | exp exp '/' @{ $$ = $1 / $2; @}
2c0f9706
AD
1514 | exp exp '^' @{ $$ = pow ($1, $2); @} /* Exponentiation */
1515 | exp 'n' @{ $$ = -$1; @} /* Unary minus */
bfa74976 1516;
2c0f9706 1517@end group
bfa74976
RS
1518%%
1519@end example
1520
1521The groupings of the rpcalc ``language'' defined here are the expression
1522(given the name @code{exp}), the line of input (@code{line}), and the
1523complete input transcript (@code{input}). Each of these nonterminal
8c5b881d 1524symbols has several alternate rules, joined by the vertical bar @samp{|}
bfa74976
RS
1525which is read as ``or''. The following sections explain what these rules
1526mean.
1527
1528The semantics of the language is determined by the actions taken when a
1529grouping is recognized. The actions are the C code that appears inside
1530braces. @xref{Actions}.
1531
1532You must specify these actions in C, but Bison provides the means for
1533passing semantic values between the rules. In each action, the
1534pseudo-variable @code{$$} stands for the semantic value for the grouping
1535that the rule is going to construct. Assigning a value to @code{$$} is the
1536main job of most actions. The semantic values of the components of the
1537rule are referred to as @code{$1}, @code{$2}, and so on.
1538
1539@menu
13863333
AD
1540* Rpcalc Input::
1541* Rpcalc Line::
1542* Rpcalc Expr::
bfa74976
RS
1543@end menu
1544
342b8b6e 1545@node Rpcalc Input
bfa74976
RS
1546@subsubsection Explanation of @code{input}
1547
1548Consider the definition of @code{input}:
1549
1550@example
1551input: /* empty */
1552 | input line
1553;
1554@end example
1555
1556This definition reads as follows: ``A complete input is either an empty
1557string, or a complete input followed by an input line''. Notice that
1558``complete input'' is defined in terms of itself. This definition is said
1559to be @dfn{left recursive} since @code{input} appears always as the
1560leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}.
1561
1562The first alternative is empty because there are no symbols between the
1563colon and the first @samp{|}; this means that @code{input} can match an
1564empty string of input (no tokens). We write the rules this way because it
1565is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
1566It's conventional to put an empty alternative first and write the comment
1567@samp{/* empty */} in it.
1568
1569The second alternate rule (@code{input line}) handles all nontrivial input.
1570It means, ``After reading any number of lines, read one more line if
1571possible.'' The left recursion makes this rule into a loop. Since the
1572first alternative matches empty input, the loop can be executed zero or
1573more times.
1574
1575The parser function @code{yyparse} continues to process input until a
1576grammatical error is seen or the lexical analyzer says there are no more
72d2299c 1577input tokens; we will arrange for the latter to happen at end-of-input.
bfa74976 1578
342b8b6e 1579@node Rpcalc Line
bfa74976
RS
1580@subsubsection Explanation of @code{line}
1581
1582Now consider the definition of @code{line}:
1583
1584@example
1585line: '\n'
1586 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1587;
1588@end example
1589
1590The first alternative is a token which is a newline character; this means
1591that rpcalc accepts a blank line (and ignores it, since there is no
1592action). The second alternative is an expression followed by a newline.
1593This is the alternative that makes rpcalc useful. The semantic value of
1594the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
1595question is the first symbol in the alternative. The action prints this
1596value, which is the result of the computation the user asked for.
1597
1598This action is unusual because it does not assign a value to @code{$$}. As
1599a consequence, the semantic value associated with the @code{line} is
1600uninitialized (its value will be unpredictable). This would be a bug if
1601that value were ever used, but we don't use it: once rpcalc has printed the
1602value of the user's input line, that value is no longer needed.
1603
342b8b6e 1604@node Rpcalc Expr
bfa74976
RS
1605@subsubsection Explanation of @code{expr}
1606
1607The @code{exp} grouping has several rules, one for each kind of expression.
1608The first rule handles the simplest expressions: those that are just numbers.
1609The second handles an addition-expression, which looks like two expressions
1610followed by a plus-sign. The third handles subtraction, and so on.
1611
1612@example
1613exp: NUM
1614 | exp exp '+' @{ $$ = $1 + $2; @}
1615 | exp exp '-' @{ $$ = $1 - $2; @}
1616 @dots{}
1617 ;
1618@end example
1619
1620We have used @samp{|} to join all the rules for @code{exp}, but we could
1621equally well have written them separately:
1622
1623@example
1624exp: NUM ;
1625exp: exp exp '+' @{ $$ = $1 + $2; @} ;
1626exp: exp exp '-' @{ $$ = $1 - $2; @} ;
1627 @dots{}
1628@end example
1629
1630Most of the rules have actions that compute the value of the expression in
1631terms of the value of its parts. For example, in the rule for addition,
1632@code{$1} refers to the first component @code{exp} and @code{$2} refers to
1633the second one. The third component, @code{'+'}, has no meaningful
1634associated semantic value, but if it had one you could refer to it as
1635@code{$3}. When @code{yyparse} recognizes a sum expression using this
1636rule, the sum of the two subexpressions' values is produced as the value of
1637the entire expression. @xref{Actions}.
1638
1639You don't have to give an action for every rule. When a rule has no
1640action, Bison by default copies the value of @code{$1} into @code{$$}.
1641This is what happens in the first rule (the one that uses @code{NUM}).
1642
1643The formatting shown here is the recommended convention, but Bison does
72d2299c 1644not require it. You can add or change white space as much as you wish.
bfa74976
RS
1645For example, this:
1646
1647@example
99a9344e 1648exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ;
bfa74976
RS
1649@end example
1650
1651@noindent
1652means the same thing as this:
1653
1654@example
1655exp: NUM
1656 | exp exp '+' @{ $$ = $1 + $2; @}
1657 | @dots{}
99a9344e 1658;
bfa74976
RS
1659@end example
1660
1661@noindent
1662The latter, however, is much more readable.
1663
342b8b6e 1664@node Rpcalc Lexer
bfa74976
RS
1665@subsection The @code{rpcalc} Lexical Analyzer
1666@cindex writing a lexical analyzer
1667@cindex lexical analyzer, writing
1668
704a47c4
AD
1669The lexical analyzer's job is low-level parsing: converting characters
1670or sequences of characters into tokens. The Bison parser gets its
1671tokens by calling the lexical analyzer. @xref{Lexical, ,The Lexical
1672Analyzer Function @code{yylex}}.
bfa74976 1673
35430378 1674Only a simple lexical analyzer is needed for the RPN
c827f760 1675calculator. This
bfa74976
RS
1676lexical analyzer skips blanks and tabs, then reads in numbers as
1677@code{double} and returns them as @code{NUM} tokens. Any other character
1678that isn't part of a number is a separate token. Note that the token-code
1679for such a single-character token is the character itself.
1680
1681The return value of the lexical analyzer function is a numeric code which
1682represents a token type. The same text used in Bison rules to stand for
1683this token type is also a C expression for the numeric code for the type.
1684This works in two ways. If the token type is a character literal, then its
e966383b 1685numeric code is that of the character; you can use the same
bfa74976
RS
1686character literal in the lexical analyzer to express the number. If the
1687token type is an identifier, that identifier is defined by Bison as a C
1688macro whose definition is the appropriate number. In this example,
1689therefore, @code{NUM} becomes a macro for @code{yylex} to use.
1690
1964ad8c
AD
1691The semantic value of the token (if it has one) is stored into the
1692global variable @code{yylval}, which is where the Bison parser will look
1693for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was
f56274a8 1694defined at the beginning of the grammar; @pxref{Rpcalc Declarations,
1964ad8c 1695,Declarations for @code{rpcalc}}.)
bfa74976 1696
72d2299c
PE
1697A token type code of zero is returned if the end-of-input is encountered.
1698(Bison recognizes any nonpositive value as indicating end-of-input.)
bfa74976
RS
1699
1700Here is the code for the lexical analyzer:
1701
1702@example
1703@group
72d2299c 1704/* The lexical analyzer returns a double floating point
e966383b 1705 number on the stack and the token NUM, or the numeric code
72d2299c
PE
1706 of the character read if not a number. It skips all blanks
1707 and tabs, and returns 0 for end-of-input. */
bfa74976
RS
1708
1709#include <ctype.h>
1710@end group
1711
1712@group
13863333
AD
1713int
1714yylex (void)
bfa74976
RS
1715@{
1716 int c;
1717
72d2299c 1718 /* Skip white space. */
13863333 1719 while ((c = getchar ()) == ' ' || c == '\t')
98842516 1720 continue;
bfa74976
RS
1721@end group
1722@group
72d2299c 1723 /* Process numbers. */
13863333 1724 if (c == '.' || isdigit (c))
bfa74976
RS
1725 @{
1726 ungetc (c, stdin);
1727 scanf ("%lf", &yylval);
1728 return NUM;
1729 @}
1730@end group
1731@group
72d2299c 1732 /* Return end-of-input. */
13863333 1733 if (c == EOF)
bfa74976 1734 return 0;
72d2299c 1735 /* Return a single char. */
13863333 1736 return c;
bfa74976
RS
1737@}
1738@end group
1739@end example
1740
342b8b6e 1741@node Rpcalc Main
bfa74976
RS
1742@subsection The Controlling Function
1743@cindex controlling function
1744@cindex main function in simple example
1745
1746In keeping with the spirit of this example, the controlling function is
1747kept to the bare minimum. The only requirement is that it call
1748@code{yyparse} to start the process of parsing.
1749
1750@example
1751@group
13863333
AD
1752int
1753main (void)
bfa74976 1754@{
13863333 1755 return yyparse ();
bfa74976
RS
1756@}
1757@end group
1758@end example
1759
342b8b6e 1760@node Rpcalc Error
bfa74976
RS
1761@subsection The Error Reporting Routine
1762@cindex error reporting routine
1763
1764When @code{yyparse} detects a syntax error, it calls the error reporting
13863333 1765function @code{yyerror} to print an error message (usually but not
6e649e65 1766always @code{"syntax error"}). It is up to the programmer to supply
13863333
AD
1767@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
1768here is the definition we will use:
bfa74976
RS
1769
1770@example
1771@group
1772#include <stdio.h>
2c0f9706 1773@end group
bfa74976 1774
2c0f9706 1775@group
38a92d50 1776/* Called by yyparse on error. */
13863333 1777void
38a92d50 1778yyerror (char const *s)
bfa74976 1779@{
4e03e201 1780 fprintf (stderr, "%s\n", s);
bfa74976
RS
1781@}
1782@end group
1783@end example
1784
1785After @code{yyerror} returns, the Bison parser may recover from the error
1786and continue parsing if the grammar contains a suitable error rule
1787(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
1788have not written any error rules in this example, so any invalid input will
1789cause the calculator program to exit. This is not clean behavior for a
9ecbd125 1790real calculator, but it is adequate for the first example.
bfa74976 1791
f56274a8 1792@node Rpcalc Generate
bfa74976
RS
1793@subsection Running Bison to Make the Parser
1794@cindex running Bison (introduction)
1795
ceed8467
AD
1796Before running Bison to produce a parser, we need to decide how to
1797arrange all the source code in one or more source files. For such a
9913d6e4
JD
1798simple example, the easiest thing is to put everything in one file,
1799the grammar file. The definitions of @code{yylex}, @code{yyerror} and
1800@code{main} go at the end, in the epilogue of the grammar file
75f5aaea 1801(@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}).
bfa74976
RS
1802
1803For a large project, you would probably have several source files, and use
1804@code{make} to arrange to recompile them.
1805
9913d6e4
JD
1806With all the source in the grammar file, you use the following command
1807to convert it into a parser implementation file:
bfa74976
RS
1808
1809@example
fa4d969f 1810bison @var{file}.y
bfa74976
RS
1811@end example
1812
1813@noindent
9913d6e4
JD
1814In this example, the grammar file is called @file{rpcalc.y} (for
1815``Reverse Polish @sc{calc}ulator''). Bison produces a parser
1816implementation file named @file{@var{file}.tab.c}, removing the
1817@samp{.y} from the grammar file name. The parser implementation file
1818contains the source code for @code{yyparse}. The additional functions
1819in the grammar file (@code{yylex}, @code{yyerror} and @code{main}) are
1820copied verbatim to the parser implementation file.
bfa74976 1821
342b8b6e 1822@node Rpcalc Compile
9913d6e4 1823@subsection Compiling the Parser Implementation File
bfa74976
RS
1824@cindex compiling the parser
1825
9913d6e4 1826Here is how to compile and run the parser implementation file:
bfa74976
RS
1827
1828@example
1829@group
1830# @r{List files in current directory.}
9edcd895 1831$ @kbd{ls}
bfa74976
RS
1832rpcalc.tab.c rpcalc.y
1833@end group
1834
1835@group
1836# @r{Compile the Bison parser.}
1837# @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
b56471a6 1838$ @kbd{cc -lm -o rpcalc rpcalc.tab.c}
bfa74976
RS
1839@end group
1840
1841@group
1842# @r{List files again.}
9edcd895 1843$ @kbd{ls}
bfa74976
RS
1844rpcalc rpcalc.tab.c rpcalc.y
1845@end group
1846@end example
1847
1848The file @file{rpcalc} now contains the executable code. Here is an
1849example session using @code{rpcalc}.
1850
1851@example
9edcd895
AD
1852$ @kbd{rpcalc}
1853@kbd{4 9 +}
bfa74976 185413
9edcd895 1855@kbd{3 7 + 3 4 5 *+-}
bfa74976 1856-13
9edcd895 1857@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
bfa74976 185813
9edcd895 1859@kbd{5 6 / 4 n +}
bfa74976 1860-3.166666667
9edcd895 1861@kbd{3 4 ^} @r{Exponentiation}
bfa74976 186281
9edcd895
AD
1863@kbd{^D} @r{End-of-file indicator}
1864$
bfa74976
RS
1865@end example
1866
342b8b6e 1867@node Infix Calc
bfa74976
RS
1868@section Infix Notation Calculator: @code{calc}
1869@cindex infix notation calculator
1870@cindex @code{calc}
1871@cindex calculator, infix notation
1872
1873We now modify rpcalc to handle infix operators instead of postfix. Infix
1874notation involves the concept of operator precedence and the need for
1875parentheses nested to arbitrary depth. Here is the Bison code for
1876@file{calc.y}, an infix desk-top calculator.
1877
1878@example
38a92d50 1879/* Infix notation calculator. */
bfa74976 1880
2c0f9706 1881@group
bfa74976 1882%@{
38a92d50
PE
1883 #define YYSTYPE double
1884 #include <math.h>
1885 #include <stdio.h>
1886 int yylex (void);
1887 void yyerror (char const *);
bfa74976 1888%@}
2c0f9706 1889@end group
bfa74976 1890
2c0f9706 1891@group
38a92d50 1892/* Bison declarations. */
bfa74976
RS
1893%token NUM
1894%left '-' '+'
1895%left '*' '/'
1896%left NEG /* negation--unary minus */
38a92d50 1897%right '^' /* exponentiation */
2c0f9706 1898@end group
bfa74976 1899
38a92d50 1900%% /* The grammar follows. */
2c0f9706 1901@group
38a92d50 1902input: /* empty */
bfa74976
RS
1903 | input line
1904;
2c0f9706 1905@end group
bfa74976 1906
2c0f9706 1907@group
bfa74976
RS
1908line: '\n'
1909 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1910;
2c0f9706 1911@end group
bfa74976 1912
2c0f9706
AD
1913@group
1914exp: NUM @{ $$ = $1; @}
1915 | exp '+' exp @{ $$ = $1 + $3; @}
1916 | exp '-' exp @{ $$ = $1 - $3; @}
1917 | exp '*' exp @{ $$ = $1 * $3; @}
1918 | exp '/' exp @{ $$ = $1 / $3; @}
1919 | '-' exp %prec NEG @{ $$ = -$2; @}
bfa74976 1920 | exp '^' exp @{ $$ = pow ($1, $3); @}
2c0f9706 1921 | '(' exp ')' @{ $$ = $2; @}
bfa74976 1922;
2c0f9706 1923@end group
bfa74976
RS
1924%%
1925@end example
1926
1927@noindent
ceed8467
AD
1928The functions @code{yylex}, @code{yyerror} and @code{main} can be the
1929same as before.
bfa74976
RS
1930
1931There are two important new features shown in this code.
1932
1933In the second section (Bison declarations), @code{%left} declares token
1934types and says they are left-associative operators. The declarations
1935@code{%left} and @code{%right} (right associativity) take the place of
1936@code{%token} which is used to declare a token type name without
1937associativity. (These tokens are single-character literals, which
1938ordinarily don't need to be declared. We declare them here to specify
1939the associativity.)
1940
1941Operator precedence is determined by the line ordering of the
1942declarations; the higher the line number of the declaration (lower on
1943the page or screen), the higher the precedence. Hence, exponentiation
1944has the highest precedence, unary minus (@code{NEG}) is next, followed
704a47c4
AD
1945by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator
1946Precedence}.
bfa74976 1947
704a47c4
AD
1948The other important new feature is the @code{%prec} in the grammar
1949section for the unary minus operator. The @code{%prec} simply instructs
1950Bison that the rule @samp{| '-' exp} has the same precedence as
1951@code{NEG}---in this case the next-to-highest. @xref{Contextual
1952Precedence, ,Context-Dependent Precedence}.
bfa74976
RS
1953
1954Here is a sample run of @file{calc.y}:
1955
1956@need 500
1957@example
9edcd895
AD
1958$ @kbd{calc}
1959@kbd{4 + 4.5 - (34/(8*3+-3))}
bfa74976 19606.880952381
9edcd895 1961@kbd{-56 + 2}
bfa74976 1962-54
9edcd895 1963@kbd{3 ^ 2}
bfa74976
RS
19649
1965@end example
1966
342b8b6e 1967@node Simple Error Recovery
bfa74976
RS
1968@section Simple Error Recovery
1969@cindex error recovery, simple
1970
1971Up to this point, this manual has not addressed the issue of @dfn{error
1972recovery}---how to continue parsing after the parser detects a syntax
ceed8467
AD
1973error. All we have handled is error reporting with @code{yyerror}.
1974Recall that by default @code{yyparse} returns after calling
1975@code{yyerror}. This means that an erroneous input line causes the
1976calculator program to exit. Now we show how to rectify this deficiency.
bfa74976
RS
1977
1978The Bison language itself includes the reserved word @code{error}, which
1979may be included in the grammar rules. In the example below it has
1980been added to one of the alternatives for @code{line}:
1981
1982@example
1983@group
1984line: '\n'
1985 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1986 | error '\n' @{ yyerrok; @}
1987;
1988@end group
1989@end example
1990
ceed8467 1991This addition to the grammar allows for simple error recovery in the
6e649e65 1992event of a syntax error. If an expression that cannot be evaluated is
ceed8467
AD
1993read, the error will be recognized by the third rule for @code{line},
1994and parsing will continue. (The @code{yyerror} function is still called
1995upon to print its message as well.) The action executes the statement
1996@code{yyerrok}, a macro defined automatically by Bison; its meaning is
1997that error recovery is complete (@pxref{Error Recovery}). Note the
1998difference between @code{yyerrok} and @code{yyerror}; neither one is a
e0c471a9 1999misprint.
bfa74976
RS
2000
2001This form of error recovery deals with syntax errors. There are other
2002kinds of errors; for example, division by zero, which raises an exception
2003signal that is normally fatal. A real calculator program must handle this
2004signal and use @code{longjmp} to return to @code{main} and resume parsing
2005input lines; it would also have to discard the rest of the current line of
2006input. We won't discuss this issue further because it is not specific to
2007Bison programs.
2008
342b8b6e
AD
2009@node Location Tracking Calc
2010@section Location Tracking Calculator: @code{ltcalc}
2011@cindex location tracking calculator
2012@cindex @code{ltcalc}
2013@cindex calculator, location tracking
2014
9edcd895
AD
2015This example extends the infix notation calculator with location
2016tracking. This feature will be used to improve the error messages. For
2017the sake of clarity, this example is a simple integer calculator, since
2018most of the work needed to use locations will be done in the lexical
72d2299c 2019analyzer.
342b8b6e
AD
2020
2021@menu
f56274a8
DJ
2022* Ltcalc Declarations:: Bison and C declarations for ltcalc.
2023* Ltcalc Rules:: Grammar rules for ltcalc, with explanations.
2024* Ltcalc Lexer:: The lexical analyzer.
342b8b6e
AD
2025@end menu
2026
f56274a8 2027@node Ltcalc Declarations
342b8b6e
AD
2028@subsection Declarations for @code{ltcalc}
2029
9edcd895
AD
2030The C and Bison declarations for the location tracking calculator are
2031the same as the declarations for the infix notation calculator.
342b8b6e
AD
2032
2033@example
2034/* Location tracking calculator. */
2035
2036%@{
38a92d50
PE
2037 #define YYSTYPE int
2038 #include <math.h>
2039 int yylex (void);
2040 void yyerror (char const *);
342b8b6e
AD
2041%@}
2042
2043/* Bison declarations. */
2044%token NUM
2045
2046%left '-' '+'
2047%left '*' '/'
2048%left NEG
2049%right '^'
2050
38a92d50 2051%% /* The grammar follows. */
342b8b6e
AD
2052@end example
2053
9edcd895
AD
2054@noindent
2055Note there are no declarations specific to locations. Defining a data
2056type for storing locations is not needed: we will use the type provided
2057by default (@pxref{Location Type, ,Data Types of Locations}), which is a
2058four member structure with the following integer fields:
2059@code{first_line}, @code{first_column}, @code{last_line} and
cd48d21d
AD
2060@code{last_column}. By conventions, and in accordance with the GNU
2061Coding Standards and common practice, the line and column count both
2062start at 1.
342b8b6e
AD
2063
2064@node Ltcalc Rules
2065@subsection Grammar Rules for @code{ltcalc}
2066
9edcd895
AD
2067Whether handling locations or not has no effect on the syntax of your
2068language. Therefore, grammar rules for this example will be very close
2069to those of the previous example: we will only modify them to benefit
2070from the new information.
342b8b6e 2071
9edcd895
AD
2072Here, we will use locations to report divisions by zero, and locate the
2073wrong expressions or subexpressions.
342b8b6e
AD
2074
2075@example
2076@group
2077input : /* empty */
2078 | input line
2079;
2080@end group
2081
2082@group
2083line : '\n'
2084 | exp '\n' @{ printf ("%d\n", $1); @}
2085;
2086@end group
2087
2088@group
2089exp : NUM @{ $$ = $1; @}
2090 | exp '+' exp @{ $$ = $1 + $3; @}
2091 | exp '-' exp @{ $$ = $1 - $3; @}
2092 | exp '*' exp @{ $$ = $1 * $3; @}
2093@end group
342b8b6e 2094@group
9edcd895 2095 | exp '/' exp
342b8b6e
AD
2096 @{
2097 if ($3)
2098 $$ = $1 / $3;
2099 else
2100 @{
2101 $$ = 1;
9edcd895
AD
2102 fprintf (stderr, "%d.%d-%d.%d: division by zero",
2103 @@3.first_line, @@3.first_column,
2104 @@3.last_line, @@3.last_column);
342b8b6e
AD
2105 @}
2106 @}
2107@end group
2108@group
178e123e 2109 | '-' exp %prec NEG @{ $$ = -$2; @}
342b8b6e
AD
2110 | exp '^' exp @{ $$ = pow ($1, $3); @}
2111 | '(' exp ')' @{ $$ = $2; @}
2112@end group
2113@end example
2114
2115This code shows how to reach locations inside of semantic actions, by
2116using the pseudo-variables @code{@@@var{n}} for rule components, and the
2117pseudo-variable @code{@@$} for groupings.
2118
9edcd895
AD
2119We don't need to assign a value to @code{@@$}: the output parser does it
2120automatically. By default, before executing the C code of each action,
2121@code{@@$} is set to range from the beginning of @code{@@1} to the end
2122of @code{@@@var{n}}, for a rule with @var{n} components. This behavior
2123can be redefined (@pxref{Location Default Action, , Default Action for
2124Locations}), and for very specific rules, @code{@@$} can be computed by
2125hand.
342b8b6e
AD
2126
2127@node Ltcalc Lexer
2128@subsection The @code{ltcalc} Lexical Analyzer.
2129
9edcd895 2130Until now, we relied on Bison's defaults to enable location
72d2299c 2131tracking. The next step is to rewrite the lexical analyzer, and make it
9edcd895
AD
2132able to feed the parser with the token locations, as it already does for
2133semantic values.
342b8b6e 2134
9edcd895
AD
2135To this end, we must take into account every single character of the
2136input text, to avoid the computed locations of being fuzzy or wrong:
342b8b6e
AD
2137
2138@example
2139@group
2140int
2141yylex (void)
2142@{
2143 int c;
18b519c0 2144@end group
342b8b6e 2145
18b519c0 2146@group
72d2299c 2147 /* Skip white space. */
342b8b6e
AD
2148 while ((c = getchar ()) == ' ' || c == '\t')
2149 ++yylloc.last_column;
18b519c0 2150@end group
342b8b6e 2151
18b519c0 2152@group
72d2299c 2153 /* Step. */
342b8b6e
AD
2154 yylloc.first_line = yylloc.last_line;
2155 yylloc.first_column = yylloc.last_column;
2156@end group
2157
2158@group
72d2299c 2159 /* Process numbers. */
342b8b6e
AD
2160 if (isdigit (c))
2161 @{
2162 yylval = c - '0';
2163 ++yylloc.last_column;
2164 while (isdigit (c = getchar ()))
2165 @{
2166 ++yylloc.last_column;
2167 yylval = yylval * 10 + c - '0';
2168 @}
2169 ungetc (c, stdin);
2170 return NUM;
2171 @}
2172@end group
2173
72d2299c 2174 /* Return end-of-input. */
342b8b6e
AD
2175 if (c == EOF)
2176 return 0;
2177
98842516 2178@group
72d2299c 2179 /* Return a single char, and update location. */
342b8b6e
AD
2180 if (c == '\n')
2181 @{
2182 ++yylloc.last_line;
2183 yylloc.last_column = 0;
2184 @}
2185 else
2186 ++yylloc.last_column;
2187 return c;
2188@}
98842516 2189@end group
342b8b6e
AD
2190@end example
2191
9edcd895
AD
2192Basically, the lexical analyzer performs the same processing as before:
2193it skips blanks and tabs, and reads numbers or single-character tokens.
2194In addition, it updates @code{yylloc}, the global variable (of type
2195@code{YYLTYPE}) containing the token's location.
342b8b6e 2196
9edcd895 2197Now, each time this function returns a token, the parser has its number
72d2299c 2198as well as its semantic value, and its location in the text. The last
9edcd895
AD
2199needed change is to initialize @code{yylloc}, for example in the
2200controlling function:
342b8b6e
AD
2201
2202@example
9edcd895 2203@group
342b8b6e
AD
2204int
2205main (void)
2206@{
2207 yylloc.first_line = yylloc.last_line = 1;
2208 yylloc.first_column = yylloc.last_column = 0;
2209 return yyparse ();
2210@}
9edcd895 2211@end group
342b8b6e
AD
2212@end example
2213
9edcd895
AD
2214Remember that computing locations is not a matter of syntax. Every
2215character must be associated to a location update, whether it is in
2216valid input, in comments, in literal strings, and so on.
342b8b6e
AD
2217
2218@node Multi-function Calc
bfa74976
RS
2219@section Multi-Function Calculator: @code{mfcalc}
2220@cindex multi-function calculator
2221@cindex @code{mfcalc}
2222@cindex calculator, multi-function
2223
2224Now that the basics of Bison have been discussed, it is time to move on to
2225a more advanced problem. The above calculators provided only five
2226functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would
2227be nice to have a calculator that provides other mathematical functions such
2228as @code{sin}, @code{cos}, etc.
2229
2230It is easy to add new operators to the infix calculator as long as they are
2231only single-character literals. The lexical analyzer @code{yylex} passes
9d9b8b70 2232back all nonnumeric characters as tokens, so new grammar rules suffice for
bfa74976
RS
2233adding a new operator. But we want something more flexible: built-in
2234functions whose syntax has this form:
2235
2236@example
2237@var{function_name} (@var{argument})
2238@end example
2239
2240@noindent
2241At the same time, we will add memory to the calculator, by allowing you
2242to create named variables, store values in them, and use them later.
2243Here is a sample session with the multi-function calculator:
2244
2245@example
9edcd895
AD
2246$ @kbd{mfcalc}
2247@kbd{pi = 3.141592653589}
bfa74976 22483.1415926536
9edcd895 2249@kbd{sin(pi)}
bfa74976 22500.0000000000
9edcd895 2251@kbd{alpha = beta1 = 2.3}
bfa74976 22522.3000000000
9edcd895 2253@kbd{alpha}
bfa74976 22542.3000000000
9edcd895 2255@kbd{ln(alpha)}
bfa74976 22560.8329091229
9edcd895 2257@kbd{exp(ln(beta1))}
bfa74976 22582.3000000000
9edcd895 2259$
bfa74976
RS
2260@end example
2261
2262Note that multiple assignment and nested function calls are permitted.
2263
2264@menu
f56274a8
DJ
2265* Mfcalc Declarations:: Bison declarations for multi-function calculator.
2266* Mfcalc Rules:: Grammar rules for the calculator.
2267* Mfcalc Symbol Table:: Symbol table management subroutines.
bfa74976
RS
2268@end menu
2269
f56274a8 2270@node Mfcalc Declarations
bfa74976
RS
2271@subsection Declarations for @code{mfcalc}
2272
2273Here are the C and Bison declarations for the multi-function calculator.
2274
2275@smallexample
18b519c0 2276@group
bfa74976 2277%@{
38a92d50
PE
2278 #include <math.h> /* For math functions, cos(), sin(), etc. */
2279 #include "calc.h" /* Contains definition of `symrec'. */
2280 int yylex (void);
2281 void yyerror (char const *);
bfa74976 2282%@}
18b519c0
AD
2283@end group
2284@group
bfa74976 2285%union @{
38a92d50
PE
2286 double val; /* For returning numbers. */
2287 symrec *tptr; /* For returning symbol-table pointers. */
bfa74976 2288@}
18b519c0 2289@end group
38a92d50
PE
2290%token <val> NUM /* Simple double precision number. */
2291%token <tptr> VAR FNCT /* Variable and Function. */
bfa74976
RS
2292%type <val> exp
2293
18b519c0 2294@group
bfa74976
RS
2295%right '='
2296%left '-' '+'
2297%left '*' '/'
38a92d50
PE
2298%left NEG /* negation--unary minus */
2299%right '^' /* exponentiation */
18b519c0 2300@end group
38a92d50 2301%% /* The grammar follows. */
bfa74976
RS
2302@end smallexample
2303
2304The above grammar introduces only two new features of the Bison language.
2305These features allow semantic values to have various data types
2306(@pxref{Multiple Types, ,More Than One Value Type}).
2307
2308The @code{%union} declaration specifies the entire list of possible types;
2309this is instead of defining @code{YYSTYPE}. The allowable types are now
2310double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
2311the symbol table. @xref{Union Decl, ,The Collection of Value Types}.
2312
2313Since values can now have various types, it is necessary to associate a
2314type with each grammar symbol whose semantic value is used. These symbols
2315are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their
2316declarations are augmented with information about their data type (placed
2317between angle brackets).
2318
704a47c4
AD
2319The Bison construct @code{%type} is used for declaring nonterminal
2320symbols, just as @code{%token} is used for declaring token types. We
2321have not used @code{%type} before because nonterminal symbols are
2322normally declared implicitly by the rules that define them. But
2323@code{exp} must be declared explicitly so we can specify its value type.
2324@xref{Type Decl, ,Nonterminal Symbols}.
bfa74976 2325
342b8b6e 2326@node Mfcalc Rules
bfa74976
RS
2327@subsection Grammar Rules for @code{mfcalc}
2328
2329Here are the grammar rules for the multi-function calculator.
2330Most of them are copied directly from @code{calc}; three rules,
2331those which mention @code{VAR} or @code{FNCT}, are new.
2332
2333@smallexample
18b519c0 2334@group
bfa74976
RS
2335input: /* empty */
2336 | input line
2337;
18b519c0 2338@end group
bfa74976 2339
18b519c0 2340@group
bfa74976
RS
2341line:
2342 '\n'
2343 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
2344 | error '\n' @{ yyerrok; @}
2345;
18b519c0 2346@end group
bfa74976 2347
18b519c0 2348@group
bfa74976
RS
2349exp: NUM @{ $$ = $1; @}
2350 | VAR @{ $$ = $1->value.var; @}
2351 | VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
2352 | FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
2353 | exp '+' exp @{ $$ = $1 + $3; @}
2354 | exp '-' exp @{ $$ = $1 - $3; @}
2355 | exp '*' exp @{ $$ = $1 * $3; @}
2356 | exp '/' exp @{ $$ = $1 / $3; @}
2357 | '-' exp %prec NEG @{ $$ = -$2; @}
2358 | exp '^' exp @{ $$ = pow ($1, $3); @}
2359 | '(' exp ')' @{ $$ = $2; @}
2360;
18b519c0 2361@end group
38a92d50 2362/* End of grammar. */
bfa74976
RS
2363%%
2364@end smallexample
2365
f56274a8 2366@node Mfcalc Symbol Table
bfa74976
RS
2367@subsection The @code{mfcalc} Symbol Table
2368@cindex symbol table example
2369
2370The multi-function calculator requires a symbol table to keep track of the
2371names and meanings of variables and functions. This doesn't affect the
2372grammar rules (except for the actions) or the Bison declarations, but it
2373requires some additional C functions for support.
2374
2375The symbol table itself consists of a linked list of records. Its
2376definition, which is kept in the header @file{calc.h}, is as follows. It
2377provides for either functions or variables to be placed in the table.
2378
2379@smallexample
2380@group
38a92d50 2381/* Function type. */
32dfccf8 2382typedef double (*func_t) (double);
72f889cc 2383@end group
32dfccf8 2384
72f889cc 2385@group
38a92d50 2386/* Data type for links in the chain of symbols. */
bfa74976
RS
2387struct symrec
2388@{
38a92d50 2389 char *name; /* name of symbol */
bfa74976 2390 int type; /* type of symbol: either VAR or FNCT */
32dfccf8
AD
2391 union
2392 @{
38a92d50
PE
2393 double var; /* value of a VAR */
2394 func_t fnctptr; /* value of a FNCT */
bfa74976 2395 @} value;
38a92d50 2396 struct symrec *next; /* link field */
bfa74976
RS
2397@};
2398@end group
2399
2400@group
2401typedef struct symrec symrec;
2402
38a92d50 2403/* The symbol table: a chain of `struct symrec'. */
bfa74976
RS
2404extern symrec *sym_table;
2405
a730d142 2406symrec *putsym (char const *, int);
38a92d50 2407symrec *getsym (char const *);
bfa74976
RS
2408@end group
2409@end smallexample
2410
2411The new version of @code{main} includes a call to @code{init_table}, a
2412function that initializes the symbol table. Here it is, and
2413@code{init_table} as well:
2414
2415@smallexample
bfa74976
RS
2416#include <stdio.h>
2417
18b519c0 2418@group
38a92d50 2419/* Called by yyparse on error. */
13863333 2420void
38a92d50 2421yyerror (char const *s)
bfa74976
RS
2422@{
2423 printf ("%s\n", s);
2424@}
18b519c0 2425@end group
bfa74976 2426
18b519c0 2427@group
bfa74976
RS
2428struct init
2429@{
38a92d50
PE
2430 char const *fname;
2431 double (*fnct) (double);
bfa74976
RS
2432@};
2433@end group
2434
2435@group
38a92d50 2436struct init const arith_fncts[] =
13863333 2437@{
32dfccf8
AD
2438 "sin", sin,
2439 "cos", cos,
13863333 2440 "atan", atan,
32dfccf8
AD
2441 "ln", log,
2442 "exp", exp,
13863333
AD
2443 "sqrt", sqrt,
2444 0, 0
2445@};
18b519c0 2446@end group
bfa74976 2447
18b519c0 2448@group
bfa74976 2449/* The symbol table: a chain of `struct symrec'. */
38a92d50 2450symrec *sym_table;
bfa74976
RS
2451@end group
2452
2453@group
72d2299c 2454/* Put arithmetic functions in table. */
13863333
AD
2455void
2456init_table (void)
bfa74976
RS
2457@{
2458 int i;
bfa74976
RS
2459 for (i = 0; arith_fncts[i].fname != 0; i++)
2460 @{
2c0f9706 2461 symrec *ptr = putsym (arith_fncts[i].fname, FNCT);
bfa74976
RS
2462 ptr->value.fnctptr = arith_fncts[i].fnct;
2463 @}
2464@}
2465@end group
38a92d50
PE
2466
2467@group
2468int
2469main (void)
2470@{
2471 init_table ();
2472 return yyparse ();
2473@}
2474@end group
bfa74976
RS
2475@end smallexample
2476
2477By simply editing the initialization list and adding the necessary include
2478files, you can add additional functions to the calculator.
2479
2480Two important functions allow look-up and installation of symbols in the
2481symbol table. The function @code{putsym} is passed a name and the type
2482(@code{VAR} or @code{FNCT}) of the object to be installed. The object is
2483linked to the front of the list, and a pointer to the object is returned.
2484The function @code{getsym} is passed the name of the symbol to look up. If
2485found, a pointer to that symbol is returned; otherwise zero is returned.
2486
2487@smallexample
98842516
AD
2488#include <stdlib.h> /* malloc. */
2489#include <string.h> /* strlen. */
2490
2491@group
bfa74976 2492symrec *
38a92d50 2493putsym (char const *sym_name, int sym_type)
bfa74976 2494@{
2c0f9706 2495 symrec *ptr = (symrec *) malloc (sizeof (symrec));
bfa74976
RS
2496 ptr->name = (char *) malloc (strlen (sym_name) + 1);
2497 strcpy (ptr->name,sym_name);
2498 ptr->type = sym_type;
72d2299c 2499 ptr->value.var = 0; /* Set value to 0 even if fctn. */
bfa74976
RS
2500 ptr->next = (struct symrec *)sym_table;
2501 sym_table = ptr;
2502 return ptr;
2503@}
98842516 2504@end group
bfa74976 2505
98842516 2506@group
bfa74976 2507symrec *
38a92d50 2508getsym (char const *sym_name)
bfa74976
RS
2509@{
2510 symrec *ptr;
2511 for (ptr = sym_table; ptr != (symrec *) 0;
2512 ptr = (symrec *)ptr->next)
2513 if (strcmp (ptr->name,sym_name) == 0)
2514 return ptr;
2515 return 0;
2516@}
98842516 2517@end group
bfa74976
RS
2518@end smallexample
2519
2520The function @code{yylex} must now recognize variables, numeric values, and
2521the single-character arithmetic operators. Strings of alphanumeric
9d9b8b70 2522characters with a leading letter are recognized as either variables or
bfa74976
RS
2523functions depending on what the symbol table says about them.
2524
2525The string is passed to @code{getsym} for look up in the symbol table. If
2526the name appears in the table, a pointer to its location and its type
2527(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not
2528already in the table, then it is installed as a @code{VAR} using
2529@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
e0c471a9 2530returned to @code{yyparse}.
bfa74976
RS
2531
2532No change is needed in the handling of numeric values and arithmetic
2533operators in @code{yylex}.
2534
2535@smallexample
2536@group
2537#include <ctype.h>
18b519c0 2538@end group
13863333 2539
18b519c0 2540@group
13863333
AD
2541int
2542yylex (void)
bfa74976
RS
2543@{
2544 int c;
2545
72d2299c 2546 /* Ignore white space, get first nonwhite character. */
98842516
AD
2547 while ((c = getchar ()) == ' ' || c == '\t')
2548 continue;
bfa74976
RS
2549
2550 if (c == EOF)
2551 return 0;
2552@end group
2553
2554@group
2555 /* Char starts a number => parse the number. */
2556 if (c == '.' || isdigit (c))
2557 @{
2558 ungetc (c, stdin);
2559 scanf ("%lf", &yylval.val);
2560 return NUM;
2561 @}
2562@end group
2563
2564@group
2565 /* Char starts an identifier => read the name. */
2566 if (isalpha (c))
2567 @{
2c0f9706
AD
2568 /* Initially make the buffer long enough
2569 for a 40-character symbol name. */
2570 static size_t length = 40;
bfa74976 2571 static char *symbuf = 0;
2c0f9706 2572 symrec *s;
bfa74976
RS
2573 int i;
2574@end group
2575
2576@group
2c0f9706
AD
2577 if (!symbuf)
2578 symbuf = (char *) malloc (length + 1);
bfa74976
RS
2579
2580 i = 0;
2581 do
bfa74976
RS
2582@group
2583 @{
2584 /* If buffer is full, make it bigger. */
2585 if (i == length)
2586 @{
2587 length *= 2;
18b519c0 2588 symbuf = (char *) realloc (symbuf, length + 1);
bfa74976
RS
2589 @}
2590 /* Add this character to the buffer. */
2591 symbuf[i++] = c;
2592 /* Get another character. */
2593 c = getchar ();
2594 @}
2595@end group
2596@group
72d2299c 2597 while (isalnum (c));
bfa74976
RS
2598
2599 ungetc (c, stdin);
2600 symbuf[i] = '\0';
2601@end group
2602
2603@group
2604 s = getsym (symbuf);
2605 if (s == 0)
2606 s = putsym (symbuf, VAR);
2607 yylval.tptr = s;
2608 return s->type;
2609 @}
2610
2611 /* Any other character is a token by itself. */
2612 return c;
2613@}
2614@end group
2615@end smallexample
2616
72d2299c 2617This program is both powerful and flexible. You may easily add new
704a47c4
AD
2618functions, and it is a simple job to modify this code to install
2619predefined variables such as @code{pi} or @code{e} as well.
bfa74976 2620
342b8b6e 2621@node Exercises
bfa74976
RS
2622@section Exercises
2623@cindex exercises
2624
2625@enumerate
2626@item
2627Add some new functions from @file{math.h} to the initialization list.
2628
2629@item
2630Add another array that contains constants and their values. Then
2631modify @code{init_table} to add these constants to the symbol table.
2632It will be easiest to give the constants type @code{VAR}.
2633
2634@item
2635Make the program report an error if the user refers to an
2636uninitialized variable in any way except to store a value in it.
2637@end enumerate
2638
342b8b6e 2639@node Grammar File
bfa74976
RS
2640@chapter Bison Grammar Files
2641
2642Bison takes as input a context-free grammar specification and produces a
2643C-language function that recognizes correct instances of the grammar.
2644
9913d6e4 2645The Bison grammar file conventionally has a name ending in @samp{.y}.
234a3be3 2646@xref{Invocation, ,Invoking Bison}.
bfa74976
RS
2647
2648@menu
7404cdf3
JD
2649* Grammar Outline:: Overall layout of the grammar file.
2650* Symbols:: Terminal and nonterminal symbols.
2651* Rules:: How to write grammar rules.
2652* Recursion:: Writing recursive rules.
2653* Semantics:: Semantic values and actions.
2654* Tracking Locations:: Locations and actions.
2655* Named References:: Using named references in actions.
2656* Declarations:: All kinds of Bison declarations are described here.
2657* Multiple Parsers:: Putting more than one Bison parser in one program.
bfa74976
RS
2658@end menu
2659
342b8b6e 2660@node Grammar Outline
bfa74976
RS
2661@section Outline of a Bison Grammar
2662
2663A Bison grammar file has four main sections, shown here with the
2664appropriate delimiters:
2665
2666@example
2667%@{
38a92d50 2668 @var{Prologue}
bfa74976
RS
2669%@}
2670
2671@var{Bison declarations}
2672
2673%%
2674@var{Grammar rules}
2675%%
2676
75f5aaea 2677@var{Epilogue}
bfa74976
RS
2678@end example
2679
2680Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
35430378 2681As a GNU extension, @samp{//} introduces a comment that
2bfc2e2a 2682continues until end of line.
bfa74976
RS
2683
2684@menu
f56274a8 2685* Prologue:: Syntax and usage of the prologue.
2cbe6b7f 2686* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
f56274a8
DJ
2687* Bison Declarations:: Syntax and usage of the Bison declarations section.
2688* Grammar Rules:: Syntax and usage of the grammar rules section.
2689* Epilogue:: Syntax and usage of the epilogue.
bfa74976
RS
2690@end menu
2691
38a92d50 2692@node Prologue
75f5aaea
MA
2693@subsection The prologue
2694@cindex declarations section
2695@cindex Prologue
2696@cindex declarations
bfa74976 2697
f8e1c9e5
AD
2698The @var{Prologue} section contains macro definitions and declarations
2699of functions and variables that are used in the actions in the grammar
9913d6e4
JD
2700rules. These are copied to the beginning of the parser implementation
2701file so that they precede the definition of @code{yyparse}. You can
2702use @samp{#include} to get the declarations from a header file. If
2703you don't need any C declarations, you may omit the @samp{%@{} and
f8e1c9e5 2704@samp{%@}} delimiters that bracket this section.
bfa74976 2705
9c437126 2706The @var{Prologue} section is terminated by the first occurrence
287c78f6
PE
2707of @samp{%@}} that is outside a comment, a string literal, or a
2708character constant.
2709
c732d2c6
AD
2710You may have more than one @var{Prologue} section, intermixed with the
2711@var{Bison declarations}. This allows you to have C and Bison
2712declarations that refer to each other. For example, the @code{%union}
2713declaration may use types defined in a header file, and you may wish to
2714prototype functions that take arguments of type @code{YYSTYPE}. This
2715can be done with two @var{Prologue} blocks, one before and one after the
2716@code{%union} declaration.
2717
2718@smallexample
2719%@{
aef3da86 2720 #define _GNU_SOURCE
38a92d50
PE
2721 #include <stdio.h>
2722 #include "ptypes.h"
c732d2c6
AD
2723%@}
2724
2725%union @{
779e7ceb 2726 long int n;
c732d2c6
AD
2727 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2728@}
2729
2730%@{
38a92d50
PE
2731 static void print_token_value (FILE *, int, YYSTYPE);
2732 #define YYPRINT(F, N, L) print_token_value (F, N, L)
c732d2c6
AD
2733%@}
2734
2735@dots{}
2736@end smallexample
2737
aef3da86
PE
2738When in doubt, it is usually safer to put prologue code before all
2739Bison declarations, rather than after. For example, any definitions
2740of feature test macros like @code{_GNU_SOURCE} or
2741@code{_POSIX_C_SOURCE} should appear before all Bison declarations, as
2742feature test macros can affect the behavior of Bison-generated
2743@code{#include} directives.
2744
2cbe6b7f
JD
2745@node Prologue Alternatives
2746@subsection Prologue Alternatives
2747@cindex Prologue Alternatives
2748
136a0f76 2749@findex %code
16dc6a9e
JD
2750@findex %code requires
2751@findex %code provides
2752@findex %code top
85894313 2753
2cbe6b7f 2754The functionality of @var{Prologue} sections can often be subtle and
9913d6e4
JD
2755inflexible. As an alternative, Bison provides a @code{%code}
2756directive with an explicit qualifier field, which identifies the
2757purpose of the code and thus the location(s) where Bison should
2758generate it. For C/C++, the qualifier can be omitted for the default
2759location, or it can be one of @code{requires}, @code{provides},
8e6f2266 2760@code{top}. @xref{%code Summary}.
2cbe6b7f
JD
2761
2762Look again at the example of the previous section:
2763
2764@smallexample
2765%@{
2766 #define _GNU_SOURCE
2767 #include <stdio.h>
2768 #include "ptypes.h"
2769%@}
2770
2771%union @{
2772 long int n;
2773 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2774@}
2775
2776%@{
2777 static void print_token_value (FILE *, int, YYSTYPE);
2778 #define YYPRINT(F, N, L) print_token_value (F, N, L)
2779%@}
2780
2781@dots{}
2782@end smallexample
2783
2784@noindent
9913d6e4
JD
2785Notice that there are two @var{Prologue} sections here, but there's a
2786subtle distinction between their functionality. For example, if you
2787decide to override Bison's default definition for @code{YYLTYPE}, in
2788which @var{Prologue} section should you write your new definition?
2789You should write it in the first since Bison will insert that code
2790into the parser implementation file @emph{before} the default
2791@code{YYLTYPE} definition. In which @var{Prologue} section should you
2792prototype an internal function, @code{trace_token}, that accepts
2793@code{YYLTYPE} and @code{yytokentype} as arguments? You should
2794prototype it in the second since Bison will insert that code
2cbe6b7f
JD
2795@emph{after} the @code{YYLTYPE} and @code{yytokentype} definitions.
2796
2797This distinction in functionality between the two @var{Prologue} sections is
2798established by the appearance of the @code{%union} between them.
a501eca9 2799This behavior raises a few questions.
2cbe6b7f
JD
2800First, why should the position of a @code{%union} affect definitions related to
2801@code{YYLTYPE} and @code{yytokentype}?
2802Second, what if there is no @code{%union}?
2803In that case, the second kind of @var{Prologue} section is not available.
2804This behavior is not intuitive.
2805
8e0a5e9e 2806To avoid this subtle @code{%union} dependency, rewrite the example using a
16dc6a9e 2807@code{%code top} and an unqualified @code{%code}.
2cbe6b7f
JD
2808Let's go ahead and add the new @code{YYLTYPE} definition and the
2809@code{trace_token} prototype at the same time:
2810
2811@smallexample
16dc6a9e 2812%code top @{
2cbe6b7f
JD
2813 #define _GNU_SOURCE
2814 #include <stdio.h>
8e0a5e9e
JD
2815
2816 /* WARNING: The following code really belongs
16dc6a9e 2817 * in a `%code requires'; see below. */
8e0a5e9e 2818
2cbe6b7f
JD
2819 #include "ptypes.h"
2820 #define YYLTYPE YYLTYPE
2821 typedef struct YYLTYPE
2822 @{
2823 int first_line;
2824 int first_column;
2825 int last_line;
2826 int last_column;
2827 char *filename;
2828 @} YYLTYPE;
2829@}
2830
2831%union @{
2832 long int n;
2833 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2834@}
2835
2836%code @{
2837 static void print_token_value (FILE *, int, YYSTYPE);
2838 #define YYPRINT(F, N, L) print_token_value (F, N, L)
2839 static void trace_token (enum yytokentype token, YYLTYPE loc);
2840@}
2841
2842@dots{}
2843@end smallexample
2844
2845@noindent
16dc6a9e
JD
2846In this way, @code{%code top} and the unqualified @code{%code} achieve the same
2847functionality as the two kinds of @var{Prologue} sections, but it's always
8e0a5e9e 2848explicit which kind you intend.
2cbe6b7f
JD
2849Moreover, both kinds are always available even in the absence of @code{%union}.
2850
9913d6e4
JD
2851The @code{%code top} block above logically contains two parts. The
2852first two lines before the warning need to appear near the top of the
2853parser implementation file. The first line after the warning is
2854required by @code{YYSTYPE} and thus also needs to appear in the parser
2855implementation file. However, if you've instructed Bison to generate
2856a parser header file (@pxref{Decl Summary, ,%defines}), you probably
2857want that line to appear before the @code{YYSTYPE} definition in that
2858header file as well. The @code{YYLTYPE} definition should also appear
2859in the parser header file to override the default @code{YYLTYPE}
2860definition there.
2cbe6b7f 2861
16dc6a9e 2862In other words, in the @code{%code top} block above, all but the first two
8e0a5e9e
JD
2863lines are dependency code required by the @code{YYSTYPE} and @code{YYLTYPE}
2864definitions.
16dc6a9e 2865Thus, they belong in one or more @code{%code requires}:
9bc0dd67
JD
2866
2867@smallexample
98842516 2868@group
16dc6a9e 2869%code top @{
2cbe6b7f
JD
2870 #define _GNU_SOURCE
2871 #include <stdio.h>
2872@}
98842516 2873@end group
2cbe6b7f 2874
98842516 2875@group
16dc6a9e 2876%code requires @{
9bc0dd67
JD
2877 #include "ptypes.h"
2878@}
98842516
AD
2879@end group
2880@group
9bc0dd67
JD
2881%union @{
2882 long int n;
2883 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2884@}
98842516 2885@end group
9bc0dd67 2886
98842516 2887@group
16dc6a9e 2888%code requires @{
2cbe6b7f
JD
2889 #define YYLTYPE YYLTYPE
2890 typedef struct YYLTYPE
2891 @{
2892 int first_line;
2893 int first_column;
2894 int last_line;
2895 int last_column;
2896 char *filename;
2897 @} YYLTYPE;
2898@}
98842516 2899@end group
2cbe6b7f 2900
98842516 2901@group
136a0f76 2902%code @{
2cbe6b7f
JD
2903 static void print_token_value (FILE *, int, YYSTYPE);
2904 #define YYPRINT(F, N, L) print_token_value (F, N, L)
2905 static void trace_token (enum yytokentype token, YYLTYPE loc);
2906@}
98842516 2907@end group
2cbe6b7f
JD
2908
2909@dots{}
2910@end smallexample
2911
2912@noindent
9913d6e4
JD
2913Now Bison will insert @code{#include "ptypes.h"} and the new
2914@code{YYLTYPE} definition before the Bison-generated @code{YYSTYPE}
2915and @code{YYLTYPE} definitions in both the parser implementation file
2916and the parser header file. (By the same reasoning, @code{%code
2917requires} would also be the appropriate place to write your own
2918definition for @code{YYSTYPE}.)
2919
2920When you are writing dependency code for @code{YYSTYPE} and
2921@code{YYLTYPE}, you should prefer @code{%code requires} over
2922@code{%code top} regardless of whether you instruct Bison to generate
2923a parser header file. When you are writing code that you need Bison
2924to insert only into the parser implementation file and that has no
2925special need to appear at the top of that file, you should prefer the
2926unqualified @code{%code} over @code{%code top}. These practices will
2927make the purpose of each block of your code explicit to Bison and to
2928other developers reading your grammar file. Following these
2929practices, we expect the unqualified @code{%code} and @code{%code
2930requires} to be the most important of the four @var{Prologue}
16dc6a9e 2931alternatives.
a501eca9 2932
9913d6e4
JD
2933At some point while developing your parser, you might decide to
2934provide @code{trace_token} to modules that are external to your
2935parser. Thus, you might wish for Bison to insert the prototype into
2936both the parser header file and the parser implementation file. Since
2937this function is not a dependency required by @code{YYSTYPE} or
8e0a5e9e 2938@code{YYLTYPE}, it doesn't make sense to move its prototype to a
9913d6e4
JD
2939@code{%code requires}. More importantly, since it depends upon
2940@code{YYLTYPE} and @code{yytokentype}, @code{%code requires} is not
2941sufficient. Instead, move its prototype from the unqualified
2942@code{%code} to a @code{%code provides}:
2cbe6b7f
JD
2943
2944@smallexample
98842516 2945@group
16dc6a9e 2946%code top @{
2cbe6b7f 2947 #define _GNU_SOURCE
136a0f76 2948 #include <stdio.h>
2cbe6b7f 2949@}
98842516 2950@end group
136a0f76 2951
98842516 2952@group
16dc6a9e 2953%code requires @{
2cbe6b7f
JD
2954 #include "ptypes.h"
2955@}
98842516
AD
2956@end group
2957@group
2cbe6b7f
JD
2958%union @{
2959 long int n;
2960 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2961@}
98842516 2962@end group
2cbe6b7f 2963
98842516 2964@group
16dc6a9e 2965%code requires @{
2cbe6b7f
JD
2966 #define YYLTYPE YYLTYPE
2967 typedef struct YYLTYPE
2968 @{
2969 int first_line;
2970 int first_column;
2971 int last_line;
2972 int last_column;
2973 char *filename;
2974 @} YYLTYPE;
2975@}
98842516 2976@end group
2cbe6b7f 2977
98842516 2978@group
16dc6a9e 2979%code provides @{
2cbe6b7f
JD
2980 void trace_token (enum yytokentype token, YYLTYPE loc);
2981@}
98842516 2982@end group
2cbe6b7f 2983
98842516 2984@group
2cbe6b7f 2985%code @{
9bc0dd67
JD
2986 static void print_token_value (FILE *, int, YYSTYPE);
2987 #define YYPRINT(F, N, L) print_token_value (F, N, L)
34f98f46 2988@}
98842516 2989@end group
9bc0dd67
JD
2990
2991@dots{}
2992@end smallexample
2993
2cbe6b7f 2994@noindent
9913d6e4
JD
2995Bison will insert the @code{trace_token} prototype into both the
2996parser header file and the parser implementation file after the
2997definitions for @code{yytokentype}, @code{YYLTYPE}, and
2998@code{YYSTYPE}.
2999
3000The above examples are careful to write directives in an order that
3001reflects the layout of the generated parser implementation and header
3002files: @code{%code top}, @code{%code requires}, @code{%code provides},
3003and then @code{%code}. While your grammar files may generally be
3004easier to read if you also follow this order, Bison does not require
3005it. Instead, Bison lets you choose an organization that makes sense
3006to you.
2cbe6b7f 3007
a501eca9 3008You may declare any of these directives multiple times in the grammar file.
2cbe6b7f
JD
3009In that case, Bison concatenates the contained code in declaration order.
3010This is the only way in which the position of one of these directives within
3011the grammar file affects its functionality.
3012
3013The result of the previous two properties is greater flexibility in how you may
3014organize your grammar file.
3015For example, you may organize semantic-type-related directives by semantic
3016type:
3017
3018@smallexample
98842516 3019@group
16dc6a9e 3020%code requires @{ #include "type1.h" @}
2cbe6b7f
JD
3021%union @{ type1 field1; @}
3022%destructor @{ type1_free ($$); @} <field1>
3023%printer @{ type1_print ($$); @} <field1>
98842516 3024@end group
2cbe6b7f 3025
98842516 3026@group
16dc6a9e 3027%code requires @{ #include "type2.h" @}
2cbe6b7f
JD
3028%union @{ type2 field2; @}
3029%destructor @{ type2_free ($$); @} <field2>
3030%printer @{ type2_print ($$); @} <field2>
98842516 3031@end group
2cbe6b7f
JD
3032@end smallexample
3033
3034@noindent
3035You could even place each of the above directive groups in the rules section of
3036the grammar file next to the set of rules that uses the associated semantic
3037type.
61fee93e
JD
3038(In the rules section, you must terminate each of those directives with a
3039semicolon.)
2cbe6b7f
JD
3040And you don't have to worry that some directive (like a @code{%union}) in the
3041definitions section is going to adversely affect their functionality in some
3042counter-intuitive manner just because it comes first.
3043Such an organization is not possible using @var{Prologue} sections.
3044
a501eca9 3045This section has been concerned with explaining the advantages of the four
8e0a5e9e 3046@var{Prologue} alternatives over the original Yacc @var{Prologue}.
a501eca9
JD
3047However, in most cases when using these directives, you shouldn't need to
3048think about all the low-level ordering issues discussed here.
3049Instead, you should simply use these directives to label each block of your
3050code according to its purpose and let Bison handle the ordering.
3051@code{%code} is the most generic label.
16dc6a9e
JD
3052Move code to @code{%code requires}, @code{%code provides}, or @code{%code top}
3053as needed.
a501eca9 3054
342b8b6e 3055@node Bison Declarations
bfa74976
RS
3056@subsection The Bison Declarations Section
3057@cindex Bison declarations (introduction)
3058@cindex declarations, Bison (introduction)
3059
3060The @var{Bison declarations} section contains declarations that define
3061terminal and nonterminal symbols, specify precedence, and so on.
3062In some simple grammars you may not need any declarations.
3063@xref{Declarations, ,Bison Declarations}.
3064
342b8b6e 3065@node Grammar Rules
bfa74976
RS
3066@subsection The Grammar Rules Section
3067@cindex grammar rules section
3068@cindex rules section for grammar
3069
3070The @dfn{grammar rules} section contains one or more Bison grammar
3071rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}.
3072
3073There must always be at least one grammar rule, and the first
3074@samp{%%} (which precedes the grammar rules) may never be omitted even
3075if it is the first thing in the file.
3076
38a92d50 3077@node Epilogue
75f5aaea 3078@subsection The epilogue
bfa74976 3079@cindex additional C code section
75f5aaea 3080@cindex epilogue
bfa74976
RS
3081@cindex C code, section for additional
3082
9913d6e4
JD
3083The @var{Epilogue} is copied verbatim to the end of the parser
3084implementation file, just as the @var{Prologue} is copied to the
3085beginning. This is the most convenient place to put anything that you
3086want to have in the parser implementation file but which need not come
3087before the definition of @code{yyparse}. For example, the definitions
3088of @code{yylex} and @code{yyerror} often go here. Because C requires
3089functions to be declared before being used, you often need to declare
3090functions like @code{yylex} and @code{yyerror} in the Prologue, even
3091if you define them in the Epilogue. @xref{Interface, ,Parser
3092C-Language Interface}.
bfa74976
RS
3093
3094If the last section is empty, you may omit the @samp{%%} that separates it
3095from the grammar rules.
3096
f8e1c9e5
AD
3097The Bison parser itself contains many macros and identifiers whose names
3098start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using
3099any such names (except those documented in this manual) in the epilogue
3100of the grammar file.
bfa74976 3101
342b8b6e 3102@node Symbols
bfa74976
RS
3103@section Symbols, Terminal and Nonterminal
3104@cindex nonterminal symbol
3105@cindex terminal symbol
3106@cindex token type
3107@cindex symbol
3108
3109@dfn{Symbols} in Bison grammars represent the grammatical classifications
3110of the language.
3111
3112A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
3113class of syntactically equivalent tokens. You use the symbol in grammar
3114rules to mean that a token in that class is allowed. The symbol is
3115represented in the Bison parser by a numeric code, and the @code{yylex}
f8e1c9e5
AD
3116function returns a token type code to indicate what kind of token has
3117been read. You don't need to know what the code value is; you can use
3118the symbol to stand for it.
bfa74976 3119
f8e1c9e5
AD
3120A @dfn{nonterminal symbol} stands for a class of syntactically
3121equivalent groupings. The symbol name is used in writing grammar rules.
3122By convention, it should be all lower case.
bfa74976 3123
eb8c66bb
JD
3124Symbol names can contain letters, underscores, periods, and non-initial
3125digits and dashes. Dashes in symbol names are a GNU extension, incompatible
3126with POSIX Yacc. Periods and dashes make symbol names less convenient to
3127use with named references, which require brackets around such names
3128(@pxref{Named References}). Terminal symbols that contain periods or dashes
3129make little sense: since they are not valid symbols (in most programming
3130languages) they are not exported as token names.
bfa74976 3131
931c7513 3132There are three ways of writing terminal symbols in the grammar:
bfa74976
RS
3133
3134@itemize @bullet
3135@item
3136A @dfn{named token type} is written with an identifier, like an
c827f760 3137identifier in C@. By convention, it should be all upper case. Each
bfa74976
RS
3138such name must be defined with a Bison declaration such as
3139@code{%token}. @xref{Token Decl, ,Token Type Names}.
3140
3141@item
3142@cindex character token
3143@cindex literal token
3144@cindex single-character literal
931c7513
RS
3145A @dfn{character token type} (or @dfn{literal character token}) is
3146written in the grammar using the same syntax used in C for character
3147constants; for example, @code{'+'} is a character token type. A
3148character token type doesn't need to be declared unless you need to
3149specify its semantic value data type (@pxref{Value Type, ,Data Types of
3150Semantic Values}), associativity, or precedence (@pxref{Precedence,
3151,Operator Precedence}).
bfa74976
RS
3152
3153By convention, a character token type is used only to represent a
3154token that consists of that particular character. Thus, the token
3155type @code{'+'} is used to represent the character @samp{+} as a
3156token. Nothing enforces this convention, but if you depart from it,
3157your program will confuse other readers.
3158
3159All the usual escape sequences used in character literals in C can be
3160used in Bison as well, but you must not use the null character as a
72d2299c
PE
3161character literal because its numeric code, zero, signifies
3162end-of-input (@pxref{Calling Convention, ,Calling Convention
2bfc2e2a
PE
3163for @code{yylex}}). Also, unlike standard C, trigraphs have no
3164special meaning in Bison character literals, nor is backslash-newline
3165allowed.
931c7513
RS
3166
3167@item
3168@cindex string token
3169@cindex literal string token
9ecbd125 3170@cindex multicharacter literal
931c7513
RS
3171A @dfn{literal string token} is written like a C string constant; for
3172example, @code{"<="} is a literal string token. A literal string token
3173doesn't need to be declared unless you need to specify its semantic
14ded682 3174value data type (@pxref{Value Type}), associativity, or precedence
931c7513
RS
3175(@pxref{Precedence}).
3176
3177You can associate the literal string token with a symbolic name as an
3178alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
3179Declarations}). If you don't do that, the lexical analyzer has to
3180retrieve the token number for the literal string token from the
3181@code{yytname} table (@pxref{Calling Convention}).
3182
c827f760 3183@strong{Warning}: literal string tokens do not work in Yacc.
931c7513
RS
3184
3185By convention, a literal string token is used only to represent a token
3186that consists of that particular string. Thus, you should use the token
3187type @code{"<="} to represent the string @samp{<=} as a token. Bison
9ecbd125 3188does not enforce this convention, but if you depart from it, people who
931c7513
RS
3189read your program will be confused.
3190
3191All the escape sequences used in string literals in C can be used in
92ac3705
PE
3192Bison as well, except that you must not use a null character within a
3193string literal. Also, unlike Standard C, trigraphs have no special
2bfc2e2a
PE
3194meaning in Bison string literals, nor is backslash-newline allowed. A
3195literal string token must contain two or more characters; for a token
3196containing just one character, use a character token (see above).
bfa74976
RS
3197@end itemize
3198
3199How you choose to write a terminal symbol has no effect on its
3200grammatical meaning. That depends only on where it appears in rules and
3201on when the parser function returns that symbol.
3202
72d2299c
PE
3203The value returned by @code{yylex} is always one of the terminal
3204symbols, except that a zero or negative value signifies end-of-input.
3205Whichever way you write the token type in the grammar rules, you write
3206it the same way in the definition of @code{yylex}. The numeric code
3207for a character token type is simply the positive numeric code of the
3208character, so @code{yylex} can use the identical value to generate the
3209requisite code, though you may need to convert it to @code{unsigned
3210char} to avoid sign-extension on hosts where @code{char} is signed.
9913d6e4
JD
3211Each named token type becomes a C macro in the parser implementation
3212file, so @code{yylex} can use the name to stand for the code. (This
3213is why periods don't make sense in terminal symbols.) @xref{Calling
3214Convention, ,Calling Convention for @code{yylex}}.
bfa74976
RS
3215
3216If @code{yylex} is defined in a separate file, you need to arrange for the
3217token-type macro definitions to be available there. Use the @samp{-d}
3218option when you run Bison, so that it will write these macro definitions
3219into a separate header file @file{@var{name}.tab.h} which you can include
3220in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
3221
72d2299c 3222If you want to write a grammar that is portable to any Standard C
9d9b8b70 3223host, you must use only nonnull character tokens taken from the basic
c827f760 3224execution character set of Standard C@. This set consists of the ten
72d2299c
PE
3225digits, the 52 lower- and upper-case English letters, and the
3226characters in the following C-language string:
3227
3228@example
3229"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~"
3230@end example
3231
f8e1c9e5
AD
3232The @code{yylex} function and Bison must use a consistent character set
3233and encoding for character tokens. For example, if you run Bison in an
35430378 3234ASCII environment, but then compile and run the resulting
f8e1c9e5 3235program in an environment that uses an incompatible character set like
35430378
JD
3236EBCDIC, the resulting program may not work because the tables
3237generated by Bison will assume ASCII numeric values for
f8e1c9e5
AD
3238character tokens. It is standard practice for software distributions to
3239contain C source files that were generated by Bison in an
35430378
JD
3240ASCII environment, so installers on platforms that are
3241incompatible with ASCII must rebuild those files before
f8e1c9e5 3242compiling them.
e966383b 3243
bfa74976
RS
3244The symbol @code{error} is a terminal symbol reserved for error recovery
3245(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
23c5a174
AD
3246In particular, @code{yylex} should never return this value. The default
3247value of the error token is 256, unless you explicitly assigned 256 to
3248one of your tokens with a @code{%token} declaration.
bfa74976 3249
342b8b6e 3250@node Rules
bfa74976
RS
3251@section Syntax of Grammar Rules
3252@cindex rule syntax
3253@cindex grammar rule syntax
3254@cindex syntax of grammar rules
3255
3256A Bison grammar rule has the following general form:
3257
3258@example
e425e872 3259@group
bfa74976
RS
3260@var{result}: @var{components}@dots{}
3261 ;
e425e872 3262@end group
bfa74976
RS
3263@end example
3264
3265@noindent
9ecbd125 3266where @var{result} is the nonterminal symbol that this rule describes,
bfa74976 3267and @var{components} are various terminal and nonterminal symbols that
13863333 3268are put together by this rule (@pxref{Symbols}).
bfa74976
RS
3269
3270For example,
3271
3272@example
3273@group
3274exp: exp '+' exp
3275 ;
3276@end group
3277@end example
3278
3279@noindent
3280says that two groupings of type @code{exp}, with a @samp{+} token in between,
3281can be combined into a larger grouping of type @code{exp}.
3282
72d2299c
PE
3283White space in rules is significant only to separate symbols. You can add
3284extra white space as you wish.
bfa74976
RS
3285
3286Scattered among the components can be @var{actions} that determine
3287the semantics of the rule. An action looks like this:
3288
3289@example
3290@{@var{C statements}@}
3291@end example
3292
3293@noindent
287c78f6
PE
3294@cindex braced code
3295This is an example of @dfn{braced code}, that is, C code surrounded by
3296braces, much like a compound statement in C@. Braced code can contain
3297any sequence of C tokens, so long as its braces are balanced. Bison
3298does not check the braced code for correctness directly; it merely
9913d6e4
JD
3299copies the code to the parser implementation file, where the C
3300compiler can check it.
287c78f6
PE
3301
3302Within braced code, the balanced-brace count is not affected by braces
3303within comments, string literals, or character constants, but it is
3304affected by the C digraphs @samp{<%} and @samp{%>} that represent
3305braces. At the top level braced code must be terminated by @samp{@}}
3306and not by a digraph. Bison does not look for trigraphs, so if braced
3307code uses trigraphs you should ensure that they do not affect the
3308nesting of braces or the boundaries of comments, string literals, or
3309character constants.
3310
bfa74976
RS
3311Usually there is only one action and it follows the components.
3312@xref{Actions}.
3313
3314@findex |
3315Multiple rules for the same @var{result} can be written separately or can
3316be joined with the vertical-bar character @samp{|} as follows:
3317
bfa74976
RS
3318@example
3319@group
3320@var{result}: @var{rule1-components}@dots{}
3321 | @var{rule2-components}@dots{}
3322 @dots{}
3323 ;
3324@end group
3325@end example
bfa74976
RS
3326
3327@noindent
3328They are still considered distinct rules even when joined in this way.
3329
3330If @var{components} in a rule is empty, it means that @var{result} can
3331match the empty string. For example, here is how to define a
3332comma-separated sequence of zero or more @code{exp} groupings:
3333
3334@example
3335@group
3336expseq: /* empty */
3337 | expseq1
3338 ;
3339@end group
3340
3341@group
3342expseq1: exp
3343 | expseq1 ',' exp
3344 ;
3345@end group
3346@end example
3347
3348@noindent
3349It is customary to write a comment @samp{/* empty */} in each rule
3350with no components.
3351
342b8b6e 3352@node Recursion
bfa74976
RS
3353@section Recursive Rules
3354@cindex recursive rule
3355
f8e1c9e5
AD
3356A rule is called @dfn{recursive} when its @var{result} nonterminal
3357appears also on its right hand side. Nearly all Bison grammars need to
3358use recursion, because that is the only way to define a sequence of any
3359number of a particular thing. Consider this recursive definition of a
9ecbd125 3360comma-separated sequence of one or more expressions:
bfa74976
RS
3361
3362@example
3363@group
3364expseq1: exp
3365 | expseq1 ',' exp
3366 ;
3367@end group
3368@end example
3369
3370@cindex left recursion
3371@cindex right recursion
3372@noindent
3373Since the recursive use of @code{expseq1} is the leftmost symbol in the
3374right hand side, we call this @dfn{left recursion}. By contrast, here
3375the same construct is defined using @dfn{right recursion}:
3376
3377@example
3378@group
3379expseq1: exp
3380 | exp ',' expseq1
3381 ;
3382@end group
3383@end example
3384
3385@noindent
ec3bc396
AD
3386Any kind of sequence can be defined using either left recursion or right
3387recursion, but you should always use left recursion, because it can
3388parse a sequence of any number of elements with bounded stack space.
3389Right recursion uses up space on the Bison stack in proportion to the
3390number of elements in the sequence, because all the elements must be
3391shifted onto the stack before the rule can be applied even once.
3392@xref{Algorithm, ,The Bison Parser Algorithm}, for further explanation
3393of this.
bfa74976
RS
3394
3395@cindex mutual recursion
3396@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
3397rule does not appear directly on its right hand side, but does appear
3398in rules for other nonterminals which do appear on its right hand
13863333 3399side.
bfa74976
RS
3400
3401For example:
3402
3403@example
3404@group
3405expr: primary
3406 | primary '+' primary
3407 ;
3408@end group
3409
3410@group
3411primary: constant
3412 | '(' expr ')'
3413 ;
3414@end group
3415@end example
3416
3417@noindent
3418defines two mutually-recursive nonterminals, since each refers to the
3419other.
3420
342b8b6e 3421@node Semantics
bfa74976
RS
3422@section Defining Language Semantics
3423@cindex defining language semantics
13863333 3424@cindex language semantics, defining
bfa74976
RS
3425
3426The grammar rules for a language determine only the syntax. The semantics
3427are determined by the semantic values associated with various tokens and
3428groupings, and by the actions taken when various groupings are recognized.
3429
3430For example, the calculator calculates properly because the value
3431associated with each expression is the proper number; it adds properly
3432because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
3433the numbers associated with @var{x} and @var{y}.
3434
3435@menu
3436* Value Type:: Specifying one data type for all semantic values.
3437* Multiple Types:: Specifying several alternative data types.
3438* Actions:: An action is the semantic definition of a grammar rule.
3439* Action Types:: Specifying data types for actions to operate on.
3440* Mid-Rule Actions:: Most actions go at the end of a rule.
3441 This says when, why and how to use the exceptional
3442 action in the middle of a rule.
3443@end menu
3444
342b8b6e 3445@node Value Type
bfa74976
RS
3446@subsection Data Types of Semantic Values
3447@cindex semantic value type
3448@cindex value type, semantic
3449@cindex data types of semantic values
3450@cindex default data type
3451
3452In a simple program it may be sufficient to use the same data type for
3453the semantic values of all language constructs. This was true in the
35430378 3454RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish
1964ad8c 3455Notation Calculator}).
bfa74976 3456
ddc8ede1
PE
3457Bison normally uses the type @code{int} for semantic values if your
3458program uses the same data type for all language constructs. To
bfa74976
RS
3459specify some other type, define @code{YYSTYPE} as a macro, like this:
3460
3461@example
3462#define YYSTYPE double
3463@end example
3464
3465@noindent
50cce58e
PE
3466@code{YYSTYPE}'s replacement list should be a type name
3467that does not contain parentheses or square brackets.
342b8b6e 3468This macro definition must go in the prologue of the grammar file
75f5aaea 3469(@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
bfa74976 3470
342b8b6e 3471@node Multiple Types
bfa74976
RS
3472@subsection More Than One Value Type
3473
3474In most programs, you will need different data types for different kinds
3475of tokens and groupings. For example, a numeric constant may need type
f8e1c9e5
AD
3476@code{int} or @code{long int}, while a string constant needs type
3477@code{char *}, and an identifier might need a pointer to an entry in the
3478symbol table.
bfa74976
RS
3479
3480To use more than one data type for semantic values in one parser, Bison
3481requires you to do two things:
3482
3483@itemize @bullet
3484@item
ddc8ede1 3485Specify the entire collection of possible data types, either by using the
704a47c4 3486@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of
ddc8ede1
PE
3487Value Types}), or by using a @code{typedef} or a @code{#define} to
3488define @code{YYSTYPE} to be a union type whose member names are
3489the type tags.
bfa74976
RS
3490
3491@item
14ded682
AD
3492Choose one of those types for each symbol (terminal or nonterminal) for
3493which semantic values are used. This is done for tokens with the
3494@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names})
3495and for groupings with the @code{%type} Bison declaration (@pxref{Type
3496Decl, ,Nonterminal Symbols}).
bfa74976
RS
3497@end itemize
3498
342b8b6e 3499@node Actions
bfa74976
RS
3500@subsection Actions
3501@cindex action
3502@vindex $$
3503@vindex $@var{n}
1f68dca5
AR
3504@vindex $@var{name}
3505@vindex $[@var{name}]
bfa74976
RS
3506
3507An action accompanies a syntactic rule and contains C code to be executed
3508each time an instance of that rule is recognized. The task of most actions
3509is to compute a semantic value for the grouping built by the rule from the
3510semantic values associated with tokens or smaller groupings.
3511
287c78f6
PE
3512An action consists of braced code containing C statements, and can be
3513placed at any position in the rule;
704a47c4
AD
3514it is executed at that position. Most rules have just one action at the
3515end of the rule, following all the components. Actions in the middle of
3516a rule are tricky and used only for special purposes (@pxref{Mid-Rule
3517Actions, ,Actions in Mid-Rule}).
bfa74976 3518
9913d6e4
JD
3519The C code in an action can refer to the semantic values of the
3520components matched by the rule with the construct @code{$@var{n}},
3521which stands for the value of the @var{n}th component. The semantic
3522value for the grouping being constructed is @code{$$}. In addition,
3523the semantic values of symbols can be accessed with the named
3524references construct @code{$@var{name}} or @code{$[@var{name}]}.
3525Bison translates both of these constructs into expressions of the
3526appropriate type when it copies the actions into the parser
3527implementation file. @code{$$} (or @code{$@var{name}}, when it stands
3528for the current grouping) is translated to a modifiable lvalue, so it
3529can be assigned to.
bfa74976
RS
3530
3531Here is a typical example:
3532
3533@example
3534@group
3535exp: @dots{}
3536 | exp '+' exp
3537 @{ $$ = $1 + $3; @}
3538@end group
3539@end example
3540
1f68dca5
AR
3541Or, in terms of named references:
3542
3543@example
3544@group
3545exp[result]: @dots{}
3546 | exp[left] '+' exp[right]
3547 @{ $result = $left + $right; @}
3548@end group
3549@end example
3550
bfa74976
RS
3551@noindent
3552This rule constructs an @code{exp} from two smaller @code{exp} groupings
3553connected by a plus-sign token. In the action, @code{$1} and @code{$3}
1f68dca5 3554(@code{$left} and @code{$right})
bfa74976
RS
3555refer to the semantic values of the two component @code{exp} groupings,
3556which are the first and third symbols on the right hand side of the rule.
1f68dca5
AR
3557The sum is stored into @code{$$} (@code{$result}) so that it becomes the
3558semantic value of
bfa74976
RS
3559the addition-expression just recognized by the rule. If there were a
3560useful semantic value associated with the @samp{+} token, it could be
e0c471a9 3561referred to as @code{$2}.
bfa74976 3562
ce24f7f5
JD
3563@xref{Named References}, for more information about using the named
3564references construct.
1f68dca5 3565
3ded9a63
AD
3566Note that the vertical-bar character @samp{|} is really a rule
3567separator, and actions are attached to a single rule. This is a
3568difference with tools like Flex, for which @samp{|} stands for either
3569``or'', or ``the same action as that of the next rule''. In the
3570following example, the action is triggered only when @samp{b} is found:
3571
3572@example
3573@group
3574a-or-b: 'a'|'b' @{ a_or_b_found = 1; @};
3575@end group
3576@end example
3577
bfa74976
RS
3578@cindex default action
3579If you don't specify an action for a rule, Bison supplies a default:
72f889cc
AD
3580@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule
3581becomes the value of the whole rule. Of course, the default action is
3582valid only if the two data types match. There is no meaningful default
3583action for an empty rule; every empty rule must have an explicit action
3584unless the rule's value does not matter.
bfa74976
RS
3585
3586@code{$@var{n}} with @var{n} zero or negative is allowed for reference
3587to tokens and groupings on the stack @emph{before} those that match the
3588current rule. This is a very risky practice, and to use it reliably
3589you must be certain of the context in which the rule is applied. Here
3590is a case in which you can use this reliably:
3591
3592@example
3593@group
3594foo: expr bar '+' expr @{ @dots{} @}
3595 | expr bar '-' expr @{ @dots{} @}
3596 ;
3597@end group
3598
3599@group
3600bar: /* empty */
3601 @{ previous_expr = $0; @}
3602 ;
3603@end group
3604@end example
3605
3606As long as @code{bar} is used only in the fashion shown here, @code{$0}
3607always refers to the @code{expr} which precedes @code{bar} in the
3608definition of @code{foo}.
3609
32c29292 3610@vindex yylval
742e4900 3611It is also possible to access the semantic value of the lookahead token, if
32c29292
JD
3612any, from a semantic action.
3613This semantic value is stored in @code{yylval}.
3614@xref{Action Features, ,Special Features for Use in Actions}.
3615
342b8b6e 3616@node Action Types
bfa74976
RS
3617@subsection Data Types of Values in Actions
3618@cindex action data types
3619@cindex data types in actions
3620
3621If you have chosen a single data type for semantic values, the @code{$$}
3622and @code{$@var{n}} constructs always have that data type.
3623
3624If you have used @code{%union} to specify a variety of data types, then you
3625must declare a choice among these types for each terminal or nonterminal
3626symbol that can have a semantic value. Then each time you use @code{$$} or
3627@code{$@var{n}}, its data type is determined by which symbol it refers to
e0c471a9 3628in the rule. In this example,
bfa74976
RS
3629
3630@example
3631@group
3632exp: @dots{}
3633 | exp '+' exp
3634 @{ $$ = $1 + $3; @}
3635@end group
3636@end example
3637
3638@noindent
3639@code{$1} and @code{$3} refer to instances of @code{exp}, so they all
3640have the data type declared for the nonterminal symbol @code{exp}. If
3641@code{$2} were used, it would have the data type declared for the
e0c471a9 3642terminal symbol @code{'+'}, whatever that might be.
bfa74976
RS
3643
3644Alternatively, you can specify the data type when you refer to the value,
3645by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
3646reference. For example, if you have defined types as shown here:
3647
3648@example
3649@group
3650%union @{
3651 int itype;
3652 double dtype;
3653@}
3654@end group
3655@end example
3656
3657@noindent
3658then you can write @code{$<itype>1} to refer to the first subunit of the
3659rule as an integer, or @code{$<dtype>1} to refer to it as a double.
3660
342b8b6e 3661@node Mid-Rule Actions
bfa74976
RS
3662@subsection Actions in Mid-Rule
3663@cindex actions in mid-rule
3664@cindex mid-rule actions
3665
3666Occasionally it is useful to put an action in the middle of a rule.
3667These actions are written just like usual end-of-rule actions, but they
3668are executed before the parser even recognizes the following components.
3669
3670A mid-rule action may refer to the components preceding it using
3671@code{$@var{n}}, but it may not refer to subsequent components because
3672it is run before they are parsed.
3673
3674The mid-rule action itself counts as one of the components of the rule.
3675This makes a difference when there is another action later in the same rule
3676(and usually there is another at the end): you have to count the actions
3677along with the symbols when working out which number @var{n} to use in
3678@code{$@var{n}}.
3679
3680The mid-rule action can also have a semantic value. The action can set
3681its value with an assignment to @code{$$}, and actions later in the rule
3682can refer to the value using @code{$@var{n}}. Since there is no symbol
3683to name the action, there is no way to declare a data type for the value
fdc6758b
MA
3684in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to
3685specify a data type each time you refer to this value.
bfa74976
RS
3686
3687There is no way to set the value of the entire rule with a mid-rule
3688action, because assignments to @code{$$} do not have that effect. The
3689only way to set the value for the entire rule is with an ordinary action
3690at the end of the rule.
3691
3692Here is an example from a hypothetical compiler, handling a @code{let}
3693statement that looks like @samp{let (@var{variable}) @var{statement}} and
3694serves to create a variable named @var{variable} temporarily for the
3695duration of @var{statement}. To parse this construct, we must put
3696@var{variable} into the symbol table while @var{statement} is parsed, then
3697remove it afterward. Here is how it is done:
3698
3699@example
3700@group
3701stmt: LET '(' var ')'
3702 @{ $<context>$ = push_context ();
3703 declare_variable ($3); @}
3704 stmt @{ $$ = $6;
3705 pop_context ($<context>5); @}
3706@end group
3707@end example
3708
3709@noindent
3710As soon as @samp{let (@var{variable})} has been recognized, the first
3711action is run. It saves a copy of the current semantic context (the
3712list of accessible variables) as its semantic value, using alternative
3713@code{context} in the data-type union. Then it calls
3714@code{declare_variable} to add the new variable to that list. Once the
3715first action is finished, the embedded statement @code{stmt} can be
3716parsed. Note that the mid-rule action is component number 5, so the
3717@samp{stmt} is component number 6.
3718
3719After the embedded statement is parsed, its semantic value becomes the
3720value of the entire @code{let}-statement. Then the semantic value from the
3721earlier action is used to restore the prior list of variables. This
3722removes the temporary @code{let}-variable from the list so that it won't
3723appear to exist while the rest of the program is parsed.
3724
841a7737
JD
3725@findex %destructor
3726@cindex discarded symbols, mid-rule actions
3727@cindex error recovery, mid-rule actions
3728In the above example, if the parser initiates error recovery (@pxref{Error
3729Recovery}) while parsing the tokens in the embedded statement @code{stmt},
3730it might discard the previous semantic context @code{$<context>5} without
3731restoring it.
3732Thus, @code{$<context>5} needs a destructor (@pxref{Destructor Decl, , Freeing
3733Discarded Symbols}).
ec5479ce
JD
3734However, Bison currently provides no means to declare a destructor specific to
3735a particular mid-rule action's semantic value.
841a7737
JD
3736
3737One solution is to bury the mid-rule action inside a nonterminal symbol and to
3738declare a destructor for that symbol:
3739
3740@example
3741@group
3742%type <context> let
3743%destructor @{ pop_context ($$); @} let
3744
3745%%
3746
3747stmt: let stmt
3748 @{ $$ = $2;
3749 pop_context ($1); @}
3750 ;
3751
3752let: LET '(' var ')'
3753 @{ $$ = push_context ();
3754 declare_variable ($3); @}
3755 ;
3756
3757@end group
3758@end example
3759
3760@noindent
3761Note that the action is now at the end of its rule.
3762Any mid-rule action can be converted to an end-of-rule action in this way, and
3763this is what Bison actually does to implement mid-rule actions.
3764
bfa74976
RS
3765Taking action before a rule is completely recognized often leads to
3766conflicts since the parser must commit to a parse in order to execute the
3767action. For example, the following two rules, without mid-rule actions,
3768can coexist in a working parser because the parser can shift the open-brace
3769token and look at what follows before deciding whether there is a
3770declaration or not:
3771
3772@example
3773@group
3774compound: '@{' declarations statements '@}'
3775 | '@{' statements '@}'
3776 ;
3777@end group
3778@end example
3779
3780@noindent
3781But when we add a mid-rule action as follows, the rules become nonfunctional:
3782
3783@example
3784@group
3785compound: @{ prepare_for_local_variables (); @}
3786 '@{' declarations statements '@}'
3787@end group
3788@group
3789 | '@{' statements '@}'
3790 ;
3791@end group
3792@end example
3793
3794@noindent
3795Now the parser is forced to decide whether to run the mid-rule action
3796when it has read no farther than the open-brace. In other words, it
3797must commit to using one rule or the other, without sufficient
3798information to do it correctly. (The open-brace token is what is called
742e4900
JD
3799the @dfn{lookahead} token at this time, since the parser is still
3800deciding what to do about it. @xref{Lookahead, ,Lookahead Tokens}.)
bfa74976
RS
3801
3802You might think that you could correct the problem by putting identical
3803actions into the two rules, like this:
3804
3805@example
3806@group
3807compound: @{ prepare_for_local_variables (); @}
3808 '@{' declarations statements '@}'
3809 | @{ prepare_for_local_variables (); @}
3810 '@{' statements '@}'
3811 ;
3812@end group
3813@end example
3814
3815@noindent
3816But this does not help, because Bison does not realize that the two actions
3817are identical. (Bison never tries to understand the C code in an action.)
3818
3819If the grammar is such that a declaration can be distinguished from a
3820statement by the first token (which is true in C), then one solution which
3821does work is to put the action after the open-brace, like this:
3822
3823@example
3824@group
3825compound: '@{' @{ prepare_for_local_variables (); @}
3826 declarations statements '@}'
3827 | '@{' statements '@}'
3828 ;
3829@end group
3830@end example
3831
3832@noindent
3833Now the first token of the following declaration or statement,
3834which would in any case tell Bison which rule to use, can still do so.
3835
3836Another solution is to bury the action inside a nonterminal symbol which
3837serves as a subroutine:
3838
3839@example
3840@group
3841subroutine: /* empty */
3842 @{ prepare_for_local_variables (); @}
3843 ;
3844
3845@end group
3846
3847@group
3848compound: subroutine
3849 '@{' declarations statements '@}'
3850 | subroutine
3851 '@{' statements '@}'
3852 ;
3853@end group
3854@end example
3855
3856@noindent
3857Now Bison can execute the action in the rule for @code{subroutine} without
841a7737 3858deciding which rule for @code{compound} it will eventually use.
bfa74976 3859
7404cdf3 3860@node Tracking Locations
847bf1f5
AD
3861@section Tracking Locations
3862@cindex location
95923bd6
AD
3863@cindex textual location
3864@cindex location, textual
847bf1f5
AD
3865
3866Though grammar rules and semantic actions are enough to write a fully
72d2299c 3867functional parser, it can be useful to process some additional information,
3e259915
MA
3868especially symbol locations.
3869
704a47c4
AD
3870The way locations are handled is defined by providing a data type, and
3871actions to take when rules are matched.
847bf1f5
AD
3872
3873@menu
3874* Location Type:: Specifying a data type for locations.
3875* Actions and Locations:: Using locations in actions.
3876* Location Default Action:: Defining a general way to compute locations.
3877@end menu
3878
342b8b6e 3879@node Location Type
847bf1f5
AD
3880@subsection Data Type of Locations
3881@cindex data type of locations
3882@cindex default location type
3883
3884Defining a data type for locations is much simpler than for semantic values,
3885since all tokens and groupings always use the same type.
3886
50cce58e
PE
3887You can specify the type of locations by defining a macro called
3888@code{YYLTYPE}, just as you can specify the semantic value type by
ddc8ede1 3889defining a @code{YYSTYPE} macro (@pxref{Value Type}).
847bf1f5
AD
3890When @code{YYLTYPE} is not defined, Bison uses a default structure type with
3891four members:
3892
3893@example
6273355b 3894typedef struct YYLTYPE
847bf1f5
AD
3895@{
3896 int first_line;
3897 int first_column;
3898 int last_line;
3899 int last_column;
6273355b 3900@} YYLTYPE;
847bf1f5
AD
3901@end example
3902
8fbbeba2
AD
3903When @code{YYLTYPE} is not defined, at the beginning of the parsing, Bison
3904initializes all these fields to 1 for @code{yylloc}. To initialize
3905@code{yylloc} with a custom location type (or to chose a different
3906initialization), use the @code{%initial-action} directive. @xref{Initial
3907Action Decl, , Performing Actions before Parsing}.
cd48d21d 3908
342b8b6e 3909@node Actions and Locations
847bf1f5
AD
3910@subsection Actions and Locations
3911@cindex location actions
3912@cindex actions, location
3913@vindex @@$
3914@vindex @@@var{n}
1f68dca5
AR
3915@vindex @@@var{name}
3916@vindex @@[@var{name}]
847bf1f5
AD
3917
3918Actions are not only useful for defining language semantics, but also for
3919describing the behavior of the output parser with locations.
3920
3921The most obvious way for building locations of syntactic groupings is very
72d2299c 3922similar to the way semantic values are computed. In a given rule, several
847bf1f5
AD
3923constructs can be used to access the locations of the elements being matched.
3924The location of the @var{n}th component of the right hand side is
3925@code{@@@var{n}}, while the location of the left hand side grouping is
3926@code{@@$}.
3927
1f68dca5
AR
3928In addition, the named references construct @code{@@@var{name}} and
3929@code{@@[@var{name}]} may also be used to address the symbol locations.
ce24f7f5
JD
3930@xref{Named References}, for more information about using the named
3931references construct.
1f68dca5 3932
3e259915 3933Here is a basic example using the default data type for locations:
847bf1f5
AD
3934
3935@example
3936@group
3937exp: @dots{}
3e259915 3938 | exp '/' exp
847bf1f5 3939 @{
3e259915
MA
3940 @@$.first_column = @@1.first_column;
3941 @@$.first_line = @@1.first_line;
847bf1f5
AD
3942 @@$.last_column = @@3.last_column;
3943 @@$.last_line = @@3.last_line;
3e259915
MA
3944 if ($3)
3945 $$ = $1 / $3;
3946 else
3947 @{
3948 $$ = 1;
4e03e201
AD
3949 fprintf (stderr,
3950 "Division by zero, l%d,c%d-l%d,c%d",
3951 @@3.first_line, @@3.first_column,
3952 @@3.last_line, @@3.last_column);
3e259915 3953 @}
847bf1f5
AD
3954 @}
3955@end group
3956@end example
3957
3e259915 3958As for semantic values, there is a default action for locations that is
72d2299c 3959run each time a rule is matched. It sets the beginning of @code{@@$} to the
3e259915 3960beginning of the first symbol, and the end of @code{@@$} to the end of the
79282c6c 3961last symbol.
3e259915 3962
72d2299c 3963With this default action, the location tracking can be fully automatic. The
3e259915
MA
3964example above simply rewrites this way:
3965
3966@example
3967@group
3968exp: @dots{}
3969 | exp '/' exp
3970 @{
3971 if ($3)
3972 $$ = $1 / $3;
3973 else
3974 @{
3975 $$ = 1;
4e03e201
AD
3976 fprintf (stderr,
3977 "Division by zero, l%d,c%d-l%d,c%d",
3978 @@3.first_line, @@3.first_column,
3979 @@3.last_line, @@3.last_column);
3e259915
MA
3980 @}
3981 @}
3982@end group
3983@end example
847bf1f5 3984
32c29292 3985@vindex yylloc
742e4900 3986It is also possible to access the location of the lookahead token, if any,
32c29292
JD
3987from a semantic action.
3988This location is stored in @code{yylloc}.
3989@xref{Action Features, ,Special Features for Use in Actions}.
3990
342b8b6e 3991@node Location Default Action
847bf1f5
AD
3992@subsection Default Action for Locations
3993@vindex YYLLOC_DEFAULT
35430378 3994@cindex GLR parsers and @code{YYLLOC_DEFAULT}
847bf1f5 3995
72d2299c 3996Actually, actions are not the best place to compute locations. Since
704a47c4
AD
3997locations are much more general than semantic values, there is room in
3998the output parser to redefine the default action to take for each
72d2299c 3999rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
96b93a3d
PE
4000matched, before the associated action is run. It is also invoked
4001while processing a syntax error, to compute the error's location.
35430378 4002Before reporting an unresolvable syntactic ambiguity, a GLR
8710fc41
JD
4003parser invokes @code{YYLLOC_DEFAULT} recursively to compute the location
4004of that ambiguity.
847bf1f5 4005
3e259915 4006Most of the time, this macro is general enough to suppress location
79282c6c 4007dedicated code from semantic actions.
847bf1f5 4008
72d2299c 4009The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
96b93a3d 4010the location of the grouping (the result of the computation). When a
766de5eb 4011rule is matched, the second parameter identifies locations of
96b93a3d 4012all right hand side elements of the rule being matched, and the third
8710fc41 4013parameter is the size of the rule's right hand side.
35430378 4014When a GLR parser reports an ambiguity, which of multiple candidate
8710fc41
JD
4015right hand sides it passes to @code{YYLLOC_DEFAULT} is undefined.
4016When processing a syntax error, the second parameter identifies locations
4017of the symbols that were discarded during error processing, and the third
96b93a3d 4018parameter is the number of discarded symbols.
847bf1f5 4019
766de5eb 4020By default, @code{YYLLOC_DEFAULT} is defined this way:
847bf1f5 4021
766de5eb 4022@smallexample
847bf1f5 4023@group
766de5eb
PE
4024# define YYLLOC_DEFAULT(Current, Rhs, N) \
4025 do \
4026 if (N) \
4027 @{ \
4028 (Current).first_line = YYRHSLOC(Rhs, 1).first_line; \
4029 (Current).first_column = YYRHSLOC(Rhs, 1).first_column; \
4030 (Current).last_line = YYRHSLOC(Rhs, N).last_line; \
4031 (Current).last_column = YYRHSLOC(Rhs, N).last_column; \
4032 @} \
4033 else \
4034 @{ \
4035 (Current).first_line = (Current).last_line = \
4036 YYRHSLOC(Rhs, 0).last_line; \
4037 (Current).first_column = (Current).last_column = \
4038 YYRHSLOC(Rhs, 0).last_column; \
4039 @} \
4040 while (0)
847bf1f5 4041@end group
766de5eb 4042@end smallexample
676385e2 4043
2c0f9706 4044@noindent
766de5eb
PE
4045where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol
4046in @var{rhs} when @var{k} is positive, and the location of the symbol
f28ac696 4047just before the reduction when @var{k} and @var{n} are both zero.
676385e2 4048
3e259915 4049When defining @code{YYLLOC_DEFAULT}, you should consider that:
847bf1f5 4050
3e259915 4051@itemize @bullet
79282c6c 4052@item
72d2299c 4053All arguments are free of side-effects. However, only the first one (the
3e259915 4054result) should be modified by @code{YYLLOC_DEFAULT}.
847bf1f5 4055
3e259915 4056@item
766de5eb
PE
4057For consistency with semantic actions, valid indexes within the
4058right hand side range from 1 to @var{n}. When @var{n} is zero, only 0 is a
4059valid index, and it refers to the symbol just before the reduction.
4060During error processing @var{n} is always positive.
0ae99356
PE
4061
4062@item
4063Your macro should parenthesize its arguments, if need be, since the
4064actual arguments may not be surrounded by parentheses. Also, your
4065macro should expand to something that can be used as a single
4066statement when it is followed by a semicolon.
3e259915 4067@end itemize
847bf1f5 4068
908c8647 4069@node Named References
ce24f7f5 4070@section Named References
908c8647
JD
4071@cindex named references
4072
7d31f092
JD
4073As described in the preceding sections, the traditional way to refer to any
4074semantic value or location is a @dfn{positional reference}, which takes the
4075form @code{$@var{n}}, @code{$$}, @code{@@@var{n}}, and @code{@@$}. However,
4076such a reference is not very descriptive. Moreover, if you later decide to
4077insert or remove symbols in the right-hand side of a grammar rule, the need
4078to renumber such references can be tedious and error-prone.
4079
4080To avoid these issues, you can also refer to a semantic value or location
4081using a @dfn{named reference}. First of all, original symbol names may be
4082used as named references. For example:
908c8647
JD
4083
4084@example
4085@group
4086invocation: op '(' args ')'
4087 @{ $invocation = new_invocation ($op, $args, @@invocation); @}
4088@end group
4089@end example
4090
4091@noindent
7d31f092 4092Positional and named references can be mixed arbitrarily. For example:
908c8647
JD
4093
4094@example
4095@group
4096invocation: op '(' args ')'
4097 @{ $$ = new_invocation ($op, $args, @@$); @}
4098@end group
4099@end example
4100
4101@noindent
4102However, sometimes regular symbol names are not sufficient due to
4103ambiguities:
4104
4105@example
4106@group
4107exp: exp '/' exp
4108 @{ $exp = $exp / $exp; @} // $exp is ambiguous.
4109
4110exp: exp '/' exp
4111 @{ $$ = $1 / $exp; @} // One usage is ambiguous.
4112
4113exp: exp '/' exp
4114 @{ $$ = $1 / $3; @} // No error.
4115@end group
4116@end example
4117
4118@noindent
4119When ambiguity occurs, explicitly declared names may be used for values and
4120locations. Explicit names are declared as a bracketed name after a symbol
4121appearance in rule definitions. For example:
4122@example
4123@group
4124exp[result]: exp[left] '/' exp[right]
4125 @{ $result = $left / $right; @}
4126@end group
4127@end example
4128
4129@noindent
ce24f7f5
JD
4130In order to access a semantic value generated by a mid-rule action, an
4131explicit name may also be declared by putting a bracketed name after the
4132closing brace of the mid-rule action code:
908c8647
JD
4133@example
4134@group
4135exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
4136 @{ $res = $left + $right; @}
4137@end group
4138@end example
4139
4140@noindent
4141
4142In references, in order to specify names containing dots and dashes, an explicit
4143bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
4144@example
4145@group
14f4455e 4146if-stmt: "if" '(' expr ')' "then" then.stmt ';'
908c8647
JD
4147 @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
4148@end group
4149@end example
4150
4151It often happens that named references are followed by a dot, dash or other
4152C punctuation marks and operators. By default, Bison will read
ce24f7f5
JD
4153@samp{$name.suffix} as a reference to symbol value @code{$name} followed by
4154@samp{.suffix}, i.e., an access to the @code{suffix} field of the semantic
4155value. In order to force Bison to recognize @samp{name.suffix} in its
4156entirety as the name of a semantic value, the bracketed syntax
4157@samp{$[name.suffix]} must be used.
4158
4159The named references feature is experimental. More user feedback will help
4160to stabilize it.
908c8647 4161
342b8b6e 4162@node Declarations
bfa74976
RS
4163@section Bison Declarations
4164@cindex declarations, Bison
4165@cindex Bison declarations
4166
4167The @dfn{Bison declarations} section of a Bison grammar defines the symbols
4168used in formulating the grammar and the data types of semantic values.
4169@xref{Symbols}.
4170
4171All token type names (but not single-character literal tokens such as
4172@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
4173declared if you need to specify which data type to use for the semantic
4174value (@pxref{Multiple Types, ,More Than One Value Type}).
4175
9913d6e4
JD
4176The first rule in the grammar file also specifies the start symbol, by
4177default. If you want some other symbol to be the start symbol, you
4178must declare it explicitly (@pxref{Language and Grammar, ,Languages
4179and Context-Free Grammars}).
bfa74976
RS
4180
4181@menu
b50d2359 4182* Require Decl:: Requiring a Bison version.
bfa74976
RS
4183* Token Decl:: Declaring terminal symbols.
4184* Precedence Decl:: Declaring terminals with precedence and associativity.
4185* Union Decl:: Declaring the set of all semantic value types.
4186* Type Decl:: Declaring the choice of type for a nonterminal symbol.
18d192f0 4187* Initial Action Decl:: Code run before parsing starts.
72f889cc 4188* Destructor Decl:: Declaring how symbols are freed.
d6328241 4189* Expect Decl:: Suppressing warnings about parsing conflicts.
bfa74976
RS
4190* Start Decl:: Specifying the start symbol.
4191* Pure Decl:: Requesting a reentrant parser.
9987d1b3 4192* Push Decl:: Requesting a push parser.
bfa74976 4193* Decl Summary:: Table of all Bison declarations.
2f4518a1 4194* %define Summary:: Defining variables to adjust Bison's behavior.
8e6f2266 4195* %code Summary:: Inserting code into the parser source.
bfa74976
RS
4196@end menu
4197
b50d2359
AD
4198@node Require Decl
4199@subsection Require a Version of Bison
4200@cindex version requirement
4201@cindex requiring a version of Bison
4202@findex %require
4203
4204You may require the minimum version of Bison to process the grammar. If
9b8a5ce0
AD
4205the requirement is not met, @command{bison} exits with an error (exit
4206status 63).
b50d2359
AD
4207
4208@example
4209%require "@var{version}"
4210@end example
4211
342b8b6e 4212@node Token Decl
bfa74976
RS
4213@subsection Token Type Names
4214@cindex declaring token type names
4215@cindex token type names, declaring
931c7513 4216@cindex declaring literal string tokens
bfa74976
RS
4217@findex %token
4218
4219The basic way to declare a token type name (terminal symbol) is as follows:
4220
4221@example
4222%token @var{name}
4223@end example
4224
4225Bison will convert this into a @code{#define} directive in
4226the parser, so that the function @code{yylex} (if it is in this file)
4227can use the name @var{name} to stand for this token type's code.
4228
14ded682
AD
4229Alternatively, you can use @code{%left}, @code{%right}, or
4230@code{%nonassoc} instead of @code{%token}, if you wish to specify
4231associativity and precedence. @xref{Precedence Decl, ,Operator
4232Precedence}.
bfa74976
RS
4233
4234You can explicitly specify the numeric code for a token type by appending
b1cc23c4 4235a nonnegative decimal or hexadecimal integer value in the field immediately
1452af69 4236following the token name:
bfa74976
RS
4237
4238@example
4239%token NUM 300
1452af69 4240%token XNUM 0x12d // a GNU extension
bfa74976
RS
4241@end example
4242
4243@noindent
4244It is generally best, however, to let Bison choose the numeric codes for
4245all token types. Bison will automatically select codes that don't conflict
e966383b 4246with each other or with normal characters.
bfa74976
RS
4247
4248In the event that the stack type is a union, you must augment the
4249@code{%token} or other token declaration to include the data type
704a47c4
AD
4250alternative delimited by angle-brackets (@pxref{Multiple Types, ,More
4251Than One Value Type}).
bfa74976
RS
4252
4253For example:
4254
4255@example
4256@group
4257%union @{ /* define stack type */
4258 double val;
4259 symrec *tptr;
4260@}
4261%token <val> NUM /* define token NUM and its type */
4262@end group
4263@end example
4264
931c7513
RS
4265You can associate a literal string token with a token type name by
4266writing the literal string at the end of a @code{%token}
4267declaration which declares the name. For example:
4268
4269@example
4270%token arrow "=>"
4271@end example
4272
4273@noindent
4274For example, a grammar for the C language might specify these names with
4275equivalent literal string tokens:
4276
4277@example
4278%token <operator> OR "||"
4279%token <operator> LE 134 "<="
4280%left OR "<="
4281@end example
4282
4283@noindent
4284Once you equate the literal string and the token name, you can use them
4285interchangeably in further declarations or the grammar rules. The
4286@code{yylex} function can use the token name or the literal string to
4287obtain the token type code number (@pxref{Calling Convention}).
b1cc23c4
JD
4288Syntax error messages passed to @code{yyerror} from the parser will reference
4289the literal string instead of the token name.
4290
4291The token numbered as 0 corresponds to end of file; the following line
4292allows for nicer error messages referring to ``end of file'' instead
4293of ``$end'':
4294
4295@example
4296%token END 0 "end of file"
4297@end example
931c7513 4298
342b8b6e 4299@node Precedence Decl
bfa74976
RS
4300@subsection Operator Precedence
4301@cindex precedence declarations
4302@cindex declaring operator precedence
4303@cindex operator precedence, declaring
4304
4305Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
4306declare a token and specify its precedence and associativity, all at
4307once. These are called @dfn{precedence declarations}.
704a47c4
AD
4308@xref{Precedence, ,Operator Precedence}, for general information on
4309operator precedence.
bfa74976 4310
ab7f29f8 4311The syntax of a precedence declaration is nearly the same as that of
bfa74976
RS
4312@code{%token}: either
4313
4314@example
4315%left @var{symbols}@dots{}
4316@end example
4317
4318@noindent
4319or
4320
4321@example
4322%left <@var{type}> @var{symbols}@dots{}
4323@end example
4324
4325And indeed any of these declarations serves the purposes of @code{%token}.
4326But in addition, they specify the associativity and relative precedence for
4327all the @var{symbols}:
4328
4329@itemize @bullet
4330@item
4331The associativity of an operator @var{op} determines how repeated uses
4332of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
4333@var{z}} is parsed by grouping @var{x} with @var{y} first or by
4334grouping @var{y} with @var{z} first. @code{%left} specifies
4335left-associativity (grouping @var{x} with @var{y} first) and
4336@code{%right} specifies right-associativity (grouping @var{y} with
4337@var{z} first). @code{%nonassoc} specifies no associativity, which
4338means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
4339considered a syntax error.
4340
4341@item
4342The precedence of an operator determines how it nests with other operators.
4343All the tokens declared in a single precedence declaration have equal
4344precedence and nest together according to their associativity.
4345When two tokens declared in different precedence declarations associate,
4346the one declared later has the higher precedence and is grouped first.
4347@end itemize
4348
ab7f29f8
JD
4349For backward compatibility, there is a confusing difference between the
4350argument lists of @code{%token} and precedence declarations.
4351Only a @code{%token} can associate a literal string with a token type name.
4352A precedence declaration always interprets a literal string as a reference to a
4353separate token.
4354For example:
4355
4356@example
4357%left OR "<=" // Does not declare an alias.
4358%left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=".
4359@end example
4360
342b8b6e 4361@node Union Decl
bfa74976
RS
4362@subsection The Collection of Value Types
4363@cindex declaring value types
4364@cindex value types, declaring
4365@findex %union
4366
287c78f6
PE
4367The @code{%union} declaration specifies the entire collection of
4368possible data types for semantic values. The keyword @code{%union} is
4369followed by braced code containing the same thing that goes inside a
4370@code{union} in C@.
bfa74976
RS
4371
4372For example:
4373
4374@example
4375@group
4376%union @{
4377 double val;
4378 symrec *tptr;
4379@}
4380@end group
4381@end example
4382
4383@noindent
4384This says that the two alternative types are @code{double} and @code{symrec
4385*}. They are given names @code{val} and @code{tptr}; these names are used
4386in the @code{%token} and @code{%type} declarations to pick one of the types
4387for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
4388
35430378 4389As an extension to POSIX, a tag is allowed after the
6273355b
PE
4390@code{union}. For example:
4391
4392@example
4393@group
4394%union value @{
4395 double val;
4396 symrec *tptr;
4397@}
4398@end group
4399@end example
4400
d6ca7905 4401@noindent
6273355b
PE
4402specifies the union tag @code{value}, so the corresponding C type is
4403@code{union value}. If you do not specify a tag, it defaults to
4404@code{YYSTYPE}.
4405
35430378 4406As another extension to POSIX, you may specify multiple
d6ca7905
PE
4407@code{%union} declarations; their contents are concatenated. However,
4408only the first @code{%union} declaration can specify a tag.
4409
6273355b 4410Note that, unlike making a @code{union} declaration in C, you need not write
bfa74976
RS
4411a semicolon after the closing brace.
4412
ddc8ede1
PE
4413Instead of @code{%union}, you can define and use your own union type
4414@code{YYSTYPE} if your grammar contains at least one
4415@samp{<@var{type}>} tag. For example, you can put the following into
4416a header file @file{parser.h}:
4417
4418@example
4419@group
4420union YYSTYPE @{
4421 double val;
4422 symrec *tptr;
4423@};
4424typedef union YYSTYPE YYSTYPE;
4425@end group
4426@end example
4427
4428@noindent
4429and then your grammar can use the following
4430instead of @code{%union}:
4431
4432@example
4433@group
4434%@{
4435#include "parser.h"
4436%@}
4437%type <val> expr
4438%token <tptr> ID
4439@end group
4440@end example
4441
342b8b6e 4442@node Type Decl
bfa74976
RS
4443@subsection Nonterminal Symbols
4444@cindex declaring value types, nonterminals
4445@cindex value types, nonterminals, declaring
4446@findex %type
4447
4448@noindent
4449When you use @code{%union} to specify multiple value types, you must
4450declare the value type of each nonterminal symbol for which values are
4451used. This is done with a @code{%type} declaration, like this:
4452
4453@example
4454%type <@var{type}> @var{nonterminal}@dots{}
4455@end example
4456
4457@noindent
704a47c4
AD
4458Here @var{nonterminal} is the name of a nonterminal symbol, and
4459@var{type} is the name given in the @code{%union} to the alternative
4460that you want (@pxref{Union Decl, ,The Collection of Value Types}). You
4461can give any number of nonterminal symbols in the same @code{%type}
4462declaration, if they have the same value type. Use spaces to separate
4463the symbol names.
bfa74976 4464
931c7513
RS
4465You can also declare the value type of a terminal symbol. To do this,
4466use the same @code{<@var{type}>} construction in a declaration for the
4467terminal symbol. All kinds of token declarations allow
4468@code{<@var{type}>}.
4469
18d192f0
AD
4470@node Initial Action Decl
4471@subsection Performing Actions before Parsing
4472@findex %initial-action
4473
4474Sometimes your parser needs to perform some initializations before
4475parsing. The @code{%initial-action} directive allows for such arbitrary
4476code.
4477
4478@deffn {Directive} %initial-action @{ @var{code} @}
4479@findex %initial-action
287c78f6 4480Declare that the braced @var{code} must be invoked before parsing each time
451364ed 4481@code{yyparse} is called. The @var{code} may use @code{$$} and
742e4900 4482@code{@@$} --- initial value and location of the lookahead --- and the
451364ed 4483@code{%parse-param}.
18d192f0
AD
4484@end deffn
4485
451364ed
AD
4486For instance, if your locations use a file name, you may use
4487
4488@example
48b16bbc 4489%parse-param @{ char const *file_name @};
451364ed
AD
4490%initial-action
4491@{
4626a15d 4492 @@$.initialize (file_name);
451364ed
AD
4493@};
4494@end example
4495
18d192f0 4496
72f889cc
AD
4497@node Destructor Decl
4498@subsection Freeing Discarded Symbols
4499@cindex freeing discarded symbols
4500@findex %destructor
12e35840 4501@findex <*>
3ebecc24 4502@findex <>
a85284cf
AD
4503During error recovery (@pxref{Error Recovery}), symbols already pushed
4504on the stack and tokens coming from the rest of the file are discarded
4505until the parser falls on its feet. If the parser runs out of memory,
9d9b8b70 4506or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the
a85284cf
AD
4507symbols on the stack must be discarded. Even if the parser succeeds, it
4508must discard the start symbol.
258b75ca
PE
4509
4510When discarded symbols convey heap based information, this memory is
4511lost. While this behavior can be tolerable for batch parsers, such as
4b367315
AD
4512in traditional compilers, it is unacceptable for programs like shells or
4513protocol implementations that may parse and execute indefinitely.
258b75ca 4514
a85284cf
AD
4515The @code{%destructor} directive defines code that is called when a
4516symbol is automatically discarded.
72f889cc
AD
4517
4518@deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
4519@findex %destructor
287c78f6
PE
4520Invoke the braced @var{code} whenever the parser discards one of the
4521@var{symbols}.
4b367315 4522Within @var{code}, @code{$$} designates the semantic value associated
ec5479ce
JD
4523with the discarded symbol, and @code{@@$} designates its location.
4524The additional parser parameters are also available (@pxref{Parser Function, ,
4525The Parser Function @code{yyparse}}).
ec5479ce 4526
b2a0b7ca
JD
4527When a symbol is listed among @var{symbols}, its @code{%destructor} is called a
4528per-symbol @code{%destructor}.
4529You may also define a per-type @code{%destructor} by listing a semantic type
12e35840 4530tag among @var{symbols}.
b2a0b7ca 4531In that case, the parser will invoke this @var{code} whenever it discards any
12e35840 4532grammar symbol that has that semantic type tag unless that symbol has its own
b2a0b7ca
JD
4533per-symbol @code{%destructor}.
4534
12e35840 4535Finally, you can define two different kinds of default @code{%destructor}s.
85894313
JD
4536(These default forms are experimental.
4537More user feedback will help to determine whether they should become permanent
4538features.)
3ebecc24 4539You can place each of @code{<*>} and @code{<>} in the @var{symbols} list of
12e35840
JD
4540exactly one @code{%destructor} declaration in your grammar file.
4541The parser will invoke the @var{code} associated with one of these whenever it
4542discards any user-defined grammar symbol that has no per-symbol and no per-type
4543@code{%destructor}.
4544The parser uses the @var{code} for @code{<*>} in the case of such a grammar
4545symbol for which you have formally declared a semantic type tag (@code{%type}
4546counts as such a declaration, but @code{$<tag>$} does not).
3ebecc24 4547The parser uses the @var{code} for @code{<>} in the case of such a grammar
12e35840 4548symbol that has no declared semantic type tag.
72f889cc
AD
4549@end deffn
4550
b2a0b7ca 4551@noindent
12e35840 4552For example:
72f889cc
AD
4553
4554@smallexample
ec5479ce
JD
4555%union @{ char *string; @}
4556%token <string> STRING1
4557%token <string> STRING2
4558%type <string> string1
4559%type <string> string2
b2a0b7ca
JD
4560%union @{ char character; @}
4561%token <character> CHR
4562%type <character> chr
12e35840
JD
4563%token TAGLESS
4564
b2a0b7ca 4565%destructor @{ @} <character>
12e35840
JD
4566%destructor @{ free ($$); @} <*>
4567%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
3ebecc24 4568%destructor @{ printf ("Discarding tagless symbol.\n"); @} <>
72f889cc
AD
4569@end smallexample
4570
4571@noindent
b2a0b7ca
JD
4572guarantees that, when the parser discards any user-defined symbol that has a
4573semantic type tag other than @code{<character>}, it passes its semantic value
12e35840 4574to @code{free} by default.
ec5479ce
JD
4575However, when the parser discards a @code{STRING1} or a @code{string1}, it also
4576prints its line number to @code{stdout}.
4577It performs only the second @code{%destructor} in this case, so it invokes
4578@code{free} only once.
12e35840
JD
4579Finally, the parser merely prints a message whenever it discards any symbol,
4580such as @code{TAGLESS}, that has no semantic type tag.
4581
4582A Bison-generated parser invokes the default @code{%destructor}s only for
4583user-defined as opposed to Bison-defined symbols.
4584For example, the parser will not invoke either kind of default
4585@code{%destructor} for the special Bison-defined symbols @code{$accept},
4586@code{$undefined}, or @code{$end} (@pxref{Table of Symbols, ,Bison Symbols}),
4587none of which you can reference in your grammar.
4588It also will not invoke either for the @code{error} token (@pxref{Table of
4589Symbols, ,error}), which is always defined by Bison regardless of whether you
4590reference it in your grammar.
4591However, it may invoke one of them for the end token (token 0) if you
4592redefine it from @code{$end} to, for example, @code{END}:
3508ce36
JD
4593
4594@smallexample
4595%token END 0
4596@end smallexample
4597
12e35840
JD
4598@cindex actions in mid-rule
4599@cindex mid-rule actions
4600Finally, Bison will never invoke a @code{%destructor} for an unreferenced
4601mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}).
ce24f7f5
JD
4602That is, Bison does not consider a mid-rule to have a semantic value if you
4603do not reference @code{$$} in the mid-rule's action or @code{$@var{n}}
4604(where @var{n} is the right-hand side symbol position of the mid-rule) in
4605any later action in that rule. However, if you do reference either, the
4606Bison-generated parser will invoke the @code{<>} @code{%destructor} whenever
4607it discards the mid-rule symbol.
12e35840 4608
3508ce36
JD
4609@ignore
4610@noindent
4611In the future, it may be possible to redefine the @code{error} token as a
4612nonterminal that captures the discarded symbols.
4613In that case, the parser will invoke the default destructor for it as well.
4614@end ignore
4615
e757bb10
AD
4616@sp 1
4617
4618@cindex discarded symbols
4619@dfn{Discarded symbols} are the following:
4620
4621@itemize
4622@item
4623stacked symbols popped during the first phase of error recovery,
4624@item
4625incoming terminals during the second phase of error recovery,
4626@item
742e4900 4627the current lookahead and the entire stack (except the current
9d9b8b70 4628right-hand side symbols) when the parser returns immediately, and
258b75ca
PE
4629@item
4630the start symbol, when the parser succeeds.
e757bb10
AD
4631@end itemize
4632
9d9b8b70
PE
4633The parser can @dfn{return immediately} because of an explicit call to
4634@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory
4635exhaustion.
4636
29553547 4637Right-hand side symbols of a rule that explicitly triggers a syntax
9d9b8b70
PE
4638error via @code{YYERROR} are not discarded automatically. As a rule
4639of thumb, destructors are invoked only when user actions cannot manage
a85284cf 4640the memory.
e757bb10 4641
342b8b6e 4642@node Expect Decl
bfa74976
RS
4643@subsection Suppressing Conflict Warnings
4644@cindex suppressing conflict warnings
4645@cindex preventing warnings about conflicts
4646@cindex warnings, preventing
4647@cindex conflicts, suppressing warnings of
4648@findex %expect
d6328241 4649@findex %expect-rr
bfa74976
RS
4650
4651Bison normally warns if there are any conflicts in the grammar
7da99ede
AD
4652(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars
4653have harmless shift/reduce conflicts which are resolved in a predictable
4654way and would be difficult to eliminate. It is desirable to suppress
4655the warning about these conflicts unless the number of conflicts
4656changes. You can do this with the @code{%expect} declaration.
bfa74976
RS
4657
4658The declaration looks like this:
4659
4660@example
4661%expect @var{n}
4662@end example
4663
035aa4a0
PE
4664Here @var{n} is a decimal integer. The declaration says there should
4665be @var{n} shift/reduce conflicts and no reduce/reduce conflicts.
4666Bison reports an error if the number of shift/reduce conflicts differs
4667from @var{n}, or if there are any reduce/reduce conflicts.
bfa74976 4668
34a6c2d1 4669For deterministic parsers, reduce/reduce conflicts are more
035aa4a0 4670serious, and should be eliminated entirely. Bison will always report
35430378 4671reduce/reduce conflicts for these parsers. With GLR
035aa4a0 4672parsers, however, both kinds of conflicts are routine; otherwise,
35430378 4673there would be no need to use GLR parsing. Therefore, it is
035aa4a0 4674also possible to specify an expected number of reduce/reduce conflicts
35430378 4675in GLR parsers, using the declaration:
d6328241
PH
4676
4677@example
4678%expect-rr @var{n}
4679@end example
4680
bfa74976
RS
4681In general, using @code{%expect} involves these steps:
4682
4683@itemize @bullet
4684@item
4685Compile your grammar without @code{%expect}. Use the @samp{-v} option
4686to get a verbose list of where the conflicts occur. Bison will also
4687print the number of conflicts.
4688
4689@item
4690Check each of the conflicts to make sure that Bison's default
4691resolution is what you really want. If not, rewrite the grammar and
4692go back to the beginning.
4693
4694@item
4695Add an @code{%expect} declaration, copying the number @var{n} from the
35430378 4696number which Bison printed. With GLR parsers, add an
035aa4a0 4697@code{%expect-rr} declaration as well.
bfa74976
RS
4698@end itemize
4699
cf22447c
JD
4700Now Bison will report an error if you introduce an unexpected conflict,
4701but will keep silent otherwise.
bfa74976 4702
342b8b6e 4703@node Start Decl
bfa74976
RS
4704@subsection The Start-Symbol
4705@cindex declaring the start symbol
4706@cindex start symbol, declaring
4707@cindex default start symbol
4708@findex %start
4709
4710Bison assumes by default that the start symbol for the grammar is the first
4711nonterminal specified in the grammar specification section. The programmer
4712may override this restriction with the @code{%start} declaration as follows:
4713
4714@example
4715%start @var{symbol}
4716@end example
4717
342b8b6e 4718@node Pure Decl
bfa74976
RS
4719@subsection A Pure (Reentrant) Parser
4720@cindex reentrant parser
4721@cindex pure parser
d9df47b6 4722@findex %define api.pure
bfa74976
RS
4723
4724A @dfn{reentrant} program is one which does not alter in the course of
4725execution; in other words, it consists entirely of @dfn{pure} (read-only)
4726code. Reentrancy is important whenever asynchronous execution is possible;
9d9b8b70
PE
4727for example, a nonreentrant program may not be safe to call from a signal
4728handler. In systems with multiple threads of control, a nonreentrant
bfa74976
RS
4729program must be called only within interlocks.
4730
70811b85 4731Normally, Bison generates a parser which is not reentrant. This is
c827f760
PE
4732suitable for most uses, and it permits compatibility with Yacc. (The
4733standard Yacc interfaces are inherently nonreentrant, because they use
70811b85
RS
4734statically allocated variables for communication with @code{yylex},
4735including @code{yylval} and @code{yylloc}.)
bfa74976 4736
70811b85 4737Alternatively, you can generate a pure, reentrant parser. The Bison
d9df47b6 4738declaration @code{%define api.pure} says that you want the parser to be
70811b85 4739reentrant. It looks like this:
bfa74976
RS
4740
4741@example
d9df47b6 4742%define api.pure
bfa74976
RS
4743@end example
4744
70811b85
RS
4745The result is that the communication variables @code{yylval} and
4746@code{yylloc} become local variables in @code{yyparse}, and a different
4747calling convention is used for the lexical analyzer function
4748@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
f4101aa6
AD
4749Parsers}, for the details of this. The variable @code{yynerrs}
4750becomes local in @code{yyparse} in pull mode but it becomes a member
9987d1b3 4751of yypstate in push mode. (@pxref{Error Reporting, ,The Error
70811b85
RS
4752Reporting Function @code{yyerror}}). The convention for calling
4753@code{yyparse} itself is unchanged.
4754
4755Whether the parser is pure has nothing to do with the grammar rules.
4756You can generate either a pure parser or a nonreentrant parser from any
4757valid grammar.
bfa74976 4758
9987d1b3
JD
4759@node Push Decl
4760@subsection A Push Parser
4761@cindex push parser
4762@cindex push parser
812775a0 4763@findex %define api.push-pull
9987d1b3 4764
59da312b
JD
4765(The current push parsing interface is experimental and may evolve.
4766More user feedback will help to stabilize it.)
4767
f4101aa6
AD
4768A pull parser is called once and it takes control until all its input
4769is completely parsed. A push parser, on the other hand, is called
9987d1b3
JD
4770each time a new token is made available.
4771
f4101aa6 4772A push parser is typically useful when the parser is part of a
9987d1b3 4773main event loop in the client's application. This is typically
f4101aa6
AD
4774a requirement of a GUI, when the main event loop needs to be triggered
4775within a certain time period.
9987d1b3 4776
d782395d
JD
4777Normally, Bison generates a pull parser.
4778The following Bison declaration says that you want the parser to be a push
2f4518a1 4779parser (@pxref{%define Summary,,api.push-pull}):
9987d1b3
JD
4780
4781@example
f37495f6 4782%define api.push-pull push
9987d1b3
JD
4783@end example
4784
4785In almost all cases, you want to ensure that your push parser is also
4786a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). The only
f4101aa6 4787time you should create an impure push parser is to have backwards
9987d1b3
JD
4788compatibility with the impure Yacc pull mode interface. Unless you know
4789what you are doing, your declarations should look like this:
4790
4791@example
d9df47b6 4792%define api.pure
f37495f6 4793%define api.push-pull push
9987d1b3
JD
4794@end example
4795
f4101aa6
AD
4796There is a major notable functional difference between the pure push parser
4797and the impure push parser. It is acceptable for a pure push parser to have
9987d1b3
JD
4798many parser instances, of the same type of parser, in memory at the same time.
4799An impure push parser should only use one parser at a time.
4800
4801When a push parser is selected, Bison will generate some new symbols in
f4101aa6
AD
4802the generated parser. @code{yypstate} is a structure that the generated
4803parser uses to store the parser's state. @code{yypstate_new} is the
9987d1b3
JD
4804function that will create a new parser instance. @code{yypstate_delete}
4805will free the resources associated with the corresponding parser instance.
f4101aa6 4806Finally, @code{yypush_parse} is the function that should be called whenever a
9987d1b3
JD
4807token is available to provide the parser. A trivial example
4808of using a pure push parser would look like this:
4809
4810@example
4811int status;
4812yypstate *ps = yypstate_new ();
4813do @{
4814 status = yypush_parse (ps, yylex (), NULL);
4815@} while (status == YYPUSH_MORE);
4816yypstate_delete (ps);
4817@end example
4818
4819If the user decided to use an impure push parser, a few things about
f4101aa6 4820the generated parser will change. The @code{yychar} variable becomes
9987d1b3
JD
4821a global variable instead of a variable in the @code{yypush_parse} function.
4822For this reason, the signature of the @code{yypush_parse} function is
f4101aa6 4823changed to remove the token as a parameter. A nonreentrant push parser
9987d1b3
JD
4824example would thus look like this:
4825
4826@example
4827extern int yychar;
4828int status;
4829yypstate *ps = yypstate_new ();
4830do @{
4831 yychar = yylex ();
4832 status = yypush_parse (ps);
4833@} while (status == YYPUSH_MORE);
4834yypstate_delete (ps);
4835@end example
4836
f4101aa6 4837That's it. Notice the next token is put into the global variable @code{yychar}
9987d1b3
JD
4838for use by the next invocation of the @code{yypush_parse} function.
4839
f4101aa6 4840Bison also supports both the push parser interface along with the pull parser
9987d1b3 4841interface in the same generated parser. In order to get this functionality,
f37495f6
JD
4842you should replace the @code{%define api.push-pull push} declaration with the
4843@code{%define api.push-pull both} declaration. Doing this will create all of
c373bf8b 4844the symbols mentioned earlier along with the two extra symbols, @code{yyparse}
f4101aa6
AD
4845and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally
4846would be used. However, the user should note that it is implemented in the
d782395d
JD
4847generated parser by calling @code{yypull_parse}.
4848This makes the @code{yyparse} function that is generated with the
f37495f6 4849@code{%define api.push-pull both} declaration slower than the normal
d782395d
JD
4850@code{yyparse} function. If the user
4851calls the @code{yypull_parse} function it will parse the rest of the input
f4101aa6
AD
4852stream. It is possible to @code{yypush_parse} tokens to select a subgrammar
4853and then @code{yypull_parse} the rest of the input stream. If you would like
4854to switch back and forth between between parsing styles, you would have to
4855write your own @code{yypull_parse} function that knows when to quit looking
4856for input. An example of using the @code{yypull_parse} function would look
9987d1b3
JD
4857like this:
4858
4859@example
4860yypstate *ps = yypstate_new ();
4861yypull_parse (ps); /* Will call the lexer */
4862yypstate_delete (ps);
4863@end example
4864
d9df47b6 4865Adding the @code{%define api.pure} declaration does exactly the same thing to
f37495f6
JD
4866the generated parser with @code{%define api.push-pull both} as it did for
4867@code{%define api.push-pull push}.
9987d1b3 4868
342b8b6e 4869@node Decl Summary
bfa74976
RS
4870@subsection Bison Declaration Summary
4871@cindex Bison declaration summary
4872@cindex declaration summary
4873@cindex summary, Bison declaration
4874
d8988b2f 4875Here is a summary of the declarations used to define a grammar:
bfa74976 4876
18b519c0 4877@deffn {Directive} %union
bfa74976
RS
4878Declare the collection of data types that semantic values may have
4879(@pxref{Union Decl, ,The Collection of Value Types}).
18b519c0 4880@end deffn
bfa74976 4881
18b519c0 4882@deffn {Directive} %token
bfa74976
RS
4883Declare a terminal symbol (token type name) with no precedence
4884or associativity specified (@pxref{Token Decl, ,Token Type Names}).
18b519c0 4885@end deffn
bfa74976 4886
18b519c0 4887@deffn {Directive} %right
bfa74976
RS
4888Declare a terminal symbol (token type name) that is right-associative
4889(@pxref{Precedence Decl, ,Operator Precedence}).
18b519c0 4890@end deffn
bfa74976 4891
18b519c0 4892@deffn {Directive} %left
bfa74976
RS
4893Declare a terminal symbol (token type name) that is left-associative
4894(@pxref{Precedence Decl, ,Operator Precedence}).
18b519c0 4895@end deffn
bfa74976 4896
18b519c0 4897@deffn {Directive} %nonassoc
bfa74976 4898Declare a terminal symbol (token type name) that is nonassociative
bfa74976 4899(@pxref{Precedence Decl, ,Operator Precedence}).
39a06c25
PE
4900Using it in a way that would be associative is a syntax error.
4901@end deffn
4902
91d2c560 4903@ifset defaultprec
39a06c25 4904@deffn {Directive} %default-prec
22fccf95 4905Assign a precedence to rules lacking an explicit @code{%prec} modifier
39a06c25
PE
4906(@pxref{Contextual Precedence, ,Context-Dependent Precedence}).
4907@end deffn
91d2c560 4908@end ifset
bfa74976 4909
18b519c0 4910@deffn {Directive} %type
bfa74976
RS
4911Declare the type of semantic values for a nonterminal symbol
4912(@pxref{Type Decl, ,Nonterminal Symbols}).
18b519c0 4913@end deffn
bfa74976 4914
18b519c0 4915@deffn {Directive} %start
89cab50d
AD
4916Specify the grammar's start symbol (@pxref{Start Decl, ,The
4917Start-Symbol}).
18b519c0 4918@end deffn
bfa74976 4919
18b519c0 4920@deffn {Directive} %expect
bfa74976
RS
4921Declare the expected number of shift-reduce conflicts
4922(@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
18b519c0
AD
4923@end deffn
4924
bfa74976 4925
d8988b2f
AD
4926@sp 1
4927@noindent
4928In order to change the behavior of @command{bison}, use the following
4929directives:
4930
148d66d8 4931@deffn {Directive} %code @{@var{code}@}
8e6f2266 4932@deffnx {Directive} %code @var{qualifier} @{@var{code}@}
148d66d8 4933@findex %code
8e6f2266
JD
4934Insert @var{code} verbatim into the output parser source at the
4935default location or at the location specified by @var{qualifier}.
4936@xref{%code Summary}.
148d66d8
JD
4937@end deffn
4938
18b519c0 4939@deffn {Directive} %debug
9913d6e4
JD
4940In the parser implementation file, define the macro @code{YYDEBUG} to
49411 if it is not already defined, so that the debugging facilities are
4942compiled. @xref{Tracing, ,Tracing Your Parser}.
bd5df716 4943@end deffn
d8988b2f 4944
2f4518a1
JD
4945@deffn {Directive} %define @var{variable}
4946@deffnx {Directive} %define @var{variable} @var{value}
4947@deffnx {Directive} %define @var{variable} "@var{value}"
4948Define a variable to adjust Bison's behavior. @xref{%define Summary}.
4949@end deffn
4950
4951@deffn {Directive} %defines
4952Write a parser header file containing macro definitions for the token
4953type names defined in the grammar as well as a few other declarations.
4954If the parser implementation file is named @file{@var{name}.c} then
4955the parser header file is named @file{@var{name}.h}.
4956
4957For C parsers, the parser header file declares @code{YYSTYPE} unless
4958@code{YYSTYPE} is already defined as a macro or you have used a
4959@code{<@var{type}>} tag without using @code{%union}. Therefore, if
4960you are using a @code{%union} (@pxref{Multiple Types, ,More Than One
4961Value Type}) with components that require other definitions, or if you
4962have defined a @code{YYSTYPE} macro or type definition (@pxref{Value
4963Type, ,Data Types of Semantic Values}), you need to arrange for these
4964definitions to be propagated to all modules, e.g., by putting them in
4965a prerequisite header that is included both by your parser and by any
4966other module that needs @code{YYSTYPE}.
4967
4968Unless your parser is pure, the parser header file declares
4969@code{yylval} as an external variable. @xref{Pure Decl, ,A Pure
4970(Reentrant) Parser}.
4971
4972If you have also used locations, the parser header file declares
7404cdf3
JD
4973@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of the
4974@code{YYSTYPE} macro and @code{yylval}. @xref{Tracking Locations}.
2f4518a1
JD
4975
4976This parser header file is normally essential if you wish to put the
4977definition of @code{yylex} in a separate source file, because
4978@code{yylex} typically needs to be able to refer to the
4979above-mentioned declarations and to the token type codes. @xref{Token
4980Values, ,Semantic Values of Tokens}.
4981
4982@findex %code requires
4983@findex %code provides
4984If you have declared @code{%code requires} or @code{%code provides}, the output
4985header also contains their code.
4986@xref{%code Summary}.
4987@end deffn
4988
4989@deffn {Directive} %defines @var{defines-file}
4990Same as above, but save in the file @var{defines-file}.
4991@end deffn
4992
4993@deffn {Directive} %destructor
4994Specify how the parser should reclaim the memory associated to
4995discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
4996@end deffn
4997
4998@deffn {Directive} %file-prefix "@var{prefix}"
4999Specify a prefix to use for all Bison output file names. The names
5000are chosen as if the grammar file were named @file{@var{prefix}.y}.
5001@end deffn
5002
5003@deffn {Directive} %language "@var{language}"
5004Specify the programming language for the generated parser. Currently
5005supported languages include C, C++, and Java.
5006@var{language} is case-insensitive.
5007
5008This directive is experimental and its effect may be modified in future
5009releases.
5010@end deffn
5011
5012@deffn {Directive} %locations
5013Generate the code processing the locations (@pxref{Action Features,
5014,Special Features for Use in Actions}). This mode is enabled as soon as
5015the grammar uses the special @samp{@@@var{n}} tokens, but if your
5016grammar does not use it, using @samp{%locations} allows for more
5017accurate syntax error messages.
5018@end deffn
5019
5020@deffn {Directive} %name-prefix "@var{prefix}"
5021Rename the external symbols used in the parser so that they start with
5022@var{prefix} instead of @samp{yy}. The precise list of symbols renamed
5023in C parsers
5024is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
5025@code{yylval}, @code{yychar}, @code{yydebug}, and
5026(if locations are used) @code{yylloc}. If you use a push parser,
5027@code{yypush_parse}, @code{yypull_parse}, @code{yypstate},
5028@code{yypstate_new} and @code{yypstate_delete} will
5029also be renamed. For example, if you use @samp{%name-prefix "c_"}, the
5030names become @code{c_parse}, @code{c_lex}, and so on.
5031For C++ parsers, see the @code{%define namespace} documentation in this
5032section.
5033@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
5034@end deffn
5035
5036@ifset defaultprec
5037@deffn {Directive} %no-default-prec
5038Do not assign a precedence to rules lacking an explicit @code{%prec}
5039modifier (@pxref{Contextual Precedence, ,Context-Dependent
5040Precedence}).
5041@end deffn
5042@end ifset
5043
5044@deffn {Directive} %no-lines
5045Don't generate any @code{#line} preprocessor commands in the parser
5046implementation file. Ordinarily Bison writes these commands in the
5047parser implementation file so that the C compiler and debuggers will
5048associate errors and object code with your source file (the grammar
5049file). This directive causes them to associate errors with the parser
5050implementation file, treating it as an independent source file in its
5051own right.
5052@end deffn
5053
5054@deffn {Directive} %output "@var{file}"
5055Specify @var{file} for the parser implementation file.
5056@end deffn
5057
5058@deffn {Directive} %pure-parser
5059Deprecated version of @code{%define api.pure} (@pxref{%define
5060Summary,,api.pure}), for which Bison is more careful to warn about
5061unreasonable usage.
5062@end deffn
5063
5064@deffn {Directive} %require "@var{version}"
5065Require version @var{version} or higher of Bison. @xref{Require Decl, ,
5066Require a Version of Bison}.
5067@end deffn
5068
5069@deffn {Directive} %skeleton "@var{file}"
5070Specify the skeleton to use.
5071
5072@c You probably don't need this option unless you are developing Bison.
5073@c You should use @code{%language} if you want to specify the skeleton for a
5074@c different language, because it is clearer and because it will always choose the
5075@c correct skeleton for non-deterministic or push parsers.
5076
5077If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
5078file in the Bison installation directory.
5079If it does, @var{file} is an absolute file name or a file name relative to the
5080directory of the grammar file.
5081This is similar to how most shells resolve commands.
5082@end deffn
5083
5084@deffn {Directive} %token-table
5085Generate an array of token names in the parser implementation file.
5086The name of the array is @code{yytname}; @code{yytname[@var{i}]} is
5087the name of the token whose internal Bison token code number is
5088@var{i}. The first three elements of @code{yytname} correspond to the
5089predefined tokens @code{"$end"}, @code{"error"}, and
5090@code{"$undefined"}; after these come the symbols defined in the
5091grammar file.
5092
5093The name in the table includes all the characters needed to represent
5094the token in Bison. For single-character literals and literal
5095strings, this includes the surrounding quoting characters and any
5096escape sequences. For example, the Bison single-character literal
5097@code{'+'} corresponds to a three-character name, represented in C as
5098@code{"'+'"}; and the Bison two-character literal string @code{"\\/"}
5099corresponds to a five-character name, represented in C as
5100@code{"\"\\\\/\""}.
5101
5102When you specify @code{%token-table}, Bison also generates macro
5103definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
5104@code{YYNRULES}, and @code{YYNSTATES}:
5105
5106@table @code
5107@item YYNTOKENS
5108The highest token number, plus one.
5109@item YYNNTS
5110The number of nonterminal symbols.
5111@item YYNRULES
5112The number of grammar rules,
5113@item YYNSTATES
5114The number of parser states (@pxref{Parser States}).
5115@end table
5116@end deffn
5117
5118@deffn {Directive} %verbose
5119Write an extra output file containing verbose descriptions of the
5120parser states and what is done for each type of lookahead token in
5121that state. @xref{Understanding, , Understanding Your Parser}, for more
5122information.
5123@end deffn
5124
5125@deffn {Directive} %yacc
5126Pretend the option @option{--yacc} was given, i.e., imitate Yacc,
5127including its naming conventions. @xref{Bison Options}, for more.
5128@end deffn
5129
5130
5131@node %define Summary
5132@subsection %define Summary
406dec82
JD
5133
5134There are many features of Bison's behavior that can be controlled by
5135assigning the feature a single value. For historical reasons, some
5136such features are assigned values by dedicated directives, such as
5137@code{%start}, which assigns the start symbol. However, newer such
5138features are associated with variables, which are assigned by the
5139@code{%define} directive:
5140
c1d19e10 5141@deffn {Directive} %define @var{variable}
f37495f6 5142@deffnx {Directive} %define @var{variable} @var{value}
c1d19e10 5143@deffnx {Directive} %define @var{variable} "@var{value}"
406dec82 5144Define @var{variable} to @var{value}.
9611cfa2 5145
406dec82
JD
5146@var{value} must be placed in quotation marks if it contains any
5147character other than a letter, underscore, period, or non-initial dash
5148or digit. Omitting @code{"@var{value}"} entirely is always equivalent
5149to specifying @code{""}.
9611cfa2 5150
406dec82
JD
5151It is an error if a @var{variable} is defined by @code{%define}
5152multiple times, but see @ref{Bison Options,,-D
5153@var{name}[=@var{value}]}.
5154@end deffn
f37495f6 5155
406dec82
JD
5156The rest of this section summarizes variables and values that
5157@code{%define} accepts.
9611cfa2 5158
406dec82
JD
5159Some @var{variable}s take Boolean values. In this case, Bison will
5160complain if the variable definition does not meet one of the following
5161four conditions:
9611cfa2
JD
5162
5163@enumerate
f37495f6 5164@item @code{@var{value}} is @code{true}
9611cfa2 5165
f37495f6
JD
5166@item @code{@var{value}} is omitted (or @code{""} is specified).
5167This is equivalent to @code{true}.
9611cfa2 5168
f37495f6 5169@item @code{@var{value}} is @code{false}.
9611cfa2
JD
5170
5171@item @var{variable} is never defined.
628be6c9 5172In this case, Bison selects a default value.
9611cfa2 5173@end enumerate
148d66d8 5174
628be6c9
JD
5175What @var{variable}s are accepted, as well as their meanings and default
5176values, depend on the selected target language and/or the parser
5177skeleton (@pxref{Decl Summary,,%language}, @pxref{Decl
5178Summary,,%skeleton}).
5179Unaccepted @var{variable}s produce an error.
793fbca5
JD
5180Some of the accepted @var{variable}s are:
5181
5182@itemize @bullet
d9df47b6
JD
5183@item api.pure
5184@findex %define api.pure
5185
5186@itemize @bullet
5187@item Language(s): C
5188
5189@item Purpose: Request a pure (reentrant) parser program.
5190@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
5191
5192@item Accepted Values: Boolean
5193
f37495f6 5194@item Default Value: @code{false}
d9df47b6
JD
5195@end itemize
5196
812775a0
JD
5197@item api.push-pull
5198@findex %define api.push-pull
793fbca5
JD
5199
5200@itemize @bullet
34a6c2d1 5201@item Language(s): C (deterministic parsers only)
793fbca5 5202
3b1977ea 5203@item Purpose: Request a pull parser, a push parser, or both.
d782395d 5204@xref{Push Decl, ,A Push Parser}.
59da312b
JD
5205(The current push parsing interface is experimental and may evolve.
5206More user feedback will help to stabilize it.)
793fbca5 5207
f37495f6 5208@item Accepted Values: @code{pull}, @code{push}, @code{both}
793fbca5 5209
f37495f6 5210@item Default Value: @code{pull}
793fbca5
JD
5211@end itemize
5212
232be91a
AD
5213@c ================================================== lr.default-reductions
5214
1d0f55cc 5215@item lr.default-reductions
1d0f55cc 5216@findex %define lr.default-reductions
34a6c2d1
JD
5217
5218@itemize @bullet
5219@item Language(s): all
5220
4c38b19e 5221@item Purpose: Specify the kind of states that are permitted to
6f04ee6c
JD
5222contain default reductions. @xref{Default Reductions}. (The ability to
5223specify where default reductions should be used is experimental. More user
5224feedback will help to stabilize it.)
34a6c2d1 5225
a6e5a280 5226@item Accepted Values: @code{most}, @code{consistent}, @code{accepting}
34a6c2d1
JD
5227@item Default Value:
5228@itemize
f37495f6 5229@item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
a6e5a280 5230@item @code{most} otherwise.
34a6c2d1
JD
5231@end itemize
5232@end itemize
5233
232be91a
AD
5234@c ============================================ lr.keep-unreachable-states
5235
812775a0
JD
5236@item lr.keep-unreachable-states
5237@findex %define lr.keep-unreachable-states
31984206
JD
5238
5239@itemize @bullet
5240@item Language(s): all
3b1977ea 5241@item Purpose: Request that Bison allow unreachable parser states to
6f04ee6c 5242remain in the parser tables. @xref{Unreachable States}.
31984206 5243@item Accepted Values: Boolean
f37495f6 5244@item Default Value: @code{false}
31984206
JD
5245@end itemize
5246
232be91a
AD
5247@c ================================================== lr.type
5248
34a6c2d1
JD
5249@item lr.type
5250@findex %define lr.type
34a6c2d1
JD
5251
5252@itemize @bullet
5253@item Language(s): all
5254
3b1977ea 5255@item Purpose: Specify the type of parser tables within the
6f04ee6c 5256LR(1) family. @xref{LR Table Construction}. (This feature is experimental.
34a6c2d1
JD
5257More user feedback will help to stabilize it.)
5258
6f04ee6c 5259@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr}
34a6c2d1 5260
f37495f6 5261@item Default Value: @code{lalr}
34a6c2d1
JD
5262@end itemize
5263
793fbca5
JD
5264@item namespace
5265@findex %define namespace
5266
5267@itemize
5268@item Languages(s): C++
5269
3b1977ea 5270@item Purpose: Specify the namespace for the parser class.
793fbca5
JD
5271For example, if you specify:
5272
5273@smallexample
5274%define namespace "foo::bar"
5275@end smallexample
5276
5277Bison uses @code{foo::bar} verbatim in references such as:
5278
5279@smallexample
5280foo::bar::parser::semantic_type
5281@end smallexample
5282
5283However, to open a namespace, Bison removes any leading @code{::} and then
5284splits on any remaining occurrences:
5285
5286@smallexample
5287namespace foo @{ namespace bar @{
5288 class position;
5289 class location;
5290@} @}
5291@end smallexample
5292
5293@item Accepted Values: Any absolute or relative C++ namespace reference without
5294a trailing @code{"::"}.
5295For example, @code{"foo"} or @code{"::foo::bar"}.
5296
5297@item Default Value: The value specified by @code{%name-prefix}, which defaults
5298to @code{yy}.
5299This usage of @code{%name-prefix} is for backward compatibility and can be
5300confusing since @code{%name-prefix} also specifies the textual prefix for the
5301lexical analyzer function.
5302Thus, if you specify @code{%name-prefix}, it is best to also specify
5303@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the
5304lexical analyzer function.
5305For example, if you specify:
5306
5307@smallexample
5308%define namespace "foo"
5309%name-prefix "bar::"
5310@end smallexample
5311
5312The parser namespace is @code{foo} and @code{yylex} is referenced as
5313@code{bar::lex}.
5314@end itemize
4c38b19e
JD
5315
5316@c ================================================== parse.lac
5317@item parse.lac
5318@findex %define parse.lac
4c38b19e
JD
5319
5320@itemize
6f04ee6c 5321@item Languages(s): C (deterministic parsers only)
4c38b19e 5322
35430378 5323@item Purpose: Enable LAC (lookahead correction) to improve
6f04ee6c 5324syntax error handling. @xref{LAC}.
4c38b19e 5325@item Accepted Values: @code{none}, @code{full}
4c38b19e
JD
5326@item Default Value: @code{none}
5327@end itemize
793fbca5
JD
5328@end itemize
5329
d8988b2f 5330
8e6f2266
JD
5331@node %code Summary
5332@subsection %code Summary
8e6f2266 5333@findex %code
8e6f2266 5334@cindex Prologue
406dec82
JD
5335
5336The @code{%code} directive inserts code verbatim into the output
5337parser source at any of a predefined set of locations. It thus serves
5338as a flexible and user-friendly alternative to the traditional Yacc
5339prologue, @code{%@{@var{code}%@}}. This section summarizes the
5340functionality of @code{%code} for the various target languages
5341supported by Bison. For a detailed discussion of how to use
5342@code{%code} in place of @code{%@{@var{code}%@}} for C/C++ and why it
5343is advantageous to do so, @pxref{Prologue Alternatives}.
5344
5345@deffn {Directive} %code @{@var{code}@}
5346This is the unqualified form of the @code{%code} directive. It
5347inserts @var{code} verbatim at a language-dependent default location
5348in the parser implementation.
5349
8e6f2266 5350For C/C++, the default location is the parser implementation file
406dec82
JD
5351after the usual contents of the parser header file. Thus, the
5352unqualified form replaces @code{%@{@var{code}%@}} for most purposes.
8e6f2266
JD
5353
5354For Java, the default location is inside the parser class.
5355@end deffn
5356
5357@deffn {Directive} %code @var{qualifier} @{@var{code}@}
5358This is the qualified form of the @code{%code} directive.
406dec82
JD
5359@var{qualifier} identifies the purpose of @var{code} and thus the
5360location(s) where Bison should insert it. That is, if you need to
5361specify location-sensitive @var{code} that does not belong at the
5362default location selected by the unqualified @code{%code} form, use
5363this form instead.
5364@end deffn
5365
5366For any particular qualifier or for the unqualified form, if there are
5367multiple occurrences of the @code{%code} directive, Bison concatenates
5368the specified code in the order in which it appears in the grammar
5369file.
8e6f2266 5370
406dec82
JD
5371Not all qualifiers are accepted for all target languages. Unaccepted
5372qualifiers produce an error. Some of the accepted qualifiers are:
8e6f2266
JD
5373
5374@itemize @bullet
5375@item requires
5376@findex %code requires
5377
5378@itemize @bullet
5379@item Language(s): C, C++
5380
5381@item Purpose: This is the best place to write dependency code required for
5382@code{YYSTYPE} and @code{YYLTYPE}.
5383In other words, it's the best place to define types referenced in @code{%union}
5384directives, and it's the best place to override Bison's default @code{YYSTYPE}
5385and @code{YYLTYPE} definitions.
5386
5387@item Location(s): The parser header file and the parser implementation file
5388before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE}
5389definitions.
5390@end itemize
5391
5392@item provides
5393@findex %code provides
5394
5395@itemize @bullet
5396@item Language(s): C, C++
5397
5398@item Purpose: This is the best place to write additional definitions and
5399declarations that should be provided to other modules.
5400
5401@item Location(s): The parser header file and the parser implementation
5402file after the Bison-generated @code{YYSTYPE}, @code{YYLTYPE}, and
5403token definitions.
5404@end itemize
5405
5406@item top
5407@findex %code top
5408
5409@itemize @bullet
5410@item Language(s): C, C++
5411
5412@item Purpose: The unqualified @code{%code} or @code{%code requires}
5413should usually be more appropriate than @code{%code top}. However,
5414occasionally it is necessary to insert code much nearer the top of the
5415parser implementation file. For example:
5416
5417@smallexample
5418%code top @{
5419 #define _GNU_SOURCE
5420 #include <stdio.h>
5421@}
5422@end smallexample
5423
5424@item Location(s): Near the top of the parser implementation file.
5425@end itemize
5426
5427@item imports
5428@findex %code imports
5429
5430@itemize @bullet
5431@item Language(s): Java
5432
5433@item Purpose: This is the best place to write Java import directives.
5434
5435@item Location(s): The parser Java file after any Java package directive and
5436before any class definitions.
5437@end itemize
5438@end itemize
5439
406dec82
JD
5440Though we say the insertion locations are language-dependent, they are
5441technically skeleton-dependent. Writers of non-standard skeletons
5442however should choose their locations consistently with the behavior
5443of the standard Bison skeletons.
8e6f2266 5444
d8988b2f 5445
342b8b6e 5446@node Multiple Parsers
bfa74976
RS
5447@section Multiple Parsers in the Same Program
5448
5449Most programs that use Bison parse only one language and therefore contain
5450only one Bison parser. But what if you want to parse more than one
5451language with the same program? Then you need to avoid a name conflict
5452between different definitions of @code{yyparse}, @code{yylval}, and so on.
5453
5454The easy way to do this is to use the option @samp{-p @var{prefix}}
704a47c4
AD
5455(@pxref{Invocation, ,Invoking Bison}). This renames the interface
5456functions and variables of the Bison parser to start with @var{prefix}
5457instead of @samp{yy}. You can use this to give each parser distinct
5458names that do not conflict.
bfa74976
RS
5459
5460The precise list of symbols renamed is @code{yyparse}, @code{yylex},
2a8d363a 5461@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yylloc},
f4101aa6
AD
5462@code{yychar} and @code{yydebug}. If you use a push parser,
5463@code{yypush_parse}, @code{yypull_parse}, @code{yypstate},
9987d1b3 5464@code{yypstate_new} and @code{yypstate_delete} will also be renamed.
f4101aa6 5465For example, if you use @samp{-p c}, the names become @code{cparse},
9987d1b3 5466@code{clex}, and so on.
bfa74976
RS
5467
5468@strong{All the other variables and macros associated with Bison are not
5469renamed.} These others are not global; there is no conflict if the same
5470name is used in different parsers. For example, @code{YYSTYPE} is not
5471renamed, but defining this in different ways in different parsers causes
5472no trouble (@pxref{Value Type, ,Data Types of Semantic Values}).
5473
9913d6e4
JD
5474The @samp{-p} option works by adding macro definitions to the
5475beginning of the parser implementation file, defining @code{yyparse}
5476as @code{@var{prefix}parse}, and so on. This effectively substitutes
5477one name for the other in the entire parser implementation file.
bfa74976 5478
342b8b6e 5479@node Interface
bfa74976
RS
5480@chapter Parser C-Language Interface
5481@cindex C-language interface
5482@cindex interface
5483
5484The Bison parser is actually a C function named @code{yyparse}. Here we
5485describe the interface conventions of @code{yyparse} and the other
5486functions that it needs to use.
5487
5488Keep in mind that the parser uses many C identifiers starting with
5489@samp{yy} and @samp{YY} for internal purposes. If you use such an
75f5aaea
MA
5490identifier (aside from those in this manual) in an action or in epilogue
5491in the grammar file, you are likely to run into trouble.
bfa74976
RS
5492
5493@menu
f56274a8
DJ
5494* Parser Function:: How to call @code{yyparse} and what it returns.
5495* Push Parser Function:: How to call @code{yypush_parse} and what it returns.
5496* Pull Parser Function:: How to call @code{yypull_parse} and what it returns.
5497* Parser Create Function:: How to call @code{yypstate_new} and what it returns.
5498* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns.
5499* Lexical:: You must supply a function @code{yylex}
5500 which reads tokens.
5501* Error Reporting:: You must supply a function @code{yyerror}.
5502* Action Features:: Special features for use in actions.
5503* Internationalization:: How to let the parser speak in the user's
5504 native language.
bfa74976
RS
5505@end menu
5506
342b8b6e 5507@node Parser Function
bfa74976
RS
5508@section The Parser Function @code{yyparse}
5509@findex yyparse
5510
5511You call the function @code{yyparse} to cause parsing to occur. This
5512function reads tokens, executes actions, and ultimately returns when it
5513encounters end-of-input or an unrecoverable syntax error. You can also
14ded682
AD
5514write an action which directs @code{yyparse} to return immediately
5515without reading further.
bfa74976 5516
2a8d363a
AD
5517
5518@deftypefun int yyparse (void)
bfa74976
RS
5519The value returned by @code{yyparse} is 0 if parsing was successful (return
5520is due to end-of-input).
5521
b47dbebe
PE
5522The value is 1 if parsing failed because of invalid input, i.e., input
5523that contains a syntax error or that causes @code{YYABORT} to be
5524invoked.
5525
5526The value is 2 if parsing failed due to memory exhaustion.
2a8d363a 5527@end deftypefun
bfa74976
RS
5528
5529In an action, you can cause immediate return from @code{yyparse} by using
5530these macros:
5531
2a8d363a 5532@defmac YYACCEPT
bfa74976
RS
5533@findex YYACCEPT
5534Return immediately with value 0 (to report success).
2a8d363a 5535@end defmac
bfa74976 5536
2a8d363a 5537@defmac YYABORT
bfa74976
RS
5538@findex YYABORT
5539Return immediately with value 1 (to report failure).
2a8d363a
AD
5540@end defmac
5541
5542If you use a reentrant parser, you can optionally pass additional
5543parameter information to it in a reentrant way. To do so, use the
5544declaration @code{%parse-param}:
5545
feeb0eda 5546@deffn {Directive} %parse-param @{@var{argument-declaration}@}
2a8d363a 5547@findex %parse-param
287c78f6
PE
5548Declare that an argument declared by the braced-code
5549@var{argument-declaration} is an additional @code{yyparse} argument.
94175978 5550The @var{argument-declaration} is used when declaring
feeb0eda
PE
5551functions or prototypes. The last identifier in
5552@var{argument-declaration} must be the argument name.
2a8d363a
AD
5553@end deffn
5554
5555Here's an example. Write this in the parser:
5556
5557@example
feeb0eda
PE
5558%parse-param @{int *nastiness@}
5559%parse-param @{int *randomness@}
2a8d363a
AD
5560@end example
5561
5562@noindent
5563Then call the parser like this:
5564
5565@example
5566@{
5567 int nastiness, randomness;
5568 @dots{} /* @r{Store proper data in @code{nastiness} and @code{randomness}.} */
5569 value = yyparse (&nastiness, &randomness);
5570 @dots{}
5571@}
5572@end example
5573
5574@noindent
5575In the grammar actions, use expressions like this to refer to the data:
5576
5577@example
5578exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
5579@end example
5580
9987d1b3
JD
5581@node Push Parser Function
5582@section The Push Parser Function @code{yypush_parse}
5583@findex yypush_parse
5584
59da312b
JD
5585(The current push parsing interface is experimental and may evolve.
5586More user feedback will help to stabilize it.)
5587
f4101aa6 5588You call the function @code{yypush_parse} to parse a single token. This
f37495f6
JD
5589function is available if either the @code{%define api.push-pull push} or
5590@code{%define api.push-pull both} declaration is used.
9987d1b3
JD
5591@xref{Push Decl, ,A Push Parser}.
5592
5593@deftypefun int yypush_parse (yypstate *yyps)
f4101aa6 5594The value returned by @code{yypush_parse} is the same as for yyparse with the
9987d1b3
JD
5595following exception. @code{yypush_parse} will return YYPUSH_MORE if more input
5596is required to finish parsing the grammar.
5597@end deftypefun
5598
5599@node Pull Parser Function
5600@section The Pull Parser Function @code{yypull_parse}
5601@findex yypull_parse
5602
59da312b
JD
5603(The current push parsing interface is experimental and may evolve.
5604More user feedback will help to stabilize it.)
5605
f4101aa6 5606You call the function @code{yypull_parse} to parse the rest of the input
f37495f6 5607stream. This function is available if the @code{%define api.push-pull both}
f4101aa6 5608declaration is used.
9987d1b3
JD
5609@xref{Push Decl, ,A Push Parser}.
5610
5611@deftypefun int yypull_parse (yypstate *yyps)
5612The value returned by @code{yypull_parse} is the same as for @code{yyparse}.
5613@end deftypefun
5614
5615@node Parser Create Function
5616@section The Parser Create Function @code{yystate_new}
5617@findex yypstate_new
5618
59da312b
JD
5619(The current push parsing interface is experimental and may evolve.
5620More user feedback will help to stabilize it.)
5621
f4101aa6 5622You call the function @code{yypstate_new} to create a new parser instance.
f37495f6
JD
5623This function is available if either the @code{%define api.push-pull push} or
5624@code{%define api.push-pull both} declaration is used.
9987d1b3
JD
5625@xref{Push Decl, ,A Push Parser}.
5626
5627@deftypefun yypstate *yypstate_new (void)
c781580d 5628The function will return a valid parser instance if there was memory available
333e670c
JD
5629or 0 if no memory was available.
5630In impure mode, it will also return 0 if a parser instance is currently
5631allocated.
9987d1b3
JD
5632@end deftypefun
5633
5634@node Parser Delete Function
5635@section The Parser Delete Function @code{yystate_delete}
5636@findex yypstate_delete
5637
59da312b
JD
5638(The current push parsing interface is experimental and may evolve.
5639More user feedback will help to stabilize it.)
5640
9987d1b3 5641You call the function @code{yypstate_delete} to delete a parser instance.
f37495f6
JD
5642function is available if either the @code{%define api.push-pull push} or
5643@code{%define api.push-pull both} declaration is used.
9987d1b3
JD
5644@xref{Push Decl, ,A Push Parser}.
5645
5646@deftypefun void yypstate_delete (yypstate *yyps)
5647This function will reclaim the memory associated with a parser instance.
5648After this call, you should no longer attempt to use the parser instance.
5649@end deftypefun
bfa74976 5650
342b8b6e 5651@node Lexical
bfa74976
RS
5652@section The Lexical Analyzer Function @code{yylex}
5653@findex yylex
5654@cindex lexical analyzer
5655
5656The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
5657the input stream and returns them to the parser. Bison does not create
5658this function automatically; you must write it so that @code{yyparse} can
5659call it. The function is sometimes referred to as a lexical scanner.
5660
9913d6e4
JD
5661In simple programs, @code{yylex} is often defined at the end of the
5662Bison grammar file. If @code{yylex} is defined in a separate source
5663file, you need to arrange for the token-type macro definitions to be
5664available there. To do this, use the @samp{-d} option when you run
5665Bison, so that it will write these macro definitions into the separate
5666parser header file, @file{@var{name}.tab.h}, which you can include in
5667the other source files that need it. @xref{Invocation, ,Invoking
5668Bison}.
bfa74976
RS
5669
5670@menu
5671* Calling Convention:: How @code{yyparse} calls @code{yylex}.
f56274a8
DJ
5672* Token Values:: How @code{yylex} must return the semantic value
5673 of the token it has read.
5674* Token Locations:: How @code{yylex} must return the text location
5675 (line number, etc.) of the token, if the
5676 actions want that.
5677* Pure Calling:: How the calling convention differs in a pure parser
5678 (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
bfa74976
RS
5679@end menu
5680
342b8b6e 5681@node Calling Convention
bfa74976
RS
5682@subsection Calling Convention for @code{yylex}
5683
72d2299c
PE
5684The value that @code{yylex} returns must be the positive numeric code
5685for the type of token it has just found; a zero or negative value
5686signifies end-of-input.
bfa74976
RS
5687
5688When a token is referred to in the grammar rules by a name, that name
9913d6e4
JD
5689in the parser implementation file becomes a C macro whose definition
5690is the proper numeric code for that token type. So @code{yylex} can
5691use the name to indicate that type. @xref{Symbols}.
bfa74976
RS
5692
5693When a token is referred to in the grammar rules by a character literal,
5694the numeric code for that character is also the code for the token type.
72d2299c
PE
5695So @code{yylex} can simply return that character code, possibly converted
5696to @code{unsigned char} to avoid sign-extension. The null character
5697must not be used this way, because its code is zero and that
bfa74976
RS
5698signifies end-of-input.
5699
5700Here is an example showing these things:
5701
5702@example
13863333
AD
5703int
5704yylex (void)
bfa74976
RS
5705@{
5706 @dots{}
72d2299c 5707 if (c == EOF) /* Detect end-of-input. */
bfa74976
RS
5708 return 0;
5709 @dots{}
5710 if (c == '+' || c == '-')
72d2299c 5711 return c; /* Assume token type for `+' is '+'. */
bfa74976 5712 @dots{}
72d2299c 5713 return INT; /* Return the type of the token. */
bfa74976
RS
5714 @dots{}
5715@}
5716@end example
5717
5718@noindent
5719This interface has been designed so that the output from the @code{lex}
5720utility can be used without change as the definition of @code{yylex}.
5721
931c7513
RS
5722If the grammar uses literal string tokens, there are two ways that
5723@code{yylex} can determine the token type codes for them:
5724
5725@itemize @bullet
5726@item
5727If the grammar defines symbolic token names as aliases for the
5728literal string tokens, @code{yylex} can use these symbolic names like
5729all others. In this case, the use of the literal string tokens in
5730the grammar file has no effect on @code{yylex}.
5731
5732@item
9ecbd125 5733@code{yylex} can find the multicharacter token in the @code{yytname}
931c7513 5734table. The index of the token in the table is the token type's code.
9ecbd125 5735The name of a multicharacter token is recorded in @code{yytname} with a
931c7513 5736double-quote, the token's characters, and another double-quote. The
9e0876fb
PE
5737token's characters are escaped as necessary to be suitable as input
5738to Bison.
931c7513 5739
9e0876fb
PE
5740Here's code for looking up a multicharacter token in @code{yytname},
5741assuming that the characters of the token are stored in
5742@code{token_buffer}, and assuming that the token does not contain any
5743characters like @samp{"} that require escaping.
931c7513
RS
5744
5745@smallexample
5746for (i = 0; i < YYNTOKENS; i++)
5747 @{
5748 if (yytname[i] != 0
5749 && yytname[i][0] == '"'
68449b3a
PE
5750 && ! strncmp (yytname[i] + 1, token_buffer,
5751 strlen (token_buffer))
931c7513
RS
5752 && yytname[i][strlen (token_buffer) + 1] == '"'
5753 && yytname[i][strlen (token_buffer) + 2] == 0)
5754 break;
5755 @}
5756@end smallexample
5757
5758The @code{yytname} table is generated only if you use the
8c9a50be 5759@code{%token-table} declaration. @xref{Decl Summary}.
931c7513
RS
5760@end itemize
5761
342b8b6e 5762@node Token Values
bfa74976
RS
5763@subsection Semantic Values of Tokens
5764
5765@vindex yylval
9d9b8b70 5766In an ordinary (nonreentrant) parser, the semantic value of the token must
bfa74976
RS
5767be stored into the global variable @code{yylval}. When you are using
5768just one data type for semantic values, @code{yylval} has that type.
5769Thus, if the type is @code{int} (the default), you might write this in
5770@code{yylex}:
5771
5772@example
5773@group
5774 @dots{}
72d2299c
PE
5775 yylval = value; /* Put value onto Bison stack. */
5776 return INT; /* Return the type of the token. */
bfa74976
RS
5777 @dots{}
5778@end group
5779@end example
5780
5781When you are using multiple data types, @code{yylval}'s type is a union
704a47c4
AD
5782made from the @code{%union} declaration (@pxref{Union Decl, ,The
5783Collection of Value Types}). So when you store a token's value, you
5784must use the proper member of the union. If the @code{%union}
5785declaration looks like this:
bfa74976
RS
5786
5787@example
5788@group
5789%union @{
5790 int intval;
5791 double val;
5792 symrec *tptr;
5793@}
5794@end group
5795@end example
5796
5797@noindent
5798then the code in @code{yylex} might look like this:
5799
5800@example
5801@group
5802 @dots{}
72d2299c
PE
5803 yylval.intval = value; /* Put value onto Bison stack. */
5804 return INT; /* Return the type of the token. */
bfa74976
RS
5805 @dots{}
5806@end group
5807@end example
5808
95923bd6
AD
5809@node Token Locations
5810@subsection Textual Locations of Tokens
bfa74976
RS
5811
5812@vindex yylloc
7404cdf3
JD
5813If you are using the @samp{@@@var{n}}-feature (@pxref{Tracking Locations})
5814in actions to keep track of the textual locations of tokens and groupings,
5815then you must provide this information in @code{yylex}. The function
5816@code{yyparse} expects to find the textual location of a token just parsed
5817in the global variable @code{yylloc}. So @code{yylex} must store the proper
5818data in that variable.
847bf1f5
AD
5819
5820By default, the value of @code{yylloc} is a structure and you need only
89cab50d
AD
5821initialize the members that are going to be used by the actions. The
5822four members are called @code{first_line}, @code{first_column},
5823@code{last_line} and @code{last_column}. Note that the use of this
5824feature makes the parser noticeably slower.
bfa74976
RS
5825
5826@tindex YYLTYPE
5827The data type of @code{yylloc} has the name @code{YYLTYPE}.
5828
342b8b6e 5829@node Pure Calling
c656404a 5830@subsection Calling Conventions for Pure Parsers
bfa74976 5831
d9df47b6 5832When you use the Bison declaration @code{%define api.pure} to request a
e425e872
RS
5833pure, reentrant parser, the global communication variables @code{yylval}
5834and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
5835Parser}.) In such parsers the two global variables are replaced by
5836pointers passed as arguments to @code{yylex}. You must declare them as
5837shown here, and pass the information back by storing it through those
5838pointers.
bfa74976
RS
5839
5840@example
13863333
AD
5841int
5842yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
bfa74976
RS
5843@{
5844 @dots{}
5845 *lvalp = value; /* Put value onto Bison stack. */
5846 return INT; /* Return the type of the token. */
5847 @dots{}
5848@}
5849@end example
5850
5851If the grammar file does not use the @samp{@@} constructs to refer to
95923bd6 5852textual locations, then the type @code{YYLTYPE} will not be defined. In
bfa74976
RS
5853this case, omit the second argument; @code{yylex} will be called with
5854only one argument.
5855
e425e872 5856
2a8d363a
AD
5857If you wish to pass the additional parameter data to @code{yylex}, use
5858@code{%lex-param} just like @code{%parse-param} (@pxref{Parser
5859Function}).
e425e872 5860
feeb0eda 5861@deffn {Directive} lex-param @{@var{argument-declaration}@}
2a8d363a 5862@findex %lex-param
287c78f6
PE
5863Declare that the braced-code @var{argument-declaration} is an
5864additional @code{yylex} argument declaration.
2a8d363a 5865@end deffn
e425e872 5866
2a8d363a 5867For instance:
e425e872
RS
5868
5869@example
feeb0eda
PE
5870%parse-param @{int *nastiness@}
5871%lex-param @{int *nastiness@}
5872%parse-param @{int *randomness@}
e425e872
RS
5873@end example
5874
5875@noindent
2a8d363a 5876results in the following signature:
e425e872
RS
5877
5878@example
2a8d363a
AD
5879int yylex (int *nastiness);
5880int yyparse (int *nastiness, int *randomness);
e425e872
RS
5881@end example
5882
d9df47b6 5883If @code{%define api.pure} is added:
c656404a
RS
5884
5885@example
2a8d363a
AD
5886int yylex (YYSTYPE *lvalp, int *nastiness);
5887int yyparse (int *nastiness, int *randomness);
c656404a
RS
5888@end example
5889
2a8d363a 5890@noindent
d9df47b6 5891and finally, if both @code{%define api.pure} and @code{%locations} are used:
c656404a 5892
2a8d363a
AD
5893@example
5894int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
5895int yyparse (int *nastiness, int *randomness);
5896@end example
931c7513 5897
342b8b6e 5898@node Error Reporting
bfa74976
RS
5899@section The Error Reporting Function @code{yyerror}
5900@cindex error reporting function
5901@findex yyerror
5902@cindex parse error
5903@cindex syntax error
5904
6e649e65 5905The Bison parser detects a @dfn{syntax error} or @dfn{parse error}
9ecbd125 5906whenever it reads a token which cannot satisfy any syntax rule. An
bfa74976 5907action in the grammar can also explicitly proclaim an error, using the
ceed8467
AD
5908macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use
5909in Actions}).
bfa74976
RS
5910
5911The Bison parser expects to report the error by calling an error
5912reporting function named @code{yyerror}, which you must supply. It is
5913called by @code{yyparse} whenever a syntax error is found, and it
6e649e65
PE
5914receives one argument. For a syntax error, the string is normally
5915@w{@code{"syntax error"}}.
bfa74976 5916
2a8d363a 5917@findex %error-verbose
6f04ee6c
JD
5918If you invoke the directive @code{%error-verbose} in the Bison declarations
5919section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then
5920Bison provides a more verbose and specific error message string instead of
5921just plain @w{@code{"syntax error"}}. However, that message sometimes
5922contains incorrect information if LAC is not enabled (@pxref{LAC}).
bfa74976 5923
1a059451
PE
5924The parser can detect one other kind of error: memory exhaustion. This
5925can happen when the input contains constructions that are very deeply
bfa74976 5926nested. It isn't likely you will encounter this, since the Bison
1a059451
PE
5927parser normally extends its stack automatically up to a very large limit. But
5928if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual
5929fashion, except that the argument string is @w{@code{"memory exhausted"}}.
5930
5931In some cases diagnostics like @w{@code{"syntax error"}} are
5932translated automatically from English to some other language before
5933they are passed to @code{yyerror}. @xref{Internationalization}.
bfa74976
RS
5934
5935The following definition suffices in simple programs:
5936
5937@example
5938@group
13863333 5939void
38a92d50 5940yyerror (char const *s)
bfa74976
RS
5941@{
5942@end group
5943@group
5944 fprintf (stderr, "%s\n", s);
5945@}
5946@end group
5947@end example
5948
5949After @code{yyerror} returns to @code{yyparse}, the latter will attempt
5950error recovery if you have written suitable error recovery grammar rules
5951(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
5952immediately return 1.
5953
93724f13 5954Obviously, in location tracking pure parsers, @code{yyerror} should have
fa7e68c3 5955an access to the current location.
35430378 5956This is indeed the case for the GLR
2a8d363a 5957parsers, but not for the Yacc parser, for historical reasons. I.e., if
d9df47b6 5958@samp{%locations %define api.pure} is passed then the prototypes for
2a8d363a
AD
5959@code{yyerror} are:
5960
5961@example
38a92d50
PE
5962void yyerror (char const *msg); /* Yacc parsers. */
5963void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */
2a8d363a
AD
5964@end example
5965
feeb0eda 5966If @samp{%parse-param @{int *nastiness@}} is used, then:
2a8d363a
AD
5967
5968@example
b317297e
PE
5969void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */
5970void yyerror (int *nastiness, char const *msg); /* GLR parsers. */
2a8d363a
AD
5971@end example
5972
35430378 5973Finally, GLR and Yacc parsers share the same @code{yyerror} calling
2a8d363a
AD
5974convention for absolutely pure parsers, i.e., when the calling
5975convention of @code{yylex} @emph{and} the calling convention of
d9df47b6
JD
5976@code{%define api.pure} are pure.
5977I.e.:
2a8d363a
AD
5978
5979@example
5980/* Location tracking. */
5981%locations
5982/* Pure yylex. */
d9df47b6 5983%define api.pure
feeb0eda 5984%lex-param @{int *nastiness@}
2a8d363a 5985/* Pure yyparse. */
feeb0eda
PE
5986%parse-param @{int *nastiness@}
5987%parse-param @{int *randomness@}
2a8d363a
AD
5988@end example
5989
5990@noindent
5991results in the following signatures for all the parser kinds:
5992
5993@example
5994int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
5995int yyparse (int *nastiness, int *randomness);
93724f13
AD
5996void yyerror (YYLTYPE *locp,
5997 int *nastiness, int *randomness,
38a92d50 5998 char const *msg);
2a8d363a
AD
5999@end example
6000
1c0c3e95 6001@noindent
38a92d50
PE
6002The prototypes are only indications of how the code produced by Bison
6003uses @code{yyerror}. Bison-generated code always ignores the returned
6004value, so @code{yyerror} can return any type, including @code{void}.
6005Also, @code{yyerror} can be a variadic function; that is why the
6006message is always passed last.
6007
6008Traditionally @code{yyerror} returns an @code{int} that is always
6009ignored, but this is purely for historical reasons, and @code{void} is
6010preferable since it more accurately describes the return type for
6011@code{yyerror}.
93724f13 6012
bfa74976
RS
6013@vindex yynerrs
6014The variable @code{yynerrs} contains the number of syntax errors
8a2800e7 6015reported so far. Normally this variable is global; but if you
704a47c4
AD
6016request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser})
6017then it is a local variable which only the actions can access.
bfa74976 6018
342b8b6e 6019@node Action Features
bfa74976
RS
6020@section Special Features for Use in Actions
6021@cindex summary, action features
6022@cindex action features summary
6023
6024Here is a table of Bison constructs, variables and macros that
6025are useful in actions.
6026
18b519c0 6027@deffn {Variable} $$
bfa74976
RS
6028Acts like a variable that contains the semantic value for the
6029grouping made by the current rule. @xref{Actions}.
18b519c0 6030@end deffn
bfa74976 6031
18b519c0 6032@deffn {Variable} $@var{n}
bfa74976
RS
6033Acts like a variable that contains the semantic value for the
6034@var{n}th component of the current rule. @xref{Actions}.
18b519c0 6035@end deffn
bfa74976 6036
18b519c0 6037@deffn {Variable} $<@var{typealt}>$
bfa74976 6038Like @code{$$} but specifies alternative @var{typealt} in the union
704a47c4
AD
6039specified by the @code{%union} declaration. @xref{Action Types, ,Data
6040Types of Values in Actions}.
18b519c0 6041@end deffn
bfa74976 6042
18b519c0 6043@deffn {Variable} $<@var{typealt}>@var{n}
bfa74976 6044Like @code{$@var{n}} but specifies alternative @var{typealt} in the
13863333 6045union specified by the @code{%union} declaration.
e0c471a9 6046@xref{Action Types, ,Data Types of Values in Actions}.
18b519c0 6047@end deffn
bfa74976 6048
18b519c0 6049@deffn {Macro} YYABORT;
bfa74976
RS
6050Return immediately from @code{yyparse}, indicating failure.
6051@xref{Parser Function, ,The Parser Function @code{yyparse}}.
18b519c0 6052@end deffn
bfa74976 6053
18b519c0 6054@deffn {Macro} YYACCEPT;
bfa74976
RS
6055Return immediately from @code{yyparse}, indicating success.
6056@xref{Parser Function, ,The Parser Function @code{yyparse}}.
18b519c0 6057@end deffn
bfa74976 6058
18b519c0 6059@deffn {Macro} YYBACKUP (@var{token}, @var{value});
bfa74976
RS
6060@findex YYBACKUP
6061Unshift a token. This macro is allowed only for rules that reduce
742e4900 6062a single value, and only when there is no lookahead token.
35430378 6063It is also disallowed in GLR parsers.
742e4900 6064It installs a lookahead token with token type @var{token} and
bfa74976
RS
6065semantic value @var{value}; then it discards the value that was
6066going to be reduced by this rule.
6067
6068If the macro is used when it is not valid, such as when there is
742e4900 6069a lookahead token already, then it reports a syntax error with
bfa74976
RS
6070a message @samp{cannot back up} and performs ordinary error
6071recovery.
6072
6073In either case, the rest of the action is not executed.
18b519c0 6074@end deffn
bfa74976 6075
18b519c0 6076@deffn {Macro} YYEMPTY
bfa74976 6077@vindex YYEMPTY
742e4900 6078Value stored in @code{yychar} when there is no lookahead token.
18b519c0 6079@end deffn
bfa74976 6080
32c29292
JD
6081@deffn {Macro} YYEOF
6082@vindex YYEOF
742e4900 6083Value stored in @code{yychar} when the lookahead is the end of the input
32c29292
JD
6084stream.
6085@end deffn
6086
18b519c0 6087@deffn {Macro} YYERROR;
bfa74976
RS
6088@findex YYERROR
6089Cause an immediate syntax error. This statement initiates error
6090recovery just as if the parser itself had detected an error; however, it
6091does not call @code{yyerror}, and does not print any message. If you
6092want to print an error message, call @code{yyerror} explicitly before
6093the @samp{YYERROR;} statement. @xref{Error Recovery}.
18b519c0 6094@end deffn
bfa74976 6095
18b519c0 6096@deffn {Macro} YYRECOVERING
02103984
PE
6097@findex YYRECOVERING
6098The expression @code{YYRECOVERING ()} yields 1 when the parser
6099is recovering from a syntax error, and 0 otherwise.
bfa74976 6100@xref{Error Recovery}.
18b519c0 6101@end deffn
bfa74976 6102
18b519c0 6103@deffn {Variable} yychar
742e4900
JD
6104Variable containing either the lookahead token, or @code{YYEOF} when the
6105lookahead is the end of the input stream, or @code{YYEMPTY} when no lookahead
32c29292
JD
6106has been performed so the next token is not yet known.
6107Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic
6108Actions}).
742e4900 6109@xref{Lookahead, ,Lookahead Tokens}.
18b519c0 6110@end deffn
bfa74976 6111
18b519c0 6112@deffn {Macro} yyclearin;
742e4900 6113Discard the current lookahead token. This is useful primarily in
32c29292
JD
6114error rules.
6115Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR
6116Semantic Actions}).
6117@xref{Error Recovery}.
18b519c0 6118@end deffn
bfa74976 6119
18b519c0 6120@deffn {Macro} yyerrok;
bfa74976 6121Resume generating error messages immediately for subsequent syntax
13863333 6122errors. This is useful primarily in error rules.
bfa74976 6123@xref{Error Recovery}.
18b519c0 6124@end deffn
bfa74976 6125
32c29292 6126@deffn {Variable} yylloc
742e4900 6127Variable containing the lookahead token location when @code{yychar} is not set
32c29292
JD
6128to @code{YYEMPTY} or @code{YYEOF}.
6129Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic
6130Actions}).
6131@xref{Actions and Locations, ,Actions and Locations}.
6132@end deffn
6133
6134@deffn {Variable} yylval
742e4900 6135Variable containing the lookahead token semantic value when @code{yychar} is
32c29292
JD
6136not set to @code{YYEMPTY} or @code{YYEOF}.
6137Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic
6138Actions}).
6139@xref{Actions, ,Actions}.
6140@end deffn
6141
18b519c0 6142@deffn {Value} @@$
847bf1f5 6143@findex @@$
7404cdf3
JD
6144Acts like a structure variable containing information on the textual
6145location of the grouping made by the current rule. @xref{Tracking
6146Locations}.
bfa74976 6147
847bf1f5
AD
6148@c Check if those paragraphs are still useful or not.
6149
6150@c @example
6151@c struct @{
6152@c int first_line, last_line;
6153@c int first_column, last_column;
6154@c @};
6155@c @end example
6156
6157@c Thus, to get the starting line number of the third component, you would
6158@c use @samp{@@3.first_line}.
bfa74976 6159
847bf1f5
AD
6160@c In order for the members of this structure to contain valid information,
6161@c you must make @code{yylex} supply this information about each token.
6162@c If you need only certain members, then @code{yylex} need only fill in
6163@c those members.
bfa74976 6164
847bf1f5 6165@c The use of this feature makes the parser noticeably slower.
18b519c0 6166@end deffn
847bf1f5 6167
18b519c0 6168@deffn {Value} @@@var{n}
847bf1f5 6169@findex @@@var{n}
7404cdf3
JD
6170Acts like a structure variable containing information on the textual
6171location of the @var{n}th component of the current rule. @xref{Tracking
6172Locations}.
18b519c0 6173@end deffn
bfa74976 6174
f7ab6a50
PE
6175@node Internationalization
6176@section Parser Internationalization
6177@cindex internationalization
6178@cindex i18n
6179@cindex NLS
6180@cindex gettext
6181@cindex bison-po
6182
6183A Bison-generated parser can print diagnostics, including error and
6184tracing messages. By default, they appear in English. However, Bison
f8e1c9e5
AD
6185also supports outputting diagnostics in the user's native language. To
6186make this work, the user should set the usual environment variables.
6187@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}.
6188For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might
35430378 6189set the user's locale to French Canadian using the UTF-8
f7ab6a50
PE
6190encoding. The exact set of available locales depends on the user's
6191installation.
6192
6193The maintainer of a package that uses a Bison-generated parser enables
6194the internationalization of the parser's output through the following
35430378
JD
6195steps. Here we assume a package that uses GNU Autoconf and
6196GNU Automake.
f7ab6a50
PE
6197
6198@enumerate
6199@item
30757c8c 6200@cindex bison-i18n.m4
35430378 6201Into the directory containing the GNU Autoconf macros used
f7ab6a50
PE
6202by the package---often called @file{m4}---copy the
6203@file{bison-i18n.m4} file installed by Bison under
6204@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory.
6205For example:
6206
6207@example
6208cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4
6209@end example
6210
6211@item
30757c8c
PE
6212@findex BISON_I18N
6213@vindex BISON_LOCALEDIR
6214@vindex YYENABLE_NLS
f7ab6a50
PE
6215In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT}
6216invocation, add an invocation of @code{BISON_I18N}. This macro is
6217defined in the file @file{bison-i18n.m4} that you copied earlier. It
6218causes @samp{configure} to find the value of the
30757c8c
PE
6219@code{BISON_LOCALEDIR} variable, and it defines the source-language
6220symbol @code{YYENABLE_NLS} to enable translations in the
6221Bison-generated parser.
f7ab6a50
PE
6222
6223@item
6224In the @code{main} function of your program, designate the directory
6225containing Bison's runtime message catalog, through a call to
6226@samp{bindtextdomain} with domain name @samp{bison-runtime}.
6227For example:
6228
6229@example
6230bindtextdomain ("bison-runtime", BISON_LOCALEDIR);
6231@end example
6232
6233Typically this appears after any other call @code{bindtextdomain
6234(PACKAGE, LOCALEDIR)} that your package already has. Here we rely on
6235@samp{BISON_LOCALEDIR} to be defined as a string through the
6236@file{Makefile}.
6237
6238@item
6239In the @file{Makefile.am} that controls the compilation of the @code{main}
6240function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro,
6241either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example:
6242
6243@example
6244DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
6245@end example
6246
6247or:
6248
6249@example
6250AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
6251@end example
6252
6253@item
6254Finally, invoke the command @command{autoreconf} to generate the build
6255infrastructure.
6256@end enumerate
6257
bfa74976 6258
342b8b6e 6259@node Algorithm
13863333
AD
6260@chapter The Bison Parser Algorithm
6261@cindex Bison parser algorithm
bfa74976
RS
6262@cindex algorithm of parser
6263@cindex shifting
6264@cindex reduction
6265@cindex parser stack
6266@cindex stack, parser
6267
6268As Bison reads tokens, it pushes them onto a stack along with their
6269semantic values. The stack is called the @dfn{parser stack}. Pushing a
6270token is traditionally called @dfn{shifting}.
6271
6272For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
6273@samp{3} to come. The stack will have four elements, one for each token
6274that was shifted.
6275
6276But the stack does not always have an element for each token read. When
6277the last @var{n} tokens and groupings shifted match the components of a
6278grammar rule, they can be combined according to that rule. This is called
6279@dfn{reduction}. Those tokens and groupings are replaced on the stack by a
6280single grouping whose symbol is the result (left hand side) of that rule.
6281Running the rule's action is part of the process of reduction, because this
6282is what computes the semantic value of the resulting grouping.
6283
6284For example, if the infix calculator's parser stack contains this:
6285
6286@example
62871 + 5 * 3
6288@end example
6289
6290@noindent
6291and the next input token is a newline character, then the last three
6292elements can be reduced to 15 via the rule:
6293
6294@example
6295expr: expr '*' expr;
6296@end example
6297
6298@noindent
6299Then the stack contains just these three elements:
6300
6301@example
63021 + 15
6303@end example
6304
6305@noindent
6306At this point, another reduction can be made, resulting in the single value
630716. Then the newline token can be shifted.
6308
6309The parser tries, by shifts and reductions, to reduce the entire input down
6310to a single grouping whose symbol is the grammar's start-symbol
6311(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
6312
6313This kind of parser is known in the literature as a bottom-up parser.
6314
6315@menu
742e4900 6316* Lookahead:: Parser looks one token ahead when deciding what to do.
bfa74976
RS
6317* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
6318* Precedence:: Operator precedence works by resolving conflicts.
6319* Contextual Precedence:: When an operator's precedence depends on context.
6320* Parser States:: The parser is a finite-state-machine with stack.
6321* Reduce/Reduce:: When two rules are applicable in the same situation.
5da0355a 6322* Mysterious Conflicts:: Conflicts that look unjustified.
6f04ee6c 6323* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
676385e2 6324* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
1a059451 6325* Memory Management:: What happens when memory is exhausted. How to avoid it.
bfa74976
RS
6326@end menu
6327
742e4900
JD
6328@node Lookahead
6329@section Lookahead Tokens
6330@cindex lookahead token
bfa74976
RS
6331
6332The Bison parser does @emph{not} always reduce immediately as soon as the
6333last @var{n} tokens and groupings match a rule. This is because such a
6334simple strategy is inadequate to handle most languages. Instead, when a
6335reduction is possible, the parser sometimes ``looks ahead'' at the next
6336token in order to decide what to do.
6337
6338When a token is read, it is not immediately shifted; first it becomes the
742e4900 6339@dfn{lookahead token}, which is not on the stack. Now the parser can
bfa74976 6340perform one or more reductions of tokens and groupings on the stack, while
742e4900
JD
6341the lookahead token remains off to the side. When no more reductions
6342should take place, the lookahead token is shifted onto the stack. This
bfa74976 6343does not mean that all possible reductions have been done; depending on the
742e4900 6344token type of the lookahead token, some rules may choose to delay their
bfa74976
RS
6345application.
6346
742e4900 6347Here is a simple case where lookahead is needed. These three rules define
bfa74976
RS
6348expressions which contain binary addition operators and postfix unary
6349factorial operators (@samp{!}), and allow parentheses for grouping.
6350
6351@example
6352@group
6353expr: term '+' expr
6354 | term
6355 ;
6356@end group
6357
6358@group
6359term: '(' expr ')'
6360 | term '!'
6361 | NUMBER
6362 ;
6363@end group
6364@end example
6365
6366Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
6367should be done? If the following token is @samp{)}, then the first three
6368tokens must be reduced to form an @code{expr}. This is the only valid
6369course, because shifting the @samp{)} would produce a sequence of symbols
6370@w{@code{term ')'}}, and no rule allows this.
6371
6372If the following token is @samp{!}, then it must be shifted immediately so
6373that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
6374parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
6375@code{expr}. It would then be impossible to shift the @samp{!} because
6376doing so would produce on the stack the sequence of symbols @code{expr
6377'!'}. No rule allows that sequence.
6378
6379@vindex yychar
32c29292
JD
6380@vindex yylval
6381@vindex yylloc
742e4900 6382The lookahead token is stored in the variable @code{yychar}.
32c29292
JD
6383Its semantic value and location, if any, are stored in the variables
6384@code{yylval} and @code{yylloc}.
bfa74976
RS
6385@xref{Action Features, ,Special Features for Use in Actions}.
6386
342b8b6e 6387@node Shift/Reduce
bfa74976
RS
6388@section Shift/Reduce Conflicts
6389@cindex conflicts
6390@cindex shift/reduce conflicts
6391@cindex dangling @code{else}
6392@cindex @code{else}, dangling
6393
6394Suppose we are parsing a language which has if-then and if-then-else
6395statements, with a pair of rules like this:
6396
6397@example
6398@group
6399if_stmt:
6400 IF expr THEN stmt
6401 | IF expr THEN stmt ELSE stmt
6402 ;
6403@end group
6404@end example
6405
6406@noindent
6407Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
6408terminal symbols for specific keyword tokens.
6409
742e4900 6410When the @code{ELSE} token is read and becomes the lookahead token, the
bfa74976
RS
6411contents of the stack (assuming the input is valid) are just right for
6412reduction by the first rule. But it is also legitimate to shift the
6413@code{ELSE}, because that would lead to eventual reduction by the second
6414rule.
6415
6416This situation, where either a shift or a reduction would be valid, is
6417called a @dfn{shift/reduce conflict}. Bison is designed to resolve
6418these conflicts by choosing to shift, unless otherwise directed by
6419operator precedence declarations. To see the reason for this, let's
6420contrast it with the other alternative.
6421
6422Since the parser prefers to shift the @code{ELSE}, the result is to attach
6423the else-clause to the innermost if-statement, making these two inputs
6424equivalent:
6425
6426@example
6427if x then if y then win (); else lose;
6428
6429if x then do; if y then win (); else lose; end;
6430@end example
6431
6432But if the parser chose to reduce when possible rather than shift, the
6433result would be to attach the else-clause to the outermost if-statement,
6434making these two inputs equivalent:
6435
6436@example
6437if x then if y then win (); else lose;
6438
6439if x then do; if y then win (); end; else lose;
6440@end example
6441
6442The conflict exists because the grammar as written is ambiguous: either
6443parsing of the simple nested if-statement is legitimate. The established
6444convention is that these ambiguities are resolved by attaching the
6445else-clause to the innermost if-statement; this is what Bison accomplishes
6446by choosing to shift rather than reduce. (It would ideally be cleaner to
6447write an unambiguous grammar, but that is very hard to do in this case.)
6448This particular ambiguity was first encountered in the specifications of
6449Algol 60 and is called the ``dangling @code{else}'' ambiguity.
6450
6451To avoid warnings from Bison about predictable, legitimate shift/reduce
cf22447c
JD
6452conflicts, use the @code{%expect @var{n}} declaration.
6453There will be no warning as long as the number of shift/reduce conflicts
6454is exactly @var{n}, and Bison will report an error if there is a
6455different number.
bfa74976
RS
6456@xref{Expect Decl, ,Suppressing Conflict Warnings}.
6457
6458The definition of @code{if_stmt} above is solely to blame for the
6459conflict, but the conflict does not actually appear without additional
9913d6e4
JD
6460rules. Here is a complete Bison grammar file that actually manifests
6461the conflict:
bfa74976
RS
6462
6463@example
6464@group
6465%token IF THEN ELSE variable
6466%%
6467@end group
6468@group
6469stmt: expr
6470 | if_stmt
6471 ;
6472@end group
6473
6474@group
6475if_stmt:
6476 IF expr THEN stmt
6477 | IF expr THEN stmt ELSE stmt
6478 ;
6479@end group
6480
6481expr: variable
6482 ;
6483@end example
6484
342b8b6e 6485@node Precedence
bfa74976
RS
6486@section Operator Precedence
6487@cindex operator precedence
6488@cindex precedence of operators
6489
6490Another situation where shift/reduce conflicts appear is in arithmetic
6491expressions. Here shifting is not always the preferred resolution; the
6492Bison declarations for operator precedence allow you to specify when to
6493shift and when to reduce.
6494
6495@menu
6496* Why Precedence:: An example showing why precedence is needed.
6497* Using Precedence:: How to specify precedence in Bison grammars.
6498* Precedence Examples:: How these features are used in the previous example.
6499* How Precedence:: How they work.
6500@end menu
6501
342b8b6e 6502@node Why Precedence
bfa74976
RS
6503@subsection When Precedence is Needed
6504
6505Consider the following ambiguous grammar fragment (ambiguous because the
6506input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
6507
6508@example
6509@group
6510expr: expr '-' expr
6511 | expr '*' expr
6512 | expr '<' expr
6513 | '(' expr ')'
6514 @dots{}
6515 ;
6516@end group
6517@end example
6518
6519@noindent
6520Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
14ded682
AD
6521should it reduce them via the rule for the subtraction operator? It
6522depends on the next token. Of course, if the next token is @samp{)}, we
6523must reduce; shifting is invalid because no single rule can reduce the
6524token sequence @w{@samp{- 2 )}} or anything starting with that. But if
6525the next token is @samp{*} or @samp{<}, we have a choice: either
6526shifting or reduction would allow the parse to complete, but with
6527different results.
6528
6529To decide which one Bison should do, we must consider the results. If
6530the next operator token @var{op} is shifted, then it must be reduced
6531first in order to permit another opportunity to reduce the difference.
6532The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other
6533hand, if the subtraction is reduced before shifting @var{op}, the result
6534is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or
6535reduce should depend on the relative precedence of the operators
6536@samp{-} and @var{op}: @samp{*} should be shifted first, but not
6537@samp{<}.
bfa74976
RS
6538
6539@cindex associativity
6540What about input such as @w{@samp{1 - 2 - 5}}; should this be
14ded682
AD
6541@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most
6542operators we prefer the former, which is called @dfn{left association}.
6543The latter alternative, @dfn{right association}, is desirable for
6544assignment operators. The choice of left or right association is a
6545matter of whether the parser chooses to shift or reduce when the stack
742e4900 6546contains @w{@samp{1 - 2}} and the lookahead token is @samp{-}: shifting
14ded682 6547makes right-associativity.
bfa74976 6548
342b8b6e 6549@node Using Precedence
bfa74976
RS
6550@subsection Specifying Operator Precedence
6551@findex %left
6552@findex %right
6553@findex %nonassoc
6554
6555Bison allows you to specify these choices with the operator precedence
6556declarations @code{%left} and @code{%right}. Each such declaration
6557contains a list of tokens, which are operators whose precedence and
6558associativity is being declared. The @code{%left} declaration makes all
6559those operators left-associative and the @code{%right} declaration makes
6560them right-associative. A third alternative is @code{%nonassoc}, which
6561declares that it is a syntax error to find the same operator twice ``in a
6562row''.
6563
6564The relative precedence of different operators is controlled by the
6565order in which they are declared. The first @code{%left} or
6566@code{%right} declaration in the file declares the operators whose
6567precedence is lowest, the next such declaration declares the operators
6568whose precedence is a little higher, and so on.
6569
342b8b6e 6570@node Precedence Examples
bfa74976
RS
6571@subsection Precedence Examples
6572
6573In our example, we would want the following declarations:
6574
6575@example
6576%left '<'
6577%left '-'
6578%left '*'
6579@end example
6580
6581In a more complete example, which supports other operators as well, we
6582would declare them in groups of equal precedence. For example, @code{'+'} is
6583declared with @code{'-'}:
6584
6585@example
6586%left '<' '>' '=' NE LE GE
6587%left '+' '-'
6588%left '*' '/'
6589@end example
6590
6591@noindent
6592(Here @code{NE} and so on stand for the operators for ``not equal''
6593and so on. We assume that these tokens are more than one character long
6594and therefore are represented by names, not character literals.)
6595
342b8b6e 6596@node How Precedence
bfa74976
RS
6597@subsection How Precedence Works
6598
6599The first effect of the precedence declarations is to assign precedence
6600levels to the terminal symbols declared. The second effect is to assign
704a47c4
AD
6601precedence levels to certain rules: each rule gets its precedence from
6602the last terminal symbol mentioned in the components. (You can also
6603specify explicitly the precedence of a rule. @xref{Contextual
6604Precedence, ,Context-Dependent Precedence}.)
6605
6606Finally, the resolution of conflicts works by comparing the precedence
742e4900 6607of the rule being considered with that of the lookahead token. If the
704a47c4
AD
6608token's precedence is higher, the choice is to shift. If the rule's
6609precedence is higher, the choice is to reduce. If they have equal
6610precedence, the choice is made based on the associativity of that
6611precedence level. The verbose output file made by @samp{-v}
6612(@pxref{Invocation, ,Invoking Bison}) says how each conflict was
6613resolved.
bfa74976
RS
6614
6615Not all rules and not all tokens have precedence. If either the rule or
742e4900 6616the lookahead token has no precedence, then the default is to shift.
bfa74976 6617
342b8b6e 6618@node Contextual Precedence
bfa74976
RS
6619@section Context-Dependent Precedence
6620@cindex context-dependent precedence
6621@cindex unary operator precedence
6622@cindex precedence, context-dependent
6623@cindex precedence, unary operator
6624@findex %prec
6625
6626Often the precedence of an operator depends on the context. This sounds
6627outlandish at first, but it is really very common. For example, a minus
6628sign typically has a very high precedence as a unary operator, and a
6629somewhat lower precedence (lower than multiplication) as a binary operator.
6630
6631The Bison precedence declarations, @code{%left}, @code{%right} and
6632@code{%nonassoc}, can only be used once for a given token; so a token has
6633only one precedence declared in this way. For context-dependent
6634precedence, you need to use an additional mechanism: the @code{%prec}
e0c471a9 6635modifier for rules.
bfa74976
RS
6636
6637The @code{%prec} modifier declares the precedence of a particular rule by
6638specifying a terminal symbol whose precedence should be used for that rule.
6639It's not necessary for that symbol to appear otherwise in the rule. The
6640modifier's syntax is:
6641
6642@example
6643%prec @var{terminal-symbol}
6644@end example
6645
6646@noindent
6647and it is written after the components of the rule. Its effect is to
6648assign the rule the precedence of @var{terminal-symbol}, overriding
6649the precedence that would be deduced for it in the ordinary way. The
6650altered rule precedence then affects how conflicts involving that rule
6651are resolved (@pxref{Precedence, ,Operator Precedence}).
6652
6653Here is how @code{%prec} solves the problem of unary minus. First, declare
6654a precedence for a fictitious terminal symbol named @code{UMINUS}. There
6655are no tokens of this type, but the symbol serves to stand for its
6656precedence:
6657
6658@example
6659@dots{}
6660%left '+' '-'
6661%left '*'
6662%left UMINUS
6663@end example
6664
6665Now the precedence of @code{UMINUS} can be used in specific rules:
6666
6667@example
6668@group
6669exp: @dots{}
6670 | exp '-' exp
6671 @dots{}
6672 | '-' exp %prec UMINUS
6673@end group
6674@end example
6675
91d2c560 6676@ifset defaultprec
39a06c25
PE
6677If you forget to append @code{%prec UMINUS} to the rule for unary
6678minus, Bison silently assumes that minus has its usual precedence.
6679This kind of problem can be tricky to debug, since one typically
6680discovers the mistake only by testing the code.
6681
22fccf95 6682The @code{%no-default-prec;} declaration makes it easier to discover
39a06c25
PE
6683this kind of problem systematically. It causes rules that lack a
6684@code{%prec} modifier to have no precedence, even if the last terminal
6685symbol mentioned in their components has a declared precedence.
6686
22fccf95 6687If @code{%no-default-prec;} is in effect, you must specify @code{%prec}
39a06c25
PE
6688for all rules that participate in precedence conflict resolution.
6689Then you will see any shift/reduce conflict until you tell Bison how
6690to resolve it, either by changing your grammar or by adding an
6691explicit precedence. This will probably add declarations to the
6692grammar, but it helps to protect against incorrect rule precedences.
6693
22fccf95
PE
6694The effect of @code{%no-default-prec;} can be reversed by giving
6695@code{%default-prec;}, which is the default.
91d2c560 6696@end ifset
39a06c25 6697
342b8b6e 6698@node Parser States
bfa74976
RS
6699@section Parser States
6700@cindex finite-state machine
6701@cindex parser state
6702@cindex state (of parser)
6703
6704The function @code{yyparse} is implemented using a finite-state machine.
6705The values pushed on the parser stack are not simply token type codes; they
6706represent the entire sequence of terminal and nonterminal symbols at or
6707near the top of the stack. The current state collects all the information
6708about previous input which is relevant to deciding what to do next.
6709
742e4900
JD
6710Each time a lookahead token is read, the current parser state together
6711with the type of lookahead token are looked up in a table. This table
6712entry can say, ``Shift the lookahead token.'' In this case, it also
bfa74976
RS
6713specifies the new parser state, which is pushed onto the top of the
6714parser stack. Or it can say, ``Reduce using rule number @var{n}.''
6715This means that a certain number of tokens or groupings are taken off
6716the top of the stack, and replaced by one grouping. In other words,
6717that number of states are popped from the stack, and one new state is
6718pushed.
6719
742e4900 6720There is one other alternative: the table can say that the lookahead token
bfa74976
RS
6721is erroneous in the current state. This causes error processing to begin
6722(@pxref{Error Recovery}).
6723
342b8b6e 6724@node Reduce/Reduce
bfa74976
RS
6725@section Reduce/Reduce Conflicts
6726@cindex reduce/reduce conflict
6727@cindex conflicts, reduce/reduce
6728
6729A reduce/reduce conflict occurs if there are two or more rules that apply
6730to the same sequence of input. This usually indicates a serious error
6731in the grammar.
6732
6733For example, here is an erroneous attempt to define a sequence
6734of zero or more @code{word} groupings.
6735
6736@example
98842516 6737@group
bfa74976
RS
6738sequence: /* empty */
6739 @{ printf ("empty sequence\n"); @}
6740 | maybeword
6741 | sequence word
6742 @{ printf ("added word %s\n", $2); @}
6743 ;
98842516 6744@end group
bfa74976 6745
98842516 6746@group
bfa74976
RS
6747maybeword: /* empty */
6748 @{ printf ("empty maybeword\n"); @}
6749 | word
6750 @{ printf ("single word %s\n", $1); @}
6751 ;
98842516 6752@end group
bfa74976
RS
6753@end example
6754
6755@noindent
6756The error is an ambiguity: there is more than one way to parse a single
6757@code{word} into a @code{sequence}. It could be reduced to a
6758@code{maybeword} and then into a @code{sequence} via the second rule.
6759Alternatively, nothing-at-all could be reduced into a @code{sequence}
6760via the first rule, and this could be combined with the @code{word}
6761using the third rule for @code{sequence}.
6762
6763There is also more than one way to reduce nothing-at-all into a
6764@code{sequence}. This can be done directly via the first rule,
6765or indirectly via @code{maybeword} and then the second rule.
6766
6767You might think that this is a distinction without a difference, because it
6768does not change whether any particular input is valid or not. But it does
6769affect which actions are run. One parsing order runs the second rule's
6770action; the other runs the first rule's action and the third rule's action.
6771In this example, the output of the program changes.
6772
6773Bison resolves a reduce/reduce conflict by choosing to use the rule that
6774appears first in the grammar, but it is very risky to rely on this. Every
6775reduce/reduce conflict must be studied and usually eliminated. Here is the
6776proper way to define @code{sequence}:
6777
6778@example
6779sequence: /* empty */
6780 @{ printf ("empty sequence\n"); @}
6781 | sequence word
6782 @{ printf ("added word %s\n", $2); @}
6783 ;
6784@end example
6785
6786Here is another common error that yields a reduce/reduce conflict:
6787
6788@example
6789sequence: /* empty */
6790 | sequence words
6791 | sequence redirects
6792 ;
6793
6794words: /* empty */
6795 | words word
6796 ;
6797
6798redirects:/* empty */
6799 | redirects redirect
6800 ;
6801@end example
6802
6803@noindent
6804The intention here is to define a sequence which can contain either
6805@code{word} or @code{redirect} groupings. The individual definitions of
6806@code{sequence}, @code{words} and @code{redirects} are error-free, but the
6807three together make a subtle ambiguity: even an empty input can be parsed
6808in infinitely many ways!
6809
6810Consider: nothing-at-all could be a @code{words}. Or it could be two
6811@code{words} in a row, or three, or any number. It could equally well be a
6812@code{redirects}, or two, or any number. Or it could be a @code{words}
6813followed by three @code{redirects} and another @code{words}. And so on.
6814
6815Here are two ways to correct these rules. First, to make it a single level
6816of sequence:
6817
6818@example
6819sequence: /* empty */
6820 | sequence word
6821 | sequence redirect
6822 ;
6823@end example
6824
6825Second, to prevent either a @code{words} or a @code{redirects}
6826from being empty:
6827
6828@example
98842516 6829@group
bfa74976
RS
6830sequence: /* empty */
6831 | sequence words
6832 | sequence redirects
6833 ;
98842516 6834@end group
bfa74976 6835
98842516 6836@group
bfa74976
RS
6837words: word
6838 | words word
6839 ;
98842516 6840@end group
bfa74976 6841
98842516 6842@group
bfa74976
RS
6843redirects:redirect
6844 | redirects redirect
6845 ;
98842516 6846@end group
bfa74976
RS
6847@end example
6848
5da0355a
JD
6849@node Mysterious Conflicts
6850@section Mysterious Conflicts
6f04ee6c 6851@cindex Mysterious Conflicts
bfa74976
RS
6852
6853Sometimes reduce/reduce conflicts can occur that don't look warranted.
6854Here is an example:
6855
6856@example
6857@group
6858%token ID
6859
6860%%
6861def: param_spec return_spec ','
6862 ;
6863param_spec:
6864 type
6865 | name_list ':' type
6866 ;
6867@end group
6868@group
6869return_spec:
6870 type
6871 | name ':' type
6872 ;
6873@end group
6874@group
6875type: ID
6876 ;
6877@end group
6878@group
6879name: ID
6880 ;
6881name_list:
6882 name
6883 | name ',' name_list
6884 ;
6885@end group
6886@end example
6887
6888It would seem that this grammar can be parsed with only a single token
742e4900 6889of lookahead: when a @code{param_spec} is being read, an @code{ID} is
bfa74976 6890a @code{name} if a comma or colon follows, or a @code{type} if another
35430378 6891@code{ID} follows. In other words, this grammar is LR(1).
bfa74976 6892
6f04ee6c
JD
6893@cindex LR
6894@cindex LALR
34a6c2d1 6895However, for historical reasons, Bison cannot by default handle all
35430378 6896LR(1) grammars.
34a6c2d1
JD
6897In this grammar, two contexts, that after an @code{ID} at the beginning
6898of a @code{param_spec} and likewise at the beginning of a
6899@code{return_spec}, are similar enough that Bison assumes they are the
6900same.
6901They appear similar because the same set of rules would be
bfa74976
RS
6902active---the rule for reducing to a @code{name} and that for reducing to
6903a @code{type}. Bison is unable to determine at that stage of processing
742e4900 6904that the rules would require different lookahead tokens in the two
bfa74976
RS
6905contexts, so it makes a single parser state for them both. Combining
6906the two contexts causes a conflict later. In parser terminology, this
35430378 6907occurrence means that the grammar is not LALR(1).
bfa74976 6908
6f04ee6c
JD
6909@cindex IELR
6910@cindex canonical LR
6911For many practical grammars (specifically those that fall into the non-LR(1)
6912class), the limitations of LALR(1) result in difficulties beyond just
6913mysterious reduce/reduce conflicts. The best way to fix all these problems
6914is to select a different parser table construction algorithm. Either
6915IELR(1) or canonical LR(1) would suffice, but the former is more efficient
6916and easier to debug during development. @xref{LR Table Construction}, for
6917details. (Bison's IELR(1) and canonical LR(1) implementations are
6918experimental. More user feedback will help to stabilize them.)
34a6c2d1 6919
35430378 6920If you instead wish to work around LALR(1)'s limitations, you
34a6c2d1
JD
6921can often fix a mysterious conflict by identifying the two parser states
6922that are being confused, and adding something to make them look
6923distinct. In the above example, adding one rule to
bfa74976
RS
6924@code{return_spec} as follows makes the problem go away:
6925
6926@example
6927@group
6928%token BOGUS
6929@dots{}
6930%%
6931@dots{}
6932return_spec:
6933 type
6934 | name ':' type
6935 /* This rule is never used. */
6936 | ID BOGUS
6937 ;
6938@end group
6939@end example
6940
6941This corrects the problem because it introduces the possibility of an
6942additional active rule in the context after the @code{ID} at the beginning of
6943@code{return_spec}. This rule is not active in the corresponding context
6944in a @code{param_spec}, so the two contexts receive distinct parser states.
6945As long as the token @code{BOGUS} is never generated by @code{yylex},
6946the added rule cannot alter the way actual input is parsed.
6947
6948In this particular example, there is another way to solve the problem:
6949rewrite the rule for @code{return_spec} to use @code{ID} directly
6950instead of via @code{name}. This also causes the two confusing
6951contexts to have different sets of active rules, because the one for
6952@code{return_spec} activates the altered rule for @code{return_spec}
6953rather than the one for @code{name}.
6954
6955@example
6956param_spec:
6957 type
6958 | name_list ':' type
6959 ;
6960return_spec:
6961 type
6962 | ID ':' type
6963 ;
6964@end example
6965
35430378 6966For a more detailed exposition of LALR(1) parsers and parser
71caec06 6967generators, @pxref{Bibliography,,DeRemer 1982}.
e054b190 6968
6f04ee6c
JD
6969@node Tuning LR
6970@section Tuning LR
6971
6972The default behavior of Bison's LR-based parsers is chosen mostly for
6973historical reasons, but that behavior is often not robust. For example, in
6974the previous section, we discussed the mysterious conflicts that can be
6975produced by LALR(1), Bison's default parser table construction algorithm.
6976Another example is Bison's @code{%error-verbose} directive, which instructs
6977the generated parser to produce verbose syntax error messages, which can
6978sometimes contain incorrect information.
6979
6980In this section, we explore several modern features of Bison that allow you
6981to tune fundamental aspects of the generated LR-based parsers. Some of
6982these features easily eliminate shortcomings like those mentioned above.
6983Others can be helpful purely for understanding your parser.
6984
6985Most of the features discussed in this section are still experimental. More
6986user feedback will help to stabilize them.
6987
6988@menu
6989* LR Table Construction:: Choose a different construction algorithm.
6990* Default Reductions:: Disable default reductions.
6991* LAC:: Correct lookahead sets in the parser states.
6992* Unreachable States:: Keep unreachable parser states for debugging.
6993@end menu
6994
6995@node LR Table Construction
6996@subsection LR Table Construction
6997@cindex Mysterious Conflict
6998@cindex LALR
6999@cindex IELR
7000@cindex canonical LR
7001@findex %define lr.type
7002
7003For historical reasons, Bison constructs LALR(1) parser tables by default.
7004However, LALR does not possess the full language-recognition power of LR.
7005As a result, the behavior of parsers employing LALR parser tables is often
5da0355a 7006mysterious. We presented a simple example of this effect in @ref{Mysterious
6f04ee6c
JD
7007Conflicts}.
7008
7009As we also demonstrated in that example, the traditional approach to
7010eliminating such mysterious behavior is to restructure the grammar.
7011Unfortunately, doing so correctly is often difficult. Moreover, merely
7012discovering that LALR causes mysterious behavior in your parser can be
7013difficult as well.
7014
7015Fortunately, Bison provides an easy way to eliminate the possibility of such
7016mysterious behavior altogether. You simply need to activate a more powerful
7017parser table construction algorithm by using the @code{%define lr.type}
7018directive.
7019
7020@deffn {Directive} {%define lr.type @var{TYPE}}
7021Specify the type of parser tables within the LR(1) family. The accepted
7022values for @var{TYPE} are:
7023
7024@itemize
7025@item @code{lalr} (default)
7026@item @code{ielr}
7027@item @code{canonical-lr}
7028@end itemize
7029
7030(This feature is experimental. More user feedback will help to stabilize
7031it.)
7032@end deffn
7033
7034For example, to activate IELR, you might add the following directive to you
7035grammar file:
7036
7037@example
7038%define lr.type ielr
7039@end example
7040
5da0355a 7041@noindent For the example in @ref{Mysterious Conflicts}, the mysterious
6f04ee6c
JD
7042conflict is then eliminated, so there is no need to invest time in
7043comprehending the conflict or restructuring the grammar to fix it. If,
7044during future development, the grammar evolves such that all mysterious
7045behavior would have disappeared using just LALR, you need not fear that
7046continuing to use IELR will result in unnecessarily large parser tables.
7047That is, IELR generates LALR tables when LALR (using a deterministic parsing
7048algorithm) is sufficient to support the full language-recognition power of
7049LR. Thus, by enabling IELR at the start of grammar development, you can
7050safely and completely eliminate the need to consider LALR's shortcomings.
7051
7052While IELR is almost always preferable, there are circumstances where LALR
7053or the canonical LR parser tables described by Knuth
7054(@pxref{Bibliography,,Knuth 1965}) can be useful. Here we summarize the
7055relative advantages of each parser table construction algorithm within
7056Bison:
7057
7058@itemize
7059@item LALR
7060
7061There are at least two scenarios where LALR can be worthwhile:
7062
7063@itemize
7064@item GLR without static conflict resolution.
7065
7066@cindex GLR with LALR
7067When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
7068conflicts statically (for example, with @code{%left} or @code{%prec}), then
7069the parser explores all potential parses of any given input. In this case,
7070the choice of parser table construction algorithm is guaranteed not to alter
7071the language accepted by the parser. LALR parser tables are the smallest
7072parser tables Bison can currently construct, so they may then be preferable.
7073Nevertheless, once you begin to resolve conflicts statically, GLR behaves
7074more like a deterministic parser in the syntactic contexts where those
7075conflicts appear, and so either IELR or canonical LR can then be helpful to
7076avoid LALR's mysterious behavior.
7077
7078@item Malformed grammars.
7079
7080Occasionally during development, an especially malformed grammar with a
7081major recurring flaw may severely impede the IELR or canonical LR parser
7082table construction algorithm. LALR can be a quick way to construct parser
7083tables in order to investigate such problems while ignoring the more subtle
7084differences from IELR and canonical LR.
7085@end itemize
7086
7087@item IELR
7088
7089IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given
7090any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables
7091always accept exactly the same set of sentences. However, like LALR, IELR
7092merges parser states during parser table construction so that the number of
7093parser states is often an order of magnitude less than for canonical LR.
7094More importantly, because canonical LR's extra parser states may contain
7095duplicate conflicts in the case of non-LR grammars, the number of conflicts
7096for IELR is often an order of magnitude less as well. This effect can
7097significantly reduce the complexity of developing a grammar.
7098
7099@item Canonical LR
7100
7101@cindex delayed syntax error detection
7102@cindex LAC
7103@findex %nonassoc
7104While inefficient, canonical LR parser tables can be an interesting means to
7105explore a grammar because they possess a property that IELR and LALR tables
7106do not. That is, if @code{%nonassoc} is not used and default reductions are
7107left disabled (@pxref{Default Reductions}), then, for every left context of
7108every canonical LR state, the set of tokens accepted by that state is
7109guaranteed to be the exact set of tokens that is syntactically acceptable in
7110that left context. It might then seem that an advantage of canonical LR
7111parsers in production is that, under the above constraints, they are
7112guaranteed to detect a syntax error as soon as possible without performing
7113any unnecessary reductions. However, IELR parsers that use LAC are also
7114able to achieve this behavior without sacrificing @code{%nonassoc} or
7115default reductions. For details and a few caveats of LAC, @pxref{LAC}.
7116@end itemize
7117
7118For a more detailed exposition of the mysterious behavior in LALR parsers
7119and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and
7120@ref{Bibliography,,Denny 2010 November}.
7121
7122@node Default Reductions
7123@subsection Default Reductions
7124@cindex default reductions
7125@findex %define lr.default-reductions
7126@findex %nonassoc
7127
7128After parser table construction, Bison identifies the reduction with the
7129largest lookahead set in each parser state. To reduce the size of the
7130parser state, traditional Bison behavior is to remove that lookahead set and
7131to assign that reduction to be the default parser action. Such a reduction
7132is known as a @dfn{default reduction}.
7133
7134Default reductions affect more than the size of the parser tables. They
7135also affect the behavior of the parser:
7136
7137@itemize
7138@item Delayed @code{yylex} invocations.
7139
7140@cindex delayed yylex invocations
7141@cindex consistent states
7142@cindex defaulted states
7143A @dfn{consistent state} is a state that has only one possible parser
7144action. If that action is a reduction and is encoded as a default
7145reduction, then that consistent state is called a @dfn{defaulted state}.
7146Upon reaching a defaulted state, a Bison-generated parser does not bother to
7147invoke @code{yylex} to fetch the next token before performing the reduction.
7148In other words, whether default reductions are enabled in consistent states
7149determines how soon a Bison-generated parser invokes @code{yylex} for a
7150token: immediately when it @emph{reaches} that token in the input or when it
7151eventually @emph{needs} that token as a lookahead to determine the next
7152parser action. Traditionally, default reductions are enabled, and so the
7153parser exhibits the latter behavior.
7154
7155The presence of defaulted states is an important consideration when
7156designing @code{yylex} and the grammar file. That is, if the behavior of
7157@code{yylex} can influence or be influenced by the semantic actions
7158associated with the reductions in defaulted states, then the delay of the
7159next @code{yylex} invocation until after those reductions is significant.
7160For example, the semantic actions might pop a scope stack that @code{yylex}
7161uses to determine what token to return. Thus, the delay might be necessary
7162to ensure that @code{yylex} does not look up the next token in a scope that
7163should already be considered closed.
7164
7165@item Delayed syntax error detection.
7166
7167@cindex delayed syntax error detection
7168When the parser fetches a new token by invoking @code{yylex}, it checks
7169whether there is an action for that token in the current parser state. The
7170parser detects a syntax error if and only if either (1) there is no action
7171for that token or (2) the action for that token is the error action (due to
7172the use of @code{%nonassoc}). However, if there is a default reduction in
7173that state (which might or might not be a defaulted state), then it is
7174impossible for condition 1 to exist. That is, all tokens have an action.
7175Thus, the parser sometimes fails to detect the syntax error until it reaches
7176a later state.
7177
7178@cindex LAC
7179@c If there's an infinite loop, default reductions can prevent an incorrect
7180@c sentence from being rejected.
7181While default reductions never cause the parser to accept syntactically
7182incorrect sentences, the delay of syntax error detection can have unexpected
7183effects on the behavior of the parser. However, the delay can be caused
7184anyway by parser state merging and the use of @code{%nonassoc}, and it can
7185be fixed by another Bison feature, LAC. We discuss the effects of delayed
7186syntax error detection and LAC more in the next section (@pxref{LAC}).
7187@end itemize
7188
7189For canonical LR, the only default reduction that Bison enables by default
7190is the accept action, which appears only in the accepting state, which has
7191no other action and is thus a defaulted state. However, the default accept
7192action does not delay any @code{yylex} invocation or syntax error detection
7193because the accept action ends the parse.
7194
7195For LALR and IELR, Bison enables default reductions in nearly all states by
7196default. There are only two exceptions. First, states that have a shift
7197action on the @code{error} token do not have default reductions because
7198delayed syntax error detection could then prevent the @code{error} token
7199from ever being shifted in that state. However, parser state merging can
7200cause the same effect anyway, and LAC fixes it in both cases, so future
7201versions of Bison might drop this exception when LAC is activated. Second,
7202GLR parsers do not record the default reduction as the action on a lookahead
7203token for which there is a conflict. The correct action in this case is to
7204split the parse instead.
7205
7206To adjust which states have default reductions enabled, use the
7207@code{%define lr.default-reductions} directive.
7208
7209@deffn {Directive} {%define lr.default-reductions @var{WHERE}}
7210Specify the kind of states that are permitted to contain default reductions.
7211The accepted values of @var{WHERE} are:
7212@itemize
a6e5a280 7213@item @code{most} (default for LALR and IELR)
6f04ee6c
JD
7214@item @code{consistent}
7215@item @code{accepting} (default for canonical LR)
7216@end itemize
7217
7218(The ability to specify where default reductions are permitted is
7219experimental. More user feedback will help to stabilize it.)
7220@end deffn
7221
6f04ee6c
JD
7222@node LAC
7223@subsection LAC
7224@findex %define parse.lac
7225@cindex LAC
7226@cindex lookahead correction
7227
7228Canonical LR, IELR, and LALR can suffer from a couple of problems upon
7229encountering a syntax error. First, the parser might perform additional
7230parser stack reductions before discovering the syntax error. Such
7231reductions can perform user semantic actions that are unexpected because
7232they are based on an invalid token, and they cause error recovery to begin
7233in a different syntactic context than the one in which the invalid token was
7234encountered. Second, when verbose error messages are enabled (@pxref{Error
7235Reporting}), the expected token list in the syntax error message can both
7236contain invalid tokens and omit valid tokens.
7237
7238The culprits for the above problems are @code{%nonassoc}, default reductions
7239in inconsistent states (@pxref{Default Reductions}), and parser state
7240merging. Because IELR and LALR merge parser states, they suffer the most.
7241Canonical LR can suffer only if @code{%nonassoc} is used or if default
7242reductions are enabled for inconsistent states.
7243
7244LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
7245that solves these problems for canonical LR, IELR, and LALR without
7246sacrificing @code{%nonassoc}, default reductions, or state merging. You can
7247enable LAC with the @code{%define parse.lac} directive.
7248
7249@deffn {Directive} {%define parse.lac @var{VALUE}}
7250Enable LAC to improve syntax error handling.
7251@itemize
7252@item @code{none} (default)
7253@item @code{full}
7254@end itemize
7255(This feature is experimental. More user feedback will help to stabilize
7256it. Moreover, it is currently only available for deterministic parsers in
7257C.)
7258@end deffn
7259
7260Conceptually, the LAC mechanism is straight-forward. Whenever the parser
7261fetches a new token from the scanner so that it can determine the next
7262parser action, it immediately suspends normal parsing and performs an
7263exploratory parse using a temporary copy of the normal parser state stack.
7264During this exploratory parse, the parser does not perform user semantic
7265actions. If the exploratory parse reaches a shift action, normal parsing
7266then resumes on the normal parser stacks. If the exploratory parse reaches
7267an error instead, the parser reports a syntax error. If verbose syntax
7268error messages are enabled, the parser must then discover the list of
7269expected tokens, so it performs a separate exploratory parse for each token
7270in the grammar.
7271
7272There is one subtlety about the use of LAC. That is, when in a consistent
7273parser state with a default reduction, the parser will not attempt to fetch
7274a token from the scanner because no lookahead is needed to determine the
7275next parser action. Thus, whether default reductions are enabled in
7276consistent states (@pxref{Default Reductions}) affects how soon the parser
7277detects a syntax error: immediately when it @emph{reaches} an erroneous
7278token or when it eventually @emph{needs} that token as a lookahead to
7279determine the next parser action. The latter behavior is probably more
7280intuitive, so Bison currently provides no way to achieve the former behavior
7281while default reductions are enabled in consistent states.
7282
7283Thus, when LAC is in use, for some fixed decision of whether to enable
7284default reductions in consistent states, canonical LR and IELR behave almost
7285exactly the same for both syntactically acceptable and syntactically
7286unacceptable input. While LALR still does not support the full
7287language-recognition power of canonical LR and IELR, LAC at least enables
7288LALR's syntax error handling to correctly reflect LALR's
7289language-recognition power.
7290
7291There are a few caveats to consider when using LAC:
7292
7293@itemize
7294@item Infinite parsing loops.
7295
7296IELR plus LAC does have one shortcoming relative to canonical LR. Some
7297parsers generated by Bison can loop infinitely. LAC does not fix infinite
7298parsing loops that occur between encountering a syntax error and detecting
7299it, but enabling canonical LR or disabling default reductions sometimes
7300does.
7301
7302@item Verbose error message limitations.
7303
7304Because of internationalization considerations, Bison-generated parsers
7305limit the size of the expected token list they are willing to report in a
7306verbose syntax error message. If the number of expected tokens exceeds that
7307limit, the list is simply dropped from the message. Enabling LAC can
7308increase the size of the list and thus cause the parser to drop it. Of
7309course, dropping the list is better than reporting an incorrect list.
7310
7311@item Performance.
7312
7313Because LAC requires many parse actions to be performed twice, it can have a
7314performance penalty. However, not all parse actions must be performed
7315twice. Specifically, during a series of default reductions in consistent
7316states and shift actions, the parser never has to initiate an exploratory
7317parse. Moreover, the most time-consuming tasks in a parse are often the
7318file I/O, the lexical analysis performed by the scanner, and the user's
7319semantic actions, but none of these are performed during the exploratory
7320parse. Finally, the base of the temporary stack used during an exploratory
7321parse is a pointer into the normal parser state stack so that the stack is
7322never physically copied. In our experience, the performance penalty of LAC
7323has proven insignificant for practical grammars.
7324@end itemize
7325
56706c61
JD
7326While the LAC algorithm shares techniques that have been recognized in the
7327parser community for years, for the publication that introduces LAC,
7328@pxref{Bibliography,,Denny 2010 May}.
121c4982 7329
6f04ee6c
JD
7330@node Unreachable States
7331@subsection Unreachable States
7332@findex %define lr.keep-unreachable-states
7333@cindex unreachable states
7334
7335If there exists no sequence of transitions from the parser's start state to
7336some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable
7337state}. A state can become unreachable during conflict resolution if Bison
7338disables a shift action leading to it from a predecessor state.
7339
7340By default, Bison removes unreachable states from the parser after conflict
7341resolution because they are useless in the generated parser. However,
7342keeping unreachable states is sometimes useful when trying to understand the
7343relationship between the parser and the grammar.
7344
7345@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}}
7346Request that Bison allow unreachable states to remain in the parser tables.
7347@var{VALUE} must be a Boolean. The default is @code{false}.
7348@end deffn
7349
7350There are a few caveats to consider:
7351
7352@itemize @bullet
7353@item Missing or extraneous warnings.
7354
7355Unreachable states may contain conflicts and may use rules not used in any
7356other state. Thus, keeping unreachable states may induce warnings that are
7357irrelevant to your parser's behavior, and it may eliminate warnings that are
7358relevant. Of course, the change in warnings may actually be relevant to a
7359parser table analysis that wants to keep unreachable states, so this
7360behavior will likely remain in future Bison releases.
7361
7362@item Other useless states.
7363
7364While Bison is able to remove unreachable states, it is not guaranteed to
7365remove other kinds of useless states. Specifically, when Bison disables
7366reduce actions during conflict resolution, some goto actions may become
7367useless, and thus some additional states may become useless. If Bison were
7368to compute which goto actions were useless and then disable those actions,
7369it could identify such states as unreachable and then remove those states.
7370However, Bison does not compute which goto actions are useless.
7371@end itemize
7372
fae437e8 7373@node Generalized LR Parsing
35430378
JD
7374@section Generalized LR (GLR) Parsing
7375@cindex GLR parsing
7376@cindex generalized LR (GLR) parsing
676385e2 7377@cindex ambiguous grammars
9d9b8b70 7378@cindex nondeterministic parsing
676385e2 7379
fae437e8
AD
7380Bison produces @emph{deterministic} parsers that choose uniquely
7381when to reduce and which reduction to apply
742e4900 7382based on a summary of the preceding input and on one extra token of lookahead.
676385e2
PH
7383As a result, normal Bison handles a proper subset of the family of
7384context-free languages.
fae437e8 7385Ambiguous grammars, since they have strings with more than one possible
676385e2
PH
7386sequence of reductions cannot have deterministic parsers in this sense.
7387The same is true of languages that require more than one symbol of
742e4900 7388lookahead, since the parser lacks the information necessary to make a
676385e2 7389decision at the point it must be made in a shift-reduce parser.
5da0355a 7390Finally, as previously mentioned (@pxref{Mysterious Conflicts}),
34a6c2d1 7391there are languages where Bison's default choice of how to
676385e2
PH
7392summarize the input seen so far loses necessary information.
7393
7394When you use the @samp{%glr-parser} declaration in your grammar file,
7395Bison generates a parser that uses a different algorithm, called
35430378 7396Generalized LR (or GLR). A Bison GLR
c827f760 7397parser uses the same basic
676385e2
PH
7398algorithm for parsing as an ordinary Bison parser, but behaves
7399differently in cases where there is a shift-reduce conflict that has not
fae437e8 7400been resolved by precedence rules (@pxref{Precedence}) or a
35430378 7401reduce-reduce conflict. When a GLR parser encounters such a
c827f760 7402situation, it
fae437e8 7403effectively @emph{splits} into a several parsers, one for each possible
676385e2
PH
7404shift or reduction. These parsers then proceed as usual, consuming
7405tokens in lock-step. Some of the stacks may encounter other conflicts
fae437e8 7406and split further, with the result that instead of a sequence of states,
35430378 7407a Bison GLR parsing stack is what is in effect a tree of states.
676385e2
PH
7408
7409In effect, each stack represents a guess as to what the proper parse
7410is. Additional input may indicate that a guess was wrong, in which case
7411the appropriate stack silently disappears. Otherwise, the semantics
fae437e8 7412actions generated in each stack are saved, rather than being executed
676385e2 7413immediately. When a stack disappears, its saved semantic actions never
fae437e8 7414get executed. When a reduction causes two stacks to become equivalent,
676385e2
PH
7415their sets of semantic actions are both saved with the state that
7416results from the reduction. We say that two stacks are equivalent
fae437e8 7417when they both represent the same sequence of states,
676385e2
PH
7418and each pair of corresponding states represents a
7419grammar symbol that produces the same segment of the input token
7420stream.
7421
7422Whenever the parser makes a transition from having multiple
34a6c2d1 7423states to having one, it reverts to the normal deterministic parsing
676385e2
PH
7424algorithm, after resolving and executing the saved-up actions.
7425At this transition, some of the states on the stack will have semantic
7426values that are sets (actually multisets) of possible actions. The
7427parser tries to pick one of the actions by first finding one whose rule
7428has the highest dynamic precedence, as set by the @samp{%dprec}
fae437e8 7429declaration. Otherwise, if the alternative actions are not ordered by
676385e2 7430precedence, but there the same merging function is declared for both
fae437e8 7431rules by the @samp{%merge} declaration,
676385e2
PH
7432Bison resolves and evaluates both and then calls the merge function on
7433the result. Otherwise, it reports an ambiguity.
7434
35430378
JD
7435It is possible to use a data structure for the GLR parsing tree that
7436permits the processing of any LR(1) grammar in linear time (in the
c827f760 7437size of the input), any unambiguous (not necessarily
35430378 7438LR(1)) grammar in
fae437e8 7439quadratic worst-case time, and any general (possibly ambiguous)
676385e2
PH
7440context-free grammar in cubic worst-case time. However, Bison currently
7441uses a simpler data structure that requires time proportional to the
7442length of the input times the maximum number of stacks required for any
9d9b8b70 7443prefix of the input. Thus, really ambiguous or nondeterministic
676385e2
PH
7444grammars can require exponential time and space to process. Such badly
7445behaving examples, however, are not generally of practical interest.
9d9b8b70 7446Usually, nondeterminism in a grammar is local---the parser is ``in
676385e2 7447doubt'' only for a few tokens at a time. Therefore, the current data
35430378 7448structure should generally be adequate. On LR(1) portions of a
34a6c2d1 7449grammar, in particular, it is only slightly slower than with the
35430378 7450deterministic LR(1) Bison parser.
676385e2 7451
71caec06
JD
7452For a more detailed exposition of GLR parsers, @pxref{Bibliography,,Scott
74532000}.
f6481e2f 7454
1a059451
PE
7455@node Memory Management
7456@section Memory Management, and How to Avoid Memory Exhaustion
7457@cindex memory exhaustion
7458@cindex memory management
bfa74976
RS
7459@cindex stack overflow
7460@cindex parser stack overflow
7461@cindex overflow of parser stack
7462
1a059451 7463The Bison parser stack can run out of memory if too many tokens are shifted and
bfa74976 7464not reduced. When this happens, the parser function @code{yyparse}
1a059451 7465calls @code{yyerror} and then returns 2.
bfa74976 7466
c827f760 7467Because Bison parsers have growing stacks, hitting the upper limit
d1a1114f
AD
7468usually results from using a right recursion instead of a left
7469recursion, @xref{Recursion, ,Recursive Rules}.
7470
bfa74976
RS
7471@vindex YYMAXDEPTH
7472By defining the macro @code{YYMAXDEPTH}, you can control how deep the
1a059451 7473parser stack can become before memory is exhausted. Define the
bfa74976
RS
7474macro with a value that is an integer. This value is the maximum number
7475of tokens that can be shifted (and not reduced) before overflow.
bfa74976
RS
7476
7477The stack space allowed is not necessarily allocated. If you specify a
1a059451 7478large value for @code{YYMAXDEPTH}, the parser normally allocates a small
bfa74976
RS
7479stack at first, and then makes it bigger by stages as needed. This
7480increasing allocation happens automatically and silently. Therefore,
7481you do not need to make @code{YYMAXDEPTH} painfully small merely to save
7482space for ordinary inputs that do not need much stack.
7483
d7e14fc0
PE
7484However, do not allow @code{YYMAXDEPTH} to be a value so large that
7485arithmetic overflow could occur when calculating the size of the stack
7486space. Also, do not allow @code{YYMAXDEPTH} to be less than
7487@code{YYINITDEPTH}.
7488
bfa74976
RS
7489@cindex default stack limit
7490The default value of @code{YYMAXDEPTH}, if you do not define it, is
749110000.
7492
7493@vindex YYINITDEPTH
7494You can control how much stack is allocated initially by defining the
34a6c2d1
JD
7495macro @code{YYINITDEPTH} to a positive integer. For the deterministic
7496parser in C, this value must be a compile-time constant
d7e14fc0
PE
7497unless you are assuming C99 or some other target language or compiler
7498that allows variable-length arrays. The default is 200.
7499
1a059451 7500Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}.
bfa74976 7501
d1a1114f 7502@c FIXME: C++ output.
c781580d 7503Because of semantic differences between C and C++, the deterministic
34a6c2d1 7504parsers in C produced by Bison cannot grow when compiled
1a059451
PE
7505by C++ compilers. In this precise case (compiling a C parser as C++) you are
7506suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix
7507this deficiency in a future release.
d1a1114f 7508
342b8b6e 7509@node Error Recovery
bfa74976
RS
7510@chapter Error Recovery
7511@cindex error recovery
7512@cindex recovery from errors
7513
6e649e65 7514It is not usually acceptable to have a program terminate on a syntax
bfa74976
RS
7515error. For example, a compiler should recover sufficiently to parse the
7516rest of the input file and check it for errors; a calculator should accept
7517another expression.
7518
7519In a simple interactive command parser where each input is one line, it may
7520be sufficient to allow @code{yyparse} to return 1 on error and have the
7521caller ignore the rest of the input line when that happens (and then call
7522@code{yyparse} again). But this is inadequate for a compiler, because it
7523forgets all the syntactic context leading up to the error. A syntax error
7524deep within a function in the compiler input should not cause the compiler
7525to treat the following line like the beginning of a source file.
7526
7527@findex error
7528You can define how to recover from a syntax error by writing rules to
7529recognize the special token @code{error}. This is a terminal symbol that
7530is always defined (you need not declare it) and reserved for error
7531handling. The Bison parser generates an @code{error} token whenever a
7532syntax error happens; if you have provided a rule to recognize this token
13863333 7533in the current context, the parse can continue.
bfa74976
RS
7534
7535For example:
7536
7537@example
7538stmnts: /* empty string */
7539 | stmnts '\n'
7540 | stmnts exp '\n'
7541 | stmnts error '\n'
7542@end example
7543
7544The fourth rule in this example says that an error followed by a newline
7545makes a valid addition to any @code{stmnts}.
7546
7547What happens if a syntax error occurs in the middle of an @code{exp}? The
7548error recovery rule, interpreted strictly, applies to the precise sequence
7549of a @code{stmnts}, an @code{error} and a newline. If an error occurs in
7550the middle of an @code{exp}, there will probably be some additional tokens
7551and subexpressions on the stack after the last @code{stmnts}, and there
7552will be tokens to read before the next newline. So the rule is not
7553applicable in the ordinary way.
7554
7555But Bison can force the situation to fit the rule, by discarding part of
72f889cc
AD
7556the semantic context and part of the input. First it discards states
7557and objects from the stack until it gets back to a state in which the
bfa74976 7558@code{error} token is acceptable. (This means that the subexpressions
72f889cc
AD
7559already parsed are discarded, back to the last complete @code{stmnts}.)
7560At this point the @code{error} token can be shifted. Then, if the old
742e4900 7561lookahead token is not acceptable to be shifted next, the parser reads
bfa74976 7562tokens and discards them until it finds a token which is acceptable. In
72f889cc
AD
7563this example, Bison reads and discards input until the next newline so
7564that the fourth rule can apply. Note that discarded symbols are
7565possible sources of memory leaks, see @ref{Destructor Decl, , Freeing
7566Discarded Symbols}, for a means to reclaim this memory.
bfa74976
RS
7567
7568The choice of error rules in the grammar is a choice of strategies for
7569error recovery. A simple and useful strategy is simply to skip the rest of
7570the current input line or current statement if an error is detected:
7571
7572@example
72d2299c 7573stmnt: error ';' /* On error, skip until ';' is read. */
bfa74976
RS
7574@end example
7575
7576It is also useful to recover to the matching close-delimiter of an
7577opening-delimiter that has already been parsed. Otherwise the
7578close-delimiter will probably appear to be unmatched, and generate another,
7579spurious error message:
7580
7581@example
7582primary: '(' expr ')'
7583 | '(' error ')'
7584 @dots{}
7585 ;
7586@end example
7587
7588Error recovery strategies are necessarily guesses. When they guess wrong,
7589one syntax error often leads to another. In the above example, the error
7590recovery rule guesses that an error is due to bad input within one
7591@code{stmnt}. Suppose that instead a spurious semicolon is inserted in the
7592middle of a valid @code{stmnt}. After the error recovery rule recovers
7593from the first error, another syntax error will be found straightaway,
7594since the text following the spurious semicolon is also an invalid
7595@code{stmnt}.
7596
7597To prevent an outpouring of error messages, the parser will output no error
7598message for another syntax error that happens shortly after the first; only
7599after three consecutive input tokens have been successfully shifted will
7600error messages resume.
7601
7602Note that rules which accept the @code{error} token may have actions, just
7603as any other rules can.
7604
7605@findex yyerrok
7606You can make error messages resume immediately by using the macro
7607@code{yyerrok} in an action. If you do this in the error rule's action, no
7608error messages will be suppressed. This macro requires no arguments;
7609@samp{yyerrok;} is a valid C statement.
7610
7611@findex yyclearin
742e4900 7612The previous lookahead token is reanalyzed immediately after an error. If
bfa74976
RS
7613this is unacceptable, then the macro @code{yyclearin} may be used to clear
7614this token. Write the statement @samp{yyclearin;} in the error rule's
7615action.
32c29292 7616@xref{Action Features, ,Special Features for Use in Actions}.
bfa74976 7617
6e649e65 7618For example, suppose that on a syntax error, an error handling routine is
bfa74976
RS
7619called that advances the input stream to some point where parsing should
7620once again commence. The next symbol returned by the lexical scanner is
742e4900 7621probably correct. The previous lookahead token ought to be discarded
bfa74976
RS
7622with @samp{yyclearin;}.
7623
7624@vindex YYRECOVERING
02103984
PE
7625The expression @code{YYRECOVERING ()} yields 1 when the parser
7626is recovering from a syntax error, and 0 otherwise.
7627Syntax error diagnostics are suppressed while recovering from a syntax
7628error.
bfa74976 7629
342b8b6e 7630@node Context Dependency
bfa74976
RS
7631@chapter Handling Context Dependencies
7632
7633The Bison paradigm is to parse tokens first, then group them into larger
7634syntactic units. In many languages, the meaning of a token is affected by
7635its context. Although this violates the Bison paradigm, certain techniques
7636(known as @dfn{kludges}) may enable you to write Bison parsers for such
7637languages.
7638
7639@menu
7640* Semantic Tokens:: Token parsing can depend on the semantic context.
7641* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
7642* Tie-in Recovery:: Lexical tie-ins have implications for how
7643 error recovery rules must be written.
7644@end menu
7645
7646(Actually, ``kludge'' means any technique that gets its job done but is
7647neither clean nor robust.)
7648
342b8b6e 7649@node Semantic Tokens
bfa74976
RS
7650@section Semantic Info in Token Types
7651
7652The C language has a context dependency: the way an identifier is used
7653depends on what its current meaning is. For example, consider this:
7654
7655@example
7656foo (x);
7657@end example
7658
7659This looks like a function call statement, but if @code{foo} is a typedef
7660name, then this is actually a declaration of @code{x}. How can a Bison
7661parser for C decide how to parse this input?
7662
35430378 7663The method used in GNU C is to have two different token types,
bfa74976
RS
7664@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
7665identifier, it looks up the current declaration of the identifier in order
7666to decide which token type to return: @code{TYPENAME} if the identifier is
7667declared as a typedef, @code{IDENTIFIER} otherwise.
7668
7669The grammar rules can then express the context dependency by the choice of
7670token type to recognize. @code{IDENTIFIER} is accepted as an expression,
7671but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
7672@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
7673is @emph{not} significant, such as in declarations that can shadow a
7674typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
7675accepted---there is one rule for each of the two token types.
7676
7677This technique is simple to use if the decision of which kinds of
7678identifiers to allow is made at a place close to where the identifier is
7679parsed. But in C this is not always so: C allows a declaration to
7680redeclare a typedef name provided an explicit type has been specified
7681earlier:
7682
7683@example
3a4f411f
PE
7684typedef int foo, bar;
7685int baz (void)
98842516 7686@group
3a4f411f
PE
7687@{
7688 static bar (bar); /* @r{redeclare @code{bar} as static variable} */
7689 extern foo foo (foo); /* @r{redeclare @code{foo} as function} */
7690 return foo (bar);
7691@}
98842516 7692@end group
bfa74976
RS
7693@end example
7694
7695Unfortunately, the name being declared is separated from the declaration
7696construct itself by a complicated syntactic structure---the ``declarator''.
7697
9ecbd125 7698As a result, part of the Bison parser for C needs to be duplicated, with
14ded682
AD
7699all the nonterminal names changed: once for parsing a declaration in
7700which a typedef name can be redefined, and once for parsing a
7701declaration in which that can't be done. Here is a part of the
7702duplication, with actions omitted for brevity:
bfa74976
RS
7703
7704@example
98842516 7705@group
bfa74976
RS
7706initdcl:
7707 declarator maybeasm '='
7708 init
7709 | declarator maybeasm
7710 ;
98842516 7711@end group
bfa74976 7712
98842516 7713@group
bfa74976
RS
7714notype_initdcl:
7715 notype_declarator maybeasm '='
7716 init
7717 | notype_declarator maybeasm
7718 ;
98842516 7719@end group
bfa74976
RS
7720@end example
7721
7722@noindent
7723Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
7724cannot. The distinction between @code{declarator} and
7725@code{notype_declarator} is the same sort of thing.
7726
7727There is some similarity between this technique and a lexical tie-in
7728(described next), in that information which alters the lexical analysis is
7729changed during parsing by other parts of the program. The difference is
7730here the information is global, and is used for other purposes in the
7731program. A true lexical tie-in has a special-purpose flag controlled by
7732the syntactic context.
7733
342b8b6e 7734@node Lexical Tie-ins
bfa74976
RS
7735@section Lexical Tie-ins
7736@cindex lexical tie-in
7737
7738One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
7739which is set by Bison actions, whose purpose is to alter the way tokens are
7740parsed.
7741
7742For example, suppose we have a language vaguely like C, but with a special
7743construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
7744an expression in parentheses in which all integers are hexadecimal. In
7745particular, the token @samp{a1b} must be treated as an integer rather than
7746as an identifier if it appears in that context. Here is how you can do it:
7747
7748@example
7749@group
7750%@{
38a92d50
PE
7751 int hexflag;
7752 int yylex (void);
7753 void yyerror (char const *);
bfa74976
RS
7754%@}
7755%%
7756@dots{}
7757@end group
7758@group
7759expr: IDENTIFIER
7760 | constant
7761 | HEX '('
7762 @{ hexflag = 1; @}
7763 expr ')'
7764 @{ hexflag = 0;
7765 $$ = $4; @}
7766 | expr '+' expr
7767 @{ $$ = make_sum ($1, $3); @}
7768 @dots{}
7769 ;
7770@end group
7771
7772@group
7773constant:
7774 INTEGER
7775 | STRING
7776 ;
7777@end group
7778@end example
7779
7780@noindent
7781Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
7782it is nonzero, all integers are parsed in hexadecimal, and tokens starting
7783with letters are parsed as integers if possible.
7784
9913d6e4
JD
7785The declaration of @code{hexflag} shown in the prologue of the grammar
7786file is needed to make it accessible to the actions (@pxref{Prologue,
7787,The Prologue}). You must also write the code in @code{yylex} to obey
7788the flag.
bfa74976 7789
342b8b6e 7790@node Tie-in Recovery
bfa74976
RS
7791@section Lexical Tie-ins and Error Recovery
7792
7793Lexical tie-ins make strict demands on any error recovery rules you have.
7794@xref{Error Recovery}.
7795
7796The reason for this is that the purpose of an error recovery rule is to
7797abort the parsing of one construct and resume in some larger construct.
7798For example, in C-like languages, a typical error recovery rule is to skip
7799tokens until the next semicolon, and then start a new statement, like this:
7800
7801@example
7802stmt: expr ';'
7803 | IF '(' expr ')' stmt @{ @dots{} @}
7804 @dots{}
7805 error ';'
7806 @{ hexflag = 0; @}
7807 ;
7808@end example
7809
7810If there is a syntax error in the middle of a @samp{hex (@var{expr})}
7811construct, this error rule will apply, and then the action for the
7812completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
7813remain set for the entire rest of the input, or until the next @code{hex}
7814keyword, causing identifiers to be misinterpreted as integers.
7815
7816To avoid this problem the error recovery rule itself clears @code{hexflag}.
7817
7818There may also be an error recovery rule that works within expressions.
7819For example, there could be a rule which applies within parentheses
7820and skips to the close-parenthesis:
7821
7822@example
7823@group
7824expr: @dots{}
7825 | '(' expr ')'
7826 @{ $$ = $2; @}
7827 | '(' error ')'
7828 @dots{}
7829@end group
7830@end example
7831
7832If this rule acts within the @code{hex} construct, it is not going to abort
7833that construct (since it applies to an inner level of parentheses within
7834the construct). Therefore, it should not clear the flag: the rest of
7835the @code{hex} construct should be parsed with the flag still in effect.
7836
7837What if there is an error recovery rule which might abort out of the
7838@code{hex} construct or might not, depending on circumstances? There is no
7839way you can write the action to determine whether a @code{hex} construct is
7840being aborted or not. So if you are using a lexical tie-in, you had better
7841make sure your error recovery rules are not of this kind. Each rule must
7842be such that you can be sure that it always will, or always won't, have to
7843clear the flag.
7844
ec3bc396
AD
7845@c ================================================== Debugging Your Parser
7846
342b8b6e 7847@node Debugging
bfa74976 7848@chapter Debugging Your Parser
ec3bc396
AD
7849
7850Developing a parser can be a challenge, especially if you don't
7851understand the algorithm (@pxref{Algorithm, ,The Bison Parser
7852Algorithm}). Even so, sometimes a detailed description of the automaton
7853can help (@pxref{Understanding, , Understanding Your Parser}), or
7854tracing the execution of the parser can give some insight on why it
7855behaves improperly (@pxref{Tracing, , Tracing Your Parser}).
7856
7857@menu
7858* Understanding:: Understanding the structure of your parser.
7859* Tracing:: Tracing the execution of your parser.
7860@end menu
7861
7862@node Understanding
7863@section Understanding Your Parser
7864
7865As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm})
7866Bison parsers are @dfn{shift/reduce automata}. In some cases (much more
7867frequent than one would hope), looking at this automaton is required to
7868tune or simply fix a parser. Bison provides two different
35fe0834 7869representation of it, either textually or graphically (as a DOT file).
ec3bc396
AD
7870
7871The textual file is generated when the options @option{--report} or
7872@option{--verbose} are specified, see @xref{Invocation, , Invoking
7873Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from
9913d6e4
JD
7874the parser implementation file name, and adding @samp{.output}
7875instead. Therefore, if the grammar file is @file{foo.y}, then the
7876parser implementation file is called @file{foo.tab.c} by default. As
7877a consequence, the verbose output file is called @file{foo.output}.
ec3bc396
AD
7878
7879The following grammar file, @file{calc.y}, will be used in the sequel:
7880
7881@example
7882%token NUM STR
7883%left '+' '-'
7884%left '*'
7885%%
7886exp: exp '+' exp
7887 | exp '-' exp
7888 | exp '*' exp
7889 | exp '/' exp
7890 | NUM
7891 ;
7892useless: STR;
7893%%
7894@end example
7895
88bce5a2
AD
7896@command{bison} reports:
7897
7898@example
379261b3
JD
7899calc.y: warning: 1 nonterminal useless in grammar
7900calc.y: warning: 1 rule useless in grammar
cff03fb2
JD
7901calc.y:11.1-7: warning: nonterminal useless in grammar: useless
7902calc.y:11.10-12: warning: rule useless in grammar: useless: STR
5a99098d 7903calc.y: conflicts: 7 shift/reduce
88bce5a2
AD
7904@end example
7905
7906When given @option{--report=state}, in addition to @file{calc.tab.c}, it
7907creates a file @file{calc.output} with contents detailed below. The
7908order of the output and the exact presentation might vary, but the
7909interpretation is the same.
ec3bc396
AD
7910
7911The first section includes details on conflicts that were solved thanks
7912to precedence and/or associativity:
7913
7914@example
7915Conflict in state 8 between rule 2 and token '+' resolved as reduce.
7916Conflict in state 8 between rule 2 and token '-' resolved as reduce.
7917Conflict in state 8 between rule 2 and token '*' resolved as shift.
7918@exdent @dots{}
7919@end example
7920
7921@noindent
7922The next section lists states that still have conflicts.
7923
7924@example
5a99098d
PE
7925State 8 conflicts: 1 shift/reduce
7926State 9 conflicts: 1 shift/reduce
7927State 10 conflicts: 1 shift/reduce
7928State 11 conflicts: 4 shift/reduce
ec3bc396
AD
7929@end example
7930
7931@noindent
7932@cindex token, useless
7933@cindex useless token
7934@cindex nonterminal, useless
7935@cindex useless nonterminal
7936@cindex rule, useless
7937@cindex useless rule
7938The next section reports useless tokens, nonterminal and rules. Useless
7939nonterminals and rules are removed in order to produce a smaller parser,
7940but useless tokens are preserved, since they might be used by the
d80fb37a 7941scanner (note the difference between ``useless'' and ``unused''
ec3bc396
AD
7942below):
7943
7944@example
d80fb37a 7945Nonterminals useless in grammar:
ec3bc396
AD
7946 useless
7947
d80fb37a 7948Terminals unused in grammar:
ec3bc396
AD
7949 STR
7950
cff03fb2 7951Rules useless in grammar:
ec3bc396
AD
7952#6 useless: STR;
7953@end example
7954
7955@noindent
7956The next section reproduces the exact grammar that Bison used:
7957
7958@example
7959Grammar
7960
7961 Number, Line, Rule
88bce5a2 7962 0 5 $accept -> exp $end
ec3bc396
AD
7963 1 5 exp -> exp '+' exp
7964 2 6 exp -> exp '-' exp
7965 3 7 exp -> exp '*' exp
7966 4 8 exp -> exp '/' exp
7967 5 9 exp -> NUM
7968@end example
7969
7970@noindent
7971and reports the uses of the symbols:
7972
7973@example
98842516 7974@group
ec3bc396
AD
7975Terminals, with rules where they appear
7976
88bce5a2 7977$end (0) 0
ec3bc396
AD
7978'*' (42) 3
7979'+' (43) 1
7980'-' (45) 2
7981'/' (47) 4
7982error (256)
7983NUM (258) 5
98842516 7984@end group
ec3bc396 7985
98842516 7986@group
ec3bc396
AD
7987Nonterminals, with rules where they appear
7988
88bce5a2 7989$accept (8)
ec3bc396
AD
7990 on left: 0
7991exp (9)
7992 on left: 1 2 3 4 5, on right: 0 1 2 3 4
98842516 7993@end group
ec3bc396
AD
7994@end example
7995
7996@noindent
7997@cindex item
7998@cindex pointed rule
7999@cindex rule, pointed
8000Bison then proceeds onto the automaton itself, describing each state
d13d14cc
PE
8001with its set of @dfn{items}, also known as @dfn{pointed rules}. Each
8002item is a production rule together with a point (@samp{.}) marking
8003the location of the input cursor.
ec3bc396
AD
8004
8005@example
8006state 0
8007
88bce5a2 8008 $accept -> . exp $ (rule 0)
ec3bc396 8009
2a8d363a 8010 NUM shift, and go to state 1
ec3bc396 8011
2a8d363a 8012 exp go to state 2
ec3bc396
AD
8013@end example
8014
8015This reads as follows: ``state 0 corresponds to being at the very
8016beginning of the parsing, in the initial rule, right before the start
8017symbol (here, @code{exp}). When the parser returns to this state right
8018after having reduced a rule that produced an @code{exp}, the control
8019flow jumps to state 2. If there is no such transition on a nonterminal
d13d14cc 8020symbol, and the lookahead is a @code{NUM}, then this token is shifted onto
ec3bc396 8021the parse stack, and the control flow jumps to state 1. Any other
742e4900 8022lookahead triggers a syntax error.''
ec3bc396
AD
8023
8024@cindex core, item set
8025@cindex item set core
8026@cindex kernel, item set
8027@cindex item set core
8028Even though the only active rule in state 0 seems to be rule 0, the
742e4900 8029report lists @code{NUM} as a lookahead token because @code{NUM} can be
ec3bc396
AD
8030at the beginning of any rule deriving an @code{exp}. By default Bison
8031reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
8032you want to see more detail you can invoke @command{bison} with
d13d14cc 8033@option{--report=itemset} to list the derived items as well:
ec3bc396
AD
8034
8035@example
8036state 0
8037
88bce5a2 8038 $accept -> . exp $ (rule 0)
ec3bc396
AD
8039 exp -> . exp '+' exp (rule 1)
8040 exp -> . exp '-' exp (rule 2)
8041 exp -> . exp '*' exp (rule 3)
8042 exp -> . exp '/' exp (rule 4)
8043 exp -> . NUM (rule 5)
8044
8045 NUM shift, and go to state 1
8046
8047 exp go to state 2
8048@end example
8049
8050@noindent
8051In the state 1...
8052
8053@example
8054state 1
8055
8056 exp -> NUM . (rule 5)
8057
2a8d363a 8058 $default reduce using rule 5 (exp)
ec3bc396
AD
8059@end example
8060
8061@noindent
742e4900 8062the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token
ec3bc396
AD
8063(@samp{$default}), the parser will reduce it. If it was coming from
8064state 0, then, after this reduction it will return to state 0, and will
8065jump to state 2 (@samp{exp: go to state 2}).
8066
8067@example
8068state 2
8069
88bce5a2 8070 $accept -> exp . $ (rule 0)
ec3bc396
AD
8071 exp -> exp . '+' exp (rule 1)
8072 exp -> exp . '-' exp (rule 2)
8073 exp -> exp . '*' exp (rule 3)
8074 exp -> exp . '/' exp (rule 4)
8075
2a8d363a
AD
8076 $ shift, and go to state 3
8077 '+' shift, and go to state 4
8078 '-' shift, and go to state 5
8079 '*' shift, and go to state 6
8080 '/' shift, and go to state 7
ec3bc396
AD
8081@end example
8082
8083@noindent
8084In state 2, the automaton can only shift a symbol. For instance,
d13d14cc
PE
8085because of the item @samp{exp -> exp . '+' exp}, if the lookahead is
8086@samp{+} it is shifted onto the parse stack, and the automaton
8087jumps to state 4, corresponding to the item @samp{exp -> exp '+' . exp}.
8088Since there is no default action, any lookahead not listed triggers a syntax
8089error.
ec3bc396 8090
34a6c2d1 8091@cindex accepting state
ec3bc396
AD
8092The state 3 is named the @dfn{final state}, or the @dfn{accepting
8093state}:
8094
8095@example
8096state 3
8097
88bce5a2 8098 $accept -> exp $ . (rule 0)
ec3bc396 8099
2a8d363a 8100 $default accept
ec3bc396
AD
8101@end example
8102
8103@noindent
8104the initial rule is completed (the start symbol and the end
8105of input were read), the parsing exits successfully.
8106
8107The interpretation of states 4 to 7 is straightforward, and is left to
8108the reader.
8109
8110@example
8111state 4
8112
8113 exp -> exp '+' . exp (rule 1)
8114
2a8d363a 8115 NUM shift, and go to state 1
ec3bc396 8116
2a8d363a 8117 exp go to state 8
ec3bc396
AD
8118
8119state 5
8120
8121 exp -> exp '-' . exp (rule 2)
8122
2a8d363a 8123 NUM shift, and go to state 1
ec3bc396 8124
2a8d363a 8125 exp go to state 9
ec3bc396
AD
8126
8127state 6
8128
8129 exp -> exp '*' . exp (rule 3)
8130
2a8d363a 8131 NUM shift, and go to state 1
ec3bc396 8132
2a8d363a 8133 exp go to state 10
ec3bc396
AD
8134
8135state 7
8136
8137 exp -> exp '/' . exp (rule 4)
8138
2a8d363a 8139 NUM shift, and go to state 1
ec3bc396 8140
2a8d363a 8141 exp go to state 11
ec3bc396
AD
8142@end example
8143
5a99098d
PE
8144As was announced in beginning of the report, @samp{State 8 conflicts:
81451 shift/reduce}:
ec3bc396
AD
8146
8147@example
8148state 8
8149
8150 exp -> exp . '+' exp (rule 1)
8151 exp -> exp '+' exp . (rule 1)
8152 exp -> exp . '-' exp (rule 2)
8153 exp -> exp . '*' exp (rule 3)
8154 exp -> exp . '/' exp (rule 4)
8155
2a8d363a
AD
8156 '*' shift, and go to state 6
8157 '/' shift, and go to state 7
ec3bc396 8158
2a8d363a
AD
8159 '/' [reduce using rule 1 (exp)]
8160 $default reduce using rule 1 (exp)
ec3bc396
AD
8161@end example
8162
742e4900 8163Indeed, there are two actions associated to the lookahead @samp{/}:
ec3bc396
AD
8164either shifting (and going to state 7), or reducing rule 1. The
8165conflict means that either the grammar is ambiguous, or the parser lacks
8166information to make the right decision. Indeed the grammar is
8167ambiguous, as, since we did not specify the precedence of @samp{/}, the
8168sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM /
8169NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) /
8170NUM}, which corresponds to reducing rule 1.
8171
34a6c2d1 8172Because in deterministic parsing a single decision can be made, Bison
ec3bc396
AD
8173arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, ,
8174Shift/Reduce Conflicts}. Discarded actions are reported in between
8175square brackets.
8176
8177Note that all the previous states had a single possible action: either
8178shifting the next token and going to the corresponding state, or
8179reducing a single rule. In the other cases, i.e., when shifting
8180@emph{and} reducing is possible or when @emph{several} reductions are
742e4900
JD
8181possible, the lookahead is required to select the action. State 8 is
8182one such state: if the lookahead is @samp{*} or @samp{/} then the action
ec3bc396
AD
8183is shifting, otherwise the action is reducing rule 1. In other words,
8184the first two items, corresponding to rule 1, are not eligible when the
742e4900 8185lookahead token is @samp{*}, since we specified that @samp{*} has higher
8dd162d3 8186precedence than @samp{+}. More generally, some items are eligible only
742e4900
JD
8187with some set of possible lookahead tokens. When run with
8188@option{--report=lookahead}, Bison specifies these lookahead tokens:
ec3bc396
AD
8189
8190@example
8191state 8
8192
88c78747 8193 exp -> exp . '+' exp (rule 1)
ec3bc396
AD
8194 exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1)
8195 exp -> exp . '-' exp (rule 2)
8196 exp -> exp . '*' exp (rule 3)
8197 exp -> exp . '/' exp (rule 4)
8198
8199 '*' shift, and go to state 6
8200 '/' shift, and go to state 7
8201
8202 '/' [reduce using rule 1 (exp)]
8203 $default reduce using rule 1 (exp)
8204@end example
8205
8206The remaining states are similar:
8207
8208@example
98842516 8209@group
ec3bc396
AD
8210state 9
8211
8212 exp -> exp . '+' exp (rule 1)
8213 exp -> exp . '-' exp (rule 2)
8214 exp -> exp '-' exp . (rule 2)
8215 exp -> exp . '*' exp (rule 3)
8216 exp -> exp . '/' exp (rule 4)
8217
2a8d363a
AD
8218 '*' shift, and go to state 6
8219 '/' shift, and go to state 7
ec3bc396 8220
2a8d363a
AD
8221 '/' [reduce using rule 2 (exp)]
8222 $default reduce using rule 2 (exp)
98842516 8223@end group
ec3bc396 8224
98842516 8225@group
ec3bc396
AD
8226state 10
8227
8228 exp -> exp . '+' exp (rule 1)
8229 exp -> exp . '-' exp (rule 2)
8230 exp -> exp . '*' exp (rule 3)
8231 exp -> exp '*' exp . (rule 3)
8232 exp -> exp . '/' exp (rule 4)
8233
2a8d363a 8234 '/' shift, and go to state 7
ec3bc396 8235
2a8d363a
AD
8236 '/' [reduce using rule 3 (exp)]
8237 $default reduce using rule 3 (exp)
98842516 8238@end group
ec3bc396 8239
98842516 8240@group
ec3bc396
AD
8241state 11
8242
8243 exp -> exp . '+' exp (rule 1)
8244 exp -> exp . '-' exp (rule 2)
8245 exp -> exp . '*' exp (rule 3)
8246 exp -> exp . '/' exp (rule 4)
8247 exp -> exp '/' exp . (rule 4)
8248
2a8d363a
AD
8249 '+' shift, and go to state 4
8250 '-' shift, and go to state 5
8251 '*' shift, and go to state 6
8252 '/' shift, and go to state 7
ec3bc396 8253
2a8d363a
AD
8254 '+' [reduce using rule 4 (exp)]
8255 '-' [reduce using rule 4 (exp)]
8256 '*' [reduce using rule 4 (exp)]
8257 '/' [reduce using rule 4 (exp)]
8258 $default reduce using rule 4 (exp)
98842516 8259@end group
ec3bc396
AD
8260@end example
8261
8262@noindent
fa7e68c3
PE
8263Observe that state 11 contains conflicts not only due to the lack of
8264precedence of @samp{/} with respect to @samp{+}, @samp{-}, and
8265@samp{*}, but also because the
ec3bc396
AD
8266associativity of @samp{/} is not specified.
8267
8268
8269@node Tracing
8270@section Tracing Your Parser
bfa74976
RS
8271@findex yydebug
8272@cindex debugging
8273@cindex tracing the parser
8274
8275If a Bison grammar compiles properly but doesn't do what you want when it
8276runs, the @code{yydebug} parser-trace feature can help you figure out why.
8277
3ded9a63
AD
8278There are several means to enable compilation of trace facilities:
8279
8280@table @asis
8281@item the macro @code{YYDEBUG}
8282@findex YYDEBUG
8283Define the macro @code{YYDEBUG} to a nonzero value when you compile the
35430378 8284parser. This is compliant with POSIX Yacc. You could use
3ded9a63
AD
8285@samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
8286YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue, , The
8287Prologue}).
8288
8289@item the option @option{-t}, @option{--debug}
8290Use the @samp{-t} option when you run Bison (@pxref{Invocation,
35430378 8291,Invoking Bison}). This is POSIX compliant too.
3ded9a63
AD
8292
8293@item the directive @samp{%debug}
8294@findex %debug
8295Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison
8296Declaration Summary}). This is a Bison extension, which will prove
8297useful when Bison will output parsers for languages that don't use a
35430378 8298preprocessor. Unless POSIX and Yacc portability matter to
c827f760 8299you, this is
3ded9a63
AD
8300the preferred solution.
8301@end table
8302
8303We suggest that you always enable the debug option so that debugging is
8304always possible.
bfa74976 8305
02a81e05 8306The trace facility outputs messages with macro calls of the form
e2742e46 8307@code{YYFPRINTF (stderr, @var{format}, @var{args})} where
f57a7536 8308@var{format} and @var{args} are the usual @code{printf} format and variadic
4947ebdb
PE
8309arguments. If you define @code{YYDEBUG} to a nonzero value but do not
8310define @code{YYFPRINTF}, @code{<stdio.h>} is automatically included
9c437126 8311and @code{YYFPRINTF} is defined to @code{fprintf}.
bfa74976
RS
8312
8313Once you have compiled the program with trace facilities, the way to
8314request a trace is to store a nonzero value in the variable @code{yydebug}.
8315You can do this by making the C code do it (in @code{main}, perhaps), or
8316you can alter the value with a C debugger.
8317
8318Each step taken by the parser when @code{yydebug} is nonzero produces a
8319line or two of trace information, written on @code{stderr}. The trace
8320messages tell you these things:
8321
8322@itemize @bullet
8323@item
8324Each time the parser calls @code{yylex}, what kind of token was read.
8325
8326@item
8327Each time a token is shifted, the depth and complete contents of the
8328state stack (@pxref{Parser States}).
8329
8330@item
8331Each time a rule is reduced, which rule it is, and the complete contents
8332of the state stack afterward.
8333@end itemize
8334
8335To make sense of this information, it helps to refer to the listing file
704a47c4
AD
8336produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking
8337Bison}). This file shows the meaning of each state in terms of
8338positions in various rules, and also what each state will do with each
8339possible input token. As you read the successive trace messages, you
8340can see that the parser is functioning according to its specification in
8341the listing file. Eventually you will arrive at the place where
8342something undesirable happens, and you will see which parts of the
8343grammar are to blame.
bfa74976 8344
9913d6e4
JD
8345The parser implementation file is a C program and you can use C
8346debuggers on it, but it's not easy to interpret what it is doing. The
8347parser function is a finite-state machine interpreter, and aside from
8348the actions it executes the same code over and over. Only the values
8349of variables show where in the grammar it is working.
bfa74976
RS
8350
8351@findex YYPRINT
8352The debugging information normally gives the token type of each token
8353read, but not its semantic value. You can optionally define a macro
8354named @code{YYPRINT} to provide a way to print the value. If you define
8355@code{YYPRINT}, it should take three arguments. The parser will pass a
8356standard I/O stream, the numeric code for the token type, and the token
8357value (from @code{yylval}).
8358
8359Here is an example of @code{YYPRINT} suitable for the multi-function
f56274a8 8360calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}):
bfa74976
RS
8361
8362@smallexample
38a92d50
PE
8363%@{
8364 static void print_token_value (FILE *, int, YYSTYPE);
8365 #define YYPRINT(file, type, value) print_token_value (file, type, value)
8366%@}
8367
8368@dots{} %% @dots{} %% @dots{}
bfa74976
RS
8369
8370static void
831d3c99 8371print_token_value (FILE *file, int type, YYSTYPE value)
bfa74976
RS
8372@{
8373 if (type == VAR)
d3c4e709 8374 fprintf (file, "%s", value.tptr->name);
bfa74976 8375 else if (type == NUM)
d3c4e709 8376 fprintf (file, "%d", value.val);
bfa74976
RS
8377@}
8378@end smallexample
8379
ec3bc396
AD
8380@c ================================================= Invoking Bison
8381
342b8b6e 8382@node Invocation
bfa74976
RS
8383@chapter Invoking Bison
8384@cindex invoking Bison
8385@cindex Bison invocation
8386@cindex options for invoking Bison
8387
8388The usual way to invoke Bison is as follows:
8389
8390@example
8391bison @var{infile}
8392@end example
8393
8394Here @var{infile} is the grammar file name, which usually ends in
9913d6e4
JD
8395@samp{.y}. The parser implementation file's name is made by replacing
8396the @samp{.y} with @samp{.tab.c} and removing any leading directory.
8397Thus, the @samp{bison foo.y} file name yields @file{foo.tab.c}, and
8398the @samp{bison hack/foo.y} file name yields @file{foo.tab.c}. It's
8399also possible, in case you are writing C++ code instead of C in your
8400grammar file, to name it @file{foo.ypp} or @file{foo.y++}. Then, the
8401output files will take an extension like the given one as input
8402(respectively @file{foo.tab.cpp} and @file{foo.tab.c++}). This
8403feature takes effect with all options that manipulate file names like
234a3be3
AD
8404@samp{-o} or @samp{-d}.
8405
8406For example :
8407
8408@example
8409bison -d @var{infile.yxx}
8410@end example
84163231 8411@noindent
72d2299c 8412will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}, and
234a3be3
AD
8413
8414@example
b56471a6 8415bison -d -o @var{output.c++} @var{infile.y}
234a3be3 8416@end example
84163231 8417@noindent
234a3be3
AD
8418will produce @file{output.c++} and @file{outfile.h++}.
8419
35430378 8420For compatibility with POSIX, the standard Bison
397ec073
PE
8421distribution also contains a shell script called @command{yacc} that
8422invokes Bison with the @option{-y} option.
8423
bfa74976 8424@menu
13863333 8425* Bison Options:: All the options described in detail,
c827f760 8426 in alphabetical order by short options.
bfa74976 8427* Option Cross Key:: Alphabetical list of long options.
93dd49ab 8428* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
bfa74976
RS
8429@end menu
8430
342b8b6e 8431@node Bison Options
bfa74976
RS
8432@section Bison Options
8433
8434Bison supports both traditional single-letter options and mnemonic long
8435option names. Long option names are indicated with @samp{--} instead of
8436@samp{-}. Abbreviations for option names are allowed as long as they
8437are unique. When a long option takes an argument, like
8438@samp{--file-prefix}, connect the option name and the argument with
8439@samp{=}.
8440
8441Here is a list of options that can be used with Bison, alphabetized by
8442short option. It is followed by a cross key alphabetized by long
8443option.
8444
89cab50d
AD
8445@c Please, keep this ordered as in `bison --help'.
8446@noindent
8447Operations modes:
8448@table @option
8449@item -h
8450@itemx --help
8451Print a summary of the command-line options to Bison and exit.
bfa74976 8452
89cab50d
AD
8453@item -V
8454@itemx --version
8455Print the version number of Bison and exit.
bfa74976 8456
f7ab6a50
PE
8457@item --print-localedir
8458Print the name of the directory containing locale-dependent data.
8459
a0de5091
JD
8460@item --print-datadir
8461Print the name of the directory containing skeletons and XSLT.
8462
89cab50d
AD
8463@item -y
8464@itemx --yacc
9913d6e4
JD
8465Act more like the traditional Yacc command. This can cause different
8466diagnostics to be generated, and may change behavior in other minor
8467ways. Most importantly, imitate Yacc's output file name conventions,
8468so that the parser implementation file is called @file{y.tab.c}, and
8469the other outputs are called @file{y.output} and @file{y.tab.h}.
8470Also, if generating a deterministic parser in C, generate
8471@code{#define} statements in addition to an @code{enum} to associate
8472token numbers with token names. Thus, the following shell script can
8473substitute for Yacc, and the Bison distribution contains such a script
8474for compatibility with POSIX:
bfa74976 8475
89cab50d 8476@example
397ec073 8477#! /bin/sh
26e06a21 8478bison -y "$@@"
89cab50d 8479@end example
54662697
PE
8480
8481The @option{-y}/@option{--yacc} option is intended for use with
8482traditional Yacc grammars. If your grammar uses a Bison extension
8483like @samp{%glr-parser}, Bison might not be Yacc-compatible even if
8484this option is specified.
8485
ecd1b61c
JD
8486@item -W [@var{category}]
8487@itemx --warnings[=@var{category}]
118d4978
AD
8488Output warnings falling in @var{category}. @var{category} can be one
8489of:
8490@table @code
8491@item midrule-values
8e55b3aa
JD
8492Warn about mid-rule values that are set but not used within any of the actions
8493of the parent rule.
8494For example, warn about unused @code{$2} in:
118d4978
AD
8495
8496@example
8497exp: '1' @{ $$ = 1; @} '+' exp @{ $$ = $1 + $4; @};
8498@end example
8499
8e55b3aa
JD
8500Also warn about mid-rule values that are used but not set.
8501For example, warn about unset @code{$$} in the mid-rule action in:
118d4978
AD
8502
8503@example
8504 exp: '1' @{ $1 = 1; @} '+' exp @{ $$ = $2 + $4; @};
8505@end example
8506
8507These warnings are not enabled by default since they sometimes prove to
8508be false alarms in existing grammars employing the Yacc constructs
8e55b3aa 8509@code{$0} or @code{$-@var{n}} (where @var{n} is some positive integer).
118d4978 8510
118d4978 8511@item yacc
35430378 8512Incompatibilities with POSIX Yacc.
118d4978 8513
6f8bdce2
JD
8514@item conflicts-sr
8515@itemx conflicts-rr
8516S/R and R/R conflicts. These warnings are enabled by default. However, if
8517the @code{%expect} or @code{%expect-rr} directive is specified, an
8518unexpected number of conflicts is an error, and an expected number of
8519conflicts is not reported, so @option{-W} and @option{--warning} then have
8520no effect on the conflict report.
8521
8ffd7912
JD
8522@item other
8523All warnings not categorized above. These warnings are enabled by default.
8524
8525This category is provided merely for the sake of completeness. Future
8526releases of Bison may move warnings from this category to new, more specific
8527categories.
8528
118d4978 8529@item all
8e55b3aa 8530All the warnings.
118d4978 8531@item none
8e55b3aa 8532Turn off all the warnings.
118d4978 8533@item error
8e55b3aa 8534Treat warnings as errors.
118d4978
AD
8535@end table
8536
8537A category can be turned off by prefixing its name with @samp{no-}. For
cf22447c 8538instance, @option{-Wno-yacc} will hide the warnings about
35430378 8539POSIX Yacc incompatibilities.
89cab50d
AD
8540@end table
8541
8542@noindent
8543Tuning the parser:
8544
8545@table @option
8546@item -t
8547@itemx --debug
9913d6e4
JD
8548In the parser implementation file, define the macro @code{YYDEBUG} to
85491 if it is not already defined, so that the debugging facilities are
8550compiled. @xref{Tracing, ,Tracing Your Parser}.
89cab50d 8551
e14c6831
AD
8552@item -D @var{name}[=@var{value}]
8553@itemx --define=@var{name}[=@var{value}]
c33bc800 8554@itemx -F @var{name}[=@var{value}]
34d41938
JD
8555@itemx --force-define=@var{name}[=@var{value}]
8556Each of these is equivalent to @samp{%define @var{name} "@var{value}"}
2f4518a1 8557(@pxref{%define Summary}) except that Bison processes multiple
34d41938
JD
8558definitions for the same @var{name} as follows:
8559
8560@itemize
8561@item
e3a33f7c
JD
8562Bison quietly ignores all command-line definitions for @var{name} except
8563the last.
34d41938 8564@item
e3a33f7c
JD
8565If that command-line definition is specified by a @code{-D} or
8566@code{--define}, Bison reports an error for any @code{%define}
8567definition for @var{name}.
34d41938 8568@item
e3a33f7c
JD
8569If that command-line definition is specified by a @code{-F} or
8570@code{--force-define} instead, Bison quietly ignores all @code{%define}
8571definitions for @var{name}.
8572@item
8573Otherwise, Bison reports an error if there are multiple @code{%define}
8574definitions for @var{name}.
34d41938
JD
8575@end itemize
8576
8577You should avoid using @code{-F} and @code{--force-define} in your
9913d6e4
JD
8578make files unless you are confident that it is safe to quietly ignore
8579any conflicting @code{%define} that may be added to the grammar file.
e14c6831 8580
0e021770
PE
8581@item -L @var{language}
8582@itemx --language=@var{language}
8583Specify the programming language for the generated parser, as if
8584@code{%language} was specified (@pxref{Decl Summary, , Bison Declaration
59da312b 8585Summary}). Currently supported languages include C, C++, and Java.
e6e704dc 8586@var{language} is case-insensitive.
0e021770 8587
ed4d67dc
JD
8588This option is experimental and its effect may be modified in future
8589releases.
8590
89cab50d 8591@item --locations
d8988b2f 8592Pretend that @code{%locations} was specified. @xref{Decl Summary}.
89cab50d
AD
8593
8594@item -p @var{prefix}
8595@itemx --name-prefix=@var{prefix}
02975b9a 8596Pretend that @code{%name-prefix "@var{prefix}"} was specified.
d8988b2f 8597@xref{Decl Summary}.
bfa74976
RS
8598
8599@item -l
8600@itemx --no-lines
9913d6e4
JD
8601Don't put any @code{#line} preprocessor commands in the parser
8602implementation file. Ordinarily Bison puts them in the parser
8603implementation file so that the C compiler and debuggers will
8604associate errors with your source file, the grammar file. This option
8605causes them to associate errors with the parser implementation file,
8606treating it as an independent source file in its own right.
bfa74976 8607
e6e704dc
JD
8608@item -S @var{file}
8609@itemx --skeleton=@var{file}
a7867f53 8610Specify the skeleton to use, similar to @code{%skeleton}
e6e704dc
JD
8611(@pxref{Decl Summary, , Bison Declaration Summary}).
8612
ed4d67dc
JD
8613@c You probably don't need this option unless you are developing Bison.
8614@c You should use @option{--language} if you want to specify the skeleton for a
8615@c different language, because it is clearer and because it will always
8616@c choose the correct skeleton for non-deterministic or push parsers.
e6e704dc 8617
a7867f53
JD
8618If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
8619file in the Bison installation directory.
8620If it does, @var{file} is an absolute file name or a file name relative to the
8621current working directory.
8622This is similar to how most shells resolve commands.
8623
89cab50d
AD
8624@item -k
8625@itemx --token-table
d8988b2f 8626Pretend that @code{%token-table} was specified. @xref{Decl Summary}.
89cab50d 8627@end table
bfa74976 8628
89cab50d
AD
8629@noindent
8630Adjust the output:
bfa74976 8631
89cab50d 8632@table @option
8e55b3aa 8633@item --defines[=@var{file}]
d8988b2f 8634Pretend that @code{%defines} was specified, i.e., write an extra output
6deb4447 8635file containing macro definitions for the token type names defined in
4bfd5e4e 8636the grammar, as well as a few other declarations. @xref{Decl Summary}.
931c7513 8637
8e55b3aa
JD
8638@item -d
8639This is the same as @code{--defines} except @code{-d} does not accept a
8640@var{file} argument since POSIX Yacc requires that @code{-d} can be bundled
8641with other short options.
342b8b6e 8642
89cab50d
AD
8643@item -b @var{file-prefix}
8644@itemx --file-prefix=@var{prefix}
9c437126 8645Pretend that @code{%file-prefix} was specified, i.e., specify prefix to use
72d2299c 8646for all Bison output file names. @xref{Decl Summary}.
bfa74976 8647
ec3bc396
AD
8648@item -r @var{things}
8649@itemx --report=@var{things}
8650Write an extra output file containing verbose description of the comma
8651separated list of @var{things} among:
8652
8653@table @code
8654@item state
8655Description of the grammar, conflicts (resolved and unresolved), and
34a6c2d1 8656parser's automaton.
ec3bc396 8657
742e4900 8658@item lookahead
ec3bc396 8659Implies @code{state} and augments the description of the automaton with
742e4900 8660each rule's lookahead set.
ec3bc396
AD
8661
8662@item itemset
8663Implies @code{state} and augments the description of the automaton with
8664the full set of items for each state, instead of its core only.
8665@end table
8666
1bb2bd75
JD
8667@item --report-file=@var{file}
8668Specify the @var{file} for the verbose description.
8669
bfa74976
RS
8670@item -v
8671@itemx --verbose
9c437126 8672Pretend that @code{%verbose} was specified, i.e., write an extra output
6deb4447 8673file containing verbose descriptions of the grammar and
72d2299c 8674parser. @xref{Decl Summary}.
bfa74976 8675
fa4d969f
PE
8676@item -o @var{file}
8677@itemx --output=@var{file}
9913d6e4 8678Specify the @var{file} for the parser implementation file.
bfa74976 8679
fa4d969f 8680The other output files' names are constructed from @var{file} as
d8988b2f 8681described under the @samp{-v} and @samp{-d} options.
342b8b6e 8682
72183df4 8683@item -g [@var{file}]
8e55b3aa 8684@itemx --graph[=@var{file}]
34a6c2d1 8685Output a graphical representation of the parser's
35fe0834 8686automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz}
35430378 8687@uref{http://www.graphviz.org/doc/info/lang.html, DOT} format.
8e55b3aa
JD
8688@code{@var{file}} is optional.
8689If omitted and the grammar file is @file{foo.y}, the output file will be
8690@file{foo.dot}.
59da312b 8691
72183df4 8692@item -x [@var{file}]
8e55b3aa 8693@itemx --xml[=@var{file}]
34a6c2d1 8694Output an XML report of the parser's automaton computed by Bison.
8e55b3aa 8695@code{@var{file}} is optional.
59da312b
JD
8696If omitted and the grammar file is @file{foo.y}, the output file will be
8697@file{foo.xml}.
8698(The current XML schema is experimental and may evolve.
8699More user feedback will help to stabilize it.)
bfa74976
RS
8700@end table
8701
342b8b6e 8702@node Option Cross Key
bfa74976
RS
8703@section Option Cross Key
8704
8705Here is a list of options, alphabetized by long option, to help you find
34d41938 8706the corresponding short option and directive.
bfa74976 8707
34d41938 8708@multitable {@option{--force-define=@var{name}[=@var{value}]}} {@option{-F @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}}
72183df4 8709@headitem Long Option @tab Short Option @tab Bison Directive
f4101aa6 8710@include cross-options.texi
aa08666d 8711@end multitable
bfa74976 8712
93dd49ab
PE
8713@node Yacc Library
8714@section Yacc Library
8715
8716The Yacc library contains default implementations of the
8717@code{yyerror} and @code{main} functions. These default
35430378 8718implementations are normally not useful, but POSIX requires
93dd49ab
PE
8719them. To use the Yacc library, link your program with the
8720@option{-ly} option. Note that Bison's implementation of the Yacc
35430378 8721library is distributed under the terms of the GNU General
93dd49ab
PE
8722Public License (@pxref{Copying}).
8723
8724If you use the Yacc library's @code{yyerror} function, you should
8725declare @code{yyerror} as follows:
8726
8727@example
8728int yyerror (char const *);
8729@end example
8730
8731Bison ignores the @code{int} value returned by this @code{yyerror}.
8732If you use the Yacc library's @code{main} function, your
8733@code{yyparse} function should have the following type signature:
8734
8735@example
8736int yyparse (void);
8737@end example
8738
12545799
AD
8739@c ================================================= C++ Bison
8740
8405b70c
PB
8741@node Other Languages
8742@chapter Parsers Written In Other Languages
12545799
AD
8743
8744@menu
8745* C++ Parsers:: The interface to generate C++ parser classes
8405b70c 8746* Java Parsers:: The interface to generate Java parser classes
12545799
AD
8747@end menu
8748
8749@node C++ Parsers
8750@section C++ Parsers
8751
8752@menu
8753* C++ Bison Interface:: Asking for C++ parser generation
8754* C++ Semantic Values:: %union vs. C++
8755* C++ Location Values:: The position and location classes
8756* C++ Parser Interface:: Instantiating and running the parser
8757* C++ Scanner Interface:: Exchanges between yylex and parse
8405b70c 8758* A Complete C++ Example:: Demonstrating their use
12545799
AD
8759@end menu
8760
8761@node C++ Bison Interface
8762@subsection C++ Bison Interface
ed4d67dc 8763@c - %skeleton "lalr1.cc"
12545799
AD
8764@c - Always pure
8765@c - initial action
8766
34a6c2d1 8767The C++ deterministic parser is selected using the skeleton directive,
baacae49
AD
8768@samp{%skeleton "lalr1.cc"}, or the synonymous command-line option
8769@option{--skeleton=lalr1.cc}.
e6e704dc 8770@xref{Decl Summary}.
0e021770 8771
793fbca5
JD
8772When run, @command{bison} will create several entities in the @samp{yy}
8773namespace.
8774@findex %define namespace
2f4518a1
JD
8775Use the @samp{%define namespace} directive to change the namespace
8776name, see @ref{%define Summary,,namespace}. The various classes are
8777generated in the following files:
aa08666d 8778
12545799
AD
8779@table @file
8780@item position.hh
8781@itemx location.hh
8782The definition of the classes @code{position} and @code{location},
8783used for location tracking. @xref{C++ Location Values}.
8784
8785@item stack.hh
8786An auxiliary class @code{stack} used by the parser.
8787
fa4d969f
PE
8788@item @var{file}.hh
8789@itemx @var{file}.cc
9913d6e4 8790(Assuming the extension of the grammar file was @samp{.yy}.) The
cd8b5791
AD
8791declaration and implementation of the C++ parser class. The basename
8792and extension of these two files follow the same rules as with regular C
8793parsers (@pxref{Invocation}).
12545799 8794
cd8b5791
AD
8795The header is @emph{mandatory}; you must either pass
8796@option{-d}/@option{--defines} to @command{bison}, or use the
12545799
AD
8797@samp{%defines} directive.
8798@end table
8799
8800All these files are documented using Doxygen; run @command{doxygen}
8801for a complete and accurate documentation.
8802
8803@node C++ Semantic Values
8804@subsection C++ Semantic Values
8805@c - No objects in unions
178e123e 8806@c - YYSTYPE
12545799
AD
8807@c - Printer and destructor
8808
8809The @code{%union} directive works as for C, see @ref{Union Decl, ,The
8810Collection of Value Types}. In particular it produces a genuine
8811@code{union}@footnote{In the future techniques to allow complex types
fb9712a9
AD
8812within pseudo-unions (similar to Boost variants) might be implemented to
8813alleviate these issues.}, which have a few specific features in C++.
12545799
AD
8814@itemize @minus
8815@item
fb9712a9
AD
8816The type @code{YYSTYPE} is defined but its use is discouraged: rather
8817you should refer to the parser's encapsulated type
8818@code{yy::parser::semantic_type}.
12545799
AD
8819@item
8820Non POD (Plain Old Data) types cannot be used. C++ forbids any
8821instance of classes with constructors in unions: only @emph{pointers}
8822to such objects are allowed.
8823@end itemize
8824
8825Because objects have to be stored via pointers, memory is not
8826reclaimed automatically: using the @code{%destructor} directive is the
8827only means to avoid leaks. @xref{Destructor Decl, , Freeing Discarded
8828Symbols}.
8829
8830
8831@node C++ Location Values
8832@subsection C++ Location Values
8833@c - %locations
8834@c - class Position
8835@c - class Location
16dc6a9e 8836@c - %define filename_type "const symbol::Symbol"
12545799
AD
8837
8838When the directive @code{%locations} is used, the C++ parser supports
7404cdf3
JD
8839location tracking, see @ref{Tracking Locations}. Two auxiliary classes
8840define a @code{position}, a single point in a file, and a @code{location}, a
8841range composed of a pair of @code{position}s (possibly spanning several
8842files).
12545799 8843
fa4d969f 8844@deftypemethod {position} {std::string*} file
12545799
AD
8845The name of the file. It will always be handled as a pointer, the
8846parser will never duplicate nor deallocate it. As an experimental
8847feature you may change it to @samp{@var{type}*} using @samp{%define
16dc6a9e 8848filename_type "@var{type}"}.
12545799
AD
8849@end deftypemethod
8850
8851@deftypemethod {position} {unsigned int} line
8852The line, starting at 1.
8853@end deftypemethod
8854
8855@deftypemethod {position} {unsigned int} lines (int @var{height} = 1)
8856Advance by @var{height} lines, resetting the column number.
8857@end deftypemethod
8858
8859@deftypemethod {position} {unsigned int} column
8860The column, starting at 0.
8861@end deftypemethod
8862
8863@deftypemethod {position} {unsigned int} columns (int @var{width} = 1)
8864Advance by @var{width} columns, without changing the line number.
8865@end deftypemethod
8866
8867@deftypemethod {position} {position&} operator+= (position& @var{pos}, int @var{width})
8868@deftypemethodx {position} {position} operator+ (const position& @var{pos}, int @var{width})
8869@deftypemethodx {position} {position&} operator-= (const position& @var{pos}, int @var{width})
8870@deftypemethodx {position} {position} operator- (position& @var{pos}, int @var{width})
8871Various forms of syntactic sugar for @code{columns}.
8872@end deftypemethod
8873
8874@deftypemethod {position} {position} operator<< (std::ostream @var{o}, const position& @var{p})
8875Report @var{p} on @var{o} like this:
fa4d969f
PE
8876@samp{@var{file}:@var{line}.@var{column}}, or
8877@samp{@var{line}.@var{column}} if @var{file} is null.
12545799
AD
8878@end deftypemethod
8879
8880@deftypemethod {location} {position} begin
8881@deftypemethodx {location} {position} end
8882The first, inclusive, position of the range, and the first beyond.
8883@end deftypemethod
8884
8885@deftypemethod {location} {unsigned int} columns (int @var{width} = 1)
8886@deftypemethodx {location} {unsigned int} lines (int @var{height} = 1)
8887Advance the @code{end} position.
8888@end deftypemethod
8889
8890@deftypemethod {location} {location} operator+ (const location& @var{begin}, const location& @var{end})
8891@deftypemethodx {location} {location} operator+ (const location& @var{begin}, int @var{width})
8892@deftypemethodx {location} {location} operator+= (const location& @var{loc}, int @var{width})
8893Various forms of syntactic sugar.
8894@end deftypemethod
8895
8896@deftypemethod {location} {void} step ()
8897Move @code{begin} onto @code{end}.
8898@end deftypemethod
8899
8900
8901@node C++ Parser Interface
8902@subsection C++ Parser Interface
8903@c - define parser_class_name
8904@c - Ctor
8905@c - parse, error, set_debug_level, debug_level, set_debug_stream,
8906@c debug_stream.
8907@c - Reporting errors
8908
8909The output files @file{@var{output}.hh} and @file{@var{output}.cc}
8910declare and define the parser class in the namespace @code{yy}. The
8911class name defaults to @code{parser}, but may be changed using
16dc6a9e 8912@samp{%define parser_class_name "@var{name}"}. The interface of
9d9b8b70 8913this class is detailed below. It can be extended using the
12545799
AD
8914@code{%parse-param} feature: its semantics is slightly changed since
8915it describes an additional member of the parser class, and an
8916additional argument for its constructor.
8917
baacae49
AD
8918@defcv {Type} {parser} {semantic_type}
8919@defcvx {Type} {parser} {location_type}
12545799 8920The types for semantics value and locations.
8a0adb01 8921@end defcv
12545799 8922
baacae49 8923@defcv {Type} {parser} {token}
2c0f9706
AD
8924A structure that contains (only) the @code{yytokentype} enumeration, which
8925defines the tokens. To refer to the token @code{FOO},
8926use @code{yy::parser::token::FOO}. The scanner can use
baacae49
AD
8927@samp{typedef yy::parser::token token;} to ``import'' the token enumeration
8928(@pxref{Calc++ Scanner}).
8929@end defcv
8930
12545799
AD
8931@deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...)
8932Build a new parser object. There are no arguments by default, unless
8933@samp{%parse-param @{@var{type1} @var{arg1}@}} was used.
8934@end deftypemethod
8935
8936@deftypemethod {parser} {int} parse ()
8937Run the syntactic analysis, and return 0 on success, 1 otherwise.
8938@end deftypemethod
8939
8940@deftypemethod {parser} {std::ostream&} debug_stream ()
8941@deftypemethodx {parser} {void} set_debug_stream (std::ostream& @var{o})
8942Get or set the stream used for tracing the parsing. It defaults to
8943@code{std::cerr}.
8944@end deftypemethod
8945
8946@deftypemethod {parser} {debug_level_type} debug_level ()
8947@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l})
8948Get or set the tracing level. Currently its value is either 0, no trace,
9d9b8b70 8949or nonzero, full tracing.
12545799
AD
8950@end deftypemethod
8951
8952@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m})
8953The definition for this member function must be supplied by the user:
8954the parser uses it to report a parser error occurring at @var{l},
8955described by @var{m}.
8956@end deftypemethod
8957
8958
8959@node C++ Scanner Interface
8960@subsection C++ Scanner Interface
8961@c - prefix for yylex.
8962@c - Pure interface to yylex
8963@c - %lex-param
8964
8965The parser invokes the scanner by calling @code{yylex}. Contrary to C
8966parsers, C++ parsers are always pure: there is no point in using the
d9df47b6 8967@code{%define api.pure} directive. Therefore the interface is as follows.
12545799 8968
baacae49 8969@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...)
12545799
AD
8970Return the next token. Its type is the return value, its semantic
8971value and location being @var{yylval} and @var{yylloc}. Invocations of
8972@samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments.
8973@end deftypemethod
8974
8975
8976@node A Complete C++ Example
8405b70c 8977@subsection A Complete C++ Example
12545799
AD
8978
8979This section demonstrates the use of a C++ parser with a simple but
8980complete example. This example should be available on your system,
8981ready to compile, in the directory @dfn{../bison/examples/calc++}. It
8982focuses on the use of Bison, therefore the design of the various C++
8983classes is very naive: no accessors, no encapsulation of members etc.
8984We will use a Lex scanner, and more precisely, a Flex scanner, to
8985demonstrate the various interaction. A hand written scanner is
8986actually easier to interface with.
8987
8988@menu
8989* Calc++ --- C++ Calculator:: The specifications
8990* Calc++ Parsing Driver:: An active parsing context
8991* Calc++ Parser:: A parser class
8992* Calc++ Scanner:: A pure C++ Flex scanner
8993* Calc++ Top Level:: Conducting the band
8994@end menu
8995
8996@node Calc++ --- C++ Calculator
8405b70c 8997@subsubsection Calc++ --- C++ Calculator
12545799
AD
8998
8999Of course the grammar is dedicated to arithmetics, a single
9d9b8b70 9000expression, possibly preceded by variable assignments. An
12545799
AD
9001environment containing possibly predefined variables such as
9002@code{one} and @code{two}, is exchanged with the parser. An example
9003of valid input follows.
9004
9005@example
9006three := 3
9007seven := one + two * three
9008seven * seven
9009@end example
9010
9011@node Calc++ Parsing Driver
8405b70c 9012@subsubsection Calc++ Parsing Driver
12545799
AD
9013@c - An env
9014@c - A place to store error messages
9015@c - A place for the result
9016
9017To support a pure interface with the parser (and the scanner) the
9018technique of the ``parsing context'' is convenient: a structure
9019containing all the data to exchange. Since, in addition to simply
9020launch the parsing, there are several auxiliary tasks to execute (open
9021the file for parsing, instantiate the parser etc.), we recommend
9022transforming the simple parsing context structure into a fully blown
9023@dfn{parsing driver} class.
9024
9025The declaration of this driver class, @file{calc++-driver.hh}, is as
9026follows. The first part includes the CPP guard and imports the
fb9712a9
AD
9027required standard library components, and the declaration of the parser
9028class.
12545799 9029
1c59e0a1 9030@comment file: calc++-driver.hh
12545799
AD
9031@example
9032#ifndef CALCXX_DRIVER_HH
9033# define CALCXX_DRIVER_HH
9034# include <string>
9035# include <map>
fb9712a9 9036# include "calc++-parser.hh"
12545799
AD
9037@end example
9038
12545799
AD
9039
9040@noindent
9041Then comes the declaration of the scanning function. Flex expects
9042the signature of @code{yylex} to be defined in the macro
9043@code{YY_DECL}, and the C++ parser expects it to be declared. We can
9044factor both as follows.
1c59e0a1
AD
9045
9046@comment file: calc++-driver.hh
12545799 9047@example
3dc5e96b
PE
9048// Tell Flex the lexer's prototype ...
9049# define YY_DECL \
c095d689
AD
9050 yy::calcxx_parser::token_type \
9051 yylex (yy::calcxx_parser::semantic_type* yylval, \
9052 yy::calcxx_parser::location_type* yylloc, \
9053 calcxx_driver& driver)
12545799
AD
9054// ... and declare it for the parser's sake.
9055YY_DECL;
9056@end example
9057
9058@noindent
9059The @code{calcxx_driver} class is then declared with its most obvious
9060members.
9061
1c59e0a1 9062@comment file: calc++-driver.hh
12545799
AD
9063@example
9064// Conducting the whole scanning and parsing of Calc++.
9065class calcxx_driver
9066@{
9067public:
9068 calcxx_driver ();
9069 virtual ~calcxx_driver ();
9070
9071 std::map<std::string, int> variables;
9072
9073 int result;
9074@end example
9075
9076@noindent
9077To encapsulate the coordination with the Flex scanner, it is useful to
9078have two members function to open and close the scanning phase.
12545799 9079
1c59e0a1 9080@comment file: calc++-driver.hh
12545799
AD
9081@example
9082 // Handling the scanner.
9083 void scan_begin ();
9084 void scan_end ();
9085 bool trace_scanning;
9086@end example
9087
9088@noindent
9089Similarly for the parser itself.
9090
1c59e0a1 9091@comment file: calc++-driver.hh
12545799 9092@example
bb32f4f2
AD
9093 // Run the parser. Return 0 on success.
9094 int parse (const std::string& f);
12545799
AD
9095 std::string file;
9096 bool trace_parsing;
9097@end example
9098
9099@noindent
9100To demonstrate pure handling of parse errors, instead of simply
9101dumping them on the standard error output, we will pass them to the
9102compiler driver using the following two member functions. Finally, we
9103close the class declaration and CPP guard.
9104
1c59e0a1 9105@comment file: calc++-driver.hh
12545799
AD
9106@example
9107 // Error handling.
9108 void error (const yy::location& l, const std::string& m);
9109 void error (const std::string& m);
9110@};
9111#endif // ! CALCXX_DRIVER_HH
9112@end example
9113
9114The implementation of the driver is straightforward. The @code{parse}
9115member function deserves some attention. The @code{error} functions
9116are simple stubs, they should actually register the located error
9117messages and set error state.
9118
1c59e0a1 9119@comment file: calc++-driver.cc
12545799
AD
9120@example
9121#include "calc++-driver.hh"
9122#include "calc++-parser.hh"
9123
9124calcxx_driver::calcxx_driver ()
9125 : trace_scanning (false), trace_parsing (false)
9126@{
9127 variables["one"] = 1;
9128 variables["two"] = 2;
9129@}
9130
9131calcxx_driver::~calcxx_driver ()
9132@{
9133@}
9134
bb32f4f2 9135int
12545799
AD
9136calcxx_driver::parse (const std::string &f)
9137@{
9138 file = f;
9139 scan_begin ();
9140 yy::calcxx_parser parser (*this);
9141 parser.set_debug_level (trace_parsing);
bb32f4f2 9142 int res = parser.parse ();
12545799 9143 scan_end ();
bb32f4f2 9144 return res;
12545799
AD
9145@}
9146
9147void
9148calcxx_driver::error (const yy::location& l, const std::string& m)
9149@{
9150 std::cerr << l << ": " << m << std::endl;
9151@}
9152
9153void
9154calcxx_driver::error (const std::string& m)
9155@{
9156 std::cerr << m << std::endl;
9157@}
9158@end example
9159
9160@node Calc++ Parser
8405b70c 9161@subsubsection Calc++ Parser
12545799 9162
9913d6e4
JD
9163The grammar file @file{calc++-parser.yy} starts by asking for the C++
9164deterministic parser skeleton, the creation of the parser header file,
9165and specifies the name of the parser class. Because the C++ skeleton
9166changed several times, it is safer to require the version you designed
9167the grammar for.
1c59e0a1
AD
9168
9169@comment file: calc++-parser.yy
12545799 9170@example
ed4d67dc 9171%skeleton "lalr1.cc" /* -*- C++ -*- */
e6e704dc 9172%require "@value{VERSION}"
12545799 9173%defines
16dc6a9e 9174%define parser_class_name "calcxx_parser"
fb9712a9
AD
9175@end example
9176
9177@noindent
16dc6a9e 9178@findex %code requires
fb9712a9
AD
9179Then come the declarations/inclusions needed to define the
9180@code{%union}. Because the parser uses the parsing driver and
9181reciprocally, both cannot include the header of the other. Because the
9182driver's header needs detailed knowledge about the parser class (in
9183particular its inner types), it is the parser's header which will simply
9184use a forward declaration of the driver.
8e6f2266 9185@xref{%code Summary}.
fb9712a9
AD
9186
9187@comment file: calc++-parser.yy
9188@example
16dc6a9e 9189%code requires @{
12545799 9190# include <string>
fb9712a9 9191class calcxx_driver;
9bc0dd67 9192@}
12545799
AD
9193@end example
9194
9195@noindent
9196The driver is passed by reference to the parser and to the scanner.
9197This provides a simple but effective pure interface, not relying on
9198global variables.
9199
1c59e0a1 9200@comment file: calc++-parser.yy
12545799
AD
9201@example
9202// The parsing context.
9203%parse-param @{ calcxx_driver& driver @}
9204%lex-param @{ calcxx_driver& driver @}
9205@end example
9206
9207@noindent
9208Then we request the location tracking feature, and initialize the
c781580d 9209first location's file name. Afterward new locations are computed
12545799
AD
9210relatively to the previous locations: the file name will be
9211automatically propagated.
9212
1c59e0a1 9213@comment file: calc++-parser.yy
12545799
AD
9214@example
9215%locations
9216%initial-action
9217@{
9218 // Initialize the initial location.
b47dbebe 9219 @@$.begin.filename = @@$.end.filename = &driver.file;
12545799
AD
9220@};
9221@end example
9222
9223@noindent
6f04ee6c
JD
9224Use the two following directives to enable parser tracing and verbose error
9225messages. However, verbose error messages can contain incorrect information
9226(@pxref{LAC}).
12545799 9227
1c59e0a1 9228@comment file: calc++-parser.yy
12545799
AD
9229@example
9230%debug
9231%error-verbose
9232@end example
9233
9234@noindent
9235Semantic values cannot use ``real'' objects, but only pointers to
9236them.
9237
1c59e0a1 9238@comment file: calc++-parser.yy
12545799
AD
9239@example
9240// Symbols.
9241%union
9242@{
9243 int ival;
9244 std::string *sval;
9245@};
9246@end example
9247
fb9712a9 9248@noindent
136a0f76
PB
9249@findex %code
9250The code between @samp{%code @{} and @samp{@}} is output in the
34f98f46 9251@file{*.cc} file; it needs detailed knowledge about the driver.
fb9712a9
AD
9252
9253@comment file: calc++-parser.yy
9254@example
136a0f76 9255%code @{
fb9712a9 9256# include "calc++-driver.hh"
34f98f46 9257@}
fb9712a9
AD
9258@end example
9259
9260
12545799
AD
9261@noindent
9262The token numbered as 0 corresponds to end of file; the following line
9263allows for nicer error messages referring to ``end of file'' instead
9264of ``$end''. Similarly user friendly named are provided for each
9265symbol. Note that the tokens names are prefixed by @code{TOKEN_} to
9266avoid name clashes.
9267
1c59e0a1 9268@comment file: calc++-parser.yy
12545799 9269@example
fb9712a9
AD
9270%token END 0 "end of file"
9271%token ASSIGN ":="
9272%token <sval> IDENTIFIER "identifier"
9273%token <ival> NUMBER "number"
a8c2e813 9274%type <ival> exp
12545799
AD
9275@end example
9276
9277@noindent
9278To enable memory deallocation during error recovery, use
9279@code{%destructor}.
9280
287c78f6 9281@c FIXME: Document %printer, and mention that it takes a braced-code operand.
1c59e0a1 9282@comment file: calc++-parser.yy
12545799
AD
9283@example
9284%printer @{ debug_stream () << *$$; @} "identifier"
9285%destructor @{ delete $$; @} "identifier"
9286
a8c2e813 9287%printer @{ debug_stream () << $$; @} <ival>
12545799
AD
9288@end example
9289
9290@noindent
9291The grammar itself is straightforward.
9292
1c59e0a1 9293@comment file: calc++-parser.yy
12545799
AD
9294@example
9295%%
9296%start unit;
9297unit: assignments exp @{ driver.result = $2; @};
9298
9299assignments: assignments assignment @{@}
9d9b8b70 9300 | /* Nothing. */ @{@};
12545799 9301
3dc5e96b
PE
9302assignment:
9303 "identifier" ":=" exp
9304 @{ driver.variables[*$1] = $3; delete $1; @};
12545799
AD
9305
9306%left '+' '-';
9307%left '*' '/';
9308exp: exp '+' exp @{ $$ = $1 + $3; @}
9309 | exp '-' exp @{ $$ = $1 - $3; @}
9310 | exp '*' exp @{ $$ = $1 * $3; @}
9311 | exp '/' exp @{ $$ = $1 / $3; @}
3dc5e96b 9312 | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @}
fb9712a9 9313 | "number" @{ $$ = $1; @};
12545799
AD
9314%%
9315@end example
9316
9317@noindent
9318Finally the @code{error} member function registers the errors to the
9319driver.
9320
1c59e0a1 9321@comment file: calc++-parser.yy
12545799
AD
9322@example
9323void
1c59e0a1
AD
9324yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l,
9325 const std::string& m)
12545799
AD
9326@{
9327 driver.error (l, m);
9328@}
9329@end example
9330
9331@node Calc++ Scanner
8405b70c 9332@subsubsection Calc++ Scanner
12545799
AD
9333
9334The Flex scanner first includes the driver declaration, then the
9335parser's to get the set of defined tokens.
9336
1c59e0a1 9337@comment file: calc++-scanner.ll
12545799
AD
9338@example
9339%@{ /* -*- C++ -*- */
04098407 9340# include <cstdlib>
b10dd689
AD
9341# include <cerrno>
9342# include <climits>
12545799
AD
9343# include <string>
9344# include "calc++-driver.hh"
9345# include "calc++-parser.hh"
eaea13f5
PE
9346
9347/* Work around an incompatibility in flex (at least versions
9348 2.5.31 through 2.5.33): it generates code that does
9349 not conform to C89. See Debian bug 333231
9350 <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */
7870f699
PE
9351# undef yywrap
9352# define yywrap() 1
eaea13f5 9353
c095d689
AD
9354/* By default yylex returns int, we use token_type.
9355 Unfortunately yyterminate by default returns 0, which is
9356 not of token_type. */
8c5b881d 9357#define yyterminate() return token::END
12545799
AD
9358%@}
9359@end example
9360
9361@noindent
9362Because there is no @code{#include}-like feature we don't need
9363@code{yywrap}, we don't need @code{unput} either, and we parse an
9364actual file, this is not an interactive session with the user.
9365Finally we enable the scanner tracing features.
9366
1c59e0a1 9367@comment file: calc++-scanner.ll
12545799
AD
9368@example
9369%option noyywrap nounput batch debug
9370@end example
9371
9372@noindent
9373Abbreviations allow for more readable rules.
9374
1c59e0a1 9375@comment file: calc++-scanner.ll
12545799
AD
9376@example
9377id [a-zA-Z][a-zA-Z_0-9]*
9378int [0-9]+
9379blank [ \t]
9380@end example
9381
9382@noindent
9d9b8b70 9383The following paragraph suffices to track locations accurately. Each
12545799
AD
9384time @code{yylex} is invoked, the begin position is moved onto the end
9385position. Then when a pattern is matched, the end position is
9386advanced of its width. In case it matched ends of lines, the end
9387cursor is adjusted, and each time blanks are matched, the begin cursor
9388is moved onto the end cursor to effectively ignore the blanks
9389preceding tokens. Comments would be treated equally.
9390
1c59e0a1 9391@comment file: calc++-scanner.ll
12545799 9392@example
98842516 9393@group
828c373b
AD
9394%@{
9395# define YY_USER_ACTION yylloc->columns (yyleng);
9396%@}
98842516 9397@end group
12545799
AD
9398%%
9399%@{
9400 yylloc->step ();
12545799
AD
9401%@}
9402@{blank@}+ yylloc->step ();
9403[\n]+ yylloc->lines (yyleng); yylloc->step ();
9404@end example
9405
9406@noindent
fb9712a9
AD
9407The rules are simple, just note the use of the driver to report errors.
9408It is convenient to use a typedef to shorten
9409@code{yy::calcxx_parser::token::identifier} into
9d9b8b70 9410@code{token::identifier} for instance.
12545799 9411
1c59e0a1 9412@comment file: calc++-scanner.ll
12545799 9413@example
fb9712a9
AD
9414%@{
9415 typedef yy::calcxx_parser::token token;
9416%@}
8c5b881d 9417 /* Convert ints to the actual type of tokens. */
c095d689 9418[-+*/] return yy::calcxx_parser::token_type (yytext[0]);
fb9712a9 9419":=" return token::ASSIGN;
04098407
PE
9420@{int@} @{
9421 errno = 0;
9422 long n = strtol (yytext, NULL, 10);
9423 if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
9424 driver.error (*yylloc, "integer is out of range");
9425 yylval->ival = n;
fb9712a9 9426 return token::NUMBER;
04098407 9427@}
fb9712a9 9428@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER;
12545799
AD
9429. driver.error (*yylloc, "invalid character");
9430%%
9431@end example
9432
9433@noindent
9434Finally, because the scanner related driver's member function depend
9435on the scanner's data, it is simpler to implement them in this file.
9436
1c59e0a1 9437@comment file: calc++-scanner.ll
12545799 9438@example
98842516 9439@group
12545799
AD
9440void
9441calcxx_driver::scan_begin ()
9442@{
9443 yy_flex_debug = trace_scanning;
bb32f4f2
AD
9444 if (file == "-")
9445 yyin = stdin;
9446 else if (!(yyin = fopen (file.c_str (), "r")))
9447 @{
2c0f9706 9448 error ("cannot open " + file + ": " + strerror(errno));
dd561157 9449 exit (EXIT_FAILURE);
bb32f4f2 9450 @}
12545799 9451@}
98842516 9452@end group
12545799 9453
98842516 9454@group
12545799
AD
9455void
9456calcxx_driver::scan_end ()
9457@{
9458 fclose (yyin);
9459@}
98842516 9460@end group
12545799
AD
9461@end example
9462
9463@node Calc++ Top Level
8405b70c 9464@subsubsection Calc++ Top Level
12545799
AD
9465
9466The top level file, @file{calc++.cc}, poses no problem.
9467
1c59e0a1 9468@comment file: calc++.cc
12545799
AD
9469@example
9470#include <iostream>
9471#include "calc++-driver.hh"
9472
98842516 9473@group
12545799 9474int
fa4d969f 9475main (int argc, char *argv[])
12545799
AD
9476@{
9477 calcxx_driver driver;
9478 for (++argv; argv[0]; ++argv)
9479 if (*argv == std::string ("-p"))
9480 driver.trace_parsing = true;
9481 else if (*argv == std::string ("-s"))
9482 driver.trace_scanning = true;
bb32f4f2
AD
9483 else if (!driver.parse (*argv))
9484 std::cout << driver.result << std::endl;
12545799 9485@}
98842516 9486@end group
12545799
AD
9487@end example
9488
8405b70c
PB
9489@node Java Parsers
9490@section Java Parsers
9491
9492@menu
f56274a8
DJ
9493* Java Bison Interface:: Asking for Java parser generation
9494* Java Semantic Values:: %type and %token vs. Java
9495* Java Location Values:: The position and location classes
9496* Java Parser Interface:: Instantiating and running the parser
9497* Java Scanner Interface:: Specifying the scanner for the parser
9498* Java Action Features:: Special features for use in actions
9499* Java Differences:: Differences between C/C++ and Java Grammars
9500* Java Declarations Summary:: List of Bison declarations used with Java
8405b70c
PB
9501@end menu
9502
9503@node Java Bison Interface
9504@subsection Java Bison Interface
9505@c - %language "Java"
8405b70c 9506
59da312b
JD
9507(The current Java interface is experimental and may evolve.
9508More user feedback will help to stabilize it.)
9509
e254a580
DJ
9510The Java parser skeletons are selected using the @code{%language "Java"}
9511directive or the @option{-L java}/@option{--language=java} option.
8405b70c 9512
e254a580 9513@c FIXME: Documented bug.
9913d6e4
JD
9514When generating a Java parser, @code{bison @var{basename}.y} will
9515create a single Java source file named @file{@var{basename}.java}
9516containing the parser implementation. Using a grammar file without a
9517@file{.y} suffix is currently broken. The basename of the parser
9518implementation file can be changed by the @code{%file-prefix}
9519directive or the @option{-p}/@option{--name-prefix} option. The
9520entire parser implementation file name can be changed by the
9521@code{%output} directive or the @option{-o}/@option{--output} option.
9522The parser implementation file contains a single class for the parser.
8405b70c 9523
e254a580 9524You can create documentation for generated parsers using Javadoc.
8405b70c 9525
e254a580
DJ
9526Contrary to C parsers, Java parsers do not use global variables; the
9527state of the parser is always local to an instance of the parser class.
9528Therefore, all Java parsers are ``pure'', and the @code{%pure-parser}
9529and @code{%define api.pure} directives does not do anything when used in
9530Java.
8405b70c 9531
e254a580 9532Push parsers are currently unsupported in Java and @code{%define
812775a0 9533api.push-pull} have no effect.
01b477c6 9534
35430378 9535GLR parsers are currently unsupported in Java. Do not use the
e254a580
DJ
9536@code{glr-parser} directive.
9537
9538No header file can be generated for Java parsers. Do not use the
9539@code{%defines} directive or the @option{-d}/@option{--defines} options.
9540
9541@c FIXME: Possible code change.
9542Currently, support for debugging and verbose errors are always compiled
9543in. Thus the @code{%debug} and @code{%token-table} directives and the
9544@option{-t}/@option{--debug} and @option{-k}/@option{--token-table}
9545options have no effect. This may change in the future to eliminate
9546unused code in the generated parser, so use @code{%debug} and
9547@code{%verbose-error} explicitly if needed. Also, in the future the
9548@code{%token-table} directive might enable a public interface to
9549access the token names and codes.
8405b70c
PB
9550
9551@node Java Semantic Values
9552@subsection Java Semantic Values
9553@c - No %union, specify type in %type/%token.
9554@c - YYSTYPE
9555@c - Printer and destructor
9556
9557There is no @code{%union} directive in Java parsers. Instead, the
9558semantic values' types (class names) should be specified in the
9559@code{%type} or @code{%token} directive:
9560
9561@example
9562%type <Expression> expr assignment_expr term factor
9563%type <Integer> number
9564@end example
9565
9566By default, the semantic stack is declared to have @code{Object} members,
9567which means that the class types you specify can be of any class.
9568To improve the type safety of the parser, you can declare the common
e254a580
DJ
9569superclass of all the semantic values using the @code{%define stype}
9570directive. For example, after the following declaration:
8405b70c
PB
9571
9572@example
e254a580 9573%define stype "ASTNode"
8405b70c
PB
9574@end example
9575
9576@noindent
9577any @code{%type} or @code{%token} specifying a semantic type which
9578is not a subclass of ASTNode, will cause a compile-time error.
9579
e254a580 9580@c FIXME: Documented bug.
8405b70c
PB
9581Types used in the directives may be qualified with a package name.
9582Primitive data types are accepted for Java version 1.5 or later. Note
9583that in this case the autoboxing feature of Java 1.5 will be used.
e254a580
DJ
9584Generic types may not be used; this is due to a limitation in the
9585implementation of Bison, and may change in future releases.
8405b70c
PB
9586
9587Java parsers do not support @code{%destructor}, since the language
9588adopts garbage collection. The parser will try to hold references
9589to semantic values for as little time as needed.
9590
9591Java parsers do not support @code{%printer}, as @code{toString()}
9592can be used to print the semantic values. This however may change
9593(in a backwards-compatible way) in future versions of Bison.
9594
9595
9596@node Java Location Values
9597@subsection Java Location Values
9598@c - %locations
9599@c - class Position
9600@c - class Location
9601
7404cdf3
JD
9602When the directive @code{%locations} is used, the Java parser supports
9603location tracking, see @ref{Tracking Locations}. An auxiliary user-defined
9604class defines a @dfn{position}, a single point in a file; Bison itself
9605defines a class representing a @dfn{location}, a range composed of a pair of
9606positions (possibly spanning several files). The location class is an inner
9607class of the parser; the name is @code{Location} by default, and may also be
9608renamed using @code{%define location_type "@var{class-name}"}.
8405b70c
PB
9609
9610The location class treats the position as a completely opaque value.
9611By default, the class name is @code{Position}, but this can be changed
e254a580
DJ
9612with @code{%define position_type "@var{class-name}"}. This class must
9613be supplied by the user.
8405b70c
PB
9614
9615
e254a580
DJ
9616@deftypeivar {Location} {Position} begin
9617@deftypeivarx {Location} {Position} end
8405b70c 9618The first, inclusive, position of the range, and the first beyond.
e254a580
DJ
9619@end deftypeivar
9620
9621@deftypeop {Constructor} {Location} {} Location (Position @var{loc})
c046698e 9622Create a @code{Location} denoting an empty range located at a given point.
e254a580 9623@end deftypeop
8405b70c 9624
e254a580
DJ
9625@deftypeop {Constructor} {Location} {} Location (Position @var{begin}, Position @var{end})
9626Create a @code{Location} from the endpoints of the range.
9627@end deftypeop
9628
9629@deftypemethod {Location} {String} toString ()
8405b70c
PB
9630Prints the range represented by the location. For this to work
9631properly, the position class should override the @code{equals} and
9632@code{toString} methods appropriately.
9633@end deftypemethod
9634
9635
9636@node Java Parser Interface
9637@subsection Java Parser Interface
9638@c - define parser_class_name
9639@c - Ctor
9640@c - parse, error, set_debug_level, debug_level, set_debug_stream,
9641@c debug_stream.
9642@c - Reporting errors
9643
e254a580
DJ
9644The name of the generated parser class defaults to @code{YYParser}. The
9645@code{YY} prefix may be changed using the @code{%name-prefix} directive
9646or the @option{-p}/@option{--name-prefix} option. Alternatively, use
9647@code{%define parser_class_name "@var{name}"} to give a custom name to
9648the class. The interface of this class is detailed below.
8405b70c 9649
e254a580
DJ
9650By default, the parser class has package visibility. A declaration
9651@code{%define public} will change to public visibility. Remember that,
9652according to the Java language specification, the name of the @file{.java}
9653file should match the name of the class in this case. Similarly, you can
9654use @code{abstract}, @code{final} and @code{strictfp} with the
9655@code{%define} declaration to add other modifiers to the parser class.
9656
9657The Java package name of the parser class can be specified using the
9658@code{%define package} directive. The superclass and the implemented
9659interfaces of the parser class can be specified with the @code{%define
9660extends} and @code{%define implements} directives.
9661
9662The parser class defines an inner class, @code{Location}, that is used
9663for location tracking (see @ref{Java Location Values}), and a inner
9664interface, @code{Lexer} (see @ref{Java Scanner Interface}). Other than
9665these inner class/interface, and the members described in the interface
9666below, all the other members and fields are preceded with a @code{yy} or
9667@code{YY} prefix to avoid clashes with user code.
9668
9669@c FIXME: The following constants and variables are still undocumented:
9670@c @code{bisonVersion}, @code{bisonSkeleton} and @code{errorVerbose}.
9671
9672The parser class can be extended using the @code{%parse-param}
9673directive. Each occurrence of the directive will add a @code{protected
9674final} field to the parser class, and an argument to its constructor,
9675which initialize them automatically.
9676
9677Token names defined by @code{%token} and the predefined @code{EOF} token
9678name are added as constant fields to the parser class.
9679
9680@deftypeop {Constructor} {YYParser} {} YYParser (@var{lex_param}, @dots{}, @var{parse_param}, @dots{})
9681Build a new parser object with embedded @code{%code lexer}. There are
9682no parameters, unless @code{%parse-param}s and/or @code{%lex-param}s are
9683used.
9684@end deftypeop
9685
9686@deftypeop {Constructor} {YYParser} {} YYParser (Lexer @var{lexer}, @var{parse_param}, @dots{})
9687Build a new parser object using the specified scanner. There are no
9688additional parameters unless @code{%parse-param}s are used.
9689
9690If the scanner is defined by @code{%code lexer}, this constructor is
9691declared @code{protected} and is called automatically with a scanner
9692created with the correct @code{%lex-param}s.
9693@end deftypeop
8405b70c
PB
9694
9695@deftypemethod {YYParser} {boolean} parse ()
9696Run the syntactic analysis, and return @code{true} on success,
9697@code{false} otherwise.
9698@end deftypemethod
9699
01b477c6 9700@deftypemethod {YYParser} {boolean} recovering ()
8405b70c 9701During the syntactic analysis, return @code{true} if recovering
e254a580
DJ
9702from a syntax error.
9703@xref{Error Recovery}.
8405b70c
PB
9704@end deftypemethod
9705
9706@deftypemethod {YYParser} {java.io.PrintStream} getDebugStream ()
9707@deftypemethodx {YYParser} {void} setDebugStream (java.io.printStream @var{o})
9708Get or set the stream used for tracing the parsing. It defaults to
9709@code{System.err}.
9710@end deftypemethod
9711
9712@deftypemethod {YYParser} {int} getDebugLevel ()
9713@deftypemethodx {YYParser} {void} setDebugLevel (int @var{l})
9714Get or set the tracing level. Currently its value is either 0, no trace,
9715or nonzero, full tracing.
9716@end deftypemethod
9717
8405b70c
PB
9718
9719@node Java Scanner Interface
9720@subsection Java Scanner Interface
01b477c6 9721@c - %code lexer
8405b70c 9722@c - %lex-param
01b477c6 9723@c - Lexer interface
8405b70c 9724
e254a580
DJ
9725There are two possible ways to interface a Bison-generated Java parser
9726with a scanner: the scanner may be defined by @code{%code lexer}, or
9727defined elsewhere. In either case, the scanner has to implement the
9728@code{Lexer} inner interface of the parser class.
9729
9730In the first case, the body of the scanner class is placed in
9731@code{%code lexer} blocks. If you want to pass parameters from the
9732parser constructor to the scanner constructor, specify them with
9733@code{%lex-param}; they are passed before @code{%parse-param}s to the
9734constructor.
01b477c6 9735
59c5ac72 9736In the second case, the scanner has to implement the @code{Lexer} interface,
01b477c6
PB
9737which is defined within the parser class (e.g., @code{YYParser.Lexer}).
9738The constructor of the parser object will then accept an object
9739implementing the interface; @code{%lex-param} is not used in this
9740case.
9741
9742In both cases, the scanner has to implement the following methods.
9743
e254a580
DJ
9744@deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg})
9745This method is defined by the user to emit an error message. The first
9746parameter is omitted if location tracking is not active. Its type can be
9747changed using @code{%define location_type "@var{class-name}".}
8405b70c
PB
9748@end deftypemethod
9749
e254a580 9750@deftypemethod {Lexer} {int} yylex ()
8405b70c 9751Return the next token. Its type is the return value, its semantic
c781580d 9752value and location are saved and returned by the their methods in the
e254a580
DJ
9753interface.
9754
9755Use @code{%define lex_throws} to specify any uncaught exceptions.
9756Default is @code{java.io.IOException}.
8405b70c
PB
9757@end deftypemethod
9758
9759@deftypemethod {Lexer} {Position} getStartPos ()
9760@deftypemethodx {Lexer} {Position} getEndPos ()
01b477c6
PB
9761Return respectively the first position of the last token that
9762@code{yylex} returned, and the first position beyond it. These
9763methods are not needed unless location tracking is active.
8405b70c 9764
e254a580 9765The return type can be changed using @code{%define position_type
8405b70c
PB
9766"@var{class-name}".}
9767@end deftypemethod
9768
9769@deftypemethod {Lexer} {Object} getLVal ()
c781580d 9770Return the semantic value of the last token that yylex returned.
8405b70c 9771
e254a580 9772The return type can be changed using @code{%define stype
8405b70c
PB
9773"@var{class-name}".}
9774@end deftypemethod
9775
9776
e254a580
DJ
9777@node Java Action Features
9778@subsection Special Features for Use in Java Actions
9779
9780The following special constructs can be uses in Java actions.
9781Other analogous C action features are currently unavailable for Java.
9782
9783Use @code{%define throws} to specify any uncaught exceptions from parser
9784actions, and initial actions specified by @code{%initial-action}.
9785
9786@defvar $@var{n}
9787The semantic value for the @var{n}th component of the current rule.
9788This may not be assigned to.
9789@xref{Java Semantic Values}.
9790@end defvar
9791
9792@defvar $<@var{typealt}>@var{n}
9793Like @code{$@var{n}} but specifies a alternative type @var{typealt}.
9794@xref{Java Semantic Values}.
9795@end defvar
9796
9797@defvar $$
9798The semantic value for the grouping made by the current rule. As a
9799value, this is in the base type (@code{Object} or as specified by
9800@code{%define stype}) as in not cast to the declared subtype because
9801casts are not allowed on the left-hand side of Java assignments.
9802Use an explicit Java cast if the correct subtype is needed.
9803@xref{Java Semantic Values}.
9804@end defvar
9805
9806@defvar $<@var{typealt}>$
9807Same as @code{$$} since Java always allow assigning to the base type.
9808Perhaps we should use this and @code{$<>$} for the value and @code{$$}
9809for setting the value but there is currently no easy way to distinguish
9810these constructs.
9811@xref{Java Semantic Values}.
9812@end defvar
9813
9814@defvar @@@var{n}
9815The location information of the @var{n}th component of the current rule.
9816This may not be assigned to.
9817@xref{Java Location Values}.
9818@end defvar
9819
9820@defvar @@$
9821The location information of the grouping made by the current rule.
9822@xref{Java Location Values}.
9823@end defvar
9824
9825@deffn {Statement} {return YYABORT;}
9826Return immediately from the parser, indicating failure.
9827@xref{Java Parser Interface}.
9828@end deffn
8405b70c 9829
e254a580
DJ
9830@deffn {Statement} {return YYACCEPT;}
9831Return immediately from the parser, indicating success.
9832@xref{Java Parser Interface}.
9833@end deffn
8405b70c 9834
e254a580 9835@deffn {Statement} {return YYERROR;}
c046698e 9836Start error recovery without printing an error message.
e254a580
DJ
9837@xref{Error Recovery}.
9838@end deffn
8405b70c 9839
e254a580
DJ
9840@deftypefn {Function} {boolean} recovering ()
9841Return whether error recovery is being done. In this state, the parser
9842reads token until it reaches a known state, and then restarts normal
9843operation.
9844@xref{Error Recovery}.
9845@end deftypefn
8405b70c 9846
e254a580
DJ
9847@deftypefn {Function} {protected void} yyerror (String msg)
9848@deftypefnx {Function} {protected void} yyerror (Position pos, String msg)
9849@deftypefnx {Function} {protected void} yyerror (Location loc, String msg)
9850Print an error message using the @code{yyerror} method of the scanner
9851instance in use.
9852@end deftypefn
8405b70c 9853
8405b70c 9854
8405b70c
PB
9855@node Java Differences
9856@subsection Differences between C/C++ and Java Grammars
9857
9858The different structure of the Java language forces several differences
9859between C/C++ grammars, and grammars designed for Java parsers. This
29553547 9860section summarizes these differences.
8405b70c
PB
9861
9862@itemize
9863@item
01b477c6 9864Java lacks a preprocessor, so the @code{YYERROR}, @code{YYACCEPT},
8405b70c 9865@code{YYABORT} symbols (@pxref{Table of Symbols}) cannot obviously be
01b477c6
PB
9866macros. Instead, they should be preceded by @code{return} when they
9867appear in an action. The actual definition of these symbols is
8405b70c
PB
9868opaque to the Bison grammar, and it might change in the future. The
9869only meaningful operation that you can do, is to return them.
e254a580 9870See @pxref{Java Action Features}.
8405b70c
PB
9871
9872Note that of these three symbols, only @code{YYACCEPT} and
9873@code{YYABORT} will cause a return from the @code{yyparse}
9874method@footnote{Java parsers include the actions in a separate
9875method than @code{yyparse} in order to have an intuitive syntax that
9876corresponds to these C macros.}.
9877
e254a580
DJ
9878@item
9879Java lacks unions, so @code{%union} has no effect. Instead, semantic
9880values have a common base type: @code{Object} or as specified by
c781580d 9881@samp{%define stype}. Angle brackets on @code{%token}, @code{type},
e254a580
DJ
9882@code{$@var{n}} and @code{$$} specify subtypes rather than fields of
9883an union. The type of @code{$$}, even with angle brackets, is the base
9884type since Java casts are not allow on the left-hand side of assignments.
9885Also, @code{$@var{n}} and @code{@@@var{n}} are not allowed on the
9886left-hand side of assignments. See @pxref{Java Semantic Values} and
9887@pxref{Java Action Features}.
9888
8405b70c 9889@item
c781580d 9890The prologue declarations have a different meaning than in C/C++ code.
01b477c6
PB
9891@table @asis
9892@item @code{%code imports}
9893blocks are placed at the beginning of the Java source code. They may
9894include copyright notices. For a @code{package} declarations, it is
9895suggested to use @code{%define package} instead.
8405b70c 9896
01b477c6
PB
9897@item unqualified @code{%code}
9898blocks are placed inside the parser class.
9899
9900@item @code{%code lexer}
9901blocks, if specified, should include the implementation of the
9902scanner. If there is no such block, the scanner can be any class
9903that implements the appropriate interface (see @pxref{Java Scanner
9904Interface}).
29553547 9905@end table
8405b70c
PB
9906
9907Other @code{%code} blocks are not supported in Java parsers.
e254a580
DJ
9908In particular, @code{%@{ @dots{} %@}} blocks should not be used
9909and may give an error in future versions of Bison.
9910
01b477c6 9911The epilogue has the same meaning as in C/C++ code and it can
e254a580
DJ
9912be used to define other classes used by the parser @emph{outside}
9913the parser class.
8405b70c
PB
9914@end itemize
9915
e254a580
DJ
9916
9917@node Java Declarations Summary
9918@subsection Java Declarations Summary
9919
9920This summary only include declarations specific to Java or have special
9921meaning when used in a Java parser.
9922
9923@deffn {Directive} {%language "Java"}
9924Generate a Java class for the parser.
9925@end deffn
9926
9927@deffn {Directive} %lex-param @{@var{type} @var{name}@}
9928A parameter for the lexer class defined by @code{%code lexer}
9929@emph{only}, added as parameters to the lexer constructor and the parser
9930constructor that @emph{creates} a lexer. Default is none.
9931@xref{Java Scanner Interface}.
9932@end deffn
9933
9934@deffn {Directive} %name-prefix "@var{prefix}"
9935The prefix of the parser class name @code{@var{prefix}Parser} if
9936@code{%define parser_class_name} is not used. Default is @code{YY}.
9937@xref{Java Bison Interface}.
9938@end deffn
9939
9940@deffn {Directive} %parse-param @{@var{type} @var{name}@}
9941A parameter for the parser class added as parameters to constructor(s)
9942and as fields initialized by the constructor(s). Default is none.
9943@xref{Java Parser Interface}.
9944@end deffn
9945
9946@deffn {Directive} %token <@var{type}> @var{token} @dots{}
9947Declare tokens. Note that the angle brackets enclose a Java @emph{type}.
9948@xref{Java Semantic Values}.
9949@end deffn
9950
9951@deffn {Directive} %type <@var{type}> @var{nonterminal} @dots{}
9952Declare the type of nonterminals. Note that the angle brackets enclose
9953a Java @emph{type}.
9954@xref{Java Semantic Values}.
9955@end deffn
9956
9957@deffn {Directive} %code @{ @var{code} @dots{} @}
9958Code appended to the inside of the parser class.
9959@xref{Java Differences}.
9960@end deffn
9961
9962@deffn {Directive} {%code imports} @{ @var{code} @dots{} @}
9963Code inserted just after the @code{package} declaration.
9964@xref{Java Differences}.
9965@end deffn
9966
9967@deffn {Directive} {%code lexer} @{ @var{code} @dots{} @}
9968Code added to the body of a inner lexer class within the parser class.
9969@xref{Java Scanner Interface}.
9970@end deffn
9971
9972@deffn {Directive} %% @var{code} @dots{}
9973Code (after the second @code{%%}) appended to the end of the file,
9974@emph{outside} the parser class.
9975@xref{Java Differences}.
9976@end deffn
9977
9978@deffn {Directive} %@{ @var{code} @dots{} %@}
9979Not supported. Use @code{%code import} instead.
9980@xref{Java Differences}.
9981@end deffn
9982
9983@deffn {Directive} {%define abstract}
9984Whether the parser class is declared @code{abstract}. Default is false.
9985@xref{Java Bison Interface}.
9986@end deffn
9987
9988@deffn {Directive} {%define extends} "@var{superclass}"
9989The superclass of the parser class. Default is none.
9990@xref{Java Bison Interface}.
9991@end deffn
9992
9993@deffn {Directive} {%define final}
9994Whether the parser class is declared @code{final}. Default is false.
9995@xref{Java Bison Interface}.
9996@end deffn
9997
9998@deffn {Directive} {%define implements} "@var{interfaces}"
9999The implemented interfaces of the parser class, a comma-separated list.
10000Default is none.
10001@xref{Java Bison Interface}.
10002@end deffn
10003
10004@deffn {Directive} {%define lex_throws} "@var{exceptions}"
10005The exceptions thrown by the @code{yylex} method of the lexer, a
10006comma-separated list. Default is @code{java.io.IOException}.
10007@xref{Java Scanner Interface}.
10008@end deffn
10009
10010@deffn {Directive} {%define location_type} "@var{class}"
10011The name of the class used for locations (a range between two
10012positions). This class is generated as an inner class of the parser
10013class by @command{bison}. Default is @code{Location}.
10014@xref{Java Location Values}.
10015@end deffn
10016
10017@deffn {Directive} {%define package} "@var{package}"
10018The package to put the parser class in. Default is none.
10019@xref{Java Bison Interface}.
10020@end deffn
10021
10022@deffn {Directive} {%define parser_class_name} "@var{name}"
10023The name of the parser class. Default is @code{YYParser} or
10024@code{@var{name-prefix}Parser}.
10025@xref{Java Bison Interface}.
10026@end deffn
10027
10028@deffn {Directive} {%define position_type} "@var{class}"
10029The name of the class used for positions. This class must be supplied by
10030the user. Default is @code{Position}.
10031@xref{Java Location Values}.
10032@end deffn
10033
10034@deffn {Directive} {%define public}
10035Whether the parser class is declared @code{public}. Default is false.
10036@xref{Java Bison Interface}.
10037@end deffn
10038
10039@deffn {Directive} {%define stype} "@var{class}"
10040The base type of semantic values. Default is @code{Object}.
10041@xref{Java Semantic Values}.
10042@end deffn
10043
10044@deffn {Directive} {%define strictfp}
10045Whether the parser class is declared @code{strictfp}. Default is false.
10046@xref{Java Bison Interface}.
10047@end deffn
10048
10049@deffn {Directive} {%define throws} "@var{exceptions}"
10050The exceptions thrown by user-supplied parser actions and
10051@code{%initial-action}, a comma-separated list. Default is none.
10052@xref{Java Parser Interface}.
10053@end deffn
10054
10055
12545799 10056@c ================================================= FAQ
d1a1114f
AD
10057
10058@node FAQ
10059@chapter Frequently Asked Questions
10060@cindex frequently asked questions
10061@cindex questions
10062
10063Several questions about Bison come up occasionally. Here some of them
10064are addressed.
10065
10066@menu
55ba27be
AD
10067* Memory Exhausted:: Breaking the Stack Limits
10068* How Can I Reset the Parser:: @code{yyparse} Keeps some State
10069* Strings are Destroyed:: @code{yylval} Loses Track of Strings
10070* Implementing Gotos/Loops:: Control Flow in the Calculator
ed2e6384 10071* Multiple start-symbols:: Factoring closely related grammars
35430378 10072* Secure? Conform?:: Is Bison POSIX safe?
55ba27be
AD
10073* I can't build Bison:: Troubleshooting
10074* Where can I find help?:: Troubleshouting
10075* Bug Reports:: Troublereporting
8405b70c 10076* More Languages:: Parsers in C++, Java, and so on
55ba27be
AD
10077* Beta Testing:: Experimenting development versions
10078* Mailing Lists:: Meeting other Bison users
d1a1114f
AD
10079@end menu
10080
1a059451
PE
10081@node Memory Exhausted
10082@section Memory Exhausted
d1a1114f 10083
ab8932bf 10084@quotation
1a059451 10085My parser returns with error with a @samp{memory exhausted}
d1a1114f 10086message. What can I do?
ab8932bf 10087@end quotation
d1a1114f
AD
10088
10089This question is already addressed elsewhere, @xref{Recursion,
10090,Recursive Rules}.
10091
e64fec0a
PE
10092@node How Can I Reset the Parser
10093@section How Can I Reset the Parser
5b066063 10094
0e14ad77
PE
10095The following phenomenon has several symptoms, resulting in the
10096following typical questions:
5b066063 10097
ab8932bf 10098@quotation
5b066063
AD
10099I invoke @code{yyparse} several times, and on correct input it works
10100properly; but when a parse error is found, all the other calls fail
0e14ad77 10101too. How can I reset the error flag of @code{yyparse}?
ab8932bf 10102@end quotation
5b066063
AD
10103
10104@noindent
10105or
10106
ab8932bf 10107@quotation
0e14ad77 10108My parser includes support for an @samp{#include}-like feature, in
5b066063 10109which case I run @code{yyparse} from @code{yyparse}. This fails
ab8932bf
AD
10110although I did specify @samp{%define api.pure}.
10111@end quotation
5b066063 10112
0e14ad77
PE
10113These problems typically come not from Bison itself, but from
10114Lex-generated scanners. Because these scanners use large buffers for
5b066063
AD
10115speed, they might not notice a change of input file. As a
10116demonstration, consider the following source file,
10117@file{first-line.l}:
10118
98842516
AD
10119@example
10120@group
10121%@{
5b066063
AD
10122#include <stdio.h>
10123#include <stdlib.h>
98842516
AD
10124%@}
10125@end group
5b066063
AD
10126%%
10127.*\n ECHO; return 1;
10128%%
98842516 10129@group
5b066063 10130int
0e14ad77 10131yyparse (char const *file)
98842516 10132@{
5b066063
AD
10133 yyin = fopen (file, "r");
10134 if (!yyin)
98842516
AD
10135 @{
10136 perror ("fopen");
10137 exit (EXIT_FAILURE);
10138 @}
10139@end group
10140@group
fa7e68c3 10141 /* One token only. */
5b066063 10142 yylex ();
0e14ad77 10143 if (fclose (yyin) != 0)
98842516
AD
10144 @{
10145 perror ("fclose");
10146 exit (EXIT_FAILURE);
10147 @}
5b066063 10148 return 0;
98842516
AD
10149@}
10150@end group
5b066063 10151
98842516 10152@group
5b066063 10153int
0e14ad77 10154main (void)
98842516 10155@{
5b066063
AD
10156 yyparse ("input");
10157 yyparse ("input");
10158 return 0;
98842516
AD
10159@}
10160@end group
10161@end example
5b066063
AD
10162
10163@noindent
10164If the file @file{input} contains
10165
ab8932bf 10166@example
5b066063
AD
10167input:1: Hello,
10168input:2: World!
ab8932bf 10169@end example
5b066063
AD
10170
10171@noindent
0e14ad77 10172then instead of getting the first line twice, you get:
5b066063
AD
10173
10174@example
10175$ @kbd{flex -ofirst-line.c first-line.l}
10176$ @kbd{gcc -ofirst-line first-line.c -ll}
10177$ @kbd{./first-line}
10178input:1: Hello,
10179input:2: World!
10180@end example
10181
0e14ad77
PE
10182Therefore, whenever you change @code{yyin}, you must tell the
10183Lex-generated scanner to discard its current buffer and switch to the
10184new one. This depends upon your implementation of Lex; see its
10185documentation for more. For Flex, it suffices to call
10186@samp{YY_FLUSH_BUFFER} after each change to @code{yyin}. If your
10187Flex-generated scanner needs to read from several input streams to
10188handle features like include files, you might consider using Flex
10189functions like @samp{yy_switch_to_buffer} that manipulate multiple
10190input buffers.
5b066063 10191
b165c324
AD
10192If your Flex-generated scanner uses start conditions (@pxref{Start
10193conditions, , Start conditions, flex, The Flex Manual}), you might
10194also want to reset the scanner's state, i.e., go back to the initial
10195start condition, through a call to @samp{BEGIN (0)}.
10196
fef4cb51
AD
10197@node Strings are Destroyed
10198@section Strings are Destroyed
10199
ab8932bf 10200@quotation
c7e441b4 10201My parser seems to destroy old strings, or maybe it loses track of
fef4cb51
AD
10202them. Instead of reporting @samp{"foo", "bar"}, it reports
10203@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
ab8932bf 10204@end quotation
fef4cb51
AD
10205
10206This error is probably the single most frequent ``bug report'' sent to
10207Bison lists, but is only concerned with a misunderstanding of the role
8c5b881d 10208of the scanner. Consider the following Lex code:
fef4cb51 10209
ab8932bf 10210@example
98842516 10211@group
ab8932bf 10212%@{
fef4cb51
AD
10213#include <stdio.h>
10214char *yylval = NULL;
ab8932bf 10215%@}
98842516
AD
10216@end group
10217@group
fef4cb51
AD
10218%%
10219.* yylval = yytext; return 1;
10220\n /* IGNORE */
10221%%
98842516
AD
10222@end group
10223@group
fef4cb51
AD
10224int
10225main ()
ab8932bf 10226@{
fa7e68c3 10227 /* Similar to using $1, $2 in a Bison action. */
fef4cb51
AD
10228 char *fst = (yylex (), yylval);
10229 char *snd = (yylex (), yylval);
10230 printf ("\"%s\", \"%s\"\n", fst, snd);
10231 return 0;
ab8932bf 10232@}
98842516 10233@end group
ab8932bf 10234@end example
fef4cb51
AD
10235
10236If you compile and run this code, you get:
10237
10238@example
10239$ @kbd{flex -osplit-lines.c split-lines.l}
10240$ @kbd{gcc -osplit-lines split-lines.c -ll}
10241$ @kbd{printf 'one\ntwo\n' | ./split-lines}
10242"one
10243two", "two"
10244@end example
10245
10246@noindent
10247this is because @code{yytext} is a buffer provided for @emph{reading}
10248in the action, but if you want to keep it, you have to duplicate it
10249(e.g., using @code{strdup}). Note that the output may depend on how
10250your implementation of Lex handles @code{yytext}. For instance, when
10251given the Lex compatibility option @option{-l} (which triggers the
10252option @samp{%array}) Flex generates a different behavior:
10253
10254@example
10255$ @kbd{flex -l -osplit-lines.c split-lines.l}
10256$ @kbd{gcc -osplit-lines split-lines.c -ll}
10257$ @kbd{printf 'one\ntwo\n' | ./split-lines}
10258"two", "two"
10259@end example
10260
10261
2fa09258
AD
10262@node Implementing Gotos/Loops
10263@section Implementing Gotos/Loops
a06ea4aa 10264
ab8932bf 10265@quotation
a06ea4aa 10266My simple calculator supports variables, assignments, and functions,
2fa09258 10267but how can I implement gotos, or loops?
ab8932bf 10268@end quotation
a06ea4aa
AD
10269
10270Although very pedagogical, the examples included in the document blur
a1c84f45 10271the distinction to make between the parser---whose job is to recover
a06ea4aa 10272the structure of a text and to transmit it to subsequent modules of
a1c84f45 10273the program---and the processing (such as the execution) of this
a06ea4aa
AD
10274structure. This works well with so called straight line programs,
10275i.e., precisely those that have a straightforward execution model:
10276execute simple instructions one after the others.
10277
10278@cindex abstract syntax tree
35430378 10279@cindex AST
a06ea4aa
AD
10280If you want a richer model, you will probably need to use the parser
10281to construct a tree that does represent the structure it has
10282recovered; this tree is usually called the @dfn{abstract syntax tree},
35430378 10283or @dfn{AST} for short. Then, walking through this tree,
a06ea4aa
AD
10284traversing it in various ways, will enable treatments such as its
10285execution or its translation, which will result in an interpreter or a
10286compiler.
10287
10288This topic is way beyond the scope of this manual, and the reader is
10289invited to consult the dedicated literature.
10290
10291
ed2e6384
AD
10292@node Multiple start-symbols
10293@section Multiple start-symbols
10294
ab8932bf 10295@quotation
ed2e6384
AD
10296I have several closely related grammars, and I would like to share their
10297implementations. In fact, I could use a single grammar but with
10298multiple entry points.
ab8932bf 10299@end quotation
ed2e6384
AD
10300
10301Bison does not support multiple start-symbols, but there is a very
10302simple means to simulate them. If @code{foo} and @code{bar} are the two
10303pseudo start-symbols, then introduce two new tokens, say
10304@code{START_FOO} and @code{START_BAR}, and use them as switches from the
10305real start-symbol:
10306
10307@example
10308%token START_FOO START_BAR;
10309%start start;
10310start: START_FOO foo
10311 | START_BAR bar;
10312@end example
10313
10314These tokens prevents the introduction of new conflicts. As far as the
10315parser goes, that is all that is needed.
10316
10317Now the difficult part is ensuring that the scanner will send these
10318tokens first. If your scanner is hand-written, that should be
10319straightforward. If your scanner is generated by Lex, them there is
10320simple means to do it: recall that anything between @samp{%@{ ... %@}}
10321after the first @code{%%} is copied verbatim in the top of the generated
10322@code{yylex} function. Make sure a variable @code{start_token} is
10323available in the scanner (e.g., a global variable or using
10324@code{%lex-param} etc.), and use the following:
10325
10326@example
10327 /* @r{Prologue.} */
10328%%
10329%@{
10330 if (start_token)
10331 @{
10332 int t = start_token;
10333 start_token = 0;
10334 return t;
10335 @}
10336%@}
10337 /* @r{The rules.} */
10338@end example
10339
10340
55ba27be
AD
10341@node Secure? Conform?
10342@section Secure? Conform?
10343
ab8932bf 10344@quotation
55ba27be 10345Is Bison secure? Does it conform to POSIX?
ab8932bf 10346@end quotation
55ba27be
AD
10347
10348If you're looking for a guarantee or certification, we don't provide it.
10349However, Bison is intended to be a reliable program that conforms to the
35430378 10350POSIX specification for Yacc. If you run into problems,
55ba27be
AD
10351please send us a bug report.
10352
10353@node I can't build Bison
10354@section I can't build Bison
10355
ab8932bf 10356@quotation
8c5b881d
PE
10357I can't build Bison because @command{make} complains that
10358@code{msgfmt} is not found.
55ba27be 10359What should I do?
ab8932bf 10360@end quotation
55ba27be
AD
10361
10362Like most GNU packages with internationalization support, that feature
10363is turned on by default. If you have problems building in the @file{po}
10364subdirectory, it indicates that your system's internationalization
10365support is lacking. You can re-configure Bison with
10366@option{--disable-nls} to turn off this support, or you can install GNU
10367gettext from @url{ftp://ftp.gnu.org/gnu/gettext/} and re-configure
10368Bison. See the file @file{ABOUT-NLS} for more information.
10369
10370
10371@node Where can I find help?
10372@section Where can I find help?
10373
ab8932bf 10374@quotation
55ba27be 10375I'm having trouble using Bison. Where can I find help?
ab8932bf 10376@end quotation
55ba27be
AD
10377
10378First, read this fine manual. Beyond that, you can send mail to
10379@email{help-bison@@gnu.org}. This mailing list is intended to be
10380populated with people who are willing to answer questions about using
10381and installing Bison. Please keep in mind that (most of) the people on
10382the list have aspects of their lives which are not related to Bison (!),
10383so you may not receive an answer to your question right away. This can
10384be frustrating, but please try not to honk them off; remember that any
10385help they provide is purely voluntary and out of the kindness of their
10386hearts.
10387
10388@node Bug Reports
10389@section Bug Reports
10390
ab8932bf 10391@quotation
55ba27be 10392I found a bug. What should I include in the bug report?
ab8932bf 10393@end quotation
55ba27be
AD
10394
10395Before you send a bug report, make sure you are using the latest
10396version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its
10397mirrors. Be sure to include the version number in your bug report. If
10398the bug is present in the latest version but not in a previous version,
10399try to determine the most recent version which did not contain the bug.
10400
10401If the bug is parser-related, you should include the smallest grammar
10402you can which demonstrates the bug. The grammar file should also be
10403complete (i.e., I should be able to run it through Bison without having
10404to edit or add anything). The smaller and simpler the grammar, the
10405easier it will be to fix the bug.
10406
10407Include information about your compilation environment, including your
10408operating system's name and version and your compiler's name and
10409version. If you have trouble compiling, you should also include a
10410transcript of the build session, starting with the invocation of
10411`configure'. Depending on the nature of the bug, you may be asked to
10412send additional files as well (such as `config.h' or `config.cache').
10413
10414Patches are most welcome, but not required. That is, do not hesitate to
d6864e19 10415send a bug report just because you cannot provide a fix.
55ba27be
AD
10416
10417Send bug reports to @email{bug-bison@@gnu.org}.
10418
8405b70c
PB
10419@node More Languages
10420@section More Languages
55ba27be 10421
ab8932bf 10422@quotation
8405b70c 10423Will Bison ever have C++ and Java support? How about @var{insert your
55ba27be 10424favorite language here}?
ab8932bf 10425@end quotation
55ba27be 10426
8405b70c 10427C++ and Java support is there now, and is documented. We'd love to add other
55ba27be
AD
10428languages; contributions are welcome.
10429
10430@node Beta Testing
10431@section Beta Testing
10432
ab8932bf 10433@quotation
55ba27be 10434What is involved in being a beta tester?
ab8932bf 10435@end quotation
55ba27be
AD
10436
10437It's not terribly involved. Basically, you would download a test
10438release, compile it, and use it to build and run a parser or two. After
10439that, you would submit either a bug report or a message saying that
10440everything is okay. It is important to report successes as well as
10441failures because test releases eventually become mainstream releases,
10442but only if they are adequately tested. If no one tests, development is
10443essentially halted.
10444
10445Beta testers are particularly needed for operating systems to which the
10446developers do not have easy access. They currently have easy access to
10447recent GNU/Linux and Solaris versions. Reports about other operating
10448systems are especially welcome.
10449
10450@node Mailing Lists
10451@section Mailing Lists
10452
ab8932bf 10453@quotation
55ba27be 10454How do I join the help-bison and bug-bison mailing lists?
ab8932bf 10455@end quotation
55ba27be
AD
10456
10457See @url{http://lists.gnu.org/}.
a06ea4aa 10458
d1a1114f
AD
10459@c ================================================= Table of Symbols
10460
342b8b6e 10461@node Table of Symbols
bfa74976
RS
10462@appendix Bison Symbols
10463@cindex Bison symbols, table of
10464@cindex symbols in Bison, table of
10465
18b519c0 10466@deffn {Variable} @@$
3ded9a63 10467In an action, the location of the left-hand side of the rule.
7404cdf3 10468@xref{Tracking Locations}.
18b519c0 10469@end deffn
3ded9a63 10470
18b519c0 10471@deffn {Variable} @@@var{n}
7404cdf3
JD
10472In an action, the location of the @var{n}-th symbol of the right-hand side
10473of the rule. @xref{Tracking Locations}.
18b519c0 10474@end deffn
3ded9a63 10475
1f68dca5 10476@deffn {Variable} @@@var{name}
7404cdf3
JD
10477In an action, the location of a symbol addressed by name. @xref{Tracking
10478Locations}.
1f68dca5
AR
10479@end deffn
10480
10481@deffn {Variable} @@[@var{name}]
7404cdf3
JD
10482In an action, the location of a symbol addressed by name. @xref{Tracking
10483Locations}.
1f68dca5
AR
10484@end deffn
10485
18b519c0 10486@deffn {Variable} $$
3ded9a63
AD
10487In an action, the semantic value of the left-hand side of the rule.
10488@xref{Actions}.
18b519c0 10489@end deffn
3ded9a63 10490
18b519c0 10491@deffn {Variable} $@var{n}
3ded9a63
AD
10492In an action, the semantic value of the @var{n}-th symbol of the
10493right-hand side of the rule. @xref{Actions}.
18b519c0 10494@end deffn
3ded9a63 10495
1f68dca5
AR
10496@deffn {Variable} $@var{name}
10497In an action, the semantic value of a symbol addressed by name.
10498@xref{Actions}.
10499@end deffn
10500
10501@deffn {Variable} $[@var{name}]
10502In an action, the semantic value of a symbol addressed by name.
10503@xref{Actions}.
10504@end deffn
10505
dd8d9022
AD
10506@deffn {Delimiter} %%
10507Delimiter used to separate the grammar rule section from the
10508Bison declarations section or the epilogue.
10509@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
18b519c0 10510@end deffn
bfa74976 10511
dd8d9022
AD
10512@c Don't insert spaces, or check the DVI output.
10513@deffn {Delimiter} %@{@var{code}%@}
9913d6e4
JD
10514All code listed between @samp{%@{} and @samp{%@}} is copied verbatim
10515to the parser implementation file. Such code forms the prologue of
10516the grammar file. @xref{Grammar Outline, ,Outline of a Bison
dd8d9022 10517Grammar}.
18b519c0 10518@end deffn
bfa74976 10519
dd8d9022
AD
10520@deffn {Construct} /*@dots{}*/
10521Comment delimiters, as in C.
18b519c0 10522@end deffn
bfa74976 10523
dd8d9022
AD
10524@deffn {Delimiter} :
10525Separates a rule's result from its components. @xref{Rules, ,Syntax of
10526Grammar Rules}.
18b519c0 10527@end deffn
bfa74976 10528
dd8d9022
AD
10529@deffn {Delimiter} ;
10530Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}.
18b519c0 10531@end deffn
bfa74976 10532
dd8d9022
AD
10533@deffn {Delimiter} |
10534Separates alternate rules for the same result nonterminal.
10535@xref{Rules, ,Syntax of Grammar Rules}.
18b519c0 10536@end deffn
bfa74976 10537
12e35840
JD
10538@deffn {Directive} <*>
10539Used to define a default tagged @code{%destructor} or default tagged
10540@code{%printer}.
85894313
JD
10541
10542This feature is experimental.
10543More user feedback will help to determine whether it should become a permanent
10544feature.
10545
12e35840
JD
10546@xref{Destructor Decl, , Freeing Discarded Symbols}.
10547@end deffn
10548
3ebecc24 10549@deffn {Directive} <>
12e35840
JD
10550Used to define a default tagless @code{%destructor} or default tagless
10551@code{%printer}.
85894313
JD
10552
10553This feature is experimental.
10554More user feedback will help to determine whether it should become a permanent
10555feature.
10556
12e35840
JD
10557@xref{Destructor Decl, , Freeing Discarded Symbols}.
10558@end deffn
10559
dd8d9022
AD
10560@deffn {Symbol} $accept
10561The predefined nonterminal whose only rule is @samp{$accept: @var{start}
10562$end}, where @var{start} is the start symbol. @xref{Start Decl, , The
10563Start-Symbol}. It cannot be used in the grammar.
18b519c0 10564@end deffn
bfa74976 10565
136a0f76 10566@deffn {Directive} %code @{@var{code}@}
148d66d8 10567@deffnx {Directive} %code @var{qualifier} @{@var{code}@}
406dec82
JD
10568Insert @var{code} verbatim into the output parser source at the
10569default location or at the location specified by @var{qualifier}.
8e6f2266 10570@xref{%code Summary}.
9bc0dd67 10571@end deffn
9bc0dd67 10572
18b519c0 10573@deffn {Directive} %debug
6deb4447 10574Equip the parser for debugging. @xref{Decl Summary}.
18b519c0 10575@end deffn
6deb4447 10576
91d2c560 10577@ifset defaultprec
22fccf95
PE
10578@deffn {Directive} %default-prec
10579Assign a precedence to rules that lack an explicit @samp{%prec}
10580modifier. @xref{Contextual Precedence, ,Context-Dependent
10581Precedence}.
39a06c25 10582@end deffn
91d2c560 10583@end ifset
39a06c25 10584
6f04ee6c
JD
10585@deffn {Directive} %define @var{variable}
10586@deffnx {Directive} %define @var{variable} @var{value}
10587@deffnx {Directive} %define @var{variable} "@var{value}"
2f4518a1 10588Define a variable to adjust Bison's behavior. @xref{%define Summary}.
148d66d8
JD
10589@end deffn
10590
18b519c0 10591@deffn {Directive} %defines
9913d6e4
JD
10592Bison declaration to create a parser header file, which is usually
10593meant for the scanner. @xref{Decl Summary}.
18b519c0 10594@end deffn
6deb4447 10595
02975b9a
JD
10596@deffn {Directive} %defines @var{defines-file}
10597Same as above, but save in the file @var{defines-file}.
10598@xref{Decl Summary}.
10599@end deffn
10600
18b519c0 10601@deffn {Directive} %destructor
258b75ca 10602Specify how the parser should reclaim the memory associated to
fa7e68c3 10603discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
18b519c0 10604@end deffn
72f889cc 10605
18b519c0 10606@deffn {Directive} %dprec
676385e2 10607Bison declaration to assign a precedence to a rule that is used at parse
c827f760 10608time to resolve reduce/reduce conflicts. @xref{GLR Parsers, ,Writing
35430378 10609GLR Parsers}.
18b519c0 10610@end deffn
676385e2 10611
dd8d9022
AD
10612@deffn {Symbol} $end
10613The predefined token marking the end of the token stream. It cannot be
10614used in the grammar.
10615@end deffn
10616
10617@deffn {Symbol} error
10618A token name reserved for error recovery. This token may be used in
10619grammar rules so as to allow the Bison parser to recognize an error in
10620the grammar without halting the process. In effect, a sentence
10621containing an error may be recognized as valid. On a syntax error, the
742e4900
JD
10622token @code{error} becomes the current lookahead token. Actions
10623corresponding to @code{error} are then executed, and the lookahead
dd8d9022
AD
10624token is reset to the token that originally caused the violation.
10625@xref{Error Recovery}.
18d192f0
AD
10626@end deffn
10627
18b519c0 10628@deffn {Directive} %error-verbose
2a8d363a 10629Bison declaration to request verbose, specific error message strings
6f04ee6c 10630when @code{yyerror} is called. @xref{Error Reporting}.
18b519c0 10631@end deffn
2a8d363a 10632
02975b9a 10633@deffn {Directive} %file-prefix "@var{prefix}"
72d2299c 10634Bison declaration to set the prefix of the output files. @xref{Decl
d8988b2f 10635Summary}.
18b519c0 10636@end deffn
d8988b2f 10637
18b519c0 10638@deffn {Directive} %glr-parser
35430378
JD
10639Bison declaration to produce a GLR parser. @xref{GLR
10640Parsers, ,Writing GLR Parsers}.
18b519c0 10641@end deffn
676385e2 10642
dd8d9022
AD
10643@deffn {Directive} %initial-action
10644Run user code before parsing. @xref{Initial Action Decl, , Performing Actions before Parsing}.
10645@end deffn
10646
e6e704dc
JD
10647@deffn {Directive} %language
10648Specify the programming language for the generated parser.
10649@xref{Decl Summary}.
10650@end deffn
10651
18b519c0 10652@deffn {Directive} %left
bfa74976
RS
10653Bison declaration to assign left associativity to token(s).
10654@xref{Precedence Decl, ,Operator Precedence}.
18b519c0 10655@end deffn
bfa74976 10656
feeb0eda 10657@deffn {Directive} %lex-param @{@var{argument-declaration}@}
2a8d363a
AD
10658Bison declaration to specifying an additional parameter that
10659@code{yylex} should accept. @xref{Pure Calling,, Calling Conventions
10660for Pure Parsers}.
18b519c0 10661@end deffn
2a8d363a 10662
18b519c0 10663@deffn {Directive} %merge
676385e2 10664Bison declaration to assign a merging function to a rule. If there is a
fae437e8 10665reduce/reduce conflict with a rule having the same merging function, the
676385e2 10666function is applied to the two semantic values to get a single result.
35430378 10667@xref{GLR Parsers, ,Writing GLR Parsers}.
18b519c0 10668@end deffn
676385e2 10669
02975b9a 10670@deffn {Directive} %name-prefix "@var{prefix}"
72d2299c 10671Bison declaration to rename the external symbols. @xref{Decl Summary}.
18b519c0 10672@end deffn
d8988b2f 10673
91d2c560 10674@ifset defaultprec
22fccf95
PE
10675@deffn {Directive} %no-default-prec
10676Do not assign a precedence to rules that lack an explicit @samp{%prec}
10677modifier. @xref{Contextual Precedence, ,Context-Dependent
10678Precedence}.
10679@end deffn
91d2c560 10680@end ifset
22fccf95 10681
18b519c0 10682@deffn {Directive} %no-lines
931c7513 10683Bison declaration to avoid generating @code{#line} directives in the
9913d6e4 10684parser implementation file. @xref{Decl Summary}.
18b519c0 10685@end deffn
931c7513 10686
18b519c0 10687@deffn {Directive} %nonassoc
9d9b8b70 10688Bison declaration to assign nonassociativity to token(s).
bfa74976 10689@xref{Precedence Decl, ,Operator Precedence}.
18b519c0 10690@end deffn
bfa74976 10691
02975b9a 10692@deffn {Directive} %output "@var{file}"
9913d6e4
JD
10693Bison declaration to set the name of the parser implementation file.
10694@xref{Decl Summary}.
18b519c0 10695@end deffn
d8988b2f 10696
feeb0eda 10697@deffn {Directive} %parse-param @{@var{argument-declaration}@}
2a8d363a
AD
10698Bison declaration to specifying an additional parameter that
10699@code{yyparse} should accept. @xref{Parser Function,, The Parser
10700Function @code{yyparse}}.
18b519c0 10701@end deffn
2a8d363a 10702
18b519c0 10703@deffn {Directive} %prec
bfa74976
RS
10704Bison declaration to assign a precedence to a specific rule.
10705@xref{Contextual Precedence, ,Context-Dependent Precedence}.
18b519c0 10706@end deffn
bfa74976 10707
18b519c0 10708@deffn {Directive} %pure-parser
2f4518a1
JD
10709Deprecated version of @code{%define api.pure} (@pxref{%define
10710Summary,,api.pure}), for which Bison is more careful to warn about
10711unreasonable usage.
18b519c0 10712@end deffn
bfa74976 10713
b50d2359 10714@deffn {Directive} %require "@var{version}"
9b8a5ce0
AD
10715Require version @var{version} or higher of Bison. @xref{Require Decl, ,
10716Require a Version of Bison}.
b50d2359
AD
10717@end deffn
10718
18b519c0 10719@deffn {Directive} %right
bfa74976
RS
10720Bison declaration to assign right associativity to token(s).
10721@xref{Precedence Decl, ,Operator Precedence}.
18b519c0 10722@end deffn
bfa74976 10723
e6e704dc
JD
10724@deffn {Directive} %skeleton
10725Specify the skeleton to use; usually for development.
10726@xref{Decl Summary}.
10727@end deffn
10728
18b519c0 10729@deffn {Directive} %start
704a47c4
AD
10730Bison declaration to specify the start symbol. @xref{Start Decl, ,The
10731Start-Symbol}.
18b519c0 10732@end deffn
bfa74976 10733
18b519c0 10734@deffn {Directive} %token
bfa74976
RS
10735Bison declaration to declare token(s) without specifying precedence.
10736@xref{Token Decl, ,Token Type Names}.
18b519c0 10737@end deffn
bfa74976 10738
18b519c0 10739@deffn {Directive} %token-table
9913d6e4
JD
10740Bison declaration to include a token name table in the parser
10741implementation file. @xref{Decl Summary}.
18b519c0 10742@end deffn
931c7513 10743
18b519c0 10744@deffn {Directive} %type
704a47c4
AD
10745Bison declaration to declare nonterminals. @xref{Type Decl,
10746,Nonterminal Symbols}.
18b519c0 10747@end deffn
bfa74976 10748
dd8d9022
AD
10749@deffn {Symbol} $undefined
10750The predefined token onto which all undefined values returned by
10751@code{yylex} are mapped. It cannot be used in the grammar, rather, use
10752@code{error}.
10753@end deffn
10754
18b519c0 10755@deffn {Directive} %union
bfa74976
RS
10756Bison declaration to specify several possible data types for semantic
10757values. @xref{Union Decl, ,The Collection of Value Types}.
18b519c0 10758@end deffn
bfa74976 10759
dd8d9022
AD
10760@deffn {Macro} YYABORT
10761Macro to pretend that an unrecoverable syntax error has occurred, by
10762making @code{yyparse} return 1 immediately. The error reporting
10763function @code{yyerror} is not called. @xref{Parser Function, ,The
10764Parser Function @code{yyparse}}.
8405b70c
PB
10765
10766For Java parsers, this functionality is invoked using @code{return YYABORT;}
10767instead.
dd8d9022 10768@end deffn
3ded9a63 10769
dd8d9022
AD
10770@deffn {Macro} YYACCEPT
10771Macro to pretend that a complete utterance of the language has been
10772read, by making @code{yyparse} return 0 immediately.
10773@xref{Parser Function, ,The Parser Function @code{yyparse}}.
8405b70c
PB
10774
10775For Java parsers, this functionality is invoked using @code{return YYACCEPT;}
10776instead.
dd8d9022 10777@end deffn
bfa74976 10778
dd8d9022 10779@deffn {Macro} YYBACKUP
742e4900 10780Macro to discard a value from the parser stack and fake a lookahead
dd8d9022 10781token. @xref{Action Features, ,Special Features for Use in Actions}.
18b519c0 10782@end deffn
bfa74976 10783
dd8d9022 10784@deffn {Variable} yychar
32c29292 10785External integer variable that contains the integer value of the
742e4900 10786lookahead token. (In a pure parser, it is a local variable within
dd8d9022
AD
10787@code{yyparse}.) Error-recovery rule actions may examine this variable.
10788@xref{Action Features, ,Special Features for Use in Actions}.
18b519c0 10789@end deffn
bfa74976 10790
dd8d9022
AD
10791@deffn {Variable} yyclearin
10792Macro used in error-recovery rule actions. It clears the previous
742e4900 10793lookahead token. @xref{Error Recovery}.
18b519c0 10794@end deffn
bfa74976 10795
dd8d9022
AD
10796@deffn {Macro} YYDEBUG
10797Macro to define to equip the parser with tracing code. @xref{Tracing,
10798,Tracing Your Parser}.
18b519c0 10799@end deffn
bfa74976 10800
dd8d9022
AD
10801@deffn {Variable} yydebug
10802External integer variable set to zero by default. If @code{yydebug}
10803is given a nonzero value, the parser will output information on input
10804symbols and parser action. @xref{Tracing, ,Tracing Your Parser}.
18b519c0 10805@end deffn
bfa74976 10806
dd8d9022
AD
10807@deffn {Macro} yyerrok
10808Macro to cause parser to recover immediately to its normal mode
10809after a syntax error. @xref{Error Recovery}.
10810@end deffn
10811
10812@deffn {Macro} YYERROR
10813Macro to pretend that a syntax error has just been detected: call
10814@code{yyerror} and then perform normal error recovery if possible
10815(@pxref{Error Recovery}), or (if recovery is impossible) make
10816@code{yyparse} return 1. @xref{Error Recovery}.
8405b70c
PB
10817
10818For Java parsers, this functionality is invoked using @code{return YYERROR;}
10819instead.
dd8d9022
AD
10820@end deffn
10821
10822@deffn {Function} yyerror
10823User-supplied function to be called by @code{yyparse} on error.
10824@xref{Error Reporting, ,The Error
10825Reporting Function @code{yyerror}}.
10826@end deffn
10827
10828@deffn {Macro} YYERROR_VERBOSE
10829An obsolete macro that you define with @code{#define} in the prologue
10830to request verbose, specific error message strings
10831when @code{yyerror} is called. It doesn't matter what definition you
10832use for @code{YYERROR_VERBOSE}, just whether you define it. Using
6f04ee6c 10833@code{%error-verbose} is preferred. @xref{Error Reporting}.
dd8d9022
AD
10834@end deffn
10835
10836@deffn {Macro} YYINITDEPTH
10837Macro for specifying the initial size of the parser stack.
1a059451 10838@xref{Memory Management}.
dd8d9022
AD
10839@end deffn
10840
10841@deffn {Function} yylex
10842User-supplied lexical analyzer function, called with no arguments to get
10843the next token. @xref{Lexical, ,The Lexical Analyzer Function
10844@code{yylex}}.
10845@end deffn
10846
10847@deffn {Macro} YYLEX_PARAM
10848An obsolete macro for specifying an extra argument (or list of extra
32c29292 10849arguments) for @code{yyparse} to pass to @code{yylex}. The use of this
dd8d9022
AD
10850macro is deprecated, and is supported only for Yacc like parsers.
10851@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
10852@end deffn
10853
10854@deffn {Variable} yylloc
10855External variable in which @code{yylex} should place the line and column
10856numbers associated with a token. (In a pure parser, it is a local
10857variable within @code{yyparse}, and its address is passed to
32c29292
JD
10858@code{yylex}.)
10859You can ignore this variable if you don't use the @samp{@@} feature in the
10860grammar actions.
10861@xref{Token Locations, ,Textual Locations of Tokens}.
742e4900 10862In semantic actions, it stores the location of the lookahead token.
32c29292 10863@xref{Actions and Locations, ,Actions and Locations}.
dd8d9022
AD
10864@end deffn
10865
10866@deffn {Type} YYLTYPE
10867Data type of @code{yylloc}; by default, a structure with four
10868members. @xref{Location Type, , Data Types of Locations}.
10869@end deffn
10870
10871@deffn {Variable} yylval
10872External variable in which @code{yylex} should place the semantic
10873value associated with a token. (In a pure parser, it is a local
10874variable within @code{yyparse}, and its address is passed to
32c29292
JD
10875@code{yylex}.)
10876@xref{Token Values, ,Semantic Values of Tokens}.
742e4900 10877In semantic actions, it stores the semantic value of the lookahead token.
32c29292 10878@xref{Actions, ,Actions}.
dd8d9022
AD
10879@end deffn
10880
10881@deffn {Macro} YYMAXDEPTH
1a059451
PE
10882Macro for specifying the maximum size of the parser stack. @xref{Memory
10883Management}.
dd8d9022
AD
10884@end deffn
10885
10886@deffn {Variable} yynerrs
8a2800e7 10887Global variable which Bison increments each time it reports a syntax error.
f4101aa6 10888(In a pure parser, it is a local variable within @code{yyparse}. In a
9987d1b3 10889pure push parser, it is a member of yypstate.)
dd8d9022
AD
10890@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
10891@end deffn
10892
10893@deffn {Function} yyparse
10894The parser function produced by Bison; call this function to start
10895parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
10896@end deffn
10897
9987d1b3 10898@deffn {Function} yypstate_delete
f4101aa6 10899The function to delete a parser instance, produced by Bison in push mode;
9987d1b3 10900call this function to delete the memory associated with a parser.
f4101aa6 10901@xref{Parser Delete Function, ,The Parser Delete Function
9987d1b3 10902@code{yypstate_delete}}.
59da312b
JD
10903(The current push parsing interface is experimental and may evolve.
10904More user feedback will help to stabilize it.)
9987d1b3
JD
10905@end deffn
10906
10907@deffn {Function} yypstate_new
f4101aa6 10908The function to create a parser instance, produced by Bison in push mode;
9987d1b3 10909call this function to create a new parser.
f4101aa6 10910@xref{Parser Create Function, ,The Parser Create Function
9987d1b3 10911@code{yypstate_new}}.
59da312b
JD
10912(The current push parsing interface is experimental and may evolve.
10913More user feedback will help to stabilize it.)
9987d1b3
JD
10914@end deffn
10915
10916@deffn {Function} yypull_parse
f4101aa6
AD
10917The parser function produced by Bison in push mode; call this function to
10918parse the rest of the input stream.
10919@xref{Pull Parser Function, ,The Pull Parser Function
9987d1b3 10920@code{yypull_parse}}.
59da312b
JD
10921(The current push parsing interface is experimental and may evolve.
10922More user feedback will help to stabilize it.)
9987d1b3
JD
10923@end deffn
10924
10925@deffn {Function} yypush_parse
f4101aa6
AD
10926The parser function produced by Bison in push mode; call this function to
10927parse a single token. @xref{Push Parser Function, ,The Push Parser Function
9987d1b3 10928@code{yypush_parse}}.
59da312b
JD
10929(The current push parsing interface is experimental and may evolve.
10930More user feedback will help to stabilize it.)
9987d1b3
JD
10931@end deffn
10932
dd8d9022
AD
10933@deffn {Macro} YYPARSE_PARAM
10934An obsolete macro for specifying the name of a parameter that
10935@code{yyparse} should accept. The use of this macro is deprecated, and
10936is supported only for Yacc like parsers. @xref{Pure Calling,, Calling
10937Conventions for Pure Parsers}.
10938@end deffn
10939
10940@deffn {Macro} YYRECOVERING
02103984
PE
10941The expression @code{YYRECOVERING ()} yields 1 when the parser
10942is recovering from a syntax error, and 0 otherwise.
10943@xref{Action Features, ,Special Features for Use in Actions}.
dd8d9022
AD
10944@end deffn
10945
10946@deffn {Macro} YYSTACK_USE_ALLOCA
34a6c2d1
JD
10947Macro used to control the use of @code{alloca} when the
10948deterministic parser in C needs to extend its stacks. If defined to 0,
d7e14fc0
PE
10949the parser will use @code{malloc} to extend its stacks. If defined to
109501, the parser will use @code{alloca}. Values other than 0 and 1 are
10951reserved for future Bison extensions. If not defined,
10952@code{YYSTACK_USE_ALLOCA} defaults to 0.
10953
55289366 10954In the all-too-common case where your code may run on a host with a
d7e14fc0
PE
10955limited stack and with unreliable stack-overflow checking, you should
10956set @code{YYMAXDEPTH} to a value that cannot possibly result in
10957unchecked stack overflow on any of your target hosts when
10958@code{alloca} is called. You can inspect the code that Bison
10959generates in order to determine the proper numeric values. This will
10960require some expertise in low-level implementation details.
dd8d9022
AD
10961@end deffn
10962
10963@deffn {Type} YYSTYPE
10964Data type of semantic values; @code{int} by default.
10965@xref{Value Type, ,Data Types of Semantic Values}.
18b519c0 10966@end deffn
bfa74976 10967
342b8b6e 10968@node Glossary
bfa74976
RS
10969@appendix Glossary
10970@cindex glossary
10971
10972@table @asis
6f04ee6c 10973@item Accepting state
34a6c2d1
JD
10974A state whose only action is the accept action.
10975The accepting state is thus a consistent state.
10976@xref{Understanding,,}.
10977
35430378 10978@item Backus-Naur Form (BNF; also called ``Backus Normal Form'')
c827f760
PE
10979Formal method of specifying context-free grammars originally proposed
10980by John Backus, and slightly improved by Peter Naur in his 1960-01-02
10981committee document contributing to what became the Algol 60 report.
10982@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
bfa74976 10983
6f04ee6c
JD
10984@item Consistent state
10985A state containing only one possible action. @xref{Default Reductions}.
34a6c2d1 10986
bfa74976
RS
10987@item Context-free grammars
10988Grammars specified as rules that can be applied regardless of context.
10989Thus, if there is a rule which says that an integer can be used as an
10990expression, integers are allowed @emph{anywhere} an expression is
89cab50d
AD
10991permitted. @xref{Language and Grammar, ,Languages and Context-Free
10992Grammars}.
bfa74976 10993
6f04ee6c 10994@item Default reduction
620b5727 10995The reduction that a parser should perform if the current parser state
2f4518a1 10996contains no other action for the lookahead token. In permitted parser
6f04ee6c
JD
10997states, Bison declares the reduction with the largest lookahead set to be
10998the default reduction and removes that lookahead set. @xref{Default
10999Reductions}.
11000
11001@item Defaulted state
11002A consistent state with a default reduction. @xref{Default Reductions}.
34a6c2d1 11003
bfa74976
RS
11004@item Dynamic allocation
11005Allocation of memory that occurs during execution, rather than at
11006compile time or on entry to a function.
11007
11008@item Empty string
11009Analogous to the empty set in set theory, the empty string is a
11010character string of length zero.
11011
11012@item Finite-state stack machine
11013A ``machine'' that has discrete states in which it is said to exist at
11014each instant in time. As input to the machine is processed, the
11015machine moves from state to state as specified by the logic of the
11016machine. In the case of the parser, the input is the language being
11017parsed, and the states correspond to various stages in the grammar
c827f760 11018rules. @xref{Algorithm, ,The Bison Parser Algorithm}.
bfa74976 11019
35430378 11020@item Generalized LR (GLR)
676385e2 11021A parsing algorithm that can handle all context-free grammars, including those
35430378 11022that are not LR(1). It resolves situations that Bison's
34a6c2d1 11023deterministic parsing
676385e2
PH
11024algorithm cannot by effectively splitting off multiple parsers, trying all
11025possible parsers, and discarding those that fail in the light of additional
c827f760 11026right context. @xref{Generalized LR Parsing, ,Generalized
35430378 11027LR Parsing}.
676385e2 11028
bfa74976
RS
11029@item Grouping
11030A language construct that is (in general) grammatically divisible;
c827f760 11031for example, `expression' or `declaration' in C@.
bfa74976
RS
11032@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
11033
6f04ee6c
JD
11034@item IELR(1) (Inadequacy Elimination LR(1))
11035A minimal LR(1) parser table construction algorithm. That is, given any
2f4518a1 11036context-free grammar, IELR(1) generates parser tables with the full
6f04ee6c
JD
11037language-recognition power of canonical LR(1) but with nearly the same
11038number of parser states as LALR(1). This reduction in parser states is
11039often an order of magnitude. More importantly, because canonical LR(1)'s
11040extra parser states may contain duplicate conflicts in the case of non-LR(1)
11041grammars, the number of conflicts for IELR(1) is often an order of magnitude
11042less as well. This can significantly reduce the complexity of developing a
11043grammar. @xref{LR Table Construction}.
34a6c2d1 11044
bfa74976
RS
11045@item Infix operator
11046An arithmetic operator that is placed between the operands on which it
11047performs some operation.
11048
11049@item Input stream
11050A continuous flow of data between devices or programs.
11051
35430378 11052@item LAC (Lookahead Correction)
4c38b19e 11053A parsing mechanism that fixes the problem of delayed syntax error
6f04ee6c
JD
11054detection, which is caused by LR state merging, default reductions, and the
11055use of @code{%nonassoc}. Delayed syntax error detection results in
11056unexpected semantic actions, initiation of error recovery in the wrong
11057syntactic context, and an incorrect list of expected tokens in a verbose
11058syntax error message. @xref{LAC}.
4c38b19e 11059
bfa74976
RS
11060@item Language construct
11061One of the typical usage schemas of the language. For example, one of
11062the constructs of the C language is the @code{if} statement.
11063@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
11064
11065@item Left associativity
11066Operators having left associativity are analyzed from left to right:
11067@samp{a+b+c} first computes @samp{a+b} and then combines with
11068@samp{c}. @xref{Precedence, ,Operator Precedence}.
11069
11070@item Left recursion
89cab50d
AD
11071A rule whose result symbol is also its first component symbol; for
11072example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive
11073Rules}.
bfa74976
RS
11074
11075@item Left-to-right parsing
11076Parsing a sentence of a language by analyzing it token by token from
c827f760 11077left to right. @xref{Algorithm, ,The Bison Parser Algorithm}.
bfa74976
RS
11078
11079@item Lexical analyzer (scanner)
11080A function that reads an input stream and returns tokens one by one.
11081@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
11082
11083@item Lexical tie-in
11084A flag, set by actions in the grammar rules, which alters the way
11085tokens are parsed. @xref{Lexical Tie-ins}.
11086
931c7513 11087@item Literal string token
14ded682 11088A token which consists of two or more fixed characters. @xref{Symbols}.
931c7513 11089
742e4900
JD
11090@item Lookahead token
11091A token already read but not yet shifted. @xref{Lookahead, ,Lookahead
89cab50d 11092Tokens}.
bfa74976 11093
35430378 11094@item LALR(1)
bfa74976 11095The class of context-free grammars that Bison (like most other parser
35430378 11096generators) can handle by default; a subset of LR(1).
5da0355a 11097@xref{Mysterious Conflicts}.
bfa74976 11098
35430378 11099@item LR(1)
bfa74976 11100The class of context-free grammars in which at most one token of
742e4900 11101lookahead is needed to disambiguate the parsing of any piece of input.
bfa74976
RS
11102
11103@item Nonterminal symbol
11104A grammar symbol standing for a grammatical construct that can
11105be expressed through rules in terms of smaller constructs; in other
11106words, a construct that is not a token. @xref{Symbols}.
11107
bfa74976
RS
11108@item Parser
11109A function that recognizes valid sentences of a language by analyzing
11110the syntax structure of a set of tokens passed to it from a lexical
11111analyzer.
11112
11113@item Postfix operator
11114An arithmetic operator that is placed after the operands upon which it
11115performs some operation.
11116
11117@item Reduction
11118Replacing a string of nonterminals and/or terminals with a single
89cab50d 11119nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison
c827f760 11120Parser Algorithm}.
bfa74976
RS
11121
11122@item Reentrant
11123A reentrant subprogram is a subprogram which can be in invoked any
11124number of times in parallel, without interference between the various
11125invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
11126
11127@item Reverse polish notation
11128A language in which all operators are postfix operators.
11129
11130@item Right recursion
89cab50d
AD
11131A rule whose result symbol is also its last component symbol; for
11132example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive
11133Rules}.
bfa74976
RS
11134
11135@item Semantics
11136In computer languages, the semantics are specified by the actions
11137taken for each instance of the language, i.e., the meaning of
11138each statement. @xref{Semantics, ,Defining Language Semantics}.
11139
11140@item Shift
11141A parser is said to shift when it makes the choice of analyzing
11142further input from the stream rather than reducing immediately some
c827f760 11143already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm}.
bfa74976
RS
11144
11145@item Single-character literal
11146A single character that is recognized and interpreted as is.
11147@xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
11148
11149@item Start symbol
11150The nonterminal symbol that stands for a complete valid utterance in
11151the language being parsed. The start symbol is usually listed as the
13863333 11152first nonterminal symbol in a language specification.
bfa74976
RS
11153@xref{Start Decl, ,The Start-Symbol}.
11154
11155@item Symbol table
11156A data structure where symbol names and associated data are stored
11157during parsing to allow for recognition and use of existing
11158information in repeated uses of a symbol. @xref{Multi-function Calc}.
11159
6e649e65
PE
11160@item Syntax error
11161An error encountered during parsing of an input stream due to invalid
11162syntax. @xref{Error Recovery}.
11163
bfa74976
RS
11164@item Token
11165A basic, grammatically indivisible unit of a language. The symbol
11166that describes a token in the grammar is a terminal symbol.
11167The input of the Bison parser is a stream of tokens which comes from
11168the lexical analyzer. @xref{Symbols}.
11169
11170@item Terminal symbol
89cab50d
AD
11171A grammar symbol that has no rules in the grammar and therefore is
11172grammatically indivisible. The piece of text it represents is a token.
11173@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
6f04ee6c
JD
11174
11175@item Unreachable state
11176A parser state to which there does not exist a sequence of transitions from
11177the parser's start state. A state can become unreachable during conflict
11178resolution. @xref{Unreachable States}.
bfa74976
RS
11179@end table
11180
342b8b6e 11181@node Copying This Manual
f2b5126e 11182@appendix Copying This Manual
f2b5126e
PB
11183@include fdl.texi
11184
71caec06
JD
11185@node Bibliography
11186@unnumbered Bibliography
11187
11188@table @asis
11189@item [Denny 2008]
11190Joel E. Denny and Brian A. Malloy, IELR(1): Practical LR(1) Parser Tables
11191for Non-LR(1) Grammars with Conflict Resolution, in @cite{Proceedings of the
111922008 ACM Symposium on Applied Computing} (SAC'08), ACM, New York, NY, USA,
11193pp.@: 240--245. @uref{http://dx.doi.org/10.1145/1363686.1363747}
11194
11195@item [Denny 2010 May]
11196Joel E. Denny, PSLR(1): Pseudo-Scannerless Minimal LR(1) for the
11197Deterministic Parsing of Composite Languages, Ph.D. Dissertation, Clemson
11198University, Clemson, SC, USA (May 2010).
11199@uref{http://proquest.umi.com/pqdlink?did=2041473591&Fmt=7&clientId=79356&RQT=309&VName=PQD}
11200
11201@item [Denny 2010 November]
11202Joel E. Denny and Brian A. Malloy, The IELR(1) Algorithm for Generating
11203Minimal LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution,
11204in @cite{Science of Computer Programming}, Vol.@: 75, Issue 11 (November
112052010), pp.@: 943--979. @uref{http://dx.doi.org/10.1016/j.scico.2009.08.001}
11206
11207@item [DeRemer 1982]
11208Frank DeRemer and Thomas Pennello, Efficient Computation of LALR(1)
11209Look-Ahead Sets, in @cite{ACM Transactions on Programming Languages and
11210Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@:
11211615--649. @uref{http://dx.doi.org/10.1145/69622.357187}
11212
11213@item [Knuth 1965]
11214Donald E. Knuth, On the Translation of Languages from Left to Right, in
11215@cite{Information and Control}, Vol.@: 8, Issue 6 (December 1965), pp.@:
11216607--639. @uref{http://dx.doi.org/10.1016/S0019-9958(65)90426-2}
11217
11218@item [Scott 2000]
11219Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf Hussain,
11220@cite{Tomita-Style Generalised LR Parsers}, Royal Holloway, University of
11221London, Department of Computer Science, TR-00-12 (December 2000).
11222@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps}
11223@end table
11224
342b8b6e 11225@node Index
bfa74976
RS
11226@unnumbered Index
11227
11228@printindex cp
11229
bfa74976 11230@bye
a06ea4aa 11231
232be91a
AD
11232@c LocalWords: texinfo setfilename settitle setchapternewpage finalout texi FSF
11233@c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex FSF's
11234@c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry Naur
11235@c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa Multi
11236@c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc multi
11237@c LocalWords: rpcalc Lexer Expr ltcalc mfcalc yylex defaultprec Donnelly Gotos
11238@c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref yypush
11239@c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex lr
11240@c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge POSIX
11241@c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG yypull
11242@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit nonfree
11243@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok rr
11244@c LocalWords: longjmp fprintf stderr yylloc YYLTYPE cos ln Stallman Destructor
11245@c LocalWords: smallexample symrec val tptr FNCT fnctptr func struct sym enum
11246@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof Lex
11247@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum DOTDOT
11248@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype Unary
11249@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless yynerrs nonterminal
11250@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES reentrant
11251@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param yypstate
11252@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP subrange
11253@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword loc
11254@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH inline
11255@c LocalWords: YYINITDEPTH stmnts ref stmnt initdcl maybeasm notype Lookahead
11256@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args Autoconf
11257@c LocalWords: infile ypp yxx outfile itemx tex leaderfill Troubleshouting sqrt
11258@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
11259@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
11260@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
4c38b19e 11261@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC
232be91a 11262@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs
9913d6e4 11263@c LocalWords: yytokentype destructor multicharacter nonnull EBCDIC
232be91a
AD
11264@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
11265@c LocalWords: destructors Reentrancy nonreentrant subgrammar nonassociative
11266@c LocalWords: deffnx namespace xml goto lalr ielr runtime lex yacc yyps env
11267@c LocalWords: yystate variadic Unshift NLS gettext po UTF Automake LOCALEDIR
11268@c LocalWords: YYENABLE bindtextdomain Makefile DEFS CPPFLAGS DBISON DeRemer
11269@c LocalWords: autoreconf Pennello multisets nondeterminism Generalised baz
11270@c LocalWords: redeclare automata Dparse localedir datadir XSLT midrule Wno
9913d6e4 11271@c LocalWords: Graphviz multitable headitem hh basename Doxygen fno
232be91a
AD
11272@c LocalWords: doxygen ival sval deftypemethod deallocate pos deftypemethodx
11273@c LocalWords: Ctor defcv defcvx arg accessors arithmetics CPP ifndef CALCXX
11274@c LocalWords: lexer's calcxx bool LPAREN RPAREN deallocation cerrno climits
11275@c LocalWords: cstdlib Debian undef yywrap unput noyywrap nounput zA yyleng
11276@c LocalWords: errno strtol ERANGE str strerror iostream argc argv Javadoc
11277@c LocalWords: bytecode initializers superclass stype ASTNode autoboxing nls
11278@c LocalWords: toString deftypeivar deftypeivarx deftypeop YYParser strictfp
11279@c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv
11280@c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url
11281@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos
840341d6 11282@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett
232be91a 11283@c LocalWords: subdirectory Solaris nonassociativity
f3103c5b
AD
11284
11285@c Local Variables:
11286@c ispell-dictionary: "american"
11287@c fill-column: 76
11288@c End: