]> git.saurik.com Git - bison.git/blame - doc/bison.texi
doc: one of the fixes for an ambiguous grammar was ambiguous too
[bison.git] / doc / bison.texi
CommitLineData
bfa74976
RS
1\input texinfo @c -*-texinfo-*-
2@comment %**start of header
3@setfilename bison.info
df1af54c
JT
4@include version.texi
5@settitle Bison @value{VERSION}
bfa74976
RS
6@setchapternewpage odd
7
5378c3e7 8@finalout
5378c3e7 9
13863333 10@c SMALL BOOK version
bfa74976 11@c This edition has been formatted so that you can format and print it in
13863333 12@c the smallbook format.
bfa74976
RS
13@c @smallbook
14
91d2c560
PE
15@c Set following if you want to document %default-prec and %no-default-prec.
16@c This feature is experimental and may change in future Bison versions.
17@c @set defaultprec
18
8c5b881d 19@ifnotinfo
bfa74976
RS
20@syncodeindex fn cp
21@syncodeindex vr cp
22@syncodeindex tp cp
8c5b881d 23@end ifnotinfo
bfa74976
RS
24@ifinfo
25@synindex fn cp
26@synindex vr cp
27@synindex tp cp
28@end ifinfo
29@comment %**end of header
30
fae437e8 31@copying
bd773d73 32
35430378
JD
33This manual (@value{UPDATED}) is for GNU Bison (version
34@value{VERSION}), the GNU parser generator.
fae437e8 35
c932d613 36Copyright @copyright{} 1988-1993, 1995, 1998-2012 Free Software
ea0a7676 37Foundation, Inc.
fae437e8
AD
38
39@quotation
40Permission is granted to copy, distribute and/or modify this document
35430378 41under the terms of the GNU Free Documentation License,
241ac701 42Version 1.3 or any later version published by the Free Software
c827f760 43Foundation; with no Invariant Sections, with the Front-Cover texts
35430378 44being ``A GNU Manual,'' and with the Back-Cover Texts as in
c827f760 45(a) below. A copy of the license is included in the section entitled
35430378 46``GNU Free Documentation License.''
c827f760 47
389c8cfd 48(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
35430378
JD
49modify this GNU manual. Buying copies from the FSF
50supports it in developing GNU and promoting software
389c8cfd 51freedom.''
fae437e8
AD
52@end quotation
53@end copying
54
e62f1a89 55@dircategory Software development
fae437e8 56@direntry
35430378 57* bison: (bison). GNU parser generator (Yacc replacement).
fae437e8 58@end direntry
bfa74976 59
bfa74976
RS
60@titlepage
61@title Bison
c827f760 62@subtitle The Yacc-compatible Parser Generator
df1af54c 63@subtitle @value{UPDATED}, Bison Version @value{VERSION}
bfa74976
RS
64
65@author by Charles Donnelly and Richard Stallman
66
67@page
68@vskip 0pt plus 1filll
fae437e8 69@insertcopying
bfa74976
RS
70@sp 2
71Published by the Free Software Foundation @*
0fb669f9
PE
7251 Franklin Street, Fifth Floor @*
73Boston, MA 02110-1301 USA @*
9ecbd125 74Printed copies are available from the Free Software Foundation.@*
35430378 75ISBN 1-882114-44-2
bfa74976
RS
76@sp 2
77Cover art by Etienne Suvasa.
78@end titlepage
d5796688
JT
79
80@contents
bfa74976 81
342b8b6e
AD
82@ifnottex
83@node Top
84@top Bison
fae437e8 85@insertcopying
342b8b6e 86@end ifnottex
bfa74976
RS
87
88@menu
13863333
AD
89* Introduction::
90* Conditions::
35430378 91* Copying:: The GNU General Public License says
f56274a8 92 how you can copy and share Bison.
bfa74976
RS
93
94Tutorial sections:
f56274a8
DJ
95* Concepts:: Basic concepts for understanding Bison.
96* Examples:: Three simple explained examples of using Bison.
bfa74976
RS
97
98Reference sections:
f56274a8
DJ
99* Grammar File:: Writing Bison declarations and rules.
100* Interface:: C-language interface to the parser function @code{yyparse}.
101* Algorithm:: How the Bison parser works at run-time.
102* Error Recovery:: Writing rules for error recovery.
bfa74976 103* Context Dependency:: What to do if your language syntax is too
f56274a8
DJ
104 messy for Bison to handle straightforwardly.
105* Debugging:: Understanding or debugging Bison parsers.
9913d6e4 106* Invocation:: How to run Bison (to produce the parser implementation).
f56274a8
DJ
107* Other Languages:: Creating C++ and Java parsers.
108* FAQ:: Frequently Asked Questions
109* Table of Symbols:: All the keywords of the Bison language are explained.
110* Glossary:: Basic concepts are explained.
111* Copying This Manual:: License for copying this manual.
71caec06 112* Bibliography:: Publications cited in this manual.
f9b86351 113* Index of Terms:: Cross-references to the text.
bfa74976 114
93dd49ab
PE
115@detailmenu
116 --- The Detailed Node Listing ---
bfa74976
RS
117
118The Concepts of Bison
119
f56274a8
DJ
120* Language and Grammar:: Languages and context-free grammars,
121 as mathematical ideas.
122* Grammar in Bison:: How we represent grammars for Bison's sake.
123* Semantic Values:: Each token or syntactic grouping can have
124 a semantic value (the value of an integer,
125 the name of an identifier, etc.).
126* Semantic Actions:: Each rule can have an action containing C code.
127* GLR Parsers:: Writing parsers for general context-free languages.
83484365 128* Locations:: Overview of location tracking.
f56274a8
DJ
129* Bison Parser:: What are Bison's input and output,
130 how is the output used?
131* Stages:: Stages in writing and running Bison grammars.
132* Grammar Layout:: Overall structure of a Bison grammar file.
bfa74976 133
35430378 134Writing GLR Parsers
fa7e68c3 135
35430378
JD
136* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
137* Merging GLR Parses:: Using GLR parsers to resolve ambiguities.
f56274a8 138* GLR Semantic Actions:: Deferred semantic actions have special concerns.
35430378 139* Compiler Requirements:: GLR parsers require a modern C compiler.
fa7e68c3 140
bfa74976
RS
141Examples
142
f56274a8
DJ
143* RPN Calc:: Reverse polish notation calculator;
144 a first example with no operator precedence.
145* Infix Calc:: Infix (algebraic) notation calculator.
146 Operator precedence is introduced.
bfa74976 147* Simple Error Recovery:: Continuing after syntax errors.
342b8b6e 148* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
f56274a8
DJ
149* Multi-function Calc:: Calculator with memory and trig functions.
150 It uses multiple data-types for semantic values.
151* Exercises:: Ideas for improving the multi-function calculator.
bfa74976
RS
152
153Reverse Polish Notation Calculator
154
f56274a8
DJ
155* Rpcalc Declarations:: Prologue (declarations) for rpcalc.
156* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.
157* Rpcalc Lexer:: The lexical analyzer.
158* Rpcalc Main:: The controlling function.
159* Rpcalc Error:: The error reporting function.
160* Rpcalc Generate:: Running Bison on the grammar file.
161* Rpcalc Compile:: Run the C compiler on the output code.
bfa74976
RS
162
163Grammar Rules for @code{rpcalc}
164
13863333
AD
165* Rpcalc Input::
166* Rpcalc Line::
167* Rpcalc Expr::
bfa74976 168
342b8b6e
AD
169Location Tracking Calculator: @code{ltcalc}
170
f56274a8
DJ
171* Ltcalc Declarations:: Bison and C declarations for ltcalc.
172* Ltcalc Rules:: Grammar rules for ltcalc, with explanations.
173* Ltcalc Lexer:: The lexical analyzer.
342b8b6e 174
bfa74976
RS
175Multi-Function Calculator: @code{mfcalc}
176
f56274a8
DJ
177* Mfcalc Declarations:: Bison declarations for multi-function calculator.
178* Mfcalc Rules:: Grammar rules for the calculator.
179* Mfcalc Symbol Table:: Symbol table management subroutines.
bfa74976
RS
180
181Bison Grammar Files
182
7404cdf3
JD
183* Grammar Outline:: Overall layout of the grammar file.
184* Symbols:: Terminal and nonterminal symbols.
185* Rules:: How to write grammar rules.
186* Recursion:: Writing recursive rules.
187* Semantics:: Semantic values and actions.
188* Tracking Locations:: Locations and actions.
189* Named References:: Using named references in actions.
190* Declarations:: All kinds of Bison declarations are described here.
191* Multiple Parsers:: Putting more than one Bison parser in one program.
bfa74976
RS
192
193Outline of a Bison Grammar
194
f56274a8 195* Prologue:: Syntax and usage of the prologue.
2cbe6b7f 196* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
f56274a8
DJ
197* Bison Declarations:: Syntax and usage of the Bison declarations section.
198* Grammar Rules:: Syntax and usage of the grammar rules section.
199* Epilogue:: Syntax and usage of the epilogue.
bfa74976
RS
200
201Defining Language Semantics
202
203* Value Type:: Specifying one data type for all semantic values.
204* Multiple Types:: Specifying several alternative data types.
205* Actions:: An action is the semantic definition of a grammar rule.
206* Action Types:: Specifying data types for actions to operate on.
207* Mid-Rule Actions:: Most actions go at the end of a rule.
208 This says when, why and how to use the exceptional
209 action in the middle of a rule.
210
93dd49ab
PE
211Tracking Locations
212
213* Location Type:: Specifying a data type for locations.
214* Actions and Locations:: Using locations in actions.
215* Location Default Action:: Defining a general way to compute locations.
216
bfa74976
RS
217Bison Declarations
218
b50d2359 219* Require Decl:: Requiring a Bison version.
bfa74976
RS
220* Token Decl:: Declaring terminal symbols.
221* Precedence Decl:: Declaring terminals with precedence and associativity.
222* Union Decl:: Declaring the set of all semantic value types.
223* Type Decl:: Declaring the choice of type for a nonterminal symbol.
18d192f0 224* Initial Action Decl:: Code run before parsing starts.
72f889cc 225* Destructor Decl:: Declaring how symbols are freed.
56d60c19 226* Printer Decl:: Declaring how symbol values are displayed.
d6328241 227* Expect Decl:: Suppressing warnings about parsing conflicts.
bfa74976
RS
228* Start Decl:: Specifying the start symbol.
229* Pure Decl:: Requesting a reentrant parser.
9987d1b3 230* Push Decl:: Requesting a push parser.
bfa74976 231* Decl Summary:: Table of all Bison declarations.
2f4518a1 232* %define Summary:: Defining variables to adjust Bison's behavior.
8e6f2266 233* %code Summary:: Inserting code into the parser source.
bfa74976
RS
234
235Parser C-Language Interface
236
f56274a8
DJ
237* Parser Function:: How to call @code{yyparse} and what it returns.
238* Push Parser Function:: How to call @code{yypush_parse} and what it returns.
239* Pull Parser Function:: How to call @code{yypull_parse} and what it returns.
240* Parser Create Function:: How to call @code{yypstate_new} and what it returns.
241* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns.
242* Lexical:: You must supply a function @code{yylex}
243 which reads tokens.
244* Error Reporting:: You must supply a function @code{yyerror}.
245* Action Features:: Special features for use in actions.
246* Internationalization:: How to let the parser speak in the user's
247 native language.
bfa74976
RS
248
249The Lexical Analyzer Function @code{yylex}
250
251* Calling Convention:: How @code{yyparse} calls @code{yylex}.
f56274a8
DJ
252* Token Values:: How @code{yylex} must return the semantic value
253 of the token it has read.
254* Token Locations:: How @code{yylex} must return the text location
255 (line number, etc.) of the token, if the
256 actions want that.
257* Pure Calling:: How the calling convention differs in a pure parser
258 (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
bfa74976 259
13863333 260The Bison Parser Algorithm
bfa74976 261
742e4900 262* Lookahead:: Parser looks one token ahead when deciding what to do.
bfa74976
RS
263* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
264* Precedence:: Operator precedence works by resolving conflicts.
265* Contextual Precedence:: When an operator's precedence depends on context.
266* Parser States:: The parser is a finite-state-machine with stack.
267* Reduce/Reduce:: When two rules are applicable in the same situation.
5da0355a 268* Mysterious Conflicts:: Conflicts that look unjustified.
6f04ee6c 269* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
676385e2 270* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
1a059451 271* Memory Management:: What happens when memory is exhausted. How to avoid it.
bfa74976
RS
272
273Operator Precedence
274
275* Why Precedence:: An example showing why precedence is needed.
276* Using Precedence:: How to specify precedence in Bison grammars.
277* Precedence Examples:: How these features are used in the previous example.
278* How Precedence:: How they work.
c28cd5dc 279* Non Operators:: Using precedence for general conflicts.
bfa74976 280
6f04ee6c
JD
281Tuning LR
282
283* LR Table Construction:: Choose a different construction algorithm.
284* Default Reductions:: Disable default reductions.
285* LAC:: Correct lookahead sets in the parser states.
286* Unreachable States:: Keep unreachable parser states for debugging.
287
bfa74976
RS
288Handling Context Dependencies
289
290* Semantic Tokens:: Token parsing can depend on the semantic context.
291* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
292* Tie-in Recovery:: Lexical tie-ins have implications for how
293 error recovery rules must be written.
294
93dd49ab 295Debugging Your Parser
ec3bc396
AD
296
297* Understanding:: Understanding the structure of your parser.
298* Tracing:: Tracing the execution of your parser.
299
56d60c19
AD
300Tracing Your Parser
301
302* Enabling Traces:: Activating run-time trace support
303* Mfcalc Traces:: Extending @code{mfcalc} to support traces
304* The YYPRINT Macro:: Obsolete interface for semantic value reports
305
bfa74976
RS
306Invoking Bison
307
13863333 308* Bison Options:: All the options described in detail,
c827f760 309 in alphabetical order by short options.
bfa74976 310* Option Cross Key:: Alphabetical list of long options.
93dd49ab 311* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
f2b5126e 312
8405b70c 313Parsers Written In Other Languages
12545799
AD
314
315* C++ Parsers:: The interface to generate C++ parser classes
8405b70c 316* Java Parsers:: The interface to generate Java parser classes
12545799
AD
317
318C++ Parsers
319
320* C++ Bison Interface:: Asking for C++ parser generation
321* C++ Semantic Values:: %union vs. C++
322* C++ Location Values:: The position and location classes
323* C++ Parser Interface:: Instantiating and running the parser
324* C++ Scanner Interface:: Exchanges between yylex and parse
8405b70c 325* A Complete C++ Example:: Demonstrating their use
12545799 326
936c88d1
AD
327C++ Location Values
328
329* C++ position:: One point in the source file
330* C++ location:: Two points in the source file
331
12545799
AD
332A Complete C++ Example
333
334* Calc++ --- C++ Calculator:: The specifications
335* Calc++ Parsing Driver:: An active parsing context
336* Calc++ Parser:: A parser class
337* Calc++ Scanner:: A pure C++ Flex scanner
338* Calc++ Top Level:: Conducting the band
339
8405b70c
PB
340Java Parsers
341
f56274a8
DJ
342* Java Bison Interface:: Asking for Java parser generation
343* Java Semantic Values:: %type and %token vs. Java
344* Java Location Values:: The position and location classes
345* Java Parser Interface:: Instantiating and running the parser
346* Java Scanner Interface:: Specifying the scanner for the parser
347* Java Action Features:: Special features for use in actions
348* Java Differences:: Differences between C/C++ and Java Grammars
349* Java Declarations Summary:: List of Bison declarations used with Java
8405b70c 350
d1a1114f
AD
351Frequently Asked Questions
352
f56274a8
DJ
353* Memory Exhausted:: Breaking the Stack Limits
354* How Can I Reset the Parser:: @code{yyparse} Keeps some State
355* Strings are Destroyed:: @code{yylval} Loses Track of Strings
356* Implementing Gotos/Loops:: Control Flow in the Calculator
357* Multiple start-symbols:: Factoring closely related grammars
35430378 358* Secure? Conform?:: Is Bison POSIX safe?
f56274a8
DJ
359* I can't build Bison:: Troubleshooting
360* Where can I find help?:: Troubleshouting
361* Bug Reports:: Troublereporting
362* More Languages:: Parsers in C++, Java, and so on
363* Beta Testing:: Experimenting development versions
364* Mailing Lists:: Meeting other Bison users
d1a1114f 365
f2b5126e
PB
366Copying This Manual
367
f56274a8 368* Copying This Manual:: License for copying this manual.
f2b5126e 369
342b8b6e 370@end detailmenu
bfa74976
RS
371@end menu
372
342b8b6e 373@node Introduction
bfa74976
RS
374@unnumbered Introduction
375@cindex introduction
376
6077da58 377@dfn{Bison} is a general-purpose parser generator that converts an
d89e48b3
JD
378annotated context-free grammar into a deterministic LR or generalized
379LR (GLR) parser employing LALR(1) parser tables. As an experimental
380feature, Bison can also generate IELR(1) or canonical LR(1) parser
381tables. Once you are proficient with Bison, you can use it to develop
382a wide range of language parsers, from those used in simple desk
383calculators to complex programming languages.
384
385Bison is upward compatible with Yacc: all properly-written Yacc
386grammars ought to work with Bison with no change. Anyone familiar
387with Yacc should be able to use Bison with little trouble. You need
388to be fluent in C or C++ programming in order to use Bison or to
389understand this manual. Java is also supported as an experimental
390feature.
391
392We begin with tutorial chapters that explain the basic concepts of
393using Bison and show three explained examples, each building on the
394last. If you don't know Bison or Yacc, start by reading these
395chapters. Reference chapters follow, which describe specific aspects
396of Bison in detail.
bfa74976 397
840341d6
JD
398Bison was written originally by Robert Corbett. Richard Stallman made
399it Yacc-compatible. Wilfred Hansen of Carnegie Mellon University
400added multi-character string literals and other features. Since then,
401Bison has grown more robust and evolved many other new features thanks
402to the hard work of a long list of volunteers. For details, see the
403@file{THANKS} and @file{ChangeLog} files included in the Bison
404distribution.
931c7513 405
df1af54c 406This edition corresponds to version @value{VERSION} of Bison.
bfa74976 407
342b8b6e 408@node Conditions
bfa74976
RS
409@unnumbered Conditions for Using Bison
410
193d7c70
PE
411The distribution terms for Bison-generated parsers permit using the
412parsers in nonfree programs. Before Bison version 2.2, these extra
35430378 413permissions applied only when Bison was generating LALR(1)
193d7c70 414parsers in C@. And before Bison version 1.24, Bison-generated
262aa8dd 415parsers could be used only in programs that were free software.
a31239f1 416
35430378 417The other GNU programming tools, such as the GNU C
c827f760 418compiler, have never
9ecbd125 419had such a requirement. They could always be used for nonfree
a31239f1
RS
420software. The reason Bison was different was not due to a special
421policy decision; it resulted from applying the usual General Public
422License to all of the Bison source code.
423
9913d6e4
JD
424The main output of the Bison utility---the Bison parser implementation
425file---contains a verbatim copy of a sizable piece of Bison, which is
426the code for the parser's implementation. (The actions from your
427grammar are inserted into this implementation at one point, but most
428of the rest of the implementation is not changed.) When we applied
429the GPL terms to the skeleton code for the parser's implementation,
a31239f1
RS
430the effect was to restrict the use of Bison output to free software.
431
432We didn't change the terms because of sympathy for people who want to
433make software proprietary. @strong{Software should be free.} But we
434concluded that limiting Bison's use to free software was doing little to
435encourage people to make other software free. So we decided to make the
436practical conditions for using Bison match the practical conditions for
35430378 437using the other GNU tools.
bfa74976 438
193d7c70
PE
439This exception applies when Bison is generating code for a parser.
440You can tell whether the exception applies to a Bison output file by
441inspecting the file for text beginning with ``As a special
442exception@dots{}''. The text spells out the exact terms of the
443exception.
262aa8dd 444
f16b0819
PE
445@node Copying
446@unnumbered GNU GENERAL PUBLIC LICENSE
447@include gpl-3.0.texi
bfa74976 448
342b8b6e 449@node Concepts
bfa74976
RS
450@chapter The Concepts of Bison
451
452This chapter introduces many of the basic concepts without which the
453details of Bison will not make sense. If you do not already know how to
454use Bison or Yacc, we suggest you start by reading this chapter carefully.
455
456@menu
f56274a8
DJ
457* Language and Grammar:: Languages and context-free grammars,
458 as mathematical ideas.
459* Grammar in Bison:: How we represent grammars for Bison's sake.
460* Semantic Values:: Each token or syntactic grouping can have
461 a semantic value (the value of an integer,
462 the name of an identifier, etc.).
463* Semantic Actions:: Each rule can have an action containing C code.
464* GLR Parsers:: Writing parsers for general context-free languages.
83484365 465* Locations:: Overview of location tracking.
f56274a8
DJ
466* Bison Parser:: What are Bison's input and output,
467 how is the output used?
468* Stages:: Stages in writing and running Bison grammars.
469* Grammar Layout:: Overall structure of a Bison grammar file.
bfa74976
RS
470@end menu
471
342b8b6e 472@node Language and Grammar
bfa74976
RS
473@section Languages and Context-Free Grammars
474
bfa74976
RS
475@cindex context-free grammar
476@cindex grammar, context-free
477In order for Bison to parse a language, it must be described by a
478@dfn{context-free grammar}. This means that you specify one or more
479@dfn{syntactic groupings} and give rules for constructing them from their
480parts. For example, in the C language, one kind of grouping is called an
481`expression'. One rule for making an expression might be, ``An expression
482can be made of a minus sign and another expression''. Another would be,
483``An expression can be an integer''. As you can see, rules are often
484recursive, but there must be at least one rule which leads out of the
485recursion.
486
35430378 487@cindex BNF
bfa74976
RS
488@cindex Backus-Naur form
489The most common formal system for presenting such rules for humans to read
35430378 490is @dfn{Backus-Naur Form} or ``BNF'', which was developed in
c827f760 491order to specify the language Algol 60. Any grammar expressed in
35430378
JD
492BNF is a context-free grammar. The input to Bison is
493essentially machine-readable BNF.
bfa74976 494
6f04ee6c
JD
495@cindex LALR grammars
496@cindex IELR grammars
497@cindex LR grammars
498There are various important subclasses of context-free grammars. Although
499it can handle almost all context-free grammars, Bison is optimized for what
500are called LR(1) grammars. In brief, in these grammars, it must be possible
501to tell how to parse any portion of an input string with just a single token
502of lookahead. For historical reasons, Bison by default is limited by the
503additional restrictions of LALR(1), which is hard to explain simply.
5da0355a
JD
504@xref{Mysterious Conflicts}, for more information on this. As an
505experimental feature, you can escape these additional restrictions by
506requesting IELR(1) or canonical LR(1) parser tables. @xref{LR Table
507Construction}, to learn how.
bfa74976 508
35430378
JD
509@cindex GLR parsing
510@cindex generalized LR (GLR) parsing
676385e2 511@cindex ambiguous grammars
9d9b8b70 512@cindex nondeterministic parsing
9501dc6e 513
35430378 514Parsers for LR(1) grammars are @dfn{deterministic}, meaning
9501dc6e
AD
515roughly that the next grammar rule to apply at any point in the input is
516uniquely determined by the preceding input and a fixed, finite portion
742e4900 517(called a @dfn{lookahead}) of the remaining input. A context-free
9501dc6e 518grammar can be @dfn{ambiguous}, meaning that there are multiple ways to
e4f85c39 519apply the grammar rules to get the same inputs. Even unambiguous
9d9b8b70 520grammars can be @dfn{nondeterministic}, meaning that no fixed
742e4900 521lookahead always suffices to determine the next grammar rule to apply.
9501dc6e 522With the proper declarations, Bison is also able to parse these more
35430378
JD
523general context-free grammars, using a technique known as GLR
524parsing (for Generalized LR). Bison's GLR parsers
9501dc6e
AD
525are able to handle any context-free grammar for which the number of
526possible parses of any given string is finite.
676385e2 527
bfa74976
RS
528@cindex symbols (abstract)
529@cindex token
530@cindex syntactic grouping
531@cindex grouping, syntactic
9501dc6e
AD
532In the formal grammatical rules for a language, each kind of syntactic
533unit or grouping is named by a @dfn{symbol}. Those which are built by
534grouping smaller constructs according to grammatical rules are called
bfa74976
RS
535@dfn{nonterminal symbols}; those which can't be subdivided are called
536@dfn{terminal symbols} or @dfn{token types}. We call a piece of input
537corresponding to a single terminal symbol a @dfn{token}, and a piece
e0c471a9 538corresponding to a single nonterminal symbol a @dfn{grouping}.
bfa74976
RS
539
540We can use the C language as an example of what symbols, terminal and
9501dc6e
AD
541nonterminal, mean. The tokens of C are identifiers, constants (numeric
542and string), and the various keywords, arithmetic operators and
543punctuation marks. So the terminal symbols of a grammar for C include
544`identifier', `number', `string', plus one symbol for each keyword,
545operator or punctuation mark: `if', `return', `const', `static', `int',
546`char', `plus-sign', `open-brace', `close-brace', `comma' and many more.
547(These tokens can be subdivided into characters, but that is a matter of
bfa74976
RS
548lexicography, not grammar.)
549
550Here is a simple C function subdivided into tokens:
551
9edcd895
AD
552@example
553int /* @r{keyword `int'} */
14d4662b 554square (int x) /* @r{identifier, open-paren, keyword `int',}
9edcd895
AD
555 @r{identifier, close-paren} */
556@{ /* @r{open-brace} */
aa08666d
AD
557 return x * x; /* @r{keyword `return', identifier, asterisk,}
558 @r{identifier, semicolon} */
9edcd895
AD
559@} /* @r{close-brace} */
560@end example
bfa74976
RS
561
562The syntactic groupings of C include the expression, the statement, the
563declaration, and the function definition. These are represented in the
564grammar of C by nonterminal symbols `expression', `statement',
565`declaration' and `function definition'. The full grammar uses dozens of
566additional language constructs, each with its own nonterminal symbol, in
567order to express the meanings of these four. The example above is a
568function definition; it contains one declaration, and one statement. In
569the statement, each @samp{x} is an expression and so is @samp{x * x}.
570
571Each nonterminal symbol must have grammatical rules showing how it is made
572out of simpler constructs. For example, one kind of C statement is the
573@code{return} statement; this would be described with a grammar rule which
574reads informally as follows:
575
576@quotation
577A `statement' can be made of a `return' keyword, an `expression' and a
578`semicolon'.
579@end quotation
580
581@noindent
582There would be many other rules for `statement', one for each kind of
583statement in C.
584
585@cindex start symbol
586One nonterminal symbol must be distinguished as the special one which
587defines a complete utterance in the language. It is called the @dfn{start
588symbol}. In a compiler, this means a complete input program. In the C
589language, the nonterminal symbol `sequence of definitions and declarations'
590plays this role.
591
592For example, @samp{1 + 2} is a valid C expression---a valid part of a C
593program---but it is not valid as an @emph{entire} C program. In the
594context-free grammar of C, this follows from the fact that `expression' is
595not the start symbol.
596
597The Bison parser reads a sequence of tokens as its input, and groups the
598tokens using the grammar rules. If the input is valid, the end result is
599that the entire token sequence reduces to a single grouping whose symbol is
600the grammar's start symbol. If we use a grammar for C, the entire input
601must be a `sequence of definitions and declarations'. If not, the parser
602reports a syntax error.
603
342b8b6e 604@node Grammar in Bison
bfa74976
RS
605@section From Formal Rules to Bison Input
606@cindex Bison grammar
607@cindex grammar, Bison
608@cindex formal grammar
609
610A formal grammar is a mathematical construct. To define the language
611for Bison, you must write a file expressing the grammar in Bison syntax:
612a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}.
613
614A nonterminal symbol in the formal grammar is represented in Bison input
c827f760 615as an identifier, like an identifier in C@. By convention, it should be
bfa74976
RS
616in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
617
618The Bison representation for a terminal symbol is also called a @dfn{token
619type}. Token types as well can be represented as C-like identifiers. By
620convention, these identifiers should be upper case to distinguish them from
621nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
622@code{RETURN}. A terminal symbol that stands for a particular keyword in
623the language should be named after that keyword converted to upper case.
624The terminal symbol @code{error} is reserved for error recovery.
931c7513 625@xref{Symbols}.
bfa74976
RS
626
627A terminal symbol can also be represented as a character literal, just like
628a C character constant. You should do this whenever a token is just a
629single character (parenthesis, plus-sign, etc.): use that same character in
630a literal as the terminal symbol for that token.
631
931c7513
RS
632A third way to represent a terminal symbol is with a C string constant
633containing several characters. @xref{Symbols}, for more information.
634
bfa74976
RS
635The grammar rules also have an expression in Bison syntax. For example,
636here is the Bison rule for a C @code{return} statement. The semicolon in
637quotes is a literal character token, representing part of the C syntax for
638the statement; the naked semicolon, and the colon, are Bison punctuation
639used in every rule.
640
641@example
de6be119 642stmt: RETURN expr ';' ;
bfa74976
RS
643@end example
644
645@noindent
646@xref{Rules, ,Syntax of Grammar Rules}.
647
342b8b6e 648@node Semantic Values
bfa74976
RS
649@section Semantic Values
650@cindex semantic value
651@cindex value, semantic
652
653A formal grammar selects tokens only by their classifications: for example,
654if a rule mentions the terminal symbol `integer constant', it means that
655@emph{any} integer constant is grammatically valid in that position. The
656precise value of the constant is irrelevant to how to parse the input: if
657@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
e0c471a9 658grammatical.
bfa74976
RS
659
660But the precise value is very important for what the input means once it is
661parsed. A compiler is useless if it fails to distinguish between 4, 1 and
6623989 as constants in the program! Therefore, each token in a Bison grammar
c827f760
PE
663has both a token type and a @dfn{semantic value}. @xref{Semantics,
664,Defining Language Semantics},
bfa74976
RS
665for details.
666
667The token type is a terminal symbol defined in the grammar, such as
668@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything
669you need to know to decide where the token may validly appear and how to
670group it with other tokens. The grammar rules know nothing about tokens
e0c471a9 671except their types.
bfa74976
RS
672
673The semantic value has all the rest of the information about the
674meaning of the token, such as the value of an integer, or the name of an
675identifier. (A token such as @code{','} which is just punctuation doesn't
676need to have any semantic value.)
677
678For example, an input token might be classified as token type
679@code{INTEGER} and have the semantic value 4. Another input token might
680have the same token type @code{INTEGER} but value 3989. When a grammar
681rule says that @code{INTEGER} is allowed, either of these tokens is
682acceptable because each is an @code{INTEGER}. When the parser accepts the
683token, it keeps track of the token's semantic value.
684
685Each grouping can also have a semantic value as well as its nonterminal
686symbol. For example, in a calculator, an expression typically has a
687semantic value that is a number. In a compiler for a programming
688language, an expression typically has a semantic value that is a tree
689structure describing the meaning of the expression.
690
342b8b6e 691@node Semantic Actions
bfa74976
RS
692@section Semantic Actions
693@cindex semantic actions
694@cindex actions, semantic
695
696In order to be useful, a program must do more than parse input; it must
697also produce some output based on the input. In a Bison grammar, a grammar
698rule can have an @dfn{action} made up of C statements. Each time the
699parser recognizes a match for that rule, the action is executed.
700@xref{Actions}.
13863333 701
bfa74976
RS
702Most of the time, the purpose of an action is to compute the semantic value
703of the whole construct from the semantic values of its parts. For example,
704suppose we have a rule which says an expression can be the sum of two
705expressions. When the parser recognizes such a sum, each of the
706subexpressions has a semantic value which describes how it was built up.
707The action for this rule should create a similar sort of value for the
708newly recognized larger expression.
709
710For example, here is a rule that says an expression can be the sum of
711two subexpressions:
712
713@example
de6be119 714expr: expr '+' expr @{ $$ = $1 + $3; @} ;
bfa74976
RS
715@end example
716
717@noindent
718The action says how to produce the semantic value of the sum expression
719from the values of the two subexpressions.
720
676385e2 721@node GLR Parsers
35430378
JD
722@section Writing GLR Parsers
723@cindex GLR parsing
724@cindex generalized LR (GLR) parsing
676385e2
PH
725@findex %glr-parser
726@cindex conflicts
727@cindex shift/reduce conflicts
fa7e68c3 728@cindex reduce/reduce conflicts
676385e2 729
34a6c2d1 730In some grammars, Bison's deterministic
35430378 731LR(1) parsing algorithm cannot decide whether to apply a
9501dc6e
AD
732certain grammar rule at a given point. That is, it may not be able to
733decide (on the basis of the input read so far) which of two possible
734reductions (applications of a grammar rule) applies, or whether to apply
735a reduction or read more of the input and apply a reduction later in the
736input. These are known respectively as @dfn{reduce/reduce} conflicts
737(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts
738(@pxref{Shift/Reduce}).
739
35430378 740To use a grammar that is not easily modified to be LR(1), a
9501dc6e 741more general parsing algorithm is sometimes necessary. If you include
676385e2 742@code{%glr-parser} among the Bison declarations in your file
35430378
JD
743(@pxref{Grammar Outline}), the result is a Generalized LR
744(GLR) parser. These parsers handle Bison grammars that
9501dc6e 745contain no unresolved conflicts (i.e., after applying precedence
34a6c2d1 746declarations) identically to deterministic parsers. However, when
9501dc6e 747faced with unresolved shift/reduce and reduce/reduce conflicts,
35430378 748GLR parsers use the simple expedient of doing both,
9501dc6e
AD
749effectively cloning the parser to follow both possibilities. Each of
750the resulting parsers can again split, so that at any given time, there
751can be any number of possible parses being explored. The parsers
676385e2
PH
752proceed in lockstep; that is, all of them consume (shift) a given input
753symbol before any of them proceed to the next. Each of the cloned
754parsers eventually meets one of two possible fates: either it runs into
755a parsing error, in which case it simply vanishes, or it merges with
756another parser, because the two of them have reduced the input to an
757identical set of symbols.
758
759During the time that there are multiple parsers, semantic actions are
760recorded, but not performed. When a parser disappears, its recorded
761semantic actions disappear as well, and are never performed. When a
762reduction makes two parsers identical, causing them to merge, Bison
763records both sets of semantic actions. Whenever the last two parsers
764merge, reverting to the single-parser case, Bison resolves all the
765outstanding actions either by precedences given to the grammar rules
766involved, or by performing both actions, and then calling a designated
767user-defined function on the resulting values to produce an arbitrary
768merged result.
769
fa7e68c3 770@menu
35430378
JD
771* Simple GLR Parsers:: Using GLR parsers on unambiguous grammars.
772* Merging GLR Parses:: Using GLR parsers to resolve ambiguities.
f56274a8 773* GLR Semantic Actions:: Deferred semantic actions have special concerns.
35430378 774* Compiler Requirements:: GLR parsers require a modern C compiler.
fa7e68c3
PE
775@end menu
776
777@node Simple GLR Parsers
35430378
JD
778@subsection Using GLR on Unambiguous Grammars
779@cindex GLR parsing, unambiguous grammars
780@cindex generalized LR (GLR) parsing, unambiguous grammars
fa7e68c3
PE
781@findex %glr-parser
782@findex %expect-rr
783@cindex conflicts
784@cindex reduce/reduce conflicts
785@cindex shift/reduce conflicts
786
35430378
JD
787In the simplest cases, you can use the GLR algorithm
788to parse grammars that are unambiguous but fail to be LR(1).
34a6c2d1 789Such grammars typically require more than one symbol of lookahead.
fa7e68c3
PE
790
791Consider a problem that
792arises in the declaration of enumerated and subrange types in the
793programming language Pascal. Here are some examples:
794
795@example
796type subrange = lo .. hi;
797type enum = (a, b, c);
798@end example
799
800@noindent
801The original language standard allows only numeric
802literals and constant identifiers for the subrange bounds (@samp{lo}
35430378 803and @samp{hi}), but Extended Pascal (ISO/IEC
fa7e68c3
PE
80410206) and many other
805Pascal implementations allow arbitrary expressions there. This gives
806rise to the following situation, containing a superfluous pair of
807parentheses:
808
809@example
810type subrange = (a) .. b;
811@end example
812
813@noindent
814Compare this to the following declaration of an enumerated
815type with only one value:
816
817@example
818type enum = (a);
819@end example
820
821@noindent
822(These declarations are contrived, but they are syntactically
823valid, and more-complicated cases can come up in practical programs.)
824
825These two declarations look identical until the @samp{..} token.
35430378 826With normal LR(1) one-token lookahead it is not
fa7e68c3
PE
827possible to decide between the two forms when the identifier
828@samp{a} is parsed. It is, however, desirable
829for a parser to decide this, since in the latter case
830@samp{a} must become a new identifier to represent the enumeration
831value, while in the former case @samp{a} must be evaluated with its
832current meaning, which may be a constant or even a function call.
833
834You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',
835to be resolved later, but this typically requires substantial
836contortions in both semantic actions and large parts of the
837grammar, where the parentheses are nested in the recursive rules for
838expressions.
839
840You might think of using the lexer to distinguish between the two
841forms by returning different tokens for currently defined and
842undefined identifiers. But if these declarations occur in a local
843scope, and @samp{a} is defined in an outer scope, then both forms
844are possible---either locally redefining @samp{a}, or using the
845value of @samp{a} from the outer scope. So this approach cannot
846work.
847
e757bb10 848A simple solution to this problem is to declare the parser to
35430378
JD
849use the GLR algorithm.
850When the GLR parser reaches the critical state, it
fa7e68c3
PE
851merely splits into two branches and pursues both syntax rules
852simultaneously. Sooner or later, one of them runs into a parsing
853error. If there is a @samp{..} token before the next
854@samp{;}, the rule for enumerated types fails since it cannot
855accept @samp{..} anywhere; otherwise, the subrange type rule
856fails since it requires a @samp{..} token. So one of the branches
857fails silently, and the other one continues normally, performing
858all the intermediate actions that were postponed during the split.
859
860If the input is syntactically incorrect, both branches fail and the parser
861reports a syntax error as usual.
862
863The effect of all this is that the parser seems to ``guess'' the
864correct branch to take, or in other words, it seems to use more
35430378
JD
865lookahead than the underlying LR(1) algorithm actually allows
866for. In this example, LR(2) would suffice, but also some cases
867that are not LR(@math{k}) for any @math{k} can be handled this way.
fa7e68c3 868
35430378 869In general, a GLR parser can take quadratic or cubic worst-case time,
fa7e68c3
PE
870and the current Bison parser even takes exponential time and space
871for some grammars. In practice, this rarely happens, and for many
872grammars it is possible to prove that it cannot happen.
873The present example contains only one conflict between two
874rules, and the type-declaration context containing the conflict
875cannot be nested. So the number of
876branches that can exist at any time is limited by the constant 2,
877and the parsing time is still linear.
878
879Here is a Bison grammar corresponding to the example above. It
880parses a vastly simplified form of Pascal type declarations.
881
882@example
883%token TYPE DOTDOT ID
884
885@group
886%left '+' '-'
887%left '*' '/'
888@end group
889
890%%
891
892@group
de6be119 893type_decl: TYPE ID '=' type ';' ;
fa7e68c3
PE
894@end group
895
896@group
de6be119
AD
897type:
898 '(' id_list ')'
899| expr DOTDOT expr
900;
fa7e68c3
PE
901@end group
902
903@group
de6be119
AD
904id_list:
905 ID
906| id_list ',' ID
907;
fa7e68c3
PE
908@end group
909
910@group
de6be119
AD
911expr:
912 '(' expr ')'
913| expr '+' expr
914| expr '-' expr
915| expr '*' expr
916| expr '/' expr
917| ID
918;
fa7e68c3
PE
919@end group
920@end example
921
35430378 922When used as a normal LR(1) grammar, Bison correctly complains
fa7e68c3
PE
923about one reduce/reduce conflict. In the conflicting situation the
924parser chooses one of the alternatives, arbitrarily the one
925declared first. Therefore the following correct input is not
926recognized:
927
928@example
929type t = (a) .. b;
930@end example
931
35430378 932The parser can be turned into a GLR parser, while also telling Bison
9913d6e4
JD
933to be silent about the one known reduce/reduce conflict, by adding
934these two declarations to the Bison grammar file (before the first
fa7e68c3
PE
935@samp{%%}):
936
937@example
938%glr-parser
939%expect-rr 1
940@end example
941
942@noindent
943No change in the grammar itself is required. Now the
944parser recognizes all valid declarations, according to the
945limited syntax above, transparently. In fact, the user does not even
946notice when the parser splits.
947
35430378 948So here we have a case where we can use the benefits of GLR,
f8e1c9e5
AD
949almost without disadvantages. Even in simple cases like this, however,
950there are at least two potential problems to beware. First, always
35430378
JD
951analyze the conflicts reported by Bison to make sure that GLR
952splitting is only done where it is intended. A GLR parser
f8e1c9e5 953splitting inadvertently may cause problems less obvious than an
35430378 954LR parser statically choosing the wrong alternative in a
f8e1c9e5
AD
955conflict. Second, consider interactions with the lexer (@pxref{Semantic
956Tokens}) with great care. Since a split parser consumes tokens without
957performing any actions during the split, the lexer cannot obtain
958information via parser actions. Some cases of lexer interactions can be
35430378 959eliminated by using GLR to shift the complications from the
f8e1c9e5
AD
960lexer to the parser. You must check the remaining cases for
961correctness.
962
963In our example, it would be safe for the lexer to return tokens based on
964their current meanings in some symbol table, because no new symbols are
965defined in the middle of a type declaration. Though it is possible for
966a parser to define the enumeration constants as they are parsed, before
967the type declaration is completed, it actually makes no difference since
968they cannot be used within the same enumerated type declaration.
fa7e68c3
PE
969
970@node Merging GLR Parses
35430378
JD
971@subsection Using GLR to Resolve Ambiguities
972@cindex GLR parsing, ambiguous grammars
973@cindex generalized LR (GLR) parsing, ambiguous grammars
fa7e68c3
PE
974@findex %dprec
975@findex %merge
976@cindex conflicts
977@cindex reduce/reduce conflicts
978
2a8d363a 979Let's consider an example, vastly simplified from a C++ grammar.
676385e2
PH
980
981@example
982%@{
38a92d50
PE
983 #include <stdio.h>
984 #define YYSTYPE char const *
985 int yylex (void);
986 void yyerror (char const *);
676385e2
PH
987%@}
988
989%token TYPENAME ID
990
991%right '='
992%left '+'
993
994%glr-parser
995
996%%
997
de6be119
AD
998prog:
999 /* Nothing. */
1000| prog stmt @{ printf ("\n"); @}
1001;
676385e2 1002
de6be119
AD
1003stmt:
1004 expr ';' %dprec 1
1005| decl %dprec 2
1006;
676385e2 1007
de6be119
AD
1008expr:
1009 ID @{ printf ("%s ", $$); @}
1010| TYPENAME '(' expr ')'
1011 @{ printf ("%s <cast> ", $1); @}
1012| expr '+' expr @{ printf ("+ "); @}
1013| expr '=' expr @{ printf ("= "); @}
1014;
676385e2 1015
de6be119
AD
1016decl:
1017 TYPENAME declarator ';'
1018 @{ printf ("%s <declare> ", $1); @}
1019| TYPENAME declarator '=' expr ';'
1020 @{ printf ("%s <init-declare> ", $1); @}
1021;
676385e2 1022
de6be119
AD
1023declarator:
1024 ID @{ printf ("\"%s\" ", $1); @}
1025| '(' declarator ')'
1026;
676385e2
PH
1027@end example
1028
1029@noindent
1030This models a problematic part of the C++ grammar---the ambiguity between
1031certain declarations and statements. For example,
1032
1033@example
1034T (x) = y+z;
1035@end example
1036
1037@noindent
1038parses as either an @code{expr} or a @code{stmt}
c827f760
PE
1039(assuming that @samp{T} is recognized as a @code{TYPENAME} and
1040@samp{x} as an @code{ID}).
676385e2 1041Bison detects this as a reduce/reduce conflict between the rules
fae437e8 1042@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at the
e757bb10 1043time it encounters @code{x} in the example above. Since this is a
35430378 1044GLR parser, it therefore splits the problem into two parses, one for
fa7e68c3
PE
1045each choice of resolving the reduce/reduce conflict.
1046Unlike the example from the previous section (@pxref{Simple GLR Parsers}),
1047however, neither of these parses ``dies,'' because the grammar as it stands is
e757bb10
AD
1048ambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} and
1049the other reduces @code{stmt : decl}, after which both parsers are in an
1050identical state: they've seen @samp{prog stmt} and have the same unprocessed
1051input remaining. We say that these parses have @dfn{merged.}
fa7e68c3 1052
35430378 1053At this point, the GLR parser requires a specification in the
fa7e68c3
PE
1054grammar of how to choose between the competing parses.
1055In the example above, the two @code{%dprec}
e757bb10 1056declarations specify that Bison is to give precedence
fa7e68c3 1057to the parse that interprets the example as a
676385e2
PH
1058@code{decl}, which implies that @code{x} is a declarator.
1059The parser therefore prints
1060
1061@example
fae437e8 1062"x" y z + T <init-declare>
676385e2
PH
1063@end example
1064
fa7e68c3
PE
1065The @code{%dprec} declarations only come into play when more than one
1066parse survives. Consider a different input string for this parser:
676385e2
PH
1067
1068@example
1069T (x) + y;
1070@end example
1071
1072@noindent
35430378 1073This is another example of using GLR to parse an unambiguous
fa7e68c3 1074construct, as shown in the previous section (@pxref{Simple GLR Parsers}).
676385e2
PH
1075Here, there is no ambiguity (this cannot be parsed as a declaration).
1076However, at the time the Bison parser encounters @code{x}, it does not
1077have enough information to resolve the reduce/reduce conflict (again,
1078between @code{x} as an @code{expr} or a @code{declarator}). In this
fa7e68c3 1079case, no precedence declaration is used. Again, the parser splits
676385e2
PH
1080into two, one assuming that @code{x} is an @code{expr}, and the other
1081assuming @code{x} is a @code{declarator}. The second of these parsers
1082then vanishes when it sees @code{+}, and the parser prints
1083
1084@example
fae437e8 1085x T <cast> y +
676385e2
PH
1086@end example
1087
1088Suppose that instead of resolving the ambiguity, you wanted to see all
fa7e68c3 1089the possibilities. For this purpose, you must merge the semantic
676385e2
PH
1090actions of the two possible parsers, rather than choosing one over the
1091other. To do so, you could change the declaration of @code{stmt} as
1092follows:
1093
1094@example
de6be119
AD
1095stmt:
1096 expr ';' %merge <stmtMerge>
1097| decl %merge <stmtMerge>
1098;
676385e2
PH
1099@end example
1100
1101@noindent
676385e2
PH
1102and define the @code{stmtMerge} function as:
1103
1104@example
38a92d50
PE
1105static YYSTYPE
1106stmtMerge (YYSTYPE x0, YYSTYPE x1)
676385e2
PH
1107@{
1108 printf ("<OR> ");
1109 return "";
1110@}
1111@end example
1112
1113@noindent
1114with an accompanying forward declaration
1115in the C declarations at the beginning of the file:
1116
1117@example
1118%@{
38a92d50 1119 #define YYSTYPE char const *
676385e2
PH
1120 static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
1121%@}
1122@end example
1123
1124@noindent
fa7e68c3
PE
1125With these declarations, the resulting parser parses the first example
1126as both an @code{expr} and a @code{decl}, and prints
676385e2
PH
1127
1128@example
fae437e8 1129"x" y z + T <init-declare> x T <cast> y z + = <OR>
676385e2
PH
1130@end example
1131
fa7e68c3 1132Bison requires that all of the
e757bb10 1133productions that participate in any particular merge have identical
fa7e68c3
PE
1134@samp{%merge} clauses. Otherwise, the ambiguity would be unresolvable,
1135and the parser will report an error during any parse that results in
1136the offending merge.
9501dc6e 1137
32c29292
JD
1138@node GLR Semantic Actions
1139@subsection GLR Semantic Actions
1140
1141@cindex deferred semantic actions
1142By definition, a deferred semantic action is not performed at the same time as
1143the associated reduction.
1144This raises caveats for several Bison features you might use in a semantic
35430378 1145action in a GLR parser.
32c29292
JD
1146
1147@vindex yychar
35430378 1148@cindex GLR parsers and @code{yychar}
32c29292 1149@vindex yylval
35430378 1150@cindex GLR parsers and @code{yylval}
32c29292 1151@vindex yylloc
35430378 1152@cindex GLR parsers and @code{yylloc}
32c29292 1153In any semantic action, you can examine @code{yychar} to determine the type of
742e4900 1154the lookahead token present at the time of the associated reduction.
32c29292
JD
1155After checking that @code{yychar} is not set to @code{YYEMPTY} or @code{YYEOF},
1156you can then examine @code{yylval} and @code{yylloc} to determine the
742e4900 1157lookahead token's semantic value and location, if any.
32c29292
JD
1158In a nondeferred semantic action, you can also modify any of these variables to
1159influence syntax analysis.
742e4900 1160@xref{Lookahead, ,Lookahead Tokens}.
32c29292
JD
1161
1162@findex yyclearin
35430378 1163@cindex GLR parsers and @code{yyclearin}
32c29292
JD
1164In a deferred semantic action, it's too late to influence syntax analysis.
1165In this case, @code{yychar}, @code{yylval}, and @code{yylloc} are set to
1166shallow copies of the values they had at the time of the associated reduction.
1167For this reason alone, modifying them is dangerous.
1168Moreover, the result of modifying them is undefined and subject to change with
1169future versions of Bison.
1170For example, if a semantic action might be deferred, you should never write it
1171to invoke @code{yyclearin} (@pxref{Action Features}) or to attempt to free
1172memory referenced by @code{yylval}.
1173
1174@findex YYERROR
35430378 1175@cindex GLR parsers and @code{YYERROR}
32c29292 1176Another Bison feature requiring special consideration is @code{YYERROR}
8710fc41 1177(@pxref{Action Features}), which you can invoke in a semantic action to
32c29292 1178initiate error recovery.
35430378 1179During deterministic GLR operation, the effect of @code{YYERROR} is
34a6c2d1 1180the same as its effect in a deterministic parser.
32c29292
JD
1181In a deferred semantic action, its effect is undefined.
1182@c The effect is probably a syntax error at the split point.
1183
8710fc41 1184Also, see @ref{Location Default Action, ,Default Action for Locations}, which
35430378 1185describes a special usage of @code{YYLLOC_DEFAULT} in GLR parsers.
8710fc41 1186
fa7e68c3 1187@node Compiler Requirements
35430378 1188@subsection Considerations when Compiling GLR Parsers
fa7e68c3 1189@cindex @code{inline}
35430378 1190@cindex GLR parsers and @code{inline}
fa7e68c3 1191
35430378 1192The GLR parsers require a compiler for ISO C89 or
38a92d50
PE
1193later. In addition, they use the @code{inline} keyword, which is not
1194C89, but is C99 and is a common extension in pre-C99 compilers. It is
1195up to the user of these parsers to handle
9501dc6e
AD
1196portability issues. For instance, if using Autoconf and the Autoconf
1197macro @code{AC_C_INLINE}, a mere
1198
1199@example
1200%@{
38a92d50 1201 #include <config.h>
9501dc6e
AD
1202%@}
1203@end example
1204
1205@noindent
1206will suffice. Otherwise, we suggest
1207
1208@example
1209%@{
2c0f9706
AD
1210 #if (__STDC_VERSION__ < 199901 && ! defined __GNUC__ \
1211 && ! defined inline)
1212 # define inline
38a92d50 1213 #endif
9501dc6e
AD
1214%@}
1215@end example
676385e2 1216
83484365 1217@node Locations
847bf1f5
AD
1218@section Locations
1219@cindex location
95923bd6
AD
1220@cindex textual location
1221@cindex location, textual
847bf1f5
AD
1222
1223Many applications, like interpreters or compilers, have to produce verbose
72d2299c 1224and useful error messages. To achieve this, one must be able to keep track of
95923bd6 1225the @dfn{textual location}, or @dfn{location}, of each syntactic construct.
847bf1f5
AD
1226Bison provides a mechanism for handling these locations.
1227
72d2299c 1228Each token has a semantic value. In a similar fashion, each token has an
7404cdf3
JD
1229associated location, but the type of locations is the same for all tokens
1230and groupings. Moreover, the output parser is equipped with a default data
1231structure for storing locations (@pxref{Tracking Locations}, for more
1232details).
847bf1f5
AD
1233
1234Like semantic values, locations can be reached in actions using a dedicated
72d2299c 1235set of constructs. In the example above, the location of the whole grouping
847bf1f5
AD
1236is @code{@@$}, while the locations of the subexpressions are @code{@@1} and
1237@code{@@3}.
1238
1239When a rule is matched, a default action is used to compute the semantic value
72d2299c
PE
1240of its left hand side (@pxref{Actions}). In the same way, another default
1241action is used for locations. However, the action for locations is general
847bf1f5 1242enough for most cases, meaning there is usually no need to describe for each
72d2299c 1243rule how @code{@@$} should be formed. When building a new location for a given
847bf1f5
AD
1244grouping, the default behavior of the output parser is to take the beginning
1245of the first symbol, and the end of the last symbol.
1246
342b8b6e 1247@node Bison Parser
9913d6e4 1248@section Bison Output: the Parser Implementation File
bfa74976
RS
1249@cindex Bison parser
1250@cindex Bison utility
1251@cindex lexical analyzer, purpose
1252@cindex parser
1253
9913d6e4
JD
1254When you run Bison, you give it a Bison grammar file as input. The
1255most important output is a C source file that implements a parser for
1256the language described by the grammar. This parser is called a
1257@dfn{Bison parser}, and this file is called a @dfn{Bison parser
1258implementation file}. Keep in mind that the Bison utility and the
1259Bison parser are two distinct programs: the Bison utility is a program
1260whose output is the Bison parser implementation file that becomes part
1261of your program.
bfa74976
RS
1262
1263The job of the Bison parser is to group tokens into groupings according to
1264the grammar rules---for example, to build identifiers and operators into
1265expressions. As it does this, it runs the actions for the grammar rules it
1266uses.
1267
704a47c4
AD
1268The tokens come from a function called the @dfn{lexical analyzer} that
1269you must supply in some fashion (such as by writing it in C). The Bison
1270parser calls the lexical analyzer each time it wants a new token. It
1271doesn't know what is ``inside'' the tokens (though their semantic values
1272may reflect this). Typically the lexical analyzer makes the tokens by
1273parsing characters of text, but Bison does not depend on this.
1274@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
bfa74976 1275
9913d6e4
JD
1276The Bison parser implementation file is C code which defines a
1277function named @code{yyparse} which implements that grammar. This
1278function does not make a complete C program: you must supply some
1279additional functions. One is the lexical analyzer. Another is an
1280error-reporting function which the parser calls to report an error.
1281In addition, a complete C program must start with a function called
1282@code{main}; you have to provide this, and arrange for it to call
1283@code{yyparse} or the parser will never run. @xref{Interface, ,Parser
1284C-Language Interface}.
bfa74976 1285
f7ab6a50 1286Aside from the token type names and the symbols in the actions you
9913d6e4
JD
1287write, all symbols defined in the Bison parser implementation file
1288itself begin with @samp{yy} or @samp{YY}. This includes interface
1289functions such as the lexical analyzer function @code{yylex}, the
1290error reporting function @code{yyerror} and the parser function
1291@code{yyparse} itself. This also includes numerous identifiers used
1292for internal purposes. Therefore, you should avoid using C
1293identifiers starting with @samp{yy} or @samp{YY} in the Bison grammar
1294file except for the ones defined in this manual. Also, you should
1295avoid using the C identifiers @samp{malloc} and @samp{free} for
1296anything other than their usual meanings.
1297
1298In some cases the Bison parser implementation file includes system
1299headers, and in those cases your code should respect the identifiers
1300reserved by those headers. On some non-GNU hosts, @code{<alloca.h>},
1301@code{<malloc.h>}, @code{<stddef.h>}, and @code{<stdlib.h>} are
1302included as needed to declare memory allocators and related types.
1303@code{<libintl.h>} is included if message translation is in use
1304(@pxref{Internationalization}). Other system headers may be included
1305if you define @code{YYDEBUG} to a nonzero value (@pxref{Tracing,
1306,Tracing Your Parser}).
7093d0f5 1307
342b8b6e 1308@node Stages
bfa74976
RS
1309@section Stages in Using Bison
1310@cindex stages in using Bison
1311@cindex using Bison
1312
1313The actual language-design process using Bison, from grammar specification
1314to a working compiler or interpreter, has these parts:
1315
1316@enumerate
1317@item
1318Formally specify the grammar in a form recognized by Bison
704a47c4
AD
1319(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule
1320in the language, describe the action that is to be taken when an
1321instance of that rule is recognized. The action is described by a
1322sequence of C statements.
bfa74976
RS
1323
1324@item
704a47c4
AD
1325Write a lexical analyzer to process input and pass tokens to the parser.
1326The lexical analyzer may be written by hand in C (@pxref{Lexical, ,The
1327Lexical Analyzer Function @code{yylex}}). It could also be produced
1328using Lex, but the use of Lex is not discussed in this manual.
bfa74976
RS
1329
1330@item
1331Write a controlling function that calls the Bison-produced parser.
1332
1333@item
1334Write error-reporting routines.
1335@end enumerate
1336
1337To turn this source code as written into a runnable program, you
1338must follow these steps:
1339
1340@enumerate
1341@item
1342Run Bison on the grammar to produce the parser.
1343
1344@item
1345Compile the code output by Bison, as well as any other source files.
1346
1347@item
1348Link the object files to produce the finished product.
1349@end enumerate
1350
342b8b6e 1351@node Grammar Layout
bfa74976
RS
1352@section The Overall Layout of a Bison Grammar
1353@cindex grammar file
1354@cindex file format
1355@cindex format of grammar file
1356@cindex layout of Bison grammar
1357
1358The input file for the Bison utility is a @dfn{Bison grammar file}. The
1359general form of a Bison grammar file is as follows:
1360
1361@example
1362%@{
08e49d20 1363@var{Prologue}
bfa74976
RS
1364%@}
1365
1366@var{Bison declarations}
1367
1368%%
1369@var{Grammar rules}
1370%%
08e49d20 1371@var{Epilogue}
bfa74976
RS
1372@end example
1373
1374@noindent
1375The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
1376in every Bison grammar file to separate the sections.
1377
72d2299c 1378The prologue may define types and variables used in the actions. You can
342b8b6e 1379also use preprocessor commands to define macros used there, and use
bfa74976 1380@code{#include} to include header files that do any of these things.
38a92d50
PE
1381You need to declare the lexical analyzer @code{yylex} and the error
1382printer @code{yyerror} here, along with any other global identifiers
1383used by the actions in the grammar rules.
bfa74976
RS
1384
1385The Bison declarations declare the names of the terminal and nonterminal
1386symbols, and may also describe operator precedence and the data types of
1387semantic values of various symbols.
1388
1389The grammar rules define how to construct each nonterminal symbol from its
1390parts.
1391
38a92d50
PE
1392The epilogue can contain any code you want to use. Often the
1393definitions of functions declared in the prologue go here. In a
1394simple program, all the rest of the program can go here.
bfa74976 1395
342b8b6e 1396@node Examples
bfa74976
RS
1397@chapter Examples
1398@cindex simple examples
1399@cindex examples, simple
1400
2c0f9706 1401Now we show and explain several sample programs written using Bison: a
bfa74976 1402reverse polish notation calculator, an algebraic (infix) notation
2c0f9706
AD
1403calculator --- later extended to track ``locations'' ---
1404and a multi-function calculator. All
1405produce usable, though limited, interactive desk-top calculators.
bfa74976
RS
1406
1407These examples are simple, but Bison grammars for real programming
aa08666d
AD
1408languages are written the same way. You can copy these examples into a
1409source file to try them.
bfa74976
RS
1410
1411@menu
f56274a8
DJ
1412* RPN Calc:: Reverse polish notation calculator;
1413 a first example with no operator precedence.
1414* Infix Calc:: Infix (algebraic) notation calculator.
1415 Operator precedence is introduced.
bfa74976 1416* Simple Error Recovery:: Continuing after syntax errors.
342b8b6e 1417* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
f56274a8
DJ
1418* Multi-function Calc:: Calculator with memory and trig functions.
1419 It uses multiple data-types for semantic values.
1420* Exercises:: Ideas for improving the multi-function calculator.
bfa74976
RS
1421@end menu
1422
342b8b6e 1423@node RPN Calc
bfa74976
RS
1424@section Reverse Polish Notation Calculator
1425@cindex reverse polish notation
1426@cindex polish notation calculator
1427@cindex @code{rpcalc}
1428@cindex calculator, simple
1429
1430The first example is that of a simple double-precision @dfn{reverse polish
1431notation} calculator (a calculator using postfix operators). This example
1432provides a good starting point, since operator precedence is not an issue.
1433The second example will illustrate how operator precedence is handled.
1434
1435The source code for this calculator is named @file{rpcalc.y}. The
9913d6e4 1436@samp{.y} extension is a convention used for Bison grammar files.
bfa74976
RS
1437
1438@menu
f56274a8
DJ
1439* Rpcalc Declarations:: Prologue (declarations) for rpcalc.
1440* Rpcalc Rules:: Grammar Rules for rpcalc, with explanation.
1441* Rpcalc Lexer:: The lexical analyzer.
1442* Rpcalc Main:: The controlling function.
1443* Rpcalc Error:: The error reporting function.
1444* Rpcalc Generate:: Running Bison on the grammar file.
1445* Rpcalc Compile:: Run the C compiler on the output code.
bfa74976
RS
1446@end menu
1447
f56274a8 1448@node Rpcalc Declarations
bfa74976
RS
1449@subsection Declarations for @code{rpcalc}
1450
1451Here are the C and Bison declarations for the reverse polish notation
1452calculator. As in C, comments are placed between @samp{/*@dots{}*/}.
1453
1454@example
72d2299c 1455/* Reverse polish notation calculator. */
bfa74976
RS
1456
1457%@{
38a92d50
PE
1458 #define YYSTYPE double
1459 #include <math.h>
1460 int yylex (void);
1461 void yyerror (char const *);
bfa74976
RS
1462%@}
1463
1464%token NUM
1465
72d2299c 1466%% /* Grammar rules and actions follow. */
bfa74976
RS
1467@end example
1468
75f5aaea 1469The declarations section (@pxref{Prologue, , The prologue}) contains two
38a92d50 1470preprocessor directives and two forward declarations.
bfa74976
RS
1471
1472The @code{#define} directive defines the macro @code{YYSTYPE}, thus
1964ad8c
AD
1473specifying the C data type for semantic values of both tokens and
1474groupings (@pxref{Value Type, ,Data Types of Semantic Values}). The
1475Bison parser will use whatever type @code{YYSTYPE} is defined as; if you
1476don't define it, @code{int} is the default. Because we specify
1477@code{double}, each token and each expression has an associated value,
1478which is a floating point number.
bfa74976
RS
1479
1480The @code{#include} directive is used to declare the exponentiation
1481function @code{pow}.
1482
38a92d50
PE
1483The forward declarations for @code{yylex} and @code{yyerror} are
1484needed because the C language requires that functions be declared
1485before they are used. These functions will be defined in the
1486epilogue, but the parser calls them so they must be declared in the
1487prologue.
1488
704a47c4
AD
1489The second section, Bison declarations, provides information to Bison
1490about the token types (@pxref{Bison Declarations, ,The Bison
1491Declarations Section}). Each terminal symbol that is not a
1492single-character literal must be declared here. (Single-character
bfa74976
RS
1493literals normally don't need to be declared.) In this example, all the
1494arithmetic operators are designated by single-character literals, so the
1495only terminal symbol that needs to be declared is @code{NUM}, the token
1496type for numeric constants.
1497
342b8b6e 1498@node Rpcalc Rules
bfa74976
RS
1499@subsection Grammar Rules for @code{rpcalc}
1500
1501Here are the grammar rules for the reverse polish notation calculator.
1502
1503@example
2c0f9706 1504@group
de6be119
AD
1505input:
1506 /* empty */
1507| input line
bfa74976 1508;
2c0f9706 1509@end group
bfa74976 1510
2c0f9706 1511@group
de6be119
AD
1512line:
1513 '\n'
1514| exp '\n' @{ printf ("%.10g\n", $1); @}
bfa74976 1515;
2c0f9706 1516@end group
bfa74976 1517
2c0f9706 1518@group
de6be119
AD
1519exp:
1520 NUM @{ $$ = $1; @}
1521| exp exp '+' @{ $$ = $1 + $2; @}
1522| exp exp '-' @{ $$ = $1 - $2; @}
1523| exp exp '*' @{ $$ = $1 * $2; @}
1524| exp exp '/' @{ $$ = $1 / $2; @}
1525| exp exp '^' @{ $$ = pow ($1, $2); @} /* Exponentiation */
1526| exp 'n' @{ $$ = -$1; @} /* Unary minus */
bfa74976 1527;
2c0f9706 1528@end group
bfa74976
RS
1529%%
1530@end example
1531
1532The groupings of the rpcalc ``language'' defined here are the expression
1533(given the name @code{exp}), the line of input (@code{line}), and the
1534complete input transcript (@code{input}). Each of these nonterminal
8c5b881d 1535symbols has several alternate rules, joined by the vertical bar @samp{|}
bfa74976
RS
1536which is read as ``or''. The following sections explain what these rules
1537mean.
1538
1539The semantics of the language is determined by the actions taken when a
1540grouping is recognized. The actions are the C code that appears inside
1541braces. @xref{Actions}.
1542
1543You must specify these actions in C, but Bison provides the means for
1544passing semantic values between the rules. In each action, the
1545pseudo-variable @code{$$} stands for the semantic value for the grouping
1546that the rule is going to construct. Assigning a value to @code{$$} is the
1547main job of most actions. The semantic values of the components of the
1548rule are referred to as @code{$1}, @code{$2}, and so on.
1549
1550@menu
13863333
AD
1551* Rpcalc Input::
1552* Rpcalc Line::
1553* Rpcalc Expr::
bfa74976
RS
1554@end menu
1555
342b8b6e 1556@node Rpcalc Input
bfa74976
RS
1557@subsubsection Explanation of @code{input}
1558
1559Consider the definition of @code{input}:
1560
1561@example
de6be119
AD
1562input:
1563 /* empty */
1564| input line
bfa74976
RS
1565;
1566@end example
1567
1568This definition reads as follows: ``A complete input is either an empty
1569string, or a complete input followed by an input line''. Notice that
1570``complete input'' is defined in terms of itself. This definition is said
1571to be @dfn{left recursive} since @code{input} appears always as the
1572leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}.
1573
1574The first alternative is empty because there are no symbols between the
1575colon and the first @samp{|}; this means that @code{input} can match an
1576empty string of input (no tokens). We write the rules this way because it
1577is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
1578It's conventional to put an empty alternative first and write the comment
1579@samp{/* empty */} in it.
1580
1581The second alternate rule (@code{input line}) handles all nontrivial input.
1582It means, ``After reading any number of lines, read one more line if
1583possible.'' The left recursion makes this rule into a loop. Since the
1584first alternative matches empty input, the loop can be executed zero or
1585more times.
1586
1587The parser function @code{yyparse} continues to process input until a
1588grammatical error is seen or the lexical analyzer says there are no more
72d2299c 1589input tokens; we will arrange for the latter to happen at end-of-input.
bfa74976 1590
342b8b6e 1591@node Rpcalc Line
bfa74976
RS
1592@subsubsection Explanation of @code{line}
1593
1594Now consider the definition of @code{line}:
1595
1596@example
de6be119
AD
1597line:
1598 '\n'
1599| exp '\n' @{ printf ("%.10g\n", $1); @}
bfa74976
RS
1600;
1601@end example
1602
1603The first alternative is a token which is a newline character; this means
1604that rpcalc accepts a blank line (and ignores it, since there is no
1605action). The second alternative is an expression followed by a newline.
1606This is the alternative that makes rpcalc useful. The semantic value of
1607the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
1608question is the first symbol in the alternative. The action prints this
1609value, which is the result of the computation the user asked for.
1610
1611This action is unusual because it does not assign a value to @code{$$}. As
1612a consequence, the semantic value associated with the @code{line} is
1613uninitialized (its value will be unpredictable). This would be a bug if
1614that value were ever used, but we don't use it: once rpcalc has printed the
1615value of the user's input line, that value is no longer needed.
1616
342b8b6e 1617@node Rpcalc Expr
bfa74976
RS
1618@subsubsection Explanation of @code{expr}
1619
1620The @code{exp} grouping has several rules, one for each kind of expression.
1621The first rule handles the simplest expressions: those that are just numbers.
1622The second handles an addition-expression, which looks like two expressions
1623followed by a plus-sign. The third handles subtraction, and so on.
1624
1625@example
de6be119
AD
1626exp:
1627 NUM
1628| exp exp '+' @{ $$ = $1 + $2; @}
1629| exp exp '-' @{ $$ = $1 - $2; @}
1630@dots{}
1631;
bfa74976
RS
1632@end example
1633
1634We have used @samp{|} to join all the rules for @code{exp}, but we could
1635equally well have written them separately:
1636
1637@example
de6be119
AD
1638exp: NUM ;
1639exp: exp exp '+' @{ $$ = $1 + $2; @};
1640exp: exp exp '-' @{ $$ = $1 - $2; @};
1641@dots{}
bfa74976
RS
1642@end example
1643
1644Most of the rules have actions that compute the value of the expression in
1645terms of the value of its parts. For example, in the rule for addition,
1646@code{$1} refers to the first component @code{exp} and @code{$2} refers to
1647the second one. The third component, @code{'+'}, has no meaningful
1648associated semantic value, but if it had one you could refer to it as
1649@code{$3}. When @code{yyparse} recognizes a sum expression using this
1650rule, the sum of the two subexpressions' values is produced as the value of
1651the entire expression. @xref{Actions}.
1652
1653You don't have to give an action for every rule. When a rule has no
1654action, Bison by default copies the value of @code{$1} into @code{$$}.
1655This is what happens in the first rule (the one that uses @code{NUM}).
1656
1657The formatting shown here is the recommended convention, but Bison does
72d2299c 1658not require it. You can add or change white space as much as you wish.
bfa74976
RS
1659For example, this:
1660
1661@example
de6be119 1662exp: NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ;
bfa74976
RS
1663@end example
1664
1665@noindent
1666means the same thing as this:
1667
1668@example
de6be119
AD
1669exp:
1670 NUM
1671| exp exp '+' @{ $$ = $1 + $2; @}
1672| @dots{}
99a9344e 1673;
bfa74976
RS
1674@end example
1675
1676@noindent
1677The latter, however, is much more readable.
1678
342b8b6e 1679@node Rpcalc Lexer
bfa74976
RS
1680@subsection The @code{rpcalc} Lexical Analyzer
1681@cindex writing a lexical analyzer
1682@cindex lexical analyzer, writing
1683
704a47c4
AD
1684The lexical analyzer's job is low-level parsing: converting characters
1685or sequences of characters into tokens. The Bison parser gets its
1686tokens by calling the lexical analyzer. @xref{Lexical, ,The Lexical
1687Analyzer Function @code{yylex}}.
bfa74976 1688
35430378 1689Only a simple lexical analyzer is needed for the RPN
c827f760 1690calculator. This
bfa74976
RS
1691lexical analyzer skips blanks and tabs, then reads in numbers as
1692@code{double} and returns them as @code{NUM} tokens. Any other character
1693that isn't part of a number is a separate token. Note that the token-code
1694for such a single-character token is the character itself.
1695
1696The return value of the lexical analyzer function is a numeric code which
1697represents a token type. The same text used in Bison rules to stand for
1698this token type is also a C expression for the numeric code for the type.
1699This works in two ways. If the token type is a character literal, then its
e966383b 1700numeric code is that of the character; you can use the same
bfa74976
RS
1701character literal in the lexical analyzer to express the number. If the
1702token type is an identifier, that identifier is defined by Bison as a C
1703macro whose definition is the appropriate number. In this example,
1704therefore, @code{NUM} becomes a macro for @code{yylex} to use.
1705
1964ad8c
AD
1706The semantic value of the token (if it has one) is stored into the
1707global variable @code{yylval}, which is where the Bison parser will look
1708for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was
f56274a8 1709defined at the beginning of the grammar; @pxref{Rpcalc Declarations,
1964ad8c 1710,Declarations for @code{rpcalc}}.)
bfa74976 1711
72d2299c
PE
1712A token type code of zero is returned if the end-of-input is encountered.
1713(Bison recognizes any nonpositive value as indicating end-of-input.)
bfa74976
RS
1714
1715Here is the code for the lexical analyzer:
1716
1717@example
1718@group
72d2299c 1719/* The lexical analyzer returns a double floating point
e966383b 1720 number on the stack and the token NUM, or the numeric code
72d2299c
PE
1721 of the character read if not a number. It skips all blanks
1722 and tabs, and returns 0 for end-of-input. */
bfa74976
RS
1723
1724#include <ctype.h>
1725@end group
1726
1727@group
13863333
AD
1728int
1729yylex (void)
bfa74976
RS
1730@{
1731 int c;
1732
72d2299c 1733 /* Skip white space. */
13863333 1734 while ((c = getchar ()) == ' ' || c == '\t')
98842516 1735 continue;
bfa74976
RS
1736@end group
1737@group
72d2299c 1738 /* Process numbers. */
13863333 1739 if (c == '.' || isdigit (c))
bfa74976
RS
1740 @{
1741 ungetc (c, stdin);
1742 scanf ("%lf", &yylval);
1743 return NUM;
1744 @}
1745@end group
1746@group
72d2299c 1747 /* Return end-of-input. */
13863333 1748 if (c == EOF)
bfa74976 1749 return 0;
72d2299c 1750 /* Return a single char. */
13863333 1751 return c;
bfa74976
RS
1752@}
1753@end group
1754@end example
1755
342b8b6e 1756@node Rpcalc Main
bfa74976
RS
1757@subsection The Controlling Function
1758@cindex controlling function
1759@cindex main function in simple example
1760
1761In keeping with the spirit of this example, the controlling function is
1762kept to the bare minimum. The only requirement is that it call
1763@code{yyparse} to start the process of parsing.
1764
1765@example
1766@group
13863333
AD
1767int
1768main (void)
bfa74976 1769@{
13863333 1770 return yyparse ();
bfa74976
RS
1771@}
1772@end group
1773@end example
1774
342b8b6e 1775@node Rpcalc Error
bfa74976
RS
1776@subsection The Error Reporting Routine
1777@cindex error reporting routine
1778
1779When @code{yyparse} detects a syntax error, it calls the error reporting
13863333 1780function @code{yyerror} to print an error message (usually but not
6e649e65 1781always @code{"syntax error"}). It is up to the programmer to supply
13863333
AD
1782@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
1783here is the definition we will use:
bfa74976
RS
1784
1785@example
1786@group
1787#include <stdio.h>
2c0f9706 1788@end group
bfa74976 1789
2c0f9706 1790@group
38a92d50 1791/* Called by yyparse on error. */
13863333 1792void
38a92d50 1793yyerror (char const *s)
bfa74976 1794@{
4e03e201 1795 fprintf (stderr, "%s\n", s);
bfa74976
RS
1796@}
1797@end group
1798@end example
1799
1800After @code{yyerror} returns, the Bison parser may recover from the error
1801and continue parsing if the grammar contains a suitable error rule
1802(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
1803have not written any error rules in this example, so any invalid input will
1804cause the calculator program to exit. This is not clean behavior for a
9ecbd125 1805real calculator, but it is adequate for the first example.
bfa74976 1806
f56274a8 1807@node Rpcalc Generate
bfa74976
RS
1808@subsection Running Bison to Make the Parser
1809@cindex running Bison (introduction)
1810
ceed8467
AD
1811Before running Bison to produce a parser, we need to decide how to
1812arrange all the source code in one or more source files. For such a
9913d6e4
JD
1813simple example, the easiest thing is to put everything in one file,
1814the grammar file. The definitions of @code{yylex}, @code{yyerror} and
1815@code{main} go at the end, in the epilogue of the grammar file
75f5aaea 1816(@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}).
bfa74976
RS
1817
1818For a large project, you would probably have several source files, and use
1819@code{make} to arrange to recompile them.
1820
9913d6e4
JD
1821With all the source in the grammar file, you use the following command
1822to convert it into a parser implementation file:
bfa74976
RS
1823
1824@example
fa4d969f 1825bison @var{file}.y
bfa74976
RS
1826@end example
1827
1828@noindent
9913d6e4
JD
1829In this example, the grammar file is called @file{rpcalc.y} (for
1830``Reverse Polish @sc{calc}ulator''). Bison produces a parser
1831implementation file named @file{@var{file}.tab.c}, removing the
1832@samp{.y} from the grammar file name. The parser implementation file
1833contains the source code for @code{yyparse}. The additional functions
1834in the grammar file (@code{yylex}, @code{yyerror} and @code{main}) are
1835copied verbatim to the parser implementation file.
bfa74976 1836
342b8b6e 1837@node Rpcalc Compile
9913d6e4 1838@subsection Compiling the Parser Implementation File
bfa74976
RS
1839@cindex compiling the parser
1840
9913d6e4 1841Here is how to compile and run the parser implementation file:
bfa74976
RS
1842
1843@example
1844@group
1845# @r{List files in current directory.}
9edcd895 1846$ @kbd{ls}
bfa74976
RS
1847rpcalc.tab.c rpcalc.y
1848@end group
1849
1850@group
1851# @r{Compile the Bison parser.}
1852# @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
b56471a6 1853$ @kbd{cc -lm -o rpcalc rpcalc.tab.c}
bfa74976
RS
1854@end group
1855
1856@group
1857# @r{List files again.}
9edcd895 1858$ @kbd{ls}
bfa74976
RS
1859rpcalc rpcalc.tab.c rpcalc.y
1860@end group
1861@end example
1862
1863The file @file{rpcalc} now contains the executable code. Here is an
1864example session using @code{rpcalc}.
1865
1866@example
9edcd895
AD
1867$ @kbd{rpcalc}
1868@kbd{4 9 +}
bfa74976 186913
9edcd895 1870@kbd{3 7 + 3 4 5 *+-}
bfa74976 1871-13
9edcd895 1872@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
bfa74976 187313
9edcd895 1874@kbd{5 6 / 4 n +}
bfa74976 1875-3.166666667
9edcd895 1876@kbd{3 4 ^} @r{Exponentiation}
bfa74976 187781
9edcd895
AD
1878@kbd{^D} @r{End-of-file indicator}
1879$
bfa74976
RS
1880@end example
1881
342b8b6e 1882@node Infix Calc
bfa74976
RS
1883@section Infix Notation Calculator: @code{calc}
1884@cindex infix notation calculator
1885@cindex @code{calc}
1886@cindex calculator, infix notation
1887
1888We now modify rpcalc to handle infix operators instead of postfix. Infix
1889notation involves the concept of operator precedence and the need for
1890parentheses nested to arbitrary depth. Here is the Bison code for
1891@file{calc.y}, an infix desk-top calculator.
1892
1893@example
38a92d50 1894/* Infix notation calculator. */
bfa74976 1895
2c0f9706 1896@group
bfa74976 1897%@{
38a92d50
PE
1898 #define YYSTYPE double
1899 #include <math.h>
1900 #include <stdio.h>
1901 int yylex (void);
1902 void yyerror (char const *);
bfa74976 1903%@}
2c0f9706 1904@end group
bfa74976 1905
2c0f9706 1906@group
38a92d50 1907/* Bison declarations. */
bfa74976
RS
1908%token NUM
1909%left '-' '+'
1910%left '*' '/'
1911%left NEG /* negation--unary minus */
38a92d50 1912%right '^' /* exponentiation */
2c0f9706 1913@end group
bfa74976 1914
38a92d50 1915%% /* The grammar follows. */
2c0f9706 1916@group
de6be119
AD
1917input:
1918 /* empty */
1919| input line
bfa74976 1920;
2c0f9706 1921@end group
bfa74976 1922
2c0f9706 1923@group
de6be119
AD
1924line:
1925 '\n'
1926| exp '\n' @{ printf ("\t%.10g\n", $1); @}
bfa74976 1927;
2c0f9706 1928@end group
bfa74976 1929
2c0f9706 1930@group
de6be119
AD
1931exp:
1932 NUM @{ $$ = $1; @}
1933| exp '+' exp @{ $$ = $1 + $3; @}
1934| exp '-' exp @{ $$ = $1 - $3; @}
1935| exp '*' exp @{ $$ = $1 * $3; @}
1936| exp '/' exp @{ $$ = $1 / $3; @}
1937| '-' exp %prec NEG @{ $$ = -$2; @}
1938| exp '^' exp @{ $$ = pow ($1, $3); @}
1939| '(' exp ')' @{ $$ = $2; @}
bfa74976 1940;
2c0f9706 1941@end group
bfa74976
RS
1942%%
1943@end example
1944
1945@noindent
ceed8467
AD
1946The functions @code{yylex}, @code{yyerror} and @code{main} can be the
1947same as before.
bfa74976
RS
1948
1949There are two important new features shown in this code.
1950
1951In the second section (Bison declarations), @code{%left} declares token
1952types and says they are left-associative operators. The declarations
1953@code{%left} and @code{%right} (right associativity) take the place of
1954@code{%token} which is used to declare a token type name without
1955associativity. (These tokens are single-character literals, which
1956ordinarily don't need to be declared. We declare them here to specify
1957the associativity.)
1958
1959Operator precedence is determined by the line ordering of the
1960declarations; the higher the line number of the declaration (lower on
1961the page or screen), the higher the precedence. Hence, exponentiation
1962has the highest precedence, unary minus (@code{NEG}) is next, followed
704a47c4
AD
1963by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator
1964Precedence}.
bfa74976 1965
704a47c4
AD
1966The other important new feature is the @code{%prec} in the grammar
1967section for the unary minus operator. The @code{%prec} simply instructs
1968Bison that the rule @samp{| '-' exp} has the same precedence as
1969@code{NEG}---in this case the next-to-highest. @xref{Contextual
1970Precedence, ,Context-Dependent Precedence}.
bfa74976
RS
1971
1972Here is a sample run of @file{calc.y}:
1973
1974@need 500
1975@example
9edcd895
AD
1976$ @kbd{calc}
1977@kbd{4 + 4.5 - (34/(8*3+-3))}
bfa74976 19786.880952381
9edcd895 1979@kbd{-56 + 2}
bfa74976 1980-54
9edcd895 1981@kbd{3 ^ 2}
bfa74976
RS
19829
1983@end example
1984
342b8b6e 1985@node Simple Error Recovery
bfa74976
RS
1986@section Simple Error Recovery
1987@cindex error recovery, simple
1988
1989Up to this point, this manual has not addressed the issue of @dfn{error
1990recovery}---how to continue parsing after the parser detects a syntax
ceed8467
AD
1991error. All we have handled is error reporting with @code{yyerror}.
1992Recall that by default @code{yyparse} returns after calling
1993@code{yyerror}. This means that an erroneous input line causes the
1994calculator program to exit. Now we show how to rectify this deficiency.
bfa74976
RS
1995
1996The Bison language itself includes the reserved word @code{error}, which
1997may be included in the grammar rules. In the example below it has
1998been added to one of the alternatives for @code{line}:
1999
2000@example
2001@group
de6be119
AD
2002line:
2003 '\n'
2004| exp '\n' @{ printf ("\t%.10g\n", $1); @}
2005| error '\n' @{ yyerrok; @}
bfa74976
RS
2006;
2007@end group
2008@end example
2009
ceed8467 2010This addition to the grammar allows for simple error recovery in the
6e649e65 2011event of a syntax error. If an expression that cannot be evaluated is
ceed8467
AD
2012read, the error will be recognized by the third rule for @code{line},
2013and parsing will continue. (The @code{yyerror} function is still called
2014upon to print its message as well.) The action executes the statement
2015@code{yyerrok}, a macro defined automatically by Bison; its meaning is
2016that error recovery is complete (@pxref{Error Recovery}). Note the
2017difference between @code{yyerrok} and @code{yyerror}; neither one is a
e0c471a9 2018misprint.
bfa74976
RS
2019
2020This form of error recovery deals with syntax errors. There are other
2021kinds of errors; for example, division by zero, which raises an exception
2022signal that is normally fatal. A real calculator program must handle this
2023signal and use @code{longjmp} to return to @code{main} and resume parsing
2024input lines; it would also have to discard the rest of the current line of
2025input. We won't discuss this issue further because it is not specific to
2026Bison programs.
2027
342b8b6e
AD
2028@node Location Tracking Calc
2029@section Location Tracking Calculator: @code{ltcalc}
2030@cindex location tracking calculator
2031@cindex @code{ltcalc}
2032@cindex calculator, location tracking
2033
9edcd895
AD
2034This example extends the infix notation calculator with location
2035tracking. This feature will be used to improve the error messages. For
2036the sake of clarity, this example is a simple integer calculator, since
2037most of the work needed to use locations will be done in the lexical
72d2299c 2038analyzer.
342b8b6e
AD
2039
2040@menu
f56274a8
DJ
2041* Ltcalc Declarations:: Bison and C declarations for ltcalc.
2042* Ltcalc Rules:: Grammar rules for ltcalc, with explanations.
2043* Ltcalc Lexer:: The lexical analyzer.
342b8b6e
AD
2044@end menu
2045
f56274a8 2046@node Ltcalc Declarations
342b8b6e
AD
2047@subsection Declarations for @code{ltcalc}
2048
9edcd895
AD
2049The C and Bison declarations for the location tracking calculator are
2050the same as the declarations for the infix notation calculator.
342b8b6e
AD
2051
2052@example
2053/* Location tracking calculator. */
2054
2055%@{
38a92d50
PE
2056 #define YYSTYPE int
2057 #include <math.h>
2058 int yylex (void);
2059 void yyerror (char const *);
342b8b6e
AD
2060%@}
2061
2062/* Bison declarations. */
2063%token NUM
2064
2065%left '-' '+'
2066%left '*' '/'
2067%left NEG
2068%right '^'
2069
38a92d50 2070%% /* The grammar follows. */
342b8b6e
AD
2071@end example
2072
9edcd895
AD
2073@noindent
2074Note there are no declarations specific to locations. Defining a data
2075type for storing locations is not needed: we will use the type provided
2076by default (@pxref{Location Type, ,Data Types of Locations}), which is a
2077four member structure with the following integer fields:
2078@code{first_line}, @code{first_column}, @code{last_line} and
cd48d21d
AD
2079@code{last_column}. By conventions, and in accordance with the GNU
2080Coding Standards and common practice, the line and column count both
2081start at 1.
342b8b6e
AD
2082
2083@node Ltcalc Rules
2084@subsection Grammar Rules for @code{ltcalc}
2085
9edcd895
AD
2086Whether handling locations or not has no effect on the syntax of your
2087language. Therefore, grammar rules for this example will be very close
2088to those of the previous example: we will only modify them to benefit
2089from the new information.
342b8b6e 2090
9edcd895
AD
2091Here, we will use locations to report divisions by zero, and locate the
2092wrong expressions or subexpressions.
342b8b6e
AD
2093
2094@example
2095@group
de6be119
AD
2096input:
2097 /* empty */
2098| input line
342b8b6e
AD
2099;
2100@end group
2101
2102@group
de6be119
AD
2103line:
2104 '\n'
2105| exp '\n' @{ printf ("%d\n", $1); @}
342b8b6e
AD
2106;
2107@end group
2108
2109@group
de6be119
AD
2110exp:
2111 NUM @{ $$ = $1; @}
2112| exp '+' exp @{ $$ = $1 + $3; @}
2113| exp '-' exp @{ $$ = $1 - $3; @}
2114| exp '*' exp @{ $$ = $1 * $3; @}
342b8b6e 2115@end group
342b8b6e 2116@group
de6be119
AD
2117| exp '/' exp
2118 @{
2119 if ($3)
2120 $$ = $1 / $3;
2121 else
2122 @{
2123 $$ = 1;
2124 fprintf (stderr, "%d.%d-%d.%d: division by zero",
2125 @@3.first_line, @@3.first_column,
2126 @@3.last_line, @@3.last_column);
2127 @}
2128 @}
342b8b6e
AD
2129@end group
2130@group
de6be119
AD
2131| '-' exp %prec NEG @{ $$ = -$2; @}
2132| exp '^' exp @{ $$ = pow ($1, $3); @}
2133| '(' exp ')' @{ $$ = $2; @}
342b8b6e
AD
2134@end group
2135@end example
2136
2137This code shows how to reach locations inside of semantic actions, by
2138using the pseudo-variables @code{@@@var{n}} for rule components, and the
2139pseudo-variable @code{@@$} for groupings.
2140
9edcd895
AD
2141We don't need to assign a value to @code{@@$}: the output parser does it
2142automatically. By default, before executing the C code of each action,
2143@code{@@$} is set to range from the beginning of @code{@@1} to the end
2144of @code{@@@var{n}}, for a rule with @var{n} components. This behavior
2145can be redefined (@pxref{Location Default Action, , Default Action for
2146Locations}), and for very specific rules, @code{@@$} can be computed by
2147hand.
342b8b6e
AD
2148
2149@node Ltcalc Lexer
2150@subsection The @code{ltcalc} Lexical Analyzer.
2151
9edcd895 2152Until now, we relied on Bison's defaults to enable location
72d2299c 2153tracking. The next step is to rewrite the lexical analyzer, and make it
9edcd895
AD
2154able to feed the parser with the token locations, as it already does for
2155semantic values.
342b8b6e 2156
9edcd895
AD
2157To this end, we must take into account every single character of the
2158input text, to avoid the computed locations of being fuzzy or wrong:
342b8b6e
AD
2159
2160@example
2161@group
2162int
2163yylex (void)
2164@{
2165 int c;
18b519c0 2166@end group
342b8b6e 2167
18b519c0 2168@group
72d2299c 2169 /* Skip white space. */
342b8b6e
AD
2170 while ((c = getchar ()) == ' ' || c == '\t')
2171 ++yylloc.last_column;
18b519c0 2172@end group
342b8b6e 2173
18b519c0 2174@group
72d2299c 2175 /* Step. */
342b8b6e
AD
2176 yylloc.first_line = yylloc.last_line;
2177 yylloc.first_column = yylloc.last_column;
2178@end group
2179
2180@group
72d2299c 2181 /* Process numbers. */
342b8b6e
AD
2182 if (isdigit (c))
2183 @{
2184 yylval = c - '0';
2185 ++yylloc.last_column;
2186 while (isdigit (c = getchar ()))
2187 @{
2188 ++yylloc.last_column;
2189 yylval = yylval * 10 + c - '0';
2190 @}
2191 ungetc (c, stdin);
2192 return NUM;
2193 @}
2194@end group
2195
72d2299c 2196 /* Return end-of-input. */
342b8b6e
AD
2197 if (c == EOF)
2198 return 0;
2199
98842516 2200@group
72d2299c 2201 /* Return a single char, and update location. */
342b8b6e
AD
2202 if (c == '\n')
2203 @{
2204 ++yylloc.last_line;
2205 yylloc.last_column = 0;
2206 @}
2207 else
2208 ++yylloc.last_column;
2209 return c;
2210@}
98842516 2211@end group
342b8b6e
AD
2212@end example
2213
9edcd895
AD
2214Basically, the lexical analyzer performs the same processing as before:
2215it skips blanks and tabs, and reads numbers or single-character tokens.
2216In addition, it updates @code{yylloc}, the global variable (of type
2217@code{YYLTYPE}) containing the token's location.
342b8b6e 2218
9edcd895 2219Now, each time this function returns a token, the parser has its number
72d2299c 2220as well as its semantic value, and its location in the text. The last
9edcd895
AD
2221needed change is to initialize @code{yylloc}, for example in the
2222controlling function:
342b8b6e
AD
2223
2224@example
9edcd895 2225@group
342b8b6e
AD
2226int
2227main (void)
2228@{
2229 yylloc.first_line = yylloc.last_line = 1;
2230 yylloc.first_column = yylloc.last_column = 0;
2231 return yyparse ();
2232@}
9edcd895 2233@end group
342b8b6e
AD
2234@end example
2235
9edcd895
AD
2236Remember that computing locations is not a matter of syntax. Every
2237character must be associated to a location update, whether it is in
2238valid input, in comments, in literal strings, and so on.
342b8b6e
AD
2239
2240@node Multi-function Calc
bfa74976
RS
2241@section Multi-Function Calculator: @code{mfcalc}
2242@cindex multi-function calculator
2243@cindex @code{mfcalc}
2244@cindex calculator, multi-function
2245
2246Now that the basics of Bison have been discussed, it is time to move on to
2247a more advanced problem. The above calculators provided only five
2248functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would
2249be nice to have a calculator that provides other mathematical functions such
2250as @code{sin}, @code{cos}, etc.
2251
2252It is easy to add new operators to the infix calculator as long as they are
2253only single-character literals. The lexical analyzer @code{yylex} passes
9d9b8b70 2254back all nonnumeric characters as tokens, so new grammar rules suffice for
bfa74976
RS
2255adding a new operator. But we want something more flexible: built-in
2256functions whose syntax has this form:
2257
2258@example
2259@var{function_name} (@var{argument})
2260@end example
2261
2262@noindent
2263At the same time, we will add memory to the calculator, by allowing you
2264to create named variables, store values in them, and use them later.
2265Here is a sample session with the multi-function calculator:
2266
2267@example
9edcd895
AD
2268$ @kbd{mfcalc}
2269@kbd{pi = 3.141592653589}
bfa74976 22703.1415926536
9edcd895 2271@kbd{sin(pi)}
bfa74976 22720.0000000000
9edcd895 2273@kbd{alpha = beta1 = 2.3}
bfa74976 22742.3000000000
9edcd895 2275@kbd{alpha}
bfa74976 22762.3000000000
9edcd895 2277@kbd{ln(alpha)}
bfa74976 22780.8329091229
9edcd895 2279@kbd{exp(ln(beta1))}
bfa74976 22802.3000000000
9edcd895 2281$
bfa74976
RS
2282@end example
2283
2284Note that multiple assignment and nested function calls are permitted.
2285
2286@menu
f56274a8
DJ
2287* Mfcalc Declarations:: Bison declarations for multi-function calculator.
2288* Mfcalc Rules:: Grammar rules for the calculator.
2289* Mfcalc Symbol Table:: Symbol table management subroutines.
bfa74976
RS
2290@end menu
2291
f56274a8 2292@node Mfcalc Declarations
bfa74976
RS
2293@subsection Declarations for @code{mfcalc}
2294
2295Here are the C and Bison declarations for the multi-function calculator.
2296
56d60c19 2297@comment file: mfcalc.y: 1
ea118b72 2298@example
18b519c0 2299@group
bfa74976 2300%@{
38a92d50
PE
2301 #include <math.h> /* For math functions, cos(), sin(), etc. */
2302 #include "calc.h" /* Contains definition of `symrec'. */
2303 int yylex (void);
2304 void yyerror (char const *);
bfa74976 2305%@}
18b519c0 2306@end group
56d60c19 2307
18b519c0 2308@group
bfa74976 2309%union @{
38a92d50
PE
2310 double val; /* For returning numbers. */
2311 symrec *tptr; /* For returning symbol-table pointers. */
bfa74976 2312@}
18b519c0 2313@end group
38a92d50 2314%token <val> NUM /* Simple double precision number. */
56d60c19 2315%token <tptr> VAR FNCT /* Variable and function. */
bfa74976
RS
2316%type <val> exp
2317
18b519c0 2318@group
bfa74976
RS
2319%right '='
2320%left '-' '+'
2321%left '*' '/'
38a92d50
PE
2322%left NEG /* negation--unary minus */
2323%right '^' /* exponentiation */
18b519c0 2324@end group
ea118b72 2325@end example
bfa74976
RS
2326
2327The above grammar introduces only two new features of the Bison language.
2328These features allow semantic values to have various data types
2329(@pxref{Multiple Types, ,More Than One Value Type}).
2330
2331The @code{%union} declaration specifies the entire list of possible types;
2332this is instead of defining @code{YYSTYPE}. The allowable types are now
2333double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
2334the symbol table. @xref{Union Decl, ,The Collection of Value Types}.
2335
2336Since values can now have various types, it is necessary to associate a
2337type with each grammar symbol whose semantic value is used. These symbols
2338are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their
2339declarations are augmented with information about their data type (placed
2340between angle brackets).
2341
704a47c4
AD
2342The Bison construct @code{%type} is used for declaring nonterminal
2343symbols, just as @code{%token} is used for declaring token types. We
2344have not used @code{%type} before because nonterminal symbols are
2345normally declared implicitly by the rules that define them. But
2346@code{exp} must be declared explicitly so we can specify its value type.
2347@xref{Type Decl, ,Nonterminal Symbols}.
bfa74976 2348
342b8b6e 2349@node Mfcalc Rules
bfa74976
RS
2350@subsection Grammar Rules for @code{mfcalc}
2351
2352Here are the grammar rules for the multi-function calculator.
2353Most of them are copied directly from @code{calc}; three rules,
2354those which mention @code{VAR} or @code{FNCT}, are new.
2355
56d60c19 2356@comment file: mfcalc.y: 3
ea118b72 2357@example
56d60c19 2358%% /* The grammar follows. */
18b519c0 2359@group
de6be119
AD
2360input:
2361 /* empty */
2362| input line
bfa74976 2363;
18b519c0 2364@end group
bfa74976 2365
18b519c0 2366@group
bfa74976 2367line:
de6be119
AD
2368 '\n'
2369| exp '\n' @{ printf ("%.10g\n", $1); @}
2370| error '\n' @{ yyerrok; @}
bfa74976 2371;
18b519c0 2372@end group
bfa74976 2373
18b519c0 2374@group
de6be119
AD
2375exp:
2376 NUM @{ $$ = $1; @}
2377| VAR @{ $$ = $1->value.var; @}
2378| VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
2379| FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
2380| exp '+' exp @{ $$ = $1 + $3; @}
2381| exp '-' exp @{ $$ = $1 - $3; @}
2382| exp '*' exp @{ $$ = $1 * $3; @}
2383| exp '/' exp @{ $$ = $1 / $3; @}
2384| '-' exp %prec NEG @{ $$ = -$2; @}
2385| exp '^' exp @{ $$ = pow ($1, $3); @}
2386| '(' exp ')' @{ $$ = $2; @}
bfa74976 2387;
18b519c0 2388@end group
38a92d50 2389/* End of grammar. */
bfa74976 2390%%
ea118b72 2391@end example
bfa74976 2392
f56274a8 2393@node Mfcalc Symbol Table
bfa74976
RS
2394@subsection The @code{mfcalc} Symbol Table
2395@cindex symbol table example
2396
2397The multi-function calculator requires a symbol table to keep track of the
2398names and meanings of variables and functions. This doesn't affect the
2399grammar rules (except for the actions) or the Bison declarations, but it
2400requires some additional C functions for support.
2401
2402The symbol table itself consists of a linked list of records. Its
2403definition, which is kept in the header @file{calc.h}, is as follows. It
2404provides for either functions or variables to be placed in the table.
2405
ea118b72
AD
2406@comment file: calc.h
2407@example
bfa74976 2408@group
38a92d50 2409/* Function type. */
32dfccf8 2410typedef double (*func_t) (double);
72f889cc 2411@end group
32dfccf8 2412
72f889cc 2413@group
38a92d50 2414/* Data type for links in the chain of symbols. */
bfa74976
RS
2415struct symrec
2416@{
38a92d50 2417 char *name; /* name of symbol */
bfa74976 2418 int type; /* type of symbol: either VAR or FNCT */
32dfccf8
AD
2419 union
2420 @{
38a92d50
PE
2421 double var; /* value of a VAR */
2422 func_t fnctptr; /* value of a FNCT */
bfa74976 2423 @} value;
38a92d50 2424 struct symrec *next; /* link field */
bfa74976
RS
2425@};
2426@end group
2427
2428@group
2429typedef struct symrec symrec;
2430
38a92d50 2431/* The symbol table: a chain of `struct symrec'. */
bfa74976
RS
2432extern symrec *sym_table;
2433
a730d142 2434symrec *putsym (char const *, int);
38a92d50 2435symrec *getsym (char const *);
bfa74976 2436@end group
ea118b72 2437@end example
bfa74976
RS
2438
2439The new version of @code{main} includes a call to @code{init_table}, a
2440function that initializes the symbol table. Here it is, and
2441@code{init_table} as well:
2442
56d60c19 2443@comment file: mfcalc.y: 3
ea118b72 2444@example
bfa74976
RS
2445#include <stdio.h>
2446
18b519c0 2447@group
38a92d50 2448/* Called by yyparse on error. */
13863333 2449void
38a92d50 2450yyerror (char const *s)
bfa74976
RS
2451@{
2452 printf ("%s\n", s);
2453@}
18b519c0 2454@end group
bfa74976 2455
18b519c0 2456@group
bfa74976
RS
2457struct init
2458@{
38a92d50
PE
2459 char const *fname;
2460 double (*fnct) (double);
bfa74976
RS
2461@};
2462@end group
2463
2464@group
38a92d50 2465struct init const arith_fncts[] =
13863333 2466@{
32dfccf8
AD
2467 "sin", sin,
2468 "cos", cos,
13863333 2469 "atan", atan,
32dfccf8
AD
2470 "ln", log,
2471 "exp", exp,
13863333
AD
2472 "sqrt", sqrt,
2473 0, 0
2474@};
18b519c0 2475@end group
bfa74976 2476
18b519c0 2477@group
bfa74976 2478/* The symbol table: a chain of `struct symrec'. */
38a92d50 2479symrec *sym_table;
bfa74976
RS
2480@end group
2481
2482@group
72d2299c 2483/* Put arithmetic functions in table. */
13863333
AD
2484void
2485init_table (void)
bfa74976
RS
2486@{
2487 int i;
bfa74976
RS
2488 for (i = 0; arith_fncts[i].fname != 0; i++)
2489 @{
2c0f9706 2490 symrec *ptr = putsym (arith_fncts[i].fname, FNCT);
bfa74976
RS
2491 ptr->value.fnctptr = arith_fncts[i].fnct;
2492 @}
2493@}
2494@end group
38a92d50
PE
2495
2496@group
2497int
2498main (void)
2499@{
2500 init_table ();
2501 return yyparse ();
2502@}
2503@end group
ea118b72 2504@end example
bfa74976
RS
2505
2506By simply editing the initialization list and adding the necessary include
2507files, you can add additional functions to the calculator.
2508
2509Two important functions allow look-up and installation of symbols in the
2510symbol table. The function @code{putsym} is passed a name and the type
2511(@code{VAR} or @code{FNCT}) of the object to be installed. The object is
2512linked to the front of the list, and a pointer to the object is returned.
2513The function @code{getsym} is passed the name of the symbol to look up. If
2514found, a pointer to that symbol is returned; otherwise zero is returned.
2515
56d60c19 2516@comment file: mfcalc.y: 3
ea118b72 2517@example
98842516
AD
2518#include <stdlib.h> /* malloc. */
2519#include <string.h> /* strlen. */
2520
2521@group
bfa74976 2522symrec *
38a92d50 2523putsym (char const *sym_name, int sym_type)
bfa74976 2524@{
2c0f9706 2525 symrec *ptr = (symrec *) malloc (sizeof (symrec));
bfa74976
RS
2526 ptr->name = (char *) malloc (strlen (sym_name) + 1);
2527 strcpy (ptr->name,sym_name);
2528 ptr->type = sym_type;
72d2299c 2529 ptr->value.var = 0; /* Set value to 0 even if fctn. */
bfa74976
RS
2530 ptr->next = (struct symrec *)sym_table;
2531 sym_table = ptr;
2532 return ptr;
2533@}
98842516 2534@end group
bfa74976 2535
98842516 2536@group
bfa74976 2537symrec *
38a92d50 2538getsym (char const *sym_name)
bfa74976
RS
2539@{
2540 symrec *ptr;
2541 for (ptr = sym_table; ptr != (symrec *) 0;
2542 ptr = (symrec *)ptr->next)
2543 if (strcmp (ptr->name,sym_name) == 0)
2544 return ptr;
2545 return 0;
2546@}
98842516 2547@end group
ea118b72 2548@end example
bfa74976
RS
2549
2550The function @code{yylex} must now recognize variables, numeric values, and
2551the single-character arithmetic operators. Strings of alphanumeric
9d9b8b70 2552characters with a leading letter are recognized as either variables or
bfa74976
RS
2553functions depending on what the symbol table says about them.
2554
2555The string is passed to @code{getsym} for look up in the symbol table. If
2556the name appears in the table, a pointer to its location and its type
2557(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not
2558already in the table, then it is installed as a @code{VAR} using
2559@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
e0c471a9 2560returned to @code{yyparse}.
bfa74976
RS
2561
2562No change is needed in the handling of numeric values and arithmetic
2563operators in @code{yylex}.
2564
56d60c19 2565@comment file: mfcalc.y: 3
ea118b72 2566@example
bfa74976
RS
2567@group
2568#include <ctype.h>
18b519c0 2569@end group
13863333 2570
18b519c0 2571@group
13863333
AD
2572int
2573yylex (void)
bfa74976
RS
2574@{
2575 int c;
2576
72d2299c 2577 /* Ignore white space, get first nonwhite character. */
98842516
AD
2578 while ((c = getchar ()) == ' ' || c == '\t')
2579 continue;
bfa74976
RS
2580
2581 if (c == EOF)
2582 return 0;
2583@end group
2584
2585@group
2586 /* Char starts a number => parse the number. */
2587 if (c == '.' || isdigit (c))
2588 @{
2589 ungetc (c, stdin);
2590 scanf ("%lf", &yylval.val);
2591 return NUM;
2592 @}
2593@end group
2594
2595@group
2596 /* Char starts an identifier => read the name. */
2597 if (isalpha (c))
2598 @{
2c0f9706
AD
2599 /* Initially make the buffer long enough
2600 for a 40-character symbol name. */
2601 static size_t length = 40;
bfa74976 2602 static char *symbuf = 0;
2c0f9706 2603 symrec *s;
bfa74976
RS
2604 int i;
2605@end group
2606
2c0f9706
AD
2607 if (!symbuf)
2608 symbuf = (char *) malloc (length + 1);
bfa74976
RS
2609
2610 i = 0;
2611 do
bfa74976
RS
2612@group
2613 @{
2614 /* If buffer is full, make it bigger. */
2615 if (i == length)
2616 @{
2617 length *= 2;
18b519c0 2618 symbuf = (char *) realloc (symbuf, length + 1);
bfa74976
RS
2619 @}
2620 /* Add this character to the buffer. */
2621 symbuf[i++] = c;
2622 /* Get another character. */
2623 c = getchar ();
2624 @}
2625@end group
2626@group
72d2299c 2627 while (isalnum (c));
bfa74976
RS
2628
2629 ungetc (c, stdin);
2630 symbuf[i] = '\0';
2631@end group
2632
2633@group
2634 s = getsym (symbuf);
2635 if (s == 0)
2636 s = putsym (symbuf, VAR);
2637 yylval.tptr = s;
2638 return s->type;
2639 @}
2640
2641 /* Any other character is a token by itself. */
2642 return c;
2643@}
2644@end group
ea118b72 2645@end example
bfa74976 2646
56d60c19
AD
2647The error reporting function is unchanged, and the new version of
2648@code{main} includes a call to @code{init_table} and sets the @code{yydebug}
2649on user demand (@xref{Tracing, , Tracing Your Parser}, for details):
2650
2651@comment file: mfcalc.y: 3
2652@example
2653@group
2654/* Called by yyparse on error. */
2655void
2656yyerror (char const *s)
2657@{
2658 fprintf (stderr, "%s\n", s);
2659@}
2660@end group
2661
2662@group
2663int
2664main (int argc, char const* argv[])
2665@{
2666 int i;
2667 /* Enable parse traces on option -p. */
2668 for (i = 1; i < argc; ++i)
2669 if (!strcmp(argv[i], "-p"))
2670 yydebug = 1;
2671 init_table ();
2672 return yyparse ();
2673@}
2674@end group
2675@end example
2676
72d2299c 2677This program is both powerful and flexible. You may easily add new
704a47c4
AD
2678functions, and it is a simple job to modify this code to install
2679predefined variables such as @code{pi} or @code{e} as well.
bfa74976 2680
342b8b6e 2681@node Exercises
bfa74976
RS
2682@section Exercises
2683@cindex exercises
2684
2685@enumerate
2686@item
2687Add some new functions from @file{math.h} to the initialization list.
2688
2689@item
2690Add another array that contains constants and their values. Then
2691modify @code{init_table} to add these constants to the symbol table.
2692It will be easiest to give the constants type @code{VAR}.
2693
2694@item
2695Make the program report an error if the user refers to an
2696uninitialized variable in any way except to store a value in it.
2697@end enumerate
2698
342b8b6e 2699@node Grammar File
bfa74976
RS
2700@chapter Bison Grammar Files
2701
2702Bison takes as input a context-free grammar specification and produces a
2703C-language function that recognizes correct instances of the grammar.
2704
9913d6e4 2705The Bison grammar file conventionally has a name ending in @samp{.y}.
234a3be3 2706@xref{Invocation, ,Invoking Bison}.
bfa74976
RS
2707
2708@menu
7404cdf3
JD
2709* Grammar Outline:: Overall layout of the grammar file.
2710* Symbols:: Terminal and nonterminal symbols.
2711* Rules:: How to write grammar rules.
2712* Recursion:: Writing recursive rules.
2713* Semantics:: Semantic values and actions.
2714* Tracking Locations:: Locations and actions.
2715* Named References:: Using named references in actions.
2716* Declarations:: All kinds of Bison declarations are described here.
2717* Multiple Parsers:: Putting more than one Bison parser in one program.
bfa74976
RS
2718@end menu
2719
342b8b6e 2720@node Grammar Outline
bfa74976
RS
2721@section Outline of a Bison Grammar
2722
2723A Bison grammar file has four main sections, shown here with the
2724appropriate delimiters:
2725
2726@example
2727%@{
38a92d50 2728 @var{Prologue}
bfa74976
RS
2729%@}
2730
2731@var{Bison declarations}
2732
2733%%
2734@var{Grammar rules}
2735%%
2736
75f5aaea 2737@var{Epilogue}
bfa74976
RS
2738@end example
2739
2740Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
35430378 2741As a GNU extension, @samp{//} introduces a comment that
2bfc2e2a 2742continues until end of line.
bfa74976
RS
2743
2744@menu
f56274a8 2745* Prologue:: Syntax and usage of the prologue.
2cbe6b7f 2746* Prologue Alternatives:: Syntax and usage of alternatives to the prologue.
f56274a8
DJ
2747* Bison Declarations:: Syntax and usage of the Bison declarations section.
2748* Grammar Rules:: Syntax and usage of the grammar rules section.
2749* Epilogue:: Syntax and usage of the epilogue.
bfa74976
RS
2750@end menu
2751
38a92d50 2752@node Prologue
75f5aaea
MA
2753@subsection The prologue
2754@cindex declarations section
2755@cindex Prologue
2756@cindex declarations
bfa74976 2757
f8e1c9e5
AD
2758The @var{Prologue} section contains macro definitions and declarations
2759of functions and variables that are used in the actions in the grammar
9913d6e4
JD
2760rules. These are copied to the beginning of the parser implementation
2761file so that they precede the definition of @code{yyparse}. You can
2762use @samp{#include} to get the declarations from a header file. If
2763you don't need any C declarations, you may omit the @samp{%@{} and
f8e1c9e5 2764@samp{%@}} delimiters that bracket this section.
bfa74976 2765
9c437126 2766The @var{Prologue} section is terminated by the first occurrence
287c78f6
PE
2767of @samp{%@}} that is outside a comment, a string literal, or a
2768character constant.
2769
c732d2c6
AD
2770You may have more than one @var{Prologue} section, intermixed with the
2771@var{Bison declarations}. This allows you to have C and Bison
2772declarations that refer to each other. For example, the @code{%union}
2773declaration may use types defined in a header file, and you may wish to
2774prototype functions that take arguments of type @code{YYSTYPE}. This
2775can be done with two @var{Prologue} blocks, one before and one after the
2776@code{%union} declaration.
2777
ea118b72 2778@example
c732d2c6 2779%@{
aef3da86 2780 #define _GNU_SOURCE
38a92d50
PE
2781 #include <stdio.h>
2782 #include "ptypes.h"
c732d2c6
AD
2783%@}
2784
2785%union @{
779e7ceb 2786 long int n;
c732d2c6
AD
2787 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2788@}
2789
2790%@{
38a92d50
PE
2791 static void print_token_value (FILE *, int, YYSTYPE);
2792 #define YYPRINT(F, N, L) print_token_value (F, N, L)
c732d2c6
AD
2793%@}
2794
2795@dots{}
ea118b72 2796@end example
c732d2c6 2797
aef3da86
PE
2798When in doubt, it is usually safer to put prologue code before all
2799Bison declarations, rather than after. For example, any definitions
2800of feature test macros like @code{_GNU_SOURCE} or
2801@code{_POSIX_C_SOURCE} should appear before all Bison declarations, as
2802feature test macros can affect the behavior of Bison-generated
2803@code{#include} directives.
2804
2cbe6b7f
JD
2805@node Prologue Alternatives
2806@subsection Prologue Alternatives
2807@cindex Prologue Alternatives
2808
136a0f76 2809@findex %code
16dc6a9e
JD
2810@findex %code requires
2811@findex %code provides
2812@findex %code top
85894313 2813
2cbe6b7f 2814The functionality of @var{Prologue} sections can often be subtle and
9913d6e4
JD
2815inflexible. As an alternative, Bison provides a @code{%code}
2816directive with an explicit qualifier field, which identifies the
2817purpose of the code and thus the location(s) where Bison should
2818generate it. For C/C++, the qualifier can be omitted for the default
2819location, or it can be one of @code{requires}, @code{provides},
8e6f2266 2820@code{top}. @xref{%code Summary}.
2cbe6b7f
JD
2821
2822Look again at the example of the previous section:
2823
ea118b72 2824@example
2cbe6b7f
JD
2825%@{
2826 #define _GNU_SOURCE
2827 #include <stdio.h>
2828 #include "ptypes.h"
2829%@}
2830
2831%union @{
2832 long int n;
2833 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2834@}
2835
2836%@{
2837 static void print_token_value (FILE *, int, YYSTYPE);
2838 #define YYPRINT(F, N, L) print_token_value (F, N, L)
2839%@}
2840
2841@dots{}
ea118b72 2842@end example
2cbe6b7f
JD
2843
2844@noindent
9913d6e4
JD
2845Notice that there are two @var{Prologue} sections here, but there's a
2846subtle distinction between their functionality. For example, if you
2847decide to override Bison's default definition for @code{YYLTYPE}, in
2848which @var{Prologue} section should you write your new definition?
2849You should write it in the first since Bison will insert that code
2850into the parser implementation file @emph{before} the default
2851@code{YYLTYPE} definition. In which @var{Prologue} section should you
2852prototype an internal function, @code{trace_token}, that accepts
2853@code{YYLTYPE} and @code{yytokentype} as arguments? You should
2854prototype it in the second since Bison will insert that code
2cbe6b7f
JD
2855@emph{after} the @code{YYLTYPE} and @code{yytokentype} definitions.
2856
2857This distinction in functionality between the two @var{Prologue} sections is
2858established by the appearance of the @code{%union} between them.
a501eca9 2859This behavior raises a few questions.
2cbe6b7f
JD
2860First, why should the position of a @code{%union} affect definitions related to
2861@code{YYLTYPE} and @code{yytokentype}?
2862Second, what if there is no @code{%union}?
2863In that case, the second kind of @var{Prologue} section is not available.
2864This behavior is not intuitive.
2865
8e0a5e9e 2866To avoid this subtle @code{%union} dependency, rewrite the example using a
16dc6a9e 2867@code{%code top} and an unqualified @code{%code}.
2cbe6b7f
JD
2868Let's go ahead and add the new @code{YYLTYPE} definition and the
2869@code{trace_token} prototype at the same time:
2870
ea118b72 2871@example
16dc6a9e 2872%code top @{
2cbe6b7f
JD
2873 #define _GNU_SOURCE
2874 #include <stdio.h>
8e0a5e9e
JD
2875
2876 /* WARNING: The following code really belongs
16dc6a9e 2877 * in a `%code requires'; see below. */
8e0a5e9e 2878
2cbe6b7f
JD
2879 #include "ptypes.h"
2880 #define YYLTYPE YYLTYPE
2881 typedef struct YYLTYPE
2882 @{
2883 int first_line;
2884 int first_column;
2885 int last_line;
2886 int last_column;
2887 char *filename;
2888 @} YYLTYPE;
2889@}
2890
2891%union @{
2892 long int n;
2893 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2894@}
2895
2896%code @{
2897 static void print_token_value (FILE *, int, YYSTYPE);
2898 #define YYPRINT(F, N, L) print_token_value (F, N, L)
2899 static void trace_token (enum yytokentype token, YYLTYPE loc);
2900@}
2901
2902@dots{}
ea118b72 2903@end example
2cbe6b7f
JD
2904
2905@noindent
16dc6a9e
JD
2906In this way, @code{%code top} and the unqualified @code{%code} achieve the same
2907functionality as the two kinds of @var{Prologue} sections, but it's always
8e0a5e9e 2908explicit which kind you intend.
2cbe6b7f
JD
2909Moreover, both kinds are always available even in the absence of @code{%union}.
2910
9913d6e4
JD
2911The @code{%code top} block above logically contains two parts. The
2912first two lines before the warning need to appear near the top of the
2913parser implementation file. The first line after the warning is
2914required by @code{YYSTYPE} and thus also needs to appear in the parser
2915implementation file. However, if you've instructed Bison to generate
2916a parser header file (@pxref{Decl Summary, ,%defines}), you probably
2917want that line to appear before the @code{YYSTYPE} definition in that
2918header file as well. The @code{YYLTYPE} definition should also appear
2919in the parser header file to override the default @code{YYLTYPE}
2920definition there.
2cbe6b7f 2921
16dc6a9e 2922In other words, in the @code{%code top} block above, all but the first two
8e0a5e9e
JD
2923lines are dependency code required by the @code{YYSTYPE} and @code{YYLTYPE}
2924definitions.
16dc6a9e 2925Thus, they belong in one or more @code{%code requires}:
9bc0dd67 2926
ea118b72 2927@example
98842516 2928@group
16dc6a9e 2929%code top @{
2cbe6b7f
JD
2930 #define _GNU_SOURCE
2931 #include <stdio.h>
2932@}
98842516 2933@end group
2cbe6b7f 2934
98842516 2935@group
16dc6a9e 2936%code requires @{
9bc0dd67
JD
2937 #include "ptypes.h"
2938@}
98842516
AD
2939@end group
2940@group
9bc0dd67
JD
2941%union @{
2942 long int n;
2943 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
2944@}
98842516 2945@end group
9bc0dd67 2946
98842516 2947@group
16dc6a9e 2948%code requires @{
2cbe6b7f
JD
2949 #define YYLTYPE YYLTYPE
2950 typedef struct YYLTYPE
2951 @{
2952 int first_line;
2953 int first_column;
2954 int last_line;
2955 int last_column;
2956 char *filename;
2957 @} YYLTYPE;
2958@}
98842516 2959@end group
2cbe6b7f 2960
98842516 2961@group
136a0f76 2962%code @{
2cbe6b7f
JD
2963 static void print_token_value (FILE *, int, YYSTYPE);
2964 #define YYPRINT(F, N, L) print_token_value (F, N, L)
2965 static void trace_token (enum yytokentype token, YYLTYPE loc);
2966@}
98842516 2967@end group
2cbe6b7f
JD
2968
2969@dots{}
ea118b72 2970@end example
2cbe6b7f
JD
2971
2972@noindent
9913d6e4
JD
2973Now Bison will insert @code{#include "ptypes.h"} and the new
2974@code{YYLTYPE} definition before the Bison-generated @code{YYSTYPE}
2975and @code{YYLTYPE} definitions in both the parser implementation file
2976and the parser header file. (By the same reasoning, @code{%code
2977requires} would also be the appropriate place to write your own
2978definition for @code{YYSTYPE}.)
2979
2980When you are writing dependency code for @code{YYSTYPE} and
2981@code{YYLTYPE}, you should prefer @code{%code requires} over
2982@code{%code top} regardless of whether you instruct Bison to generate
2983a parser header file. When you are writing code that you need Bison
2984to insert only into the parser implementation file and that has no
2985special need to appear at the top of that file, you should prefer the
2986unqualified @code{%code} over @code{%code top}. These practices will
2987make the purpose of each block of your code explicit to Bison and to
2988other developers reading your grammar file. Following these
2989practices, we expect the unqualified @code{%code} and @code{%code
2990requires} to be the most important of the four @var{Prologue}
16dc6a9e 2991alternatives.
a501eca9 2992
9913d6e4
JD
2993At some point while developing your parser, you might decide to
2994provide @code{trace_token} to modules that are external to your
2995parser. Thus, you might wish for Bison to insert the prototype into
2996both the parser header file and the parser implementation file. Since
2997this function is not a dependency required by @code{YYSTYPE} or
8e0a5e9e 2998@code{YYLTYPE}, it doesn't make sense to move its prototype to a
9913d6e4
JD
2999@code{%code requires}. More importantly, since it depends upon
3000@code{YYLTYPE} and @code{yytokentype}, @code{%code requires} is not
3001sufficient. Instead, move its prototype from the unqualified
3002@code{%code} to a @code{%code provides}:
2cbe6b7f 3003
ea118b72 3004@example
98842516 3005@group
16dc6a9e 3006%code top @{
2cbe6b7f 3007 #define _GNU_SOURCE
136a0f76 3008 #include <stdio.h>
2cbe6b7f 3009@}
98842516 3010@end group
136a0f76 3011
98842516 3012@group
16dc6a9e 3013%code requires @{
2cbe6b7f
JD
3014 #include "ptypes.h"
3015@}
98842516
AD
3016@end group
3017@group
2cbe6b7f
JD
3018%union @{
3019 long int n;
3020 tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
3021@}
98842516 3022@end group
2cbe6b7f 3023
98842516 3024@group
16dc6a9e 3025%code requires @{
2cbe6b7f
JD
3026 #define YYLTYPE YYLTYPE
3027 typedef struct YYLTYPE
3028 @{
3029 int first_line;
3030 int first_column;
3031 int last_line;
3032 int last_column;
3033 char *filename;
3034 @} YYLTYPE;
3035@}
98842516 3036@end group
2cbe6b7f 3037
98842516 3038@group
16dc6a9e 3039%code provides @{
2cbe6b7f
JD
3040 void trace_token (enum yytokentype token, YYLTYPE loc);
3041@}
98842516 3042@end group
2cbe6b7f 3043
98842516 3044@group
2cbe6b7f 3045%code @{
9bc0dd67
JD
3046 static void print_token_value (FILE *, int, YYSTYPE);
3047 #define YYPRINT(F, N, L) print_token_value (F, N, L)
34f98f46 3048@}
98842516 3049@end group
9bc0dd67
JD
3050
3051@dots{}
ea118b72 3052@end example
9bc0dd67 3053
2cbe6b7f 3054@noindent
9913d6e4
JD
3055Bison will insert the @code{trace_token} prototype into both the
3056parser header file and the parser implementation file after the
3057definitions for @code{yytokentype}, @code{YYLTYPE}, and
3058@code{YYSTYPE}.
3059
3060The above examples are careful to write directives in an order that
3061reflects the layout of the generated parser implementation and header
3062files: @code{%code top}, @code{%code requires}, @code{%code provides},
3063and then @code{%code}. While your grammar files may generally be
3064easier to read if you also follow this order, Bison does not require
3065it. Instead, Bison lets you choose an organization that makes sense
3066to you.
2cbe6b7f 3067
a501eca9 3068You may declare any of these directives multiple times in the grammar file.
2cbe6b7f
JD
3069In that case, Bison concatenates the contained code in declaration order.
3070This is the only way in which the position of one of these directives within
3071the grammar file affects its functionality.
3072
3073The result of the previous two properties is greater flexibility in how you may
3074organize your grammar file.
3075For example, you may organize semantic-type-related directives by semantic
3076type:
3077
ea118b72 3078@example
98842516 3079@group
16dc6a9e 3080%code requires @{ #include "type1.h" @}
2cbe6b7f
JD
3081%union @{ type1 field1; @}
3082%destructor @{ type1_free ($$); @} <field1>
68fff38a 3083%printer @{ type1_print (yyoutput, $$); @} <field1>
98842516 3084@end group
2cbe6b7f 3085
98842516 3086@group
16dc6a9e 3087%code requires @{ #include "type2.h" @}
2cbe6b7f
JD
3088%union @{ type2 field2; @}
3089%destructor @{ type2_free ($$); @} <field2>
68fff38a 3090%printer @{ type2_print (yyoutput, $$); @} <field2>
98842516 3091@end group
ea118b72 3092@end example
2cbe6b7f
JD
3093
3094@noindent
3095You could even place each of the above directive groups in the rules section of
3096the grammar file next to the set of rules that uses the associated semantic
3097type.
61fee93e
JD
3098(In the rules section, you must terminate each of those directives with a
3099semicolon.)
2cbe6b7f
JD
3100And you don't have to worry that some directive (like a @code{%union}) in the
3101definitions section is going to adversely affect their functionality in some
3102counter-intuitive manner just because it comes first.
3103Such an organization is not possible using @var{Prologue} sections.
3104
a501eca9 3105This section has been concerned with explaining the advantages of the four
8e0a5e9e 3106@var{Prologue} alternatives over the original Yacc @var{Prologue}.
a501eca9
JD
3107However, in most cases when using these directives, you shouldn't need to
3108think about all the low-level ordering issues discussed here.
3109Instead, you should simply use these directives to label each block of your
3110code according to its purpose and let Bison handle the ordering.
3111@code{%code} is the most generic label.
16dc6a9e
JD
3112Move code to @code{%code requires}, @code{%code provides}, or @code{%code top}
3113as needed.
a501eca9 3114
342b8b6e 3115@node Bison Declarations
bfa74976
RS
3116@subsection The Bison Declarations Section
3117@cindex Bison declarations (introduction)
3118@cindex declarations, Bison (introduction)
3119
3120The @var{Bison declarations} section contains declarations that define
3121terminal and nonterminal symbols, specify precedence, and so on.
3122In some simple grammars you may not need any declarations.
3123@xref{Declarations, ,Bison Declarations}.
3124
342b8b6e 3125@node Grammar Rules
bfa74976
RS
3126@subsection The Grammar Rules Section
3127@cindex grammar rules section
3128@cindex rules section for grammar
3129
3130The @dfn{grammar rules} section contains one or more Bison grammar
3131rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}.
3132
3133There must always be at least one grammar rule, and the first
3134@samp{%%} (which precedes the grammar rules) may never be omitted even
3135if it is the first thing in the file.
3136
38a92d50 3137@node Epilogue
75f5aaea 3138@subsection The epilogue
bfa74976 3139@cindex additional C code section
75f5aaea 3140@cindex epilogue
bfa74976
RS
3141@cindex C code, section for additional
3142
9913d6e4
JD
3143The @var{Epilogue} is copied verbatim to the end of the parser
3144implementation file, just as the @var{Prologue} is copied to the
3145beginning. This is the most convenient place to put anything that you
3146want to have in the parser implementation file but which need not come
3147before the definition of @code{yyparse}. For example, the definitions
3148of @code{yylex} and @code{yyerror} often go here. Because C requires
3149functions to be declared before being used, you often need to declare
3150functions like @code{yylex} and @code{yyerror} in the Prologue, even
3151if you define them in the Epilogue. @xref{Interface, ,Parser
3152C-Language Interface}.
bfa74976
RS
3153
3154If the last section is empty, you may omit the @samp{%%} that separates it
3155from the grammar rules.
3156
f8e1c9e5
AD
3157The Bison parser itself contains many macros and identifiers whose names
3158start with @samp{yy} or @samp{YY}, so it is a good idea to avoid using
3159any such names (except those documented in this manual) in the epilogue
3160of the grammar file.
bfa74976 3161
342b8b6e 3162@node Symbols
bfa74976
RS
3163@section Symbols, Terminal and Nonterminal
3164@cindex nonterminal symbol
3165@cindex terminal symbol
3166@cindex token type
3167@cindex symbol
3168
3169@dfn{Symbols} in Bison grammars represent the grammatical classifications
3170of the language.
3171
3172A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
3173class of syntactically equivalent tokens. You use the symbol in grammar
3174rules to mean that a token in that class is allowed. The symbol is
3175represented in the Bison parser by a numeric code, and the @code{yylex}
f8e1c9e5
AD
3176function returns a token type code to indicate what kind of token has
3177been read. You don't need to know what the code value is; you can use
3178the symbol to stand for it.
bfa74976 3179
f8e1c9e5
AD
3180A @dfn{nonterminal symbol} stands for a class of syntactically
3181equivalent groupings. The symbol name is used in writing grammar rules.
3182By convention, it should be all lower case.
bfa74976 3183
eb8c66bb
JD
3184Symbol names can contain letters, underscores, periods, and non-initial
3185digits and dashes. Dashes in symbol names are a GNU extension, incompatible
3186with POSIX Yacc. Periods and dashes make symbol names less convenient to
3187use with named references, which require brackets around such names
3188(@pxref{Named References}). Terminal symbols that contain periods or dashes
3189make little sense: since they are not valid symbols (in most programming
3190languages) they are not exported as token names.
bfa74976 3191
931c7513 3192There are three ways of writing terminal symbols in the grammar:
bfa74976
RS
3193
3194@itemize @bullet
3195@item
3196A @dfn{named token type} is written with an identifier, like an
c827f760 3197identifier in C@. By convention, it should be all upper case. Each
bfa74976
RS
3198such name must be defined with a Bison declaration such as
3199@code{%token}. @xref{Token Decl, ,Token Type Names}.
3200
3201@item
3202@cindex character token
3203@cindex literal token
3204@cindex single-character literal
931c7513
RS
3205A @dfn{character token type} (or @dfn{literal character token}) is
3206written in the grammar using the same syntax used in C for character
3207constants; for example, @code{'+'} is a character token type. A
3208character token type doesn't need to be declared unless you need to
3209specify its semantic value data type (@pxref{Value Type, ,Data Types of
3210Semantic Values}), associativity, or precedence (@pxref{Precedence,
3211,Operator Precedence}).
bfa74976
RS
3212
3213By convention, a character token type is used only to represent a
3214token that consists of that particular character. Thus, the token
3215type @code{'+'} is used to represent the character @samp{+} as a
3216token. Nothing enforces this convention, but if you depart from it,
3217your program will confuse other readers.
3218
3219All the usual escape sequences used in character literals in C can be
3220used in Bison as well, but you must not use the null character as a
72d2299c
PE
3221character literal because its numeric code, zero, signifies
3222end-of-input (@pxref{Calling Convention, ,Calling Convention
2bfc2e2a
PE
3223for @code{yylex}}). Also, unlike standard C, trigraphs have no
3224special meaning in Bison character literals, nor is backslash-newline
3225allowed.
931c7513
RS
3226
3227@item
3228@cindex string token
3229@cindex literal string token
9ecbd125 3230@cindex multicharacter literal
931c7513
RS
3231A @dfn{literal string token} is written like a C string constant; for
3232example, @code{"<="} is a literal string token. A literal string token
3233doesn't need to be declared unless you need to specify its semantic
14ded682 3234value data type (@pxref{Value Type}), associativity, or precedence
931c7513
RS
3235(@pxref{Precedence}).
3236
3237You can associate the literal string token with a symbolic name as an
3238alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
3239Declarations}). If you don't do that, the lexical analyzer has to
3240retrieve the token number for the literal string token from the
3241@code{yytname} table (@pxref{Calling Convention}).
3242
c827f760 3243@strong{Warning}: literal string tokens do not work in Yacc.
931c7513
RS
3244
3245By convention, a literal string token is used only to represent a token
3246that consists of that particular string. Thus, you should use the token
3247type @code{"<="} to represent the string @samp{<=} as a token. Bison
9ecbd125 3248does not enforce this convention, but if you depart from it, people who
931c7513
RS
3249read your program will be confused.
3250
3251All the escape sequences used in string literals in C can be used in
92ac3705
PE
3252Bison as well, except that you must not use a null character within a
3253string literal. Also, unlike Standard C, trigraphs have no special
2bfc2e2a
PE
3254meaning in Bison string literals, nor is backslash-newline allowed. A
3255literal string token must contain two or more characters; for a token
3256containing just one character, use a character token (see above).
bfa74976
RS
3257@end itemize
3258
3259How you choose to write a terminal symbol has no effect on its
3260grammatical meaning. That depends only on where it appears in rules and
3261on when the parser function returns that symbol.
3262
72d2299c
PE
3263The value returned by @code{yylex} is always one of the terminal
3264symbols, except that a zero or negative value signifies end-of-input.
3265Whichever way you write the token type in the grammar rules, you write
3266it the same way in the definition of @code{yylex}. The numeric code
3267for a character token type is simply the positive numeric code of the
3268character, so @code{yylex} can use the identical value to generate the
3269requisite code, though you may need to convert it to @code{unsigned
3270char} to avoid sign-extension on hosts where @code{char} is signed.
9913d6e4
JD
3271Each named token type becomes a C macro in the parser implementation
3272file, so @code{yylex} can use the name to stand for the code. (This
3273is why periods don't make sense in terminal symbols.) @xref{Calling
3274Convention, ,Calling Convention for @code{yylex}}.
bfa74976
RS
3275
3276If @code{yylex} is defined in a separate file, you need to arrange for the
3277token-type macro definitions to be available there. Use the @samp{-d}
3278option when you run Bison, so that it will write these macro definitions
3279into a separate header file @file{@var{name}.tab.h} which you can include
3280in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
3281
72d2299c 3282If you want to write a grammar that is portable to any Standard C
9d9b8b70 3283host, you must use only nonnull character tokens taken from the basic
c827f760 3284execution character set of Standard C@. This set consists of the ten
72d2299c
PE
3285digits, the 52 lower- and upper-case English letters, and the
3286characters in the following C-language string:
3287
3288@example
3289"\a\b\t\n\v\f\r !\"#%&'()*+,-./:;<=>?[\\]^_@{|@}~"
3290@end example
3291
f8e1c9e5
AD
3292The @code{yylex} function and Bison must use a consistent character set
3293and encoding for character tokens. For example, if you run Bison in an
35430378 3294ASCII environment, but then compile and run the resulting
f8e1c9e5 3295program in an environment that uses an incompatible character set like
35430378
JD
3296EBCDIC, the resulting program may not work because the tables
3297generated by Bison will assume ASCII numeric values for
f8e1c9e5
AD
3298character tokens. It is standard practice for software distributions to
3299contain C source files that were generated by Bison in an
35430378
JD
3300ASCII environment, so installers on platforms that are
3301incompatible with ASCII must rebuild those files before
f8e1c9e5 3302compiling them.
e966383b 3303
bfa74976
RS
3304The symbol @code{error} is a terminal symbol reserved for error recovery
3305(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
23c5a174
AD
3306In particular, @code{yylex} should never return this value. The default
3307value of the error token is 256, unless you explicitly assigned 256 to
3308one of your tokens with a @code{%token} declaration.
bfa74976 3309
342b8b6e 3310@node Rules
bfa74976
RS
3311@section Syntax of Grammar Rules
3312@cindex rule syntax
3313@cindex grammar rule syntax
3314@cindex syntax of grammar rules
3315
3316A Bison grammar rule has the following general form:
3317
3318@example
e425e872 3319@group
de6be119 3320@var{result}: @var{components}@dots{};
e425e872 3321@end group
bfa74976
RS
3322@end example
3323
3324@noindent
9ecbd125 3325where @var{result} is the nonterminal symbol that this rule describes,
bfa74976 3326and @var{components} are various terminal and nonterminal symbols that
13863333 3327are put together by this rule (@pxref{Symbols}).
bfa74976
RS
3328
3329For example,
3330
3331@example
3332@group
de6be119 3333exp: exp '+' exp;
bfa74976
RS
3334@end group
3335@end example
3336
3337@noindent
3338says that two groupings of type @code{exp}, with a @samp{+} token in between,
3339can be combined into a larger grouping of type @code{exp}.
3340
72d2299c
PE
3341White space in rules is significant only to separate symbols. You can add
3342extra white space as you wish.
bfa74976
RS
3343
3344Scattered among the components can be @var{actions} that determine
3345the semantics of the rule. An action looks like this:
3346
3347@example
3348@{@var{C statements}@}
3349@end example
3350
3351@noindent
287c78f6
PE
3352@cindex braced code
3353This is an example of @dfn{braced code}, that is, C code surrounded by
3354braces, much like a compound statement in C@. Braced code can contain
3355any sequence of C tokens, so long as its braces are balanced. Bison
3356does not check the braced code for correctness directly; it merely
9913d6e4
JD
3357copies the code to the parser implementation file, where the C
3358compiler can check it.
287c78f6
PE
3359
3360Within braced code, the balanced-brace count is not affected by braces
3361within comments, string literals, or character constants, but it is
3362affected by the C digraphs @samp{<%} and @samp{%>} that represent
3363braces. At the top level braced code must be terminated by @samp{@}}
3364and not by a digraph. Bison does not look for trigraphs, so if braced
3365code uses trigraphs you should ensure that they do not affect the
3366nesting of braces or the boundaries of comments, string literals, or
3367character constants.
3368
bfa74976
RS
3369Usually there is only one action and it follows the components.
3370@xref{Actions}.
3371
3372@findex |
3373Multiple rules for the same @var{result} can be written separately or can
3374be joined with the vertical-bar character @samp{|} as follows:
3375
bfa74976
RS
3376@example
3377@group
de6be119
AD
3378@var{result}:
3379 @var{rule1-components}@dots{}
3380| @var{rule2-components}@dots{}
3381@dots{}
3382;
bfa74976
RS
3383@end group
3384@end example
bfa74976
RS
3385
3386@noindent
3387They are still considered distinct rules even when joined in this way.
3388
3389If @var{components} in a rule is empty, it means that @var{result} can
3390match the empty string. For example, here is how to define a
3391comma-separated sequence of zero or more @code{exp} groupings:
3392
3393@example
3394@group
de6be119
AD
3395expseq:
3396 /* empty */
3397| expseq1
3398;
bfa74976
RS
3399@end group
3400
3401@group
de6be119
AD
3402expseq1:
3403 exp
3404| expseq1 ',' exp
3405;
bfa74976
RS
3406@end group
3407@end example
3408
3409@noindent
3410It is customary to write a comment @samp{/* empty */} in each rule
3411with no components.
3412
342b8b6e 3413@node Recursion
bfa74976
RS
3414@section Recursive Rules
3415@cindex recursive rule
3416
f8e1c9e5
AD
3417A rule is called @dfn{recursive} when its @var{result} nonterminal
3418appears also on its right hand side. Nearly all Bison grammars need to
3419use recursion, because that is the only way to define a sequence of any
3420number of a particular thing. Consider this recursive definition of a
9ecbd125 3421comma-separated sequence of one or more expressions:
bfa74976
RS
3422
3423@example
3424@group
de6be119
AD
3425expseq1:
3426 exp
3427| expseq1 ',' exp
3428;
bfa74976
RS
3429@end group
3430@end example
3431
3432@cindex left recursion
3433@cindex right recursion
3434@noindent
3435Since the recursive use of @code{expseq1} is the leftmost symbol in the
3436right hand side, we call this @dfn{left recursion}. By contrast, here
3437the same construct is defined using @dfn{right recursion}:
3438
3439@example
3440@group
de6be119
AD
3441expseq1:
3442 exp
3443| exp ',' expseq1
3444;
bfa74976
RS
3445@end group
3446@end example
3447
3448@noindent
ec3bc396
AD
3449Any kind of sequence can be defined using either left recursion or right
3450recursion, but you should always use left recursion, because it can
3451parse a sequence of any number of elements with bounded stack space.
3452Right recursion uses up space on the Bison stack in proportion to the
3453number of elements in the sequence, because all the elements must be
3454shifted onto the stack before the rule can be applied even once.
3455@xref{Algorithm, ,The Bison Parser Algorithm}, for further explanation
3456of this.
bfa74976
RS
3457
3458@cindex mutual recursion
3459@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
3460rule does not appear directly on its right hand side, but does appear
3461in rules for other nonterminals which do appear on its right hand
13863333 3462side.
bfa74976
RS
3463
3464For example:
3465
3466@example
3467@group
de6be119
AD
3468expr:
3469 primary
3470| primary '+' primary
3471;
bfa74976
RS
3472@end group
3473
3474@group
de6be119
AD
3475primary:
3476 constant
3477| '(' expr ')'
3478;
bfa74976
RS
3479@end group
3480@end example
3481
3482@noindent
3483defines two mutually-recursive nonterminals, since each refers to the
3484other.
3485
342b8b6e 3486@node Semantics
bfa74976
RS
3487@section Defining Language Semantics
3488@cindex defining language semantics
13863333 3489@cindex language semantics, defining
bfa74976
RS
3490
3491The grammar rules for a language determine only the syntax. The semantics
3492are determined by the semantic values associated with various tokens and
3493groupings, and by the actions taken when various groupings are recognized.
3494
3495For example, the calculator calculates properly because the value
3496associated with each expression is the proper number; it adds properly
3497because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
3498the numbers associated with @var{x} and @var{y}.
3499
3500@menu
3501* Value Type:: Specifying one data type for all semantic values.
3502* Multiple Types:: Specifying several alternative data types.
3503* Actions:: An action is the semantic definition of a grammar rule.
3504* Action Types:: Specifying data types for actions to operate on.
3505* Mid-Rule Actions:: Most actions go at the end of a rule.
3506 This says when, why and how to use the exceptional
3507 action in the middle of a rule.
3508@end menu
3509
342b8b6e 3510@node Value Type
bfa74976
RS
3511@subsection Data Types of Semantic Values
3512@cindex semantic value type
3513@cindex value type, semantic
3514@cindex data types of semantic values
3515@cindex default data type
3516
3517In a simple program it may be sufficient to use the same data type for
3518the semantic values of all language constructs. This was true in the
35430378 3519RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish
1964ad8c 3520Notation Calculator}).
bfa74976 3521
ddc8ede1
PE
3522Bison normally uses the type @code{int} for semantic values if your
3523program uses the same data type for all language constructs. To
bfa74976
RS
3524specify some other type, define @code{YYSTYPE} as a macro, like this:
3525
3526@example
3527#define YYSTYPE double
3528@end example
3529
3530@noindent
50cce58e
PE
3531@code{YYSTYPE}'s replacement list should be a type name
3532that does not contain parentheses or square brackets.
342b8b6e 3533This macro definition must go in the prologue of the grammar file
75f5aaea 3534(@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
bfa74976 3535
342b8b6e 3536@node Multiple Types
bfa74976
RS
3537@subsection More Than One Value Type
3538
3539In most programs, you will need different data types for different kinds
3540of tokens and groupings. For example, a numeric constant may need type
f8e1c9e5
AD
3541@code{int} or @code{long int}, while a string constant needs type
3542@code{char *}, and an identifier might need a pointer to an entry in the
3543symbol table.
bfa74976
RS
3544
3545To use more than one data type for semantic values in one parser, Bison
3546requires you to do two things:
3547
3548@itemize @bullet
3549@item
ddc8ede1 3550Specify the entire collection of possible data types, either by using the
704a47c4 3551@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of
ddc8ede1
PE
3552Value Types}), or by using a @code{typedef} or a @code{#define} to
3553define @code{YYSTYPE} to be a union type whose member names are
3554the type tags.
bfa74976
RS
3555
3556@item
14ded682
AD
3557Choose one of those types for each symbol (terminal or nonterminal) for
3558which semantic values are used. This is done for tokens with the
3559@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names})
3560and for groupings with the @code{%type} Bison declaration (@pxref{Type
3561Decl, ,Nonterminal Symbols}).
bfa74976
RS
3562@end itemize
3563
342b8b6e 3564@node Actions
bfa74976
RS
3565@subsection Actions
3566@cindex action
3567@vindex $$
3568@vindex $@var{n}
1f68dca5
AR
3569@vindex $@var{name}
3570@vindex $[@var{name}]
bfa74976
RS
3571
3572An action accompanies a syntactic rule and contains C code to be executed
3573each time an instance of that rule is recognized. The task of most actions
3574is to compute a semantic value for the grouping built by the rule from the
3575semantic values associated with tokens or smaller groupings.
3576
287c78f6
PE
3577An action consists of braced code containing C statements, and can be
3578placed at any position in the rule;
704a47c4
AD
3579it is executed at that position. Most rules have just one action at the
3580end of the rule, following all the components. Actions in the middle of
3581a rule are tricky and used only for special purposes (@pxref{Mid-Rule
3582Actions, ,Actions in Mid-Rule}).
bfa74976 3583
9913d6e4
JD
3584The C code in an action can refer to the semantic values of the
3585components matched by the rule with the construct @code{$@var{n}},
3586which stands for the value of the @var{n}th component. The semantic
3587value for the grouping being constructed is @code{$$}. In addition,
3588the semantic values of symbols can be accessed with the named
3589references construct @code{$@var{name}} or @code{$[@var{name}]}.
3590Bison translates both of these constructs into expressions of the
3591appropriate type when it copies the actions into the parser
3592implementation file. @code{$$} (or @code{$@var{name}}, when it stands
3593for the current grouping) is translated to a modifiable lvalue, so it
3594can be assigned to.
bfa74976
RS
3595
3596Here is a typical example:
3597
3598@example
3599@group
de6be119
AD
3600exp:
3601@dots{}
3602| exp '+' exp @{ $$ = $1 + $3; @}
bfa74976
RS
3603@end group
3604@end example
3605
1f68dca5
AR
3606Or, in terms of named references:
3607
3608@example
3609@group
de6be119
AD
3610exp[result]:
3611@dots{}
3612| exp[left] '+' exp[right] @{ $result = $left + $right; @}
1f68dca5
AR
3613@end group
3614@end example
3615
bfa74976
RS
3616@noindent
3617This rule constructs an @code{exp} from two smaller @code{exp} groupings
3618connected by a plus-sign token. In the action, @code{$1} and @code{$3}
1f68dca5 3619(@code{$left} and @code{$right})
bfa74976
RS
3620refer to the semantic values of the two component @code{exp} groupings,
3621which are the first and third symbols on the right hand side of the rule.
1f68dca5
AR
3622The sum is stored into @code{$$} (@code{$result}) so that it becomes the
3623semantic value of
bfa74976
RS
3624the addition-expression just recognized by the rule. If there were a
3625useful semantic value associated with the @samp{+} token, it could be
e0c471a9 3626referred to as @code{$2}.
bfa74976 3627
ce24f7f5
JD
3628@xref{Named References}, for more information about using the named
3629references construct.
1f68dca5 3630
3ded9a63
AD
3631Note that the vertical-bar character @samp{|} is really a rule
3632separator, and actions are attached to a single rule. This is a
3633difference with tools like Flex, for which @samp{|} stands for either
3634``or'', or ``the same action as that of the next rule''. In the
3635following example, the action is triggered only when @samp{b} is found:
3636
3637@example
3638@group
3639a-or-b: 'a'|'b' @{ a_or_b_found = 1; @};
3640@end group
3641@end example
3642
bfa74976
RS
3643@cindex default action
3644If you don't specify an action for a rule, Bison supplies a default:
72f889cc
AD
3645@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule
3646becomes the value of the whole rule. Of course, the default action is
3647valid only if the two data types match. There is no meaningful default
3648action for an empty rule; every empty rule must have an explicit action
3649unless the rule's value does not matter.
bfa74976
RS
3650
3651@code{$@var{n}} with @var{n} zero or negative is allowed for reference
3652to tokens and groupings on the stack @emph{before} those that match the
3653current rule. This is a very risky practice, and to use it reliably
3654you must be certain of the context in which the rule is applied. Here
3655is a case in which you can use this reliably:
3656
3657@example
3658@group
de6be119
AD
3659foo:
3660 expr bar '+' expr @{ @dots{} @}
3661| expr bar '-' expr @{ @dots{} @}
3662;
bfa74976
RS
3663@end group
3664
3665@group
de6be119
AD
3666bar:
3667 /* empty */ @{ previous_expr = $0; @}
3668;
bfa74976
RS
3669@end group
3670@end example
3671
3672As long as @code{bar} is used only in the fashion shown here, @code{$0}
3673always refers to the @code{expr} which precedes @code{bar} in the
3674definition of @code{foo}.
3675
32c29292 3676@vindex yylval
742e4900 3677It is also possible to access the semantic value of the lookahead token, if
32c29292
JD
3678any, from a semantic action.
3679This semantic value is stored in @code{yylval}.
3680@xref{Action Features, ,Special Features for Use in Actions}.
3681
342b8b6e 3682@node Action Types
bfa74976
RS
3683@subsection Data Types of Values in Actions
3684@cindex action data types
3685@cindex data types in actions
3686
3687If you have chosen a single data type for semantic values, the @code{$$}
3688and @code{$@var{n}} constructs always have that data type.
3689
3690If you have used @code{%union} to specify a variety of data types, then you
3691must declare a choice among these types for each terminal or nonterminal
3692symbol that can have a semantic value. Then each time you use @code{$$} or
3693@code{$@var{n}}, its data type is determined by which symbol it refers to
e0c471a9 3694in the rule. In this example,
bfa74976
RS
3695
3696@example
3697@group
de6be119
AD
3698exp:
3699 @dots{}
3700| exp '+' exp @{ $$ = $1 + $3; @}
bfa74976
RS
3701@end group
3702@end example
3703
3704@noindent
3705@code{$1} and @code{$3} refer to instances of @code{exp}, so they all
3706have the data type declared for the nonterminal symbol @code{exp}. If
3707@code{$2} were used, it would have the data type declared for the
e0c471a9 3708terminal symbol @code{'+'}, whatever that might be.
bfa74976
RS
3709
3710Alternatively, you can specify the data type when you refer to the value,
3711by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
3712reference. For example, if you have defined types as shown here:
3713
3714@example
3715@group
3716%union @{
3717 int itype;
3718 double dtype;
3719@}
3720@end group
3721@end example
3722
3723@noindent
3724then you can write @code{$<itype>1} to refer to the first subunit of the
3725rule as an integer, or @code{$<dtype>1} to refer to it as a double.
3726
342b8b6e 3727@node Mid-Rule Actions
bfa74976
RS
3728@subsection Actions in Mid-Rule
3729@cindex actions in mid-rule
3730@cindex mid-rule actions
3731
3732Occasionally it is useful to put an action in the middle of a rule.
3733These actions are written just like usual end-of-rule actions, but they
3734are executed before the parser even recognizes the following components.
3735
3736A mid-rule action may refer to the components preceding it using
3737@code{$@var{n}}, but it may not refer to subsequent components because
3738it is run before they are parsed.
3739
3740The mid-rule action itself counts as one of the components of the rule.
3741This makes a difference when there is another action later in the same rule
3742(and usually there is another at the end): you have to count the actions
3743along with the symbols when working out which number @var{n} to use in
3744@code{$@var{n}}.
3745
3746The mid-rule action can also have a semantic value. The action can set
3747its value with an assignment to @code{$$}, and actions later in the rule
3748can refer to the value using @code{$@var{n}}. Since there is no symbol
3749to name the action, there is no way to declare a data type for the value
fdc6758b
MA
3750in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to
3751specify a data type each time you refer to this value.
bfa74976
RS
3752
3753There is no way to set the value of the entire rule with a mid-rule
3754action, because assignments to @code{$$} do not have that effect. The
3755only way to set the value for the entire rule is with an ordinary action
3756at the end of the rule.
3757
3758Here is an example from a hypothetical compiler, handling a @code{let}
3759statement that looks like @samp{let (@var{variable}) @var{statement}} and
3760serves to create a variable named @var{variable} temporarily for the
3761duration of @var{statement}. To parse this construct, we must put
3762@var{variable} into the symbol table while @var{statement} is parsed, then
3763remove it afterward. Here is how it is done:
3764
3765@example
3766@group
de6be119
AD
3767stmt:
3768 LET '(' var ')'
3769 @{ $<context>$ = push_context (); declare_variable ($3); @}
3770 stmt
3771 @{ $$ = $6; pop_context ($<context>5); @}
bfa74976
RS
3772@end group
3773@end example
3774
3775@noindent
3776As soon as @samp{let (@var{variable})} has been recognized, the first
3777action is run. It saves a copy of the current semantic context (the
3778list of accessible variables) as its semantic value, using alternative
3779@code{context} in the data-type union. Then it calls
3780@code{declare_variable} to add the new variable to that list. Once the
3781first action is finished, the embedded statement @code{stmt} can be
3782parsed. Note that the mid-rule action is component number 5, so the
3783@samp{stmt} is component number 6.
3784
3785After the embedded statement is parsed, its semantic value becomes the
3786value of the entire @code{let}-statement. Then the semantic value from the
3787earlier action is used to restore the prior list of variables. This
3788removes the temporary @code{let}-variable from the list so that it won't
3789appear to exist while the rest of the program is parsed.
3790
841a7737
JD
3791@findex %destructor
3792@cindex discarded symbols, mid-rule actions
3793@cindex error recovery, mid-rule actions
3794In the above example, if the parser initiates error recovery (@pxref{Error
3795Recovery}) while parsing the tokens in the embedded statement @code{stmt},
3796it might discard the previous semantic context @code{$<context>5} without
3797restoring it.
3798Thus, @code{$<context>5} needs a destructor (@pxref{Destructor Decl, , Freeing
3799Discarded Symbols}).
ec5479ce
JD
3800However, Bison currently provides no means to declare a destructor specific to
3801a particular mid-rule action's semantic value.
841a7737
JD
3802
3803One solution is to bury the mid-rule action inside a nonterminal symbol and to
3804declare a destructor for that symbol:
3805
3806@example
3807@group
3808%type <context> let
3809%destructor @{ pop_context ($$); @} let
3810
3811%%
3812
de6be119
AD
3813stmt:
3814 let stmt
3815 @{
3816 $$ = $2;
3817 pop_context ($1);
3818 @};
841a7737 3819
de6be119
AD
3820let:
3821 LET '(' var ')'
3822 @{
3823 $$ = push_context ();
3824 declare_variable ($3);
3825 @};
841a7737
JD
3826
3827@end group
3828@end example
3829
3830@noindent
3831Note that the action is now at the end of its rule.
3832Any mid-rule action can be converted to an end-of-rule action in this way, and
3833this is what Bison actually does to implement mid-rule actions.
3834
bfa74976
RS
3835Taking action before a rule is completely recognized often leads to
3836conflicts since the parser must commit to a parse in order to execute the
3837action. For example, the following two rules, without mid-rule actions,
3838can coexist in a working parser because the parser can shift the open-brace
3839token and look at what follows before deciding whether there is a
3840declaration or not:
3841
3842@example
3843@group
de6be119
AD
3844compound:
3845 '@{' declarations statements '@}'
3846| '@{' statements '@}'
3847;
bfa74976
RS
3848@end group
3849@end example
3850
3851@noindent
3852But when we add a mid-rule action as follows, the rules become nonfunctional:
3853
3854@example
3855@group
de6be119
AD
3856compound:
3857 @{ prepare_for_local_variables (); @}
3858 '@{' declarations statements '@}'
bfa74976
RS
3859@end group
3860@group
de6be119
AD
3861| '@{' statements '@}'
3862;
bfa74976
RS
3863@end group
3864@end example
3865
3866@noindent
3867Now the parser is forced to decide whether to run the mid-rule action
3868when it has read no farther than the open-brace. In other words, it
3869must commit to using one rule or the other, without sufficient
3870information to do it correctly. (The open-brace token is what is called
742e4900
JD
3871the @dfn{lookahead} token at this time, since the parser is still
3872deciding what to do about it. @xref{Lookahead, ,Lookahead Tokens}.)
bfa74976
RS
3873
3874You might think that you could correct the problem by putting identical
3875actions into the two rules, like this:
3876
3877@example
3878@group
de6be119
AD
3879compound:
3880 @{ prepare_for_local_variables (); @}
3881 '@{' declarations statements '@}'
3882| @{ prepare_for_local_variables (); @}
3883 '@{' statements '@}'
3884;
bfa74976
RS
3885@end group
3886@end example
3887
3888@noindent
3889But this does not help, because Bison does not realize that the two actions
3890are identical. (Bison never tries to understand the C code in an action.)
3891
3892If the grammar is such that a declaration can be distinguished from a
3893statement by the first token (which is true in C), then one solution which
3894does work is to put the action after the open-brace, like this:
3895
3896@example
3897@group
de6be119
AD
3898compound:
3899 '@{' @{ prepare_for_local_variables (); @}
3900 declarations statements '@}'
3901| '@{' statements '@}'
3902;
bfa74976
RS
3903@end group
3904@end example
3905
3906@noindent
3907Now the first token of the following declaration or statement,
3908which would in any case tell Bison which rule to use, can still do so.
3909
3910Another solution is to bury the action inside a nonterminal symbol which
3911serves as a subroutine:
3912
3913@example
3914@group
de6be119
AD
3915subroutine:
3916 /* empty */ @{ prepare_for_local_variables (); @}
3917;
bfa74976
RS
3918@end group
3919
3920@group
de6be119
AD
3921compound:
3922 subroutine '@{' declarations statements '@}'
3923| subroutine '@{' statements '@}'
3924;
bfa74976
RS
3925@end group
3926@end example
3927
3928@noindent
3929Now Bison can execute the action in the rule for @code{subroutine} without
841a7737 3930deciding which rule for @code{compound} it will eventually use.
bfa74976 3931
7404cdf3 3932@node Tracking Locations
847bf1f5
AD
3933@section Tracking Locations
3934@cindex location
95923bd6
AD
3935@cindex textual location
3936@cindex location, textual
847bf1f5
AD
3937
3938Though grammar rules and semantic actions are enough to write a fully
72d2299c 3939functional parser, it can be useful to process some additional information,
3e259915
MA
3940especially symbol locations.
3941
704a47c4
AD
3942The way locations are handled is defined by providing a data type, and
3943actions to take when rules are matched.
847bf1f5
AD
3944
3945@menu
3946* Location Type:: Specifying a data type for locations.
3947* Actions and Locations:: Using locations in actions.
3948* Location Default Action:: Defining a general way to compute locations.
3949@end menu
3950
342b8b6e 3951@node Location Type
847bf1f5
AD
3952@subsection Data Type of Locations
3953@cindex data type of locations
3954@cindex default location type
3955
3956Defining a data type for locations is much simpler than for semantic values,
3957since all tokens and groupings always use the same type.
3958
50cce58e
PE
3959You can specify the type of locations by defining a macro called
3960@code{YYLTYPE}, just as you can specify the semantic value type by
ddc8ede1 3961defining a @code{YYSTYPE} macro (@pxref{Value Type}).
847bf1f5
AD
3962When @code{YYLTYPE} is not defined, Bison uses a default structure type with
3963four members:
3964
3965@example
6273355b 3966typedef struct YYLTYPE
847bf1f5
AD
3967@{
3968 int first_line;
3969 int first_column;
3970 int last_line;
3971 int last_column;
6273355b 3972@} YYLTYPE;
847bf1f5
AD
3973@end example
3974
8fbbeba2
AD
3975When @code{YYLTYPE} is not defined, at the beginning of the parsing, Bison
3976initializes all these fields to 1 for @code{yylloc}. To initialize
3977@code{yylloc} with a custom location type (or to chose a different
3978initialization), use the @code{%initial-action} directive. @xref{Initial
3979Action Decl, , Performing Actions before Parsing}.
cd48d21d 3980
342b8b6e 3981@node Actions and Locations
847bf1f5
AD
3982@subsection Actions and Locations
3983@cindex location actions
3984@cindex actions, location
3985@vindex @@$
3986@vindex @@@var{n}
1f68dca5
AR
3987@vindex @@@var{name}
3988@vindex @@[@var{name}]
847bf1f5
AD
3989
3990Actions are not only useful for defining language semantics, but also for
3991describing the behavior of the output parser with locations.
3992
3993The most obvious way for building locations of syntactic groupings is very
72d2299c 3994similar to the way semantic values are computed. In a given rule, several
847bf1f5
AD
3995constructs can be used to access the locations of the elements being matched.
3996The location of the @var{n}th component of the right hand side is
3997@code{@@@var{n}}, while the location of the left hand side grouping is
3998@code{@@$}.
3999
1f68dca5
AR
4000In addition, the named references construct @code{@@@var{name}} and
4001@code{@@[@var{name}]} may also be used to address the symbol locations.
ce24f7f5
JD
4002@xref{Named References}, for more information about using the named
4003references construct.
1f68dca5 4004
3e259915 4005Here is a basic example using the default data type for locations:
847bf1f5
AD
4006
4007@example
4008@group
de6be119
AD
4009exp:
4010 @dots{}
4011| exp '/' exp
4012 @{
4013 @@$.first_column = @@1.first_column;
4014 @@$.first_line = @@1.first_line;
4015 @@$.last_column = @@3.last_column;
4016 @@$.last_line = @@3.last_line;
4017 if ($3)
4018 $$ = $1 / $3;
4019 else
4020 @{
4021 $$ = 1;
4022 fprintf (stderr,
4023 "Division by zero, l%d,c%d-l%d,c%d",
4024 @@3.first_line, @@3.first_column,
4025 @@3.last_line, @@3.last_column);
4026 @}
4027 @}
847bf1f5
AD
4028@end group
4029@end example
4030
3e259915 4031As for semantic values, there is a default action for locations that is
72d2299c 4032run each time a rule is matched. It sets the beginning of @code{@@$} to the
3e259915 4033beginning of the first symbol, and the end of @code{@@$} to the end of the
79282c6c 4034last symbol.
3e259915 4035
72d2299c 4036With this default action, the location tracking can be fully automatic. The
3e259915
MA
4037example above simply rewrites this way:
4038
4039@example
4040@group
de6be119
AD
4041exp:
4042 @dots{}
4043| exp '/' exp
4044 @{
4045 if ($3)
4046 $$ = $1 / $3;
4047 else
4048 @{
4049 $$ = 1;
4050 fprintf (stderr,
4051 "Division by zero, l%d,c%d-l%d,c%d",
4052 @@3.first_line, @@3.first_column,
4053 @@3.last_line, @@3.last_column);
4054 @}
4055 @}
3e259915
MA
4056@end group
4057@end example
847bf1f5 4058
32c29292 4059@vindex yylloc
742e4900 4060It is also possible to access the location of the lookahead token, if any,
32c29292
JD
4061from a semantic action.
4062This location is stored in @code{yylloc}.
4063@xref{Action Features, ,Special Features for Use in Actions}.
4064
342b8b6e 4065@node Location Default Action
847bf1f5
AD
4066@subsection Default Action for Locations
4067@vindex YYLLOC_DEFAULT
35430378 4068@cindex GLR parsers and @code{YYLLOC_DEFAULT}
847bf1f5 4069
72d2299c 4070Actually, actions are not the best place to compute locations. Since
704a47c4
AD
4071locations are much more general than semantic values, there is room in
4072the output parser to redefine the default action to take for each
72d2299c 4073rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
96b93a3d
PE
4074matched, before the associated action is run. It is also invoked
4075while processing a syntax error, to compute the error's location.
35430378 4076Before reporting an unresolvable syntactic ambiguity, a GLR
8710fc41
JD
4077parser invokes @code{YYLLOC_DEFAULT} recursively to compute the location
4078of that ambiguity.
847bf1f5 4079
3e259915 4080Most of the time, this macro is general enough to suppress location
79282c6c 4081dedicated code from semantic actions.
847bf1f5 4082
72d2299c 4083The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
96b93a3d 4084the location of the grouping (the result of the computation). When a
766de5eb 4085rule is matched, the second parameter identifies locations of
96b93a3d 4086all right hand side elements of the rule being matched, and the third
8710fc41 4087parameter is the size of the rule's right hand side.
35430378 4088When a GLR parser reports an ambiguity, which of multiple candidate
8710fc41
JD
4089right hand sides it passes to @code{YYLLOC_DEFAULT} is undefined.
4090When processing a syntax error, the second parameter identifies locations
4091of the symbols that were discarded during error processing, and the third
96b93a3d 4092parameter is the number of discarded symbols.
847bf1f5 4093
766de5eb 4094By default, @code{YYLLOC_DEFAULT} is defined this way:
847bf1f5 4095
ea118b72 4096@example
847bf1f5 4097@group
ea118b72
AD
4098# define YYLLOC_DEFAULT(Cur, Rhs, N) \
4099do \
4100 if (N) \
4101 @{ \
4102 (Cur).first_line = YYRHSLOC(Rhs, 1).first_line; \
4103 (Cur).first_column = YYRHSLOC(Rhs, 1).first_column; \
4104 (Cur).last_line = YYRHSLOC(Rhs, N).last_line; \
4105 (Cur).last_column = YYRHSLOC(Rhs, N).last_column; \
4106 @} \
4107 else \
4108 @{ \
4109 (Cur).first_line = (Cur).last_line = \
4110 YYRHSLOC(Rhs, 0).last_line; \
4111 (Cur).first_column = (Cur).last_column = \
4112 YYRHSLOC(Rhs, 0).last_column; \
4113 @} \
4114while (0)
847bf1f5 4115@end group
ea118b72 4116@end example
676385e2 4117
2c0f9706 4118@noindent
766de5eb
PE
4119where @code{YYRHSLOC (rhs, k)} is the location of the @var{k}th symbol
4120in @var{rhs} when @var{k} is positive, and the location of the symbol
f28ac696 4121just before the reduction when @var{k} and @var{n} are both zero.
676385e2 4122
3e259915 4123When defining @code{YYLLOC_DEFAULT}, you should consider that:
847bf1f5 4124
3e259915 4125@itemize @bullet
79282c6c 4126@item
72d2299c 4127All arguments are free of side-effects. However, only the first one (the
3e259915 4128result) should be modified by @code{YYLLOC_DEFAULT}.
847bf1f5 4129
3e259915 4130@item
766de5eb
PE
4131For consistency with semantic actions, valid indexes within the
4132right hand side range from 1 to @var{n}. When @var{n} is zero, only 0 is a
4133valid index, and it refers to the symbol just before the reduction.
4134During error processing @var{n} is always positive.
0ae99356
PE
4135
4136@item
4137Your macro should parenthesize its arguments, if need be, since the
4138actual arguments may not be surrounded by parentheses. Also, your
4139macro should expand to something that can be used as a single
4140statement when it is followed by a semicolon.
3e259915 4141@end itemize
847bf1f5 4142
908c8647 4143@node Named References
ce24f7f5 4144@section Named References
908c8647
JD
4145@cindex named references
4146
7d31f092
JD
4147As described in the preceding sections, the traditional way to refer to any
4148semantic value or location is a @dfn{positional reference}, which takes the
4149form @code{$@var{n}}, @code{$$}, @code{@@@var{n}}, and @code{@@$}. However,
4150such a reference is not very descriptive. Moreover, if you later decide to
4151insert or remove symbols in the right-hand side of a grammar rule, the need
4152to renumber such references can be tedious and error-prone.
4153
4154To avoid these issues, you can also refer to a semantic value or location
4155using a @dfn{named reference}. First of all, original symbol names may be
4156used as named references. For example:
908c8647
JD
4157
4158@example
4159@group
4160invocation: op '(' args ')'
4161 @{ $invocation = new_invocation ($op, $args, @@invocation); @}
4162@end group
4163@end example
4164
4165@noindent
7d31f092 4166Positional and named references can be mixed arbitrarily. For example:
908c8647
JD
4167
4168@example
4169@group
4170invocation: op '(' args ')'
4171 @{ $$ = new_invocation ($op, $args, @@$); @}
4172@end group
4173@end example
4174
4175@noindent
4176However, sometimes regular symbol names are not sufficient due to
4177ambiguities:
4178
4179@example
4180@group
4181exp: exp '/' exp
4182 @{ $exp = $exp / $exp; @} // $exp is ambiguous.
4183
4184exp: exp '/' exp
4185 @{ $$ = $1 / $exp; @} // One usage is ambiguous.
4186
4187exp: exp '/' exp
4188 @{ $$ = $1 / $3; @} // No error.
4189@end group
4190@end example
4191
4192@noindent
4193When ambiguity occurs, explicitly declared names may be used for values and
4194locations. Explicit names are declared as a bracketed name after a symbol
4195appearance in rule definitions. For example:
4196@example
4197@group
4198exp[result]: exp[left] '/' exp[right]
4199 @{ $result = $left / $right; @}
4200@end group
4201@end example
4202
4203@noindent
ce24f7f5
JD
4204In order to access a semantic value generated by a mid-rule action, an
4205explicit name may also be declared by putting a bracketed name after the
4206closing brace of the mid-rule action code:
908c8647
JD
4207@example
4208@group
4209exp[res]: exp[x] '+' @{$left = $x;@}[left] exp[right]
4210 @{ $res = $left + $right; @}
4211@end group
4212@end example
4213
4214@noindent
4215
4216In references, in order to specify names containing dots and dashes, an explicit
4217bracketed syntax @code{$[name]} and @code{@@[name]} must be used:
4218@example
4219@group
14f4455e 4220if-stmt: "if" '(' expr ')' "then" then.stmt ';'
908c8647
JD
4221 @{ $[if-stmt] = new_if_stmt ($expr, $[then.stmt]); @}
4222@end group
4223@end example
4224
4225It often happens that named references are followed by a dot, dash or other
4226C punctuation marks and operators. By default, Bison will read
ce24f7f5
JD
4227@samp{$name.suffix} as a reference to symbol value @code{$name} followed by
4228@samp{.suffix}, i.e., an access to the @code{suffix} field of the semantic
4229value. In order to force Bison to recognize @samp{name.suffix} in its
4230entirety as the name of a semantic value, the bracketed syntax
4231@samp{$[name.suffix]} must be used.
4232
4233The named references feature is experimental. More user feedback will help
4234to stabilize it.
908c8647 4235
342b8b6e 4236@node Declarations
bfa74976
RS
4237@section Bison Declarations
4238@cindex declarations, Bison
4239@cindex Bison declarations
4240
4241The @dfn{Bison declarations} section of a Bison grammar defines the symbols
4242used in formulating the grammar and the data types of semantic values.
4243@xref{Symbols}.
4244
4245All token type names (but not single-character literal tokens such as
4246@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
4247declared if you need to specify which data type to use for the semantic
4248value (@pxref{Multiple Types, ,More Than One Value Type}).
4249
9913d6e4
JD
4250The first rule in the grammar file also specifies the start symbol, by
4251default. If you want some other symbol to be the start symbol, you
4252must declare it explicitly (@pxref{Language and Grammar, ,Languages
4253and Context-Free Grammars}).
bfa74976
RS
4254
4255@menu
b50d2359 4256* Require Decl:: Requiring a Bison version.
bfa74976
RS
4257* Token Decl:: Declaring terminal symbols.
4258* Precedence Decl:: Declaring terminals with precedence and associativity.
4259* Union Decl:: Declaring the set of all semantic value types.
4260* Type Decl:: Declaring the choice of type for a nonterminal symbol.
18d192f0 4261* Initial Action Decl:: Code run before parsing starts.
72f889cc 4262* Destructor Decl:: Declaring how symbols are freed.
56d60c19 4263* Printer Decl:: Declaring how symbol values are displayed.
d6328241 4264* Expect Decl:: Suppressing warnings about parsing conflicts.
bfa74976
RS
4265* Start Decl:: Specifying the start symbol.
4266* Pure Decl:: Requesting a reentrant parser.
9987d1b3 4267* Push Decl:: Requesting a push parser.
bfa74976 4268* Decl Summary:: Table of all Bison declarations.
2f4518a1 4269* %define Summary:: Defining variables to adjust Bison's behavior.
8e6f2266 4270* %code Summary:: Inserting code into the parser source.
bfa74976
RS
4271@end menu
4272
b50d2359
AD
4273@node Require Decl
4274@subsection Require a Version of Bison
4275@cindex version requirement
4276@cindex requiring a version of Bison
4277@findex %require
4278
4279You may require the minimum version of Bison to process the grammar. If
9b8a5ce0
AD
4280the requirement is not met, @command{bison} exits with an error (exit
4281status 63).
b50d2359
AD
4282
4283@example
4284%require "@var{version}"
4285@end example
4286
342b8b6e 4287@node Token Decl
bfa74976
RS
4288@subsection Token Type Names
4289@cindex declaring token type names
4290@cindex token type names, declaring
931c7513 4291@cindex declaring literal string tokens
bfa74976
RS
4292@findex %token
4293
4294The basic way to declare a token type name (terminal symbol) is as follows:
4295
4296@example
4297%token @var{name}
4298@end example
4299
4300Bison will convert this into a @code{#define} directive in
4301the parser, so that the function @code{yylex} (if it is in this file)
4302can use the name @var{name} to stand for this token type's code.
4303
14ded682
AD
4304Alternatively, you can use @code{%left}, @code{%right}, or
4305@code{%nonassoc} instead of @code{%token}, if you wish to specify
4306associativity and precedence. @xref{Precedence Decl, ,Operator
4307Precedence}.
bfa74976
RS
4308
4309You can explicitly specify the numeric code for a token type by appending
b1cc23c4 4310a nonnegative decimal or hexadecimal integer value in the field immediately
1452af69 4311following the token name:
bfa74976
RS
4312
4313@example
4314%token NUM 300
1452af69 4315%token XNUM 0x12d // a GNU extension
bfa74976
RS
4316@end example
4317
4318@noindent
4319It is generally best, however, to let Bison choose the numeric codes for
4320all token types. Bison will automatically select codes that don't conflict
e966383b 4321with each other or with normal characters.
bfa74976
RS
4322
4323In the event that the stack type is a union, you must augment the
4324@code{%token} or other token declaration to include the data type
704a47c4
AD
4325alternative delimited by angle-brackets (@pxref{Multiple Types, ,More
4326Than One Value Type}).
bfa74976
RS
4327
4328For example:
4329
4330@example
4331@group
4332%union @{ /* define stack type */
4333 double val;
4334 symrec *tptr;
4335@}
4336%token <val> NUM /* define token NUM and its type */
4337@end group
4338@end example
4339
931c7513
RS
4340You can associate a literal string token with a token type name by
4341writing the literal string at the end of a @code{%token}
4342declaration which declares the name. For example:
4343
4344@example
4345%token arrow "=>"
4346@end example
4347
4348@noindent
4349For example, a grammar for the C language might specify these names with
4350equivalent literal string tokens:
4351
4352@example
4353%token <operator> OR "||"
4354%token <operator> LE 134 "<="
4355%left OR "<="
4356@end example
4357
4358@noindent
4359Once you equate the literal string and the token name, you can use them
4360interchangeably in further declarations or the grammar rules. The
4361@code{yylex} function can use the token name or the literal string to
4362obtain the token type code number (@pxref{Calling Convention}).
b1cc23c4
JD
4363Syntax error messages passed to @code{yyerror} from the parser will reference
4364the literal string instead of the token name.
4365
4366The token numbered as 0 corresponds to end of file; the following line
4367allows for nicer error messages referring to ``end of file'' instead
4368of ``$end'':
4369
4370@example
4371%token END 0 "end of file"
4372@end example
931c7513 4373
342b8b6e 4374@node Precedence Decl
bfa74976
RS
4375@subsection Operator Precedence
4376@cindex precedence declarations
4377@cindex declaring operator precedence
4378@cindex operator precedence, declaring
4379
4380Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
4381declare a token and specify its precedence and associativity, all at
4382once. These are called @dfn{precedence declarations}.
704a47c4
AD
4383@xref{Precedence, ,Operator Precedence}, for general information on
4384operator precedence.
bfa74976 4385
ab7f29f8 4386The syntax of a precedence declaration is nearly the same as that of
bfa74976
RS
4387@code{%token}: either
4388
4389@example
4390%left @var{symbols}@dots{}
4391@end example
4392
4393@noindent
4394or
4395
4396@example
4397%left <@var{type}> @var{symbols}@dots{}
4398@end example
4399
4400And indeed any of these declarations serves the purposes of @code{%token}.
4401But in addition, they specify the associativity and relative precedence for
4402all the @var{symbols}:
4403
4404@itemize @bullet
4405@item
4406The associativity of an operator @var{op} determines how repeated uses
4407of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
4408@var{z}} is parsed by grouping @var{x} with @var{y} first or by
4409grouping @var{y} with @var{z} first. @code{%left} specifies
4410left-associativity (grouping @var{x} with @var{y} first) and
4411@code{%right} specifies right-associativity (grouping @var{y} with
4412@var{z} first). @code{%nonassoc} specifies no associativity, which
4413means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
4414considered a syntax error.
4415
4416@item
4417The precedence of an operator determines how it nests with other operators.
4418All the tokens declared in a single precedence declaration have equal
4419precedence and nest together according to their associativity.
4420When two tokens declared in different precedence declarations associate,
4421the one declared later has the higher precedence and is grouped first.
4422@end itemize
4423
ab7f29f8
JD
4424For backward compatibility, there is a confusing difference between the
4425argument lists of @code{%token} and precedence declarations.
4426Only a @code{%token} can associate a literal string with a token type name.
4427A precedence declaration always interprets a literal string as a reference to a
4428separate token.
4429For example:
4430
4431@example
4432%left OR "<=" // Does not declare an alias.
4433%left OR 134 "<=" 135 // Declares 134 for OR and 135 for "<=".
4434@end example
4435
342b8b6e 4436@node Union Decl
bfa74976
RS
4437@subsection The Collection of Value Types
4438@cindex declaring value types
4439@cindex value types, declaring
4440@findex %union
4441
287c78f6
PE
4442The @code{%union} declaration specifies the entire collection of
4443possible data types for semantic values. The keyword @code{%union} is
4444followed by braced code containing the same thing that goes inside a
4445@code{union} in C@.
bfa74976
RS
4446
4447For example:
4448
4449@example
4450@group
4451%union @{
4452 double val;
4453 symrec *tptr;
4454@}
4455@end group
4456@end example
4457
4458@noindent
4459This says that the two alternative types are @code{double} and @code{symrec
4460*}. They are given names @code{val} and @code{tptr}; these names are used
4461in the @code{%token} and @code{%type} declarations to pick one of the types
4462for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
4463
35430378 4464As an extension to POSIX, a tag is allowed after the
6273355b
PE
4465@code{union}. For example:
4466
4467@example
4468@group
4469%union value @{
4470 double val;
4471 symrec *tptr;
4472@}
4473@end group
4474@end example
4475
d6ca7905 4476@noindent
6273355b
PE
4477specifies the union tag @code{value}, so the corresponding C type is
4478@code{union value}. If you do not specify a tag, it defaults to
4479@code{YYSTYPE}.
4480
35430378 4481As another extension to POSIX, you may specify multiple
d6ca7905
PE
4482@code{%union} declarations; their contents are concatenated. However,
4483only the first @code{%union} declaration can specify a tag.
4484
6273355b 4485Note that, unlike making a @code{union} declaration in C, you need not write
bfa74976
RS
4486a semicolon after the closing brace.
4487
ddc8ede1
PE
4488Instead of @code{%union}, you can define and use your own union type
4489@code{YYSTYPE} if your grammar contains at least one
4490@samp{<@var{type}>} tag. For example, you can put the following into
4491a header file @file{parser.h}:
4492
4493@example
4494@group
4495union YYSTYPE @{
4496 double val;
4497 symrec *tptr;
4498@};
4499typedef union YYSTYPE YYSTYPE;
4500@end group
4501@end example
4502
4503@noindent
4504and then your grammar can use the following
4505instead of @code{%union}:
4506
4507@example
4508@group
4509%@{
4510#include "parser.h"
4511%@}
4512%type <val> expr
4513%token <tptr> ID
4514@end group
4515@end example
4516
342b8b6e 4517@node Type Decl
bfa74976
RS
4518@subsection Nonterminal Symbols
4519@cindex declaring value types, nonterminals
4520@cindex value types, nonterminals, declaring
4521@findex %type
4522
4523@noindent
4524When you use @code{%union} to specify multiple value types, you must
4525declare the value type of each nonterminal symbol for which values are
4526used. This is done with a @code{%type} declaration, like this:
4527
4528@example
4529%type <@var{type}> @var{nonterminal}@dots{}
4530@end example
4531
4532@noindent
704a47c4
AD
4533Here @var{nonterminal} is the name of a nonterminal symbol, and
4534@var{type} is the name given in the @code{%union} to the alternative
4535that you want (@pxref{Union Decl, ,The Collection of Value Types}). You
4536can give any number of nonterminal symbols in the same @code{%type}
4537declaration, if they have the same value type. Use spaces to separate
4538the symbol names.
bfa74976 4539
931c7513
RS
4540You can also declare the value type of a terminal symbol. To do this,
4541use the same @code{<@var{type}>} construction in a declaration for the
4542terminal symbol. All kinds of token declarations allow
4543@code{<@var{type}>}.
4544
18d192f0
AD
4545@node Initial Action Decl
4546@subsection Performing Actions before Parsing
4547@findex %initial-action
4548
4549Sometimes your parser needs to perform some initializations before
4550parsing. The @code{%initial-action} directive allows for such arbitrary
4551code.
4552
4553@deffn {Directive} %initial-action @{ @var{code} @}
4554@findex %initial-action
287c78f6 4555Declare that the braced @var{code} must be invoked before parsing each time
cd735a8c
AD
4556@code{yyparse} is called. The @var{code} may use @code{$$} (or
4557@code{$<@var{tag}>$}) and @code{@@$} --- initial value and location of the
4558lookahead --- and the @code{%parse-param}.
18d192f0
AD
4559@end deffn
4560
451364ed
AD
4561For instance, if your locations use a file name, you may use
4562
4563@example
48b16bbc 4564%parse-param @{ char const *file_name @};
451364ed
AD
4565%initial-action
4566@{
4626a15d 4567 @@$.initialize (file_name);
451364ed
AD
4568@};
4569@end example
4570
18d192f0 4571
72f889cc
AD
4572@node Destructor Decl
4573@subsection Freeing Discarded Symbols
4574@cindex freeing discarded symbols
4575@findex %destructor
12e35840 4576@findex <*>
3ebecc24 4577@findex <>
a85284cf
AD
4578During error recovery (@pxref{Error Recovery}), symbols already pushed
4579on the stack and tokens coming from the rest of the file are discarded
4580until the parser falls on its feet. If the parser runs out of memory,
9d9b8b70 4581or if it returns via @code{YYABORT} or @code{YYACCEPT}, all the
a85284cf
AD
4582symbols on the stack must be discarded. Even if the parser succeeds, it
4583must discard the start symbol.
258b75ca
PE
4584
4585When discarded symbols convey heap based information, this memory is
4586lost. While this behavior can be tolerable for batch parsers, such as
4b367315
AD
4587in traditional compilers, it is unacceptable for programs like shells or
4588protocol implementations that may parse and execute indefinitely.
258b75ca 4589
a85284cf
AD
4590The @code{%destructor} directive defines code that is called when a
4591symbol is automatically discarded.
72f889cc
AD
4592
4593@deffn {Directive} %destructor @{ @var{code} @} @var{symbols}
4594@findex %destructor
287c78f6 4595Invoke the braced @var{code} whenever the parser discards one of the
4982f078
AD
4596@var{symbols}. Within @var{code}, @code{$$} (or @code{$<@var{tag}>$})
4597designates the semantic value associated with the discarded symbol, and
4598@code{@@$} designates its location. The additional parser parameters are
4599also available (@pxref{Parser Function, , The Parser Function
4600@code{yyparse}}).
ec5479ce 4601
b2a0b7ca
JD
4602When a symbol is listed among @var{symbols}, its @code{%destructor} is called a
4603per-symbol @code{%destructor}.
4604You may also define a per-type @code{%destructor} by listing a semantic type
12e35840 4605tag among @var{symbols}.
b2a0b7ca 4606In that case, the parser will invoke this @var{code} whenever it discards any
12e35840 4607grammar symbol that has that semantic type tag unless that symbol has its own
b2a0b7ca
JD
4608per-symbol @code{%destructor}.
4609
12e35840 4610Finally, you can define two different kinds of default @code{%destructor}s.
85894313
JD
4611(These default forms are experimental.
4612More user feedback will help to determine whether they should become permanent
4613features.)
3ebecc24 4614You can place each of @code{<*>} and @code{<>} in the @var{symbols} list of
12e35840
JD
4615exactly one @code{%destructor} declaration in your grammar file.
4616The parser will invoke the @var{code} associated with one of these whenever it
4617discards any user-defined grammar symbol that has no per-symbol and no per-type
4618@code{%destructor}.
4619The parser uses the @var{code} for @code{<*>} in the case of such a grammar
4620symbol for which you have formally declared a semantic type tag (@code{%type}
4621counts as such a declaration, but @code{$<tag>$} does not).
3ebecc24 4622The parser uses the @var{code} for @code{<>} in the case of such a grammar
12e35840 4623symbol that has no declared semantic type tag.
72f889cc
AD
4624@end deffn
4625
b2a0b7ca 4626@noindent
12e35840 4627For example:
72f889cc 4628
ea118b72 4629@example
ec5479ce
JD
4630%union @{ char *string; @}
4631%token <string> STRING1
4632%token <string> STRING2
4633%type <string> string1
4634%type <string> string2
b2a0b7ca
JD
4635%union @{ char character; @}
4636%token <character> CHR
4637%type <character> chr
12e35840
JD
4638%token TAGLESS
4639
b2a0b7ca 4640%destructor @{ @} <character>
12e35840
JD
4641%destructor @{ free ($$); @} <*>
4642%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
3ebecc24 4643%destructor @{ printf ("Discarding tagless symbol.\n"); @} <>
ea118b72 4644@end example
72f889cc
AD
4645
4646@noindent
b2a0b7ca
JD
4647guarantees that, when the parser discards any user-defined symbol that has a
4648semantic type tag other than @code{<character>}, it passes its semantic value
12e35840 4649to @code{free} by default.
ec5479ce
JD
4650However, when the parser discards a @code{STRING1} or a @code{string1}, it also
4651prints its line number to @code{stdout}.
4652It performs only the second @code{%destructor} in this case, so it invokes
4653@code{free} only once.
12e35840
JD
4654Finally, the parser merely prints a message whenever it discards any symbol,
4655such as @code{TAGLESS}, that has no semantic type tag.
4656
4657A Bison-generated parser invokes the default @code{%destructor}s only for
4658user-defined as opposed to Bison-defined symbols.
4659For example, the parser will not invoke either kind of default
4660@code{%destructor} for the special Bison-defined symbols @code{$accept},
4661@code{$undefined}, or @code{$end} (@pxref{Table of Symbols, ,Bison Symbols}),
4662none of which you can reference in your grammar.
4663It also will not invoke either for the @code{error} token (@pxref{Table of
4664Symbols, ,error}), which is always defined by Bison regardless of whether you
4665reference it in your grammar.
4666However, it may invoke one of them for the end token (token 0) if you
4667redefine it from @code{$end} to, for example, @code{END}:
3508ce36 4668
ea118b72 4669@example
3508ce36 4670%token END 0
ea118b72 4671@end example
3508ce36 4672
12e35840
JD
4673@cindex actions in mid-rule
4674@cindex mid-rule actions
4675Finally, Bison will never invoke a @code{%destructor} for an unreferenced
4676mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}).
ce24f7f5
JD
4677That is, Bison does not consider a mid-rule to have a semantic value if you
4678do not reference @code{$$} in the mid-rule's action or @code{$@var{n}}
4679(where @var{n} is the right-hand side symbol position of the mid-rule) in
4680any later action in that rule. However, if you do reference either, the
4681Bison-generated parser will invoke the @code{<>} @code{%destructor} whenever
4682it discards the mid-rule symbol.
12e35840 4683
3508ce36
JD
4684@ignore
4685@noindent
4686In the future, it may be possible to redefine the @code{error} token as a
4687nonterminal that captures the discarded symbols.
4688In that case, the parser will invoke the default destructor for it as well.
4689@end ignore
4690
e757bb10
AD
4691@sp 1
4692
4693@cindex discarded symbols
4694@dfn{Discarded symbols} are the following:
4695
4696@itemize
4697@item
4698stacked symbols popped during the first phase of error recovery,
4699@item
4700incoming terminals during the second phase of error recovery,
4701@item
742e4900 4702the current lookahead and the entire stack (except the current
9d9b8b70 4703right-hand side symbols) when the parser returns immediately, and
258b75ca
PE
4704@item
4705the start symbol, when the parser succeeds.
e757bb10
AD
4706@end itemize
4707
9d9b8b70
PE
4708The parser can @dfn{return immediately} because of an explicit call to
4709@code{YYABORT} or @code{YYACCEPT}, or failed error recovery, or memory
4710exhaustion.
4711
29553547 4712Right-hand side symbols of a rule that explicitly triggers a syntax
9d9b8b70
PE
4713error via @code{YYERROR} are not discarded automatically. As a rule
4714of thumb, destructors are invoked only when user actions cannot manage
a85284cf 4715the memory.
e757bb10 4716
56d60c19
AD
4717@node Printer Decl
4718@subsection Printing Semantic Values
4719@cindex printing semantic values
4720@findex %printer
4721@findex <*>
4722@findex <>
4723When run-time traces are enabled (@pxref{Tracing, ,Tracing Your Parser}),
4724the parser reports its actions, such as reductions. When a symbol involved
4725in an action is reported, only its kind is displayed, as the parser cannot
4726know how semantic values should be formatted.
4727
4728The @code{%printer} directive defines code that is called when a symbol is
4729reported. Its syntax is the same as @code{%destructor} (@pxref{Destructor
4730Decl, , Freeing Discarded Symbols}).
4731
4732@deffn {Directive} %printer @{ @var{code} @} @var{symbols}
4733@findex %printer
4734@vindex yyoutput
4735@c This is the same text as for %destructor.
4736Invoke the braced @var{code} whenever the parser displays one of the
4737@var{symbols}. Within @var{code}, @code{yyoutput} denotes the output stream
4982f078
AD
4738(a @code{FILE*} in C, and an @code{std::ostream&} in C++), @code{$$} (or
4739@code{$<@var{tag}>$}) designates the semantic value associated with the
4740symbol, and @code{@@$} its location. The additional parser parameters are
4741also available (@pxref{Parser Function, , The Parser Function
4742@code{yyparse}}).
56d60c19
AD
4743
4744The @var{symbols} are defined as for @code{%destructor} (@pxref{Destructor
4745Decl, , Freeing Discarded Symbols}.): they can be per-type (e.g.,
4746@samp{<ival>}), per-symbol (e.g., @samp{exp}, @samp{NUM}, @samp{"float"}),
4747typed per-default (i.e., @samp{<*>}, or untyped per-default (i.e.,
4748@samp{<>}).
4749@end deffn
4750
4751@noindent
4752For example:
4753
4754@example
4755%union @{ char *string; @}
4756%token <string> STRING1
4757%token <string> STRING2
4758%type <string> string1
4759%type <string> string2
4760%union @{ char character; @}
4761%token <character> CHR
4762%type <character> chr
4763%token TAGLESS
4764
4765%printer @{ fprintf (yyoutput, "'%c'", $$); @} <character>
4766%printer @{ fprintf (yyoutput, "&%p", $$); @} <*>
4767%printer @{ fprintf (yyoutput, "\"%s\"", $$); @} STRING1 string1
4768%printer @{ fprintf (yyoutput, "<>"); @} <>
4769@end example
4770
4771@noindent
4772guarantees that, when the parser print any symbol that has a semantic type
4773tag other than @code{<character>}, it display the address of the semantic
4774value by default. However, when the parser displays a @code{STRING1} or a
4775@code{string1}, it formats it as a string in double quotes. It performs
4776only the second @code{%printer} in this case, so it prints only once.
4777Finally, the parser print @samp{<>} for any symbol, such as @code{TAGLESS},
4778that has no semantic type tag. See also
4779
4780
342b8b6e 4781@node Expect Decl
bfa74976
RS
4782@subsection Suppressing Conflict Warnings
4783@cindex suppressing conflict warnings
4784@cindex preventing warnings about conflicts
4785@cindex warnings, preventing
4786@cindex conflicts, suppressing warnings of
4787@findex %expect
d6328241 4788@findex %expect-rr
bfa74976
RS
4789
4790Bison normally warns if there are any conflicts in the grammar
7da99ede
AD
4791(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars
4792have harmless shift/reduce conflicts which are resolved in a predictable
4793way and would be difficult to eliminate. It is desirable to suppress
4794the warning about these conflicts unless the number of conflicts
4795changes. You can do this with the @code{%expect} declaration.
bfa74976
RS
4796
4797The declaration looks like this:
4798
4799@example
4800%expect @var{n}
4801@end example
4802
035aa4a0
PE
4803Here @var{n} is a decimal integer. The declaration says there should
4804be @var{n} shift/reduce conflicts and no reduce/reduce conflicts.
4805Bison reports an error if the number of shift/reduce conflicts differs
4806from @var{n}, or if there are any reduce/reduce conflicts.
bfa74976 4807
34a6c2d1 4808For deterministic parsers, reduce/reduce conflicts are more
035aa4a0 4809serious, and should be eliminated entirely. Bison will always report
35430378 4810reduce/reduce conflicts for these parsers. With GLR
035aa4a0 4811parsers, however, both kinds of conflicts are routine; otherwise,
35430378 4812there would be no need to use GLR parsing. Therefore, it is
035aa4a0 4813also possible to specify an expected number of reduce/reduce conflicts
35430378 4814in GLR parsers, using the declaration:
d6328241
PH
4815
4816@example
4817%expect-rr @var{n}
4818@end example
4819
bfa74976
RS
4820In general, using @code{%expect} involves these steps:
4821
4822@itemize @bullet
4823@item
4824Compile your grammar without @code{%expect}. Use the @samp{-v} option
4825to get a verbose list of where the conflicts occur. Bison will also
4826print the number of conflicts.
4827
4828@item
4829Check each of the conflicts to make sure that Bison's default
4830resolution is what you really want. If not, rewrite the grammar and
4831go back to the beginning.
4832
4833@item
4834Add an @code{%expect} declaration, copying the number @var{n} from the
35430378 4835number which Bison printed. With GLR parsers, add an
035aa4a0 4836@code{%expect-rr} declaration as well.
bfa74976
RS
4837@end itemize
4838
cf22447c
JD
4839Now Bison will report an error if you introduce an unexpected conflict,
4840but will keep silent otherwise.
bfa74976 4841
342b8b6e 4842@node Start Decl
bfa74976
RS
4843@subsection The Start-Symbol
4844@cindex declaring the start symbol
4845@cindex start symbol, declaring
4846@cindex default start symbol
4847@findex %start
4848
4849Bison assumes by default that the start symbol for the grammar is the first
4850nonterminal specified in the grammar specification section. The programmer
4851may override this restriction with the @code{%start} declaration as follows:
4852
4853@example
4854%start @var{symbol}
4855@end example
4856
342b8b6e 4857@node Pure Decl
bfa74976
RS
4858@subsection A Pure (Reentrant) Parser
4859@cindex reentrant parser
4860@cindex pure parser
d9df47b6 4861@findex %define api.pure
bfa74976
RS
4862
4863A @dfn{reentrant} program is one which does not alter in the course of
4864execution; in other words, it consists entirely of @dfn{pure} (read-only)
4865code. Reentrancy is important whenever asynchronous execution is possible;
9d9b8b70
PE
4866for example, a nonreentrant program may not be safe to call from a signal
4867handler. In systems with multiple threads of control, a nonreentrant
bfa74976
RS
4868program must be called only within interlocks.
4869
70811b85 4870Normally, Bison generates a parser which is not reentrant. This is
c827f760
PE
4871suitable for most uses, and it permits compatibility with Yacc. (The
4872standard Yacc interfaces are inherently nonreentrant, because they use
70811b85
RS
4873statically allocated variables for communication with @code{yylex},
4874including @code{yylval} and @code{yylloc}.)
bfa74976 4875
70811b85 4876Alternatively, you can generate a pure, reentrant parser. The Bison
d9df47b6 4877declaration @code{%define api.pure} says that you want the parser to be
70811b85 4878reentrant. It looks like this:
bfa74976
RS
4879
4880@example
d9df47b6 4881%define api.pure
bfa74976
RS
4882@end example
4883
70811b85
RS
4884The result is that the communication variables @code{yylval} and
4885@code{yylloc} become local variables in @code{yyparse}, and a different
4886calling convention is used for the lexical analyzer function
4887@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
f4101aa6
AD
4888Parsers}, for the details of this. The variable @code{yynerrs}
4889becomes local in @code{yyparse} in pull mode but it becomes a member
9987d1b3 4890of yypstate in push mode. (@pxref{Error Reporting, ,The Error
70811b85
RS
4891Reporting Function @code{yyerror}}). The convention for calling
4892@code{yyparse} itself is unchanged.
4893
4894Whether the parser is pure has nothing to do with the grammar rules.
4895You can generate either a pure parser or a nonreentrant parser from any
4896valid grammar.
bfa74976 4897
9987d1b3
JD
4898@node Push Decl
4899@subsection A Push Parser
4900@cindex push parser
4901@cindex push parser
812775a0 4902@findex %define api.push-pull
9987d1b3 4903
59da312b
JD
4904(The current push parsing interface is experimental and may evolve.
4905More user feedback will help to stabilize it.)
4906
f4101aa6
AD
4907A pull parser is called once and it takes control until all its input
4908is completely parsed. A push parser, on the other hand, is called
9987d1b3
JD
4909each time a new token is made available.
4910
f4101aa6 4911A push parser is typically useful when the parser is part of a
9987d1b3 4912main event loop in the client's application. This is typically
f4101aa6
AD
4913a requirement of a GUI, when the main event loop needs to be triggered
4914within a certain time period.
9987d1b3 4915
d782395d
JD
4916Normally, Bison generates a pull parser.
4917The following Bison declaration says that you want the parser to be a push
2f4518a1 4918parser (@pxref{%define Summary,,api.push-pull}):
9987d1b3
JD
4919
4920@example
f37495f6 4921%define api.push-pull push
9987d1b3
JD
4922@end example
4923
4924In almost all cases, you want to ensure that your push parser is also
4925a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}). The only
f4101aa6 4926time you should create an impure push parser is to have backwards
9987d1b3
JD
4927compatibility with the impure Yacc pull mode interface. Unless you know
4928what you are doing, your declarations should look like this:
4929
4930@example
d9df47b6 4931%define api.pure
f37495f6 4932%define api.push-pull push
9987d1b3
JD
4933@end example
4934
f4101aa6
AD
4935There is a major notable functional difference between the pure push parser
4936and the impure push parser. It is acceptable for a pure push parser to have
9987d1b3
JD
4937many parser instances, of the same type of parser, in memory at the same time.
4938An impure push parser should only use one parser at a time.
4939
4940When a push parser is selected, Bison will generate some new symbols in
f4101aa6
AD
4941the generated parser. @code{yypstate} is a structure that the generated
4942parser uses to store the parser's state. @code{yypstate_new} is the
9987d1b3
JD
4943function that will create a new parser instance. @code{yypstate_delete}
4944will free the resources associated with the corresponding parser instance.
f4101aa6 4945Finally, @code{yypush_parse} is the function that should be called whenever a
9987d1b3
JD
4946token is available to provide the parser. A trivial example
4947of using a pure push parser would look like this:
4948
4949@example
4950int status;
4951yypstate *ps = yypstate_new ();
4952do @{
4953 status = yypush_parse (ps, yylex (), NULL);
4954@} while (status == YYPUSH_MORE);
4955yypstate_delete (ps);
4956@end example
4957
4958If the user decided to use an impure push parser, a few things about
f4101aa6 4959the generated parser will change. The @code{yychar} variable becomes
9987d1b3
JD
4960a global variable instead of a variable in the @code{yypush_parse} function.
4961For this reason, the signature of the @code{yypush_parse} function is
f4101aa6 4962changed to remove the token as a parameter. A nonreentrant push parser
9987d1b3
JD
4963example would thus look like this:
4964
4965@example
4966extern int yychar;
4967int status;
4968yypstate *ps = yypstate_new ();
4969do @{
4970 yychar = yylex ();
4971 status = yypush_parse (ps);
4972@} while (status == YYPUSH_MORE);
4973yypstate_delete (ps);
4974@end example
4975
f4101aa6 4976That's it. Notice the next token is put into the global variable @code{yychar}
9987d1b3
JD
4977for use by the next invocation of the @code{yypush_parse} function.
4978
f4101aa6 4979Bison also supports both the push parser interface along with the pull parser
9987d1b3 4980interface in the same generated parser. In order to get this functionality,
f37495f6
JD
4981you should replace the @code{%define api.push-pull push} declaration with the
4982@code{%define api.push-pull both} declaration. Doing this will create all of
c373bf8b 4983the symbols mentioned earlier along with the two extra symbols, @code{yyparse}
f4101aa6
AD
4984and @code{yypull_parse}. @code{yyparse} can be used exactly as it normally
4985would be used. However, the user should note that it is implemented in the
d782395d
JD
4986generated parser by calling @code{yypull_parse}.
4987This makes the @code{yyparse} function that is generated with the
f37495f6 4988@code{%define api.push-pull both} declaration slower than the normal
d782395d
JD
4989@code{yyparse} function. If the user
4990calls the @code{yypull_parse} function it will parse the rest of the input
f4101aa6
AD
4991stream. It is possible to @code{yypush_parse} tokens to select a subgrammar
4992and then @code{yypull_parse} the rest of the input stream. If you would like
4993to switch back and forth between between parsing styles, you would have to
4994write your own @code{yypull_parse} function that knows when to quit looking
4995for input. An example of using the @code{yypull_parse} function would look
9987d1b3
JD
4996like this:
4997
4998@example
4999yypstate *ps = yypstate_new ();
5000yypull_parse (ps); /* Will call the lexer */
5001yypstate_delete (ps);
5002@end example
5003
d9df47b6 5004Adding the @code{%define api.pure} declaration does exactly the same thing to
f37495f6
JD
5005the generated parser with @code{%define api.push-pull both} as it did for
5006@code{%define api.push-pull push}.
9987d1b3 5007
342b8b6e 5008@node Decl Summary
bfa74976
RS
5009@subsection Bison Declaration Summary
5010@cindex Bison declaration summary
5011@cindex declaration summary
5012@cindex summary, Bison declaration
5013
d8988b2f 5014Here is a summary of the declarations used to define a grammar:
bfa74976 5015
18b519c0 5016@deffn {Directive} %union
bfa74976
RS
5017Declare the collection of data types that semantic values may have
5018(@pxref{Union Decl, ,The Collection of Value Types}).
18b519c0 5019@end deffn
bfa74976 5020
18b519c0 5021@deffn {Directive} %token
bfa74976
RS
5022Declare a terminal symbol (token type name) with no precedence
5023or associativity specified (@pxref{Token Decl, ,Token Type Names}).
18b519c0 5024@end deffn
bfa74976 5025
18b519c0 5026@deffn {Directive} %right
bfa74976
RS
5027Declare a terminal symbol (token type name) that is right-associative
5028(@pxref{Precedence Decl, ,Operator Precedence}).
18b519c0 5029@end deffn
bfa74976 5030
18b519c0 5031@deffn {Directive} %left
bfa74976
RS
5032Declare a terminal symbol (token type name) that is left-associative
5033(@pxref{Precedence Decl, ,Operator Precedence}).
18b519c0 5034@end deffn
bfa74976 5035
18b519c0 5036@deffn {Directive} %nonassoc
bfa74976 5037Declare a terminal symbol (token type name) that is nonassociative
bfa74976 5038(@pxref{Precedence Decl, ,Operator Precedence}).
39a06c25
PE
5039Using it in a way that would be associative is a syntax error.
5040@end deffn
5041
91d2c560 5042@ifset defaultprec
39a06c25 5043@deffn {Directive} %default-prec
22fccf95 5044Assign a precedence to rules lacking an explicit @code{%prec} modifier
39a06c25
PE
5045(@pxref{Contextual Precedence, ,Context-Dependent Precedence}).
5046@end deffn
91d2c560 5047@end ifset
bfa74976 5048
18b519c0 5049@deffn {Directive} %type
bfa74976
RS
5050Declare the type of semantic values for a nonterminal symbol
5051(@pxref{Type Decl, ,Nonterminal Symbols}).
18b519c0 5052@end deffn
bfa74976 5053
18b519c0 5054@deffn {Directive} %start
89cab50d
AD
5055Specify the grammar's start symbol (@pxref{Start Decl, ,The
5056Start-Symbol}).
18b519c0 5057@end deffn
bfa74976 5058
18b519c0 5059@deffn {Directive} %expect
bfa74976
RS
5060Declare the expected number of shift-reduce conflicts
5061(@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
18b519c0
AD
5062@end deffn
5063
bfa74976 5064
d8988b2f
AD
5065@sp 1
5066@noindent
5067In order to change the behavior of @command{bison}, use the following
5068directives:
5069
148d66d8 5070@deffn {Directive} %code @{@var{code}@}
8e6f2266 5071@deffnx {Directive} %code @var{qualifier} @{@var{code}@}
148d66d8 5072@findex %code
8e6f2266
JD
5073Insert @var{code} verbatim into the output parser source at the
5074default location or at the location specified by @var{qualifier}.
5075@xref{%code Summary}.
148d66d8
JD
5076@end deffn
5077
18b519c0 5078@deffn {Directive} %debug
e358222b 5079In the parser implementation file, define the macro @code{YYDEBUG} (or
3746fc33 5080@code{@var{prefix}DEBUG} with @samp{%define api.prefix @var{prefix}}, see
e358222b
AD
5081@ref{Multiple Parsers, ,Multiple Parsers in the Same Program}) to 1 if it is
5082not already defined, so that the debugging facilities are compiled.
5083@xref{Tracing, ,Tracing Your Parser}.
bd5df716 5084@end deffn
d8988b2f 5085
2f4518a1
JD
5086@deffn {Directive} %define @var{variable}
5087@deffnx {Directive} %define @var{variable} @var{value}
5088@deffnx {Directive} %define @var{variable} "@var{value}"
5089Define a variable to adjust Bison's behavior. @xref{%define Summary}.
5090@end deffn
5091
5092@deffn {Directive} %defines
5093Write a parser header file containing macro definitions for the token
5094type names defined in the grammar as well as a few other declarations.
5095If the parser implementation file is named @file{@var{name}.c} then
5096the parser header file is named @file{@var{name}.h}.
5097
5098For C parsers, the parser header file declares @code{YYSTYPE} unless
5099@code{YYSTYPE} is already defined as a macro or you have used a
5100@code{<@var{type}>} tag without using @code{%union}. Therefore, if
5101you are using a @code{%union} (@pxref{Multiple Types, ,More Than One
5102Value Type}) with components that require other definitions, or if you
5103have defined a @code{YYSTYPE} macro or type definition (@pxref{Value
5104Type, ,Data Types of Semantic Values}), you need to arrange for these
5105definitions to be propagated to all modules, e.g., by putting them in
5106a prerequisite header that is included both by your parser and by any
5107other module that needs @code{YYSTYPE}.
5108
5109Unless your parser is pure, the parser header file declares
5110@code{yylval} as an external variable. @xref{Pure Decl, ,A Pure
5111(Reentrant) Parser}.
5112
5113If you have also used locations, the parser header file declares
7404cdf3
JD
5114@code{YYLTYPE} and @code{yylloc} using a protocol similar to that of the
5115@code{YYSTYPE} macro and @code{yylval}. @xref{Tracking Locations}.
2f4518a1
JD
5116
5117This parser header file is normally essential if you wish to put the
5118definition of @code{yylex} in a separate source file, because
5119@code{yylex} typically needs to be able to refer to the
5120above-mentioned declarations and to the token type codes. @xref{Token
5121Values, ,Semantic Values of Tokens}.
5122
5123@findex %code requires
5124@findex %code provides
5125If you have declared @code{%code requires} or @code{%code provides}, the output
5126header also contains their code.
5127@xref{%code Summary}.
6192d2c6
AD
5128
5129@cindex Header guard
5130The generated header is protected against multiple inclusions with a C
5131preprocessor guard: @samp{YY_@var{PREFIX}_@var{FILE}_INCLUDED}, where
5132@var{PREFIX} and @var{FILE} are the prefix (@pxref{Multiple Parsers,
5133,Multiple Parsers in the Same Program}) and generated file name turned
5134uppercase, with each series of non alphanumerical characters converted to a
5135single underscore.
5136
5137For instance with @samp{%define api.prefix "calc"} and @samp{%defines
5138"lib/parse.h"}, the header will be guarded as follows.
5139@example
5140#ifndef YY_CALC_LIB_PARSE_H_INCLUDED
5141# define YY_CALC_LIB_PARSE_H_INCLUDED
5142...
5143#endif /* ! YY_CALC_LIB_PARSE_H_INCLUDED */
5144@end example
2f4518a1
JD
5145@end deffn
5146
5147@deffn {Directive} %defines @var{defines-file}
5148Same as above, but save in the file @var{defines-file}.
5149@end deffn
5150
5151@deffn {Directive} %destructor
5152Specify how the parser should reclaim the memory associated to
5153discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
5154@end deffn
5155
5156@deffn {Directive} %file-prefix "@var{prefix}"
5157Specify a prefix to use for all Bison output file names. The names
5158are chosen as if the grammar file were named @file{@var{prefix}.y}.
5159@end deffn
5160
5161@deffn {Directive} %language "@var{language}"
5162Specify the programming language for the generated parser. Currently
5163supported languages include C, C++, and Java.
5164@var{language} is case-insensitive.
5165
5166This directive is experimental and its effect may be modified in future
5167releases.
5168@end deffn
5169
5170@deffn {Directive} %locations
5171Generate the code processing the locations (@pxref{Action Features,
5172,Special Features for Use in Actions}). This mode is enabled as soon as
5173the grammar uses the special @samp{@@@var{n}} tokens, but if your
5174grammar does not use it, using @samp{%locations} allows for more
5175accurate syntax error messages.
5176@end deffn
5177
2f4518a1
JD
5178@ifset defaultprec
5179@deffn {Directive} %no-default-prec
5180Do not assign a precedence to rules lacking an explicit @code{%prec}
5181modifier (@pxref{Contextual Precedence, ,Context-Dependent
5182Precedence}).
5183@end deffn
5184@end ifset
5185
5186@deffn {Directive} %no-lines
5187Don't generate any @code{#line} preprocessor commands in the parser
5188implementation file. Ordinarily Bison writes these commands in the
5189parser implementation file so that the C compiler and debuggers will
5190associate errors and object code with your source file (the grammar
5191file). This directive causes them to associate errors with the parser
5192implementation file, treating it as an independent source file in its
5193own right.
5194@end deffn
5195
5196@deffn {Directive} %output "@var{file}"
5197Specify @var{file} for the parser implementation file.
5198@end deffn
5199
5200@deffn {Directive} %pure-parser
5201Deprecated version of @code{%define api.pure} (@pxref{%define
5202Summary,,api.pure}), for which Bison is more careful to warn about
5203unreasonable usage.
5204@end deffn
5205
5206@deffn {Directive} %require "@var{version}"
5207Require version @var{version} or higher of Bison. @xref{Require Decl, ,
5208Require a Version of Bison}.
5209@end deffn
5210
5211@deffn {Directive} %skeleton "@var{file}"
5212Specify the skeleton to use.
5213
5214@c You probably don't need this option unless you are developing Bison.
5215@c You should use @code{%language} if you want to specify the skeleton for a
5216@c different language, because it is clearer and because it will always choose the
5217@c correct skeleton for non-deterministic or push parsers.
5218
5219If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
5220file in the Bison installation directory.
5221If it does, @var{file} is an absolute file name or a file name relative to the
5222directory of the grammar file.
5223This is similar to how most shells resolve commands.
5224@end deffn
5225
5226@deffn {Directive} %token-table
5227Generate an array of token names in the parser implementation file.
5228The name of the array is @code{yytname}; @code{yytname[@var{i}]} is
5229the name of the token whose internal Bison token code number is
5230@var{i}. The first three elements of @code{yytname} correspond to the
5231predefined tokens @code{"$end"}, @code{"error"}, and
5232@code{"$undefined"}; after these come the symbols defined in the
5233grammar file.
5234
5235The name in the table includes all the characters needed to represent
5236the token in Bison. For single-character literals and literal
5237strings, this includes the surrounding quoting characters and any
5238escape sequences. For example, the Bison single-character literal
5239@code{'+'} corresponds to a three-character name, represented in C as
5240@code{"'+'"}; and the Bison two-character literal string @code{"\\/"}
5241corresponds to a five-character name, represented in C as
5242@code{"\"\\\\/\""}.
5243
5244When you specify @code{%token-table}, Bison also generates macro
5245definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
5246@code{YYNRULES}, and @code{YYNSTATES}:
5247
5248@table @code
5249@item YYNTOKENS
5250The highest token number, plus one.
5251@item YYNNTS
5252The number of nonterminal symbols.
5253@item YYNRULES
5254The number of grammar rules,
5255@item YYNSTATES
5256The number of parser states (@pxref{Parser States}).
5257@end table
5258@end deffn
5259
5260@deffn {Directive} %verbose
5261Write an extra output file containing verbose descriptions of the
5262parser states and what is done for each type of lookahead token in
5263that state. @xref{Understanding, , Understanding Your Parser}, for more
5264information.
5265@end deffn
5266
5267@deffn {Directive} %yacc
5268Pretend the option @option{--yacc} was given, i.e., imitate Yacc,
5269including its naming conventions. @xref{Bison Options}, for more.
5270@end deffn
5271
5272
5273@node %define Summary
5274@subsection %define Summary
406dec82
JD
5275
5276There are many features of Bison's behavior that can be controlled by
5277assigning the feature a single value. For historical reasons, some
5278such features are assigned values by dedicated directives, such as
5279@code{%start}, which assigns the start symbol. However, newer such
5280features are associated with variables, which are assigned by the
5281@code{%define} directive:
5282
c1d19e10 5283@deffn {Directive} %define @var{variable}
f37495f6 5284@deffnx {Directive} %define @var{variable} @var{value}
c1d19e10 5285@deffnx {Directive} %define @var{variable} "@var{value}"
406dec82 5286Define @var{variable} to @var{value}.
9611cfa2 5287
406dec82
JD
5288@var{value} must be placed in quotation marks if it contains any
5289character other than a letter, underscore, period, or non-initial dash
5290or digit. Omitting @code{"@var{value}"} entirely is always equivalent
5291to specifying @code{""}.
9611cfa2 5292
406dec82
JD
5293It is an error if a @var{variable} is defined by @code{%define}
5294multiple times, but see @ref{Bison Options,,-D
5295@var{name}[=@var{value}]}.
5296@end deffn
f37495f6 5297
406dec82
JD
5298The rest of this section summarizes variables and values that
5299@code{%define} accepts.
9611cfa2 5300
406dec82
JD
5301Some @var{variable}s take Boolean values. In this case, Bison will
5302complain if the variable definition does not meet one of the following
5303four conditions:
9611cfa2
JD
5304
5305@enumerate
f37495f6 5306@item @code{@var{value}} is @code{true}
9611cfa2 5307
f37495f6
JD
5308@item @code{@var{value}} is omitted (or @code{""} is specified).
5309This is equivalent to @code{true}.
9611cfa2 5310
f37495f6 5311@item @code{@var{value}} is @code{false}.
9611cfa2
JD
5312
5313@item @var{variable} is never defined.
628be6c9 5314In this case, Bison selects a default value.
9611cfa2 5315@end enumerate
148d66d8 5316
628be6c9
JD
5317What @var{variable}s are accepted, as well as their meanings and default
5318values, depend on the selected target language and/or the parser
5319skeleton (@pxref{Decl Summary,,%language}, @pxref{Decl
5320Summary,,%skeleton}).
5321Unaccepted @var{variable}s produce an error.
793fbca5
JD
5322Some of the accepted @var{variable}s are:
5323
5324@itemize @bullet
4b3847c3
AD
5325@c ================================================== api.prefix
5326@item @code{api.prefix}
5327@findex %define api.prefix
5328
5329@itemize @bullet
5330@item Language(s): All
5331
5332@item Purpose: Rename exported symbols
5333@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
5334
5335@item Accepted Values: String
5336
5337@item Default Value: @code{yy}
e358222b
AD
5338
5339@item History: introduced in Bison 2.6
4b3847c3
AD
5340@end itemize
5341
ea118b72 5342@c ================================================== api.pure
4b3847c3 5343@item @code{api.pure}
d9df47b6
JD
5344@findex %define api.pure
5345
5346@itemize @bullet
5347@item Language(s): C
5348
5349@item Purpose: Request a pure (reentrant) parser program.
5350@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
5351
5352@item Accepted Values: Boolean
5353
f37495f6 5354@item Default Value: @code{false}
d9df47b6
JD
5355@end itemize
5356
4b3847c3
AD
5357@c ================================================== api.push-pull
5358
5359@item @code{api.push-pull}
812775a0 5360@findex %define api.push-pull
793fbca5
JD
5361
5362@itemize @bullet
34a6c2d1 5363@item Language(s): C (deterministic parsers only)
793fbca5 5364
3b1977ea 5365@item Purpose: Request a pull parser, a push parser, or both.
d782395d 5366@xref{Push Decl, ,A Push Parser}.
59da312b
JD
5367(The current push parsing interface is experimental and may evolve.
5368More user feedback will help to stabilize it.)
793fbca5 5369
f37495f6 5370@item Accepted Values: @code{pull}, @code{push}, @code{both}
793fbca5 5371
f37495f6 5372@item Default Value: @code{pull}
793fbca5
JD
5373@end itemize
5374
232be91a
AD
5375@c ================================================== lr.default-reductions
5376
4b3847c3 5377@item @code{lr.default-reductions}
1d0f55cc 5378@findex %define lr.default-reductions
34a6c2d1
JD
5379
5380@itemize @bullet
5381@item Language(s): all
5382
4c38b19e 5383@item Purpose: Specify the kind of states that are permitted to
6f04ee6c
JD
5384contain default reductions. @xref{Default Reductions}. (The ability to
5385specify where default reductions should be used is experimental. More user
5386feedback will help to stabilize it.)
34a6c2d1 5387
a6e5a280 5388@item Accepted Values: @code{most}, @code{consistent}, @code{accepting}
34a6c2d1
JD
5389@item Default Value:
5390@itemize
f37495f6 5391@item @code{accepting} if @code{lr.type} is @code{canonical-lr}.
a6e5a280 5392@item @code{most} otherwise.
34a6c2d1
JD
5393@end itemize
5394@end itemize
5395
232be91a
AD
5396@c ============================================ lr.keep-unreachable-states
5397
4b3847c3 5398@item @code{lr.keep-unreachable-states}
812775a0 5399@findex %define lr.keep-unreachable-states
31984206
JD
5400
5401@itemize @bullet
5402@item Language(s): all
3b1977ea 5403@item Purpose: Request that Bison allow unreachable parser states to
6f04ee6c 5404remain in the parser tables. @xref{Unreachable States}.
31984206 5405@item Accepted Values: Boolean
f37495f6 5406@item Default Value: @code{false}
31984206
JD
5407@end itemize
5408
232be91a
AD
5409@c ================================================== lr.type
5410
4b3847c3 5411@item @code{lr.type}
34a6c2d1 5412@findex %define lr.type
34a6c2d1
JD
5413
5414@itemize @bullet
5415@item Language(s): all
5416
3b1977ea 5417@item Purpose: Specify the type of parser tables within the
6f04ee6c 5418LR(1) family. @xref{LR Table Construction}. (This feature is experimental.
34a6c2d1
JD
5419More user feedback will help to stabilize it.)
5420
6f04ee6c 5421@item Accepted Values: @code{lalr}, @code{ielr}, @code{canonical-lr}
34a6c2d1 5422
f37495f6 5423@item Default Value: @code{lalr}
34a6c2d1
JD
5424@end itemize
5425
4b3847c3
AD
5426@c ================================================== namespace
5427
5428@item @code{namespace}
793fbca5
JD
5429@findex %define namespace
5430
5431@itemize
5432@item Languages(s): C++
5433
3b1977ea 5434@item Purpose: Specify the namespace for the parser class.
793fbca5
JD
5435For example, if you specify:
5436
5437@smallexample
5438%define namespace "foo::bar"
5439@end smallexample
5440
5441Bison uses @code{foo::bar} verbatim in references such as:
5442
5443@smallexample
5444foo::bar::parser::semantic_type
5445@end smallexample
5446
5447However, to open a namespace, Bison removes any leading @code{::} and then
5448splits on any remaining occurrences:
5449
5450@smallexample
5451namespace foo @{ namespace bar @{
5452 class position;
5453 class location;
5454@} @}
5455@end smallexample
5456
5457@item Accepted Values: Any absolute or relative C++ namespace reference without
5458a trailing @code{"::"}.
5459For example, @code{"foo"} or @code{"::foo::bar"}.
5460
5461@item Default Value: The value specified by @code{%name-prefix}, which defaults
5462to @code{yy}.
5463This usage of @code{%name-prefix} is for backward compatibility and can be
5464confusing since @code{%name-prefix} also specifies the textual prefix for the
5465lexical analyzer function.
5466Thus, if you specify @code{%name-prefix}, it is best to also specify
5467@code{%define namespace} so that @code{%name-prefix} @emph{only} affects the
5468lexical analyzer function.
5469For example, if you specify:
5470
5471@smallexample
5472%define namespace "foo"
5473%name-prefix "bar::"
5474@end smallexample
5475
5476The parser namespace is @code{foo} and @code{yylex} is referenced as
5477@code{bar::lex}.
5478@end itemize
4c38b19e
JD
5479
5480@c ================================================== parse.lac
4b3847c3 5481@item @code{parse.lac}
4c38b19e 5482@findex %define parse.lac
4c38b19e
JD
5483
5484@itemize
6f04ee6c 5485@item Languages(s): C (deterministic parsers only)
4c38b19e 5486
35430378 5487@item Purpose: Enable LAC (lookahead correction) to improve
6f04ee6c 5488syntax error handling. @xref{LAC}.
4c38b19e 5489@item Accepted Values: @code{none}, @code{full}
4c38b19e
JD
5490@item Default Value: @code{none}
5491@end itemize
793fbca5
JD
5492@end itemize
5493
d8988b2f 5494
8e6f2266
JD
5495@node %code Summary
5496@subsection %code Summary
8e6f2266 5497@findex %code
8e6f2266 5498@cindex Prologue
406dec82
JD
5499
5500The @code{%code} directive inserts code verbatim into the output
5501parser source at any of a predefined set of locations. It thus serves
5502as a flexible and user-friendly alternative to the traditional Yacc
5503prologue, @code{%@{@var{code}%@}}. This section summarizes the
5504functionality of @code{%code} for the various target languages
5505supported by Bison. For a detailed discussion of how to use
5506@code{%code} in place of @code{%@{@var{code}%@}} for C/C++ and why it
5507is advantageous to do so, @pxref{Prologue Alternatives}.
5508
5509@deffn {Directive} %code @{@var{code}@}
5510This is the unqualified form of the @code{%code} directive. It
5511inserts @var{code} verbatim at a language-dependent default location
5512in the parser implementation.
5513
8e6f2266 5514For C/C++, the default location is the parser implementation file
406dec82
JD
5515after the usual contents of the parser header file. Thus, the
5516unqualified form replaces @code{%@{@var{code}%@}} for most purposes.
8e6f2266
JD
5517
5518For Java, the default location is inside the parser class.
5519@end deffn
5520
5521@deffn {Directive} %code @var{qualifier} @{@var{code}@}
5522This is the qualified form of the @code{%code} directive.
406dec82
JD
5523@var{qualifier} identifies the purpose of @var{code} and thus the
5524location(s) where Bison should insert it. That is, if you need to
5525specify location-sensitive @var{code} that does not belong at the
5526default location selected by the unqualified @code{%code} form, use
5527this form instead.
5528@end deffn
5529
5530For any particular qualifier or for the unqualified form, if there are
5531multiple occurrences of the @code{%code} directive, Bison concatenates
5532the specified code in the order in which it appears in the grammar
5533file.
8e6f2266 5534
406dec82
JD
5535Not all qualifiers are accepted for all target languages. Unaccepted
5536qualifiers produce an error. Some of the accepted qualifiers are:
8e6f2266
JD
5537
5538@itemize @bullet
5539@item requires
5540@findex %code requires
5541
5542@itemize @bullet
5543@item Language(s): C, C++
5544
5545@item Purpose: This is the best place to write dependency code required for
5546@code{YYSTYPE} and @code{YYLTYPE}.
5547In other words, it's the best place to define types referenced in @code{%union}
5548directives, and it's the best place to override Bison's default @code{YYSTYPE}
5549and @code{YYLTYPE} definitions.
5550
5551@item Location(s): The parser header file and the parser implementation file
5552before the Bison-generated @code{YYSTYPE} and @code{YYLTYPE}
5553definitions.
5554@end itemize
5555
5556@item provides
5557@findex %code provides
5558
5559@itemize @bullet
5560@item Language(s): C, C++
5561
5562@item Purpose: This is the best place to write additional definitions and
5563declarations that should be provided to other modules.
5564
5565@item Location(s): The parser header file and the parser implementation
5566file after the Bison-generated @code{YYSTYPE}, @code{YYLTYPE}, and
5567token definitions.
5568@end itemize
5569
5570@item top
5571@findex %code top
5572
5573@itemize @bullet
5574@item Language(s): C, C++
5575
5576@item Purpose: The unqualified @code{%code} or @code{%code requires}
5577should usually be more appropriate than @code{%code top}. However,
5578occasionally it is necessary to insert code much nearer the top of the
5579parser implementation file. For example:
5580
ea118b72 5581@example
8e6f2266
JD
5582%code top @{
5583 #define _GNU_SOURCE
5584 #include <stdio.h>
5585@}
ea118b72 5586@end example
8e6f2266
JD
5587
5588@item Location(s): Near the top of the parser implementation file.
5589@end itemize
5590
5591@item imports
5592@findex %code imports
5593
5594@itemize @bullet
5595@item Language(s): Java
5596
5597@item Purpose: This is the best place to write Java import directives.
5598
5599@item Location(s): The parser Java file after any Java package directive and
5600before any class definitions.
5601@end itemize
5602@end itemize
5603
406dec82
JD
5604Though we say the insertion locations are language-dependent, they are
5605technically skeleton-dependent. Writers of non-standard skeletons
5606however should choose their locations consistently with the behavior
5607of the standard Bison skeletons.
8e6f2266 5608
d8988b2f 5609
342b8b6e 5610@node Multiple Parsers
bfa74976
RS
5611@section Multiple Parsers in the Same Program
5612
5613Most programs that use Bison parse only one language and therefore contain
4b3847c3
AD
5614only one Bison parser. But what if you want to parse more than one language
5615with the same program? Then you need to avoid name conflicts between
5616different definitions of functions and variables such as @code{yyparse},
5617@code{yylval}. To use different parsers from the same compilation unit, you
5618also need to avoid conflicts on types and macros (e.g., @code{YYSTYPE})
5619exported in the generated header.
5620
5621The easy way to do this is to define the @code{%define} variable
e358222b
AD
5622@code{api.prefix}. With different @code{api.prefix}s it is guaranteed that
5623headers do not conflict when included together, and that compiled objects
5624can be linked together too. Specifying @samp{%define api.prefix
5625@var{prefix}} (or passing the option @samp{-Dapi.prefix=@var{prefix}}, see
5626@ref{Invocation, ,Invoking Bison}) renames the interface functions and
5627variables of the Bison parser to start with @var{prefix} instead of
5628@samp{yy}, and all the macros to start by @var{PREFIX} (i.e., @var{prefix}
5629upper-cased) instead of @samp{YY}.
4b3847c3
AD
5630
5631The renamed symbols include @code{yyparse}, @code{yylex}, @code{yyerror},
5632@code{yynerrs}, @code{yylval}, @code{yylloc}, @code{yychar} and
5633@code{yydebug}. If you use a push parser, @code{yypush_parse},
5634@code{yypull_parse}, @code{yypstate}, @code{yypstate_new} and
5635@code{yypstate_delete} will also be renamed. The renamed macros include
e358222b
AD
5636@code{YYSTYPE}, @code{YYLTYPE}, and @code{YYDEBUG}, which is treated
5637specifically --- more about this below.
4b3847c3
AD
5638
5639For example, if you use @samp{%define api.prefix c}, the names become
5640@code{cparse}, @code{clex}, @dots{}, @code{CSTYPE}, @code{CLTYPE}, and so
5641on.
5642
5643The @code{%define} variable @code{api.prefix} works in two different ways.
5644In the implementation file, it works by adding macro definitions to the
5645beginning of the parser implementation file, defining @code{yyparse} as
5646@code{@var{prefix}parse}, and so on:
5647
5648@example
5649#define YYSTYPE CTYPE
5650#define yyparse cparse
5651#define yylval clval
5652...
5653YYSTYPE yylval;
5654int yyparse (void);
5655@end example
5656
5657This effectively substitutes one name for the other in the entire parser
5658implementation file, thus the ``original'' names (@code{yylex},
5659@code{YYSTYPE}, @dots{}) are also usable in the parser implementation file.
5660
5661However, in the parser header file, the symbols are defined renamed, for
5662instance:
5663
5664@example
5665extern CSTYPE clval;
5666int cparse (void);
5667@end example
5668
e358222b
AD
5669The macro @code{YYDEBUG} is commonly used to enable the tracing support in
5670parsers. To comply with this tradition, when @code{api.prefix} is used,
5671@code{YYDEBUG} (not renamed) is used as a default value:
5672
5673@example
5674/* Enabling traces. */
5675#ifndef CDEBUG
5676# if defined YYDEBUG
5677# if YYDEBUG
5678# define CDEBUG 1
5679# else
5680# define CDEBUG 0
5681# endif
5682# else
5683# define CDEBUG 0
5684# endif
5685#endif
5686#if CDEBUG
5687extern int cdebug;
5688#endif
5689@end example
5690
5691@sp 2
5692
5693Prior to Bison 2.6, a feature similar to @code{api.prefix} was provided by
5694the obsolete directive @code{%name-prefix} (@pxref{Table of Symbols, ,Bison
5695Symbols}) and the option @code{--name-prefix} (@pxref{Bison Options}).
bfa74976 5696
342b8b6e 5697@node Interface
bfa74976
RS
5698@chapter Parser C-Language Interface
5699@cindex C-language interface
5700@cindex interface
5701
5702The Bison parser is actually a C function named @code{yyparse}. Here we
5703describe the interface conventions of @code{yyparse} and the other
5704functions that it needs to use.
5705
5706Keep in mind that the parser uses many C identifiers starting with
5707@samp{yy} and @samp{YY} for internal purposes. If you use such an
75f5aaea
MA
5708identifier (aside from those in this manual) in an action or in epilogue
5709in the grammar file, you are likely to run into trouble.
bfa74976
RS
5710
5711@menu
f56274a8
DJ
5712* Parser Function:: How to call @code{yyparse} and what it returns.
5713* Push Parser Function:: How to call @code{yypush_parse} and what it returns.
5714* Pull Parser Function:: How to call @code{yypull_parse} and what it returns.
5715* Parser Create Function:: How to call @code{yypstate_new} and what it returns.
5716* Parser Delete Function:: How to call @code{yypstate_delete} and what it returns.
5717* Lexical:: You must supply a function @code{yylex}
5718 which reads tokens.
5719* Error Reporting:: You must supply a function @code{yyerror}.
5720* Action Features:: Special features for use in actions.
5721* Internationalization:: How to let the parser speak in the user's
5722 native language.
bfa74976
RS
5723@end menu
5724
342b8b6e 5725@node Parser Function
bfa74976
RS
5726@section The Parser Function @code{yyparse}
5727@findex yyparse
5728
5729You call the function @code{yyparse} to cause parsing to occur. This
5730function reads tokens, executes actions, and ultimately returns when it
5731encounters end-of-input or an unrecoverable syntax error. You can also
14ded682
AD
5732write an action which directs @code{yyparse} to return immediately
5733without reading further.
bfa74976 5734
2a8d363a
AD
5735
5736@deftypefun int yyparse (void)
bfa74976
RS
5737The value returned by @code{yyparse} is 0 if parsing was successful (return
5738is due to end-of-input).
5739
b47dbebe
PE
5740The value is 1 if parsing failed because of invalid input, i.e., input
5741that contains a syntax error or that causes @code{YYABORT} to be
5742invoked.
5743
5744The value is 2 if parsing failed due to memory exhaustion.
2a8d363a 5745@end deftypefun
bfa74976
RS
5746
5747In an action, you can cause immediate return from @code{yyparse} by using
5748these macros:
5749
2a8d363a 5750@defmac YYACCEPT
bfa74976
RS
5751@findex YYACCEPT
5752Return immediately with value 0 (to report success).
2a8d363a 5753@end defmac
bfa74976 5754
2a8d363a 5755@defmac YYABORT
bfa74976
RS
5756@findex YYABORT
5757Return immediately with value 1 (to report failure).
2a8d363a
AD
5758@end defmac
5759
5760If you use a reentrant parser, you can optionally pass additional
5761parameter information to it in a reentrant way. To do so, use the
5762declaration @code{%parse-param}:
5763
feeb0eda 5764@deffn {Directive} %parse-param @{@var{argument-declaration}@}
2a8d363a 5765@findex %parse-param
287c78f6
PE
5766Declare that an argument declared by the braced-code
5767@var{argument-declaration} is an additional @code{yyparse} argument.
94175978 5768The @var{argument-declaration} is used when declaring
feeb0eda
PE
5769functions or prototypes. The last identifier in
5770@var{argument-declaration} must be the argument name.
2a8d363a
AD
5771@end deffn
5772
5773Here's an example. Write this in the parser:
5774
5775@example
feeb0eda
PE
5776%parse-param @{int *nastiness@}
5777%parse-param @{int *randomness@}
2a8d363a
AD
5778@end example
5779
5780@noindent
5781Then call the parser like this:
5782
5783@example
5784@{
5785 int nastiness, randomness;
5786 @dots{} /* @r{Store proper data in @code{nastiness} and @code{randomness}.} */
5787 value = yyparse (&nastiness, &randomness);
5788 @dots{}
5789@}
5790@end example
5791
5792@noindent
5793In the grammar actions, use expressions like this to refer to the data:
5794
5795@example
5796exp: @dots{} @{ @dots{}; *randomness += 1; @dots{} @}
5797@end example
5798
9987d1b3
JD
5799@node Push Parser Function
5800@section The Push Parser Function @code{yypush_parse}
5801@findex yypush_parse
5802
59da312b
JD
5803(The current push parsing interface is experimental and may evolve.
5804More user feedback will help to stabilize it.)
5805
f4101aa6 5806You call the function @code{yypush_parse} to parse a single token. This
f37495f6
JD
5807function is available if either the @code{%define api.push-pull push} or
5808@code{%define api.push-pull both} declaration is used.
9987d1b3
JD
5809@xref{Push Decl, ,A Push Parser}.
5810
5811@deftypefun int yypush_parse (yypstate *yyps)
ad60e80f
AD
5812The value returned by @code{yypush_parse} is the same as for yyparse with
5813the following exception: it returns @code{YYPUSH_MORE} if more input is
5814required to finish parsing the grammar.
9987d1b3
JD
5815@end deftypefun
5816
5817@node Pull Parser Function
5818@section The Pull Parser Function @code{yypull_parse}
5819@findex yypull_parse
5820
59da312b
JD
5821(The current push parsing interface is experimental and may evolve.
5822More user feedback will help to stabilize it.)
5823
f4101aa6 5824You call the function @code{yypull_parse} to parse the rest of the input
f37495f6 5825stream. This function is available if the @code{%define api.push-pull both}
f4101aa6 5826declaration is used.
9987d1b3
JD
5827@xref{Push Decl, ,A Push Parser}.
5828
5829@deftypefun int yypull_parse (yypstate *yyps)
5830The value returned by @code{yypull_parse} is the same as for @code{yyparse}.
5831@end deftypefun
5832
5833@node Parser Create Function
5834@section The Parser Create Function @code{yystate_new}
5835@findex yypstate_new
5836
59da312b
JD
5837(The current push parsing interface is experimental and may evolve.
5838More user feedback will help to stabilize it.)
5839
f4101aa6 5840You call the function @code{yypstate_new} to create a new parser instance.
f37495f6
JD
5841This function is available if either the @code{%define api.push-pull push} or
5842@code{%define api.push-pull both} declaration is used.
9987d1b3
JD
5843@xref{Push Decl, ,A Push Parser}.
5844
34a41a93 5845@deftypefun {yypstate*} yypstate_new (void)
c781580d 5846The function will return a valid parser instance if there was memory available
333e670c
JD
5847or 0 if no memory was available.
5848In impure mode, it will also return 0 if a parser instance is currently
5849allocated.
9987d1b3
JD
5850@end deftypefun
5851
5852@node Parser Delete Function
5853@section The Parser Delete Function @code{yystate_delete}
5854@findex yypstate_delete
5855
59da312b
JD
5856(The current push parsing interface is experimental and may evolve.
5857More user feedback will help to stabilize it.)
5858
9987d1b3 5859You call the function @code{yypstate_delete} to delete a parser instance.
f37495f6
JD
5860function is available if either the @code{%define api.push-pull push} or
5861@code{%define api.push-pull both} declaration is used.
9987d1b3
JD
5862@xref{Push Decl, ,A Push Parser}.
5863
5864@deftypefun void yypstate_delete (yypstate *yyps)
5865This function will reclaim the memory associated with a parser instance.
5866After this call, you should no longer attempt to use the parser instance.
5867@end deftypefun
bfa74976 5868
342b8b6e 5869@node Lexical
bfa74976
RS
5870@section The Lexical Analyzer Function @code{yylex}
5871@findex yylex
5872@cindex lexical analyzer
5873
5874The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
5875the input stream and returns them to the parser. Bison does not create
5876this function automatically; you must write it so that @code{yyparse} can
5877call it. The function is sometimes referred to as a lexical scanner.
5878
9913d6e4
JD
5879In simple programs, @code{yylex} is often defined at the end of the
5880Bison grammar file. If @code{yylex} is defined in a separate source
5881file, you need to arrange for the token-type macro definitions to be
5882available there. To do this, use the @samp{-d} option when you run
5883Bison, so that it will write these macro definitions into the separate
5884parser header file, @file{@var{name}.tab.h}, which you can include in
5885the other source files that need it. @xref{Invocation, ,Invoking
5886Bison}.
bfa74976
RS
5887
5888@menu
5889* Calling Convention:: How @code{yyparse} calls @code{yylex}.
f56274a8
DJ
5890* Token Values:: How @code{yylex} must return the semantic value
5891 of the token it has read.
5892* Token Locations:: How @code{yylex} must return the text location
5893 (line number, etc.) of the token, if the
5894 actions want that.
5895* Pure Calling:: How the calling convention differs in a pure parser
5896 (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
bfa74976
RS
5897@end menu
5898
342b8b6e 5899@node Calling Convention
bfa74976
RS
5900@subsection Calling Convention for @code{yylex}
5901
72d2299c
PE
5902The value that @code{yylex} returns must be the positive numeric code
5903for the type of token it has just found; a zero or negative value
5904signifies end-of-input.
bfa74976
RS
5905
5906When a token is referred to in the grammar rules by a name, that name
9913d6e4
JD
5907in the parser implementation file becomes a C macro whose definition
5908is the proper numeric code for that token type. So @code{yylex} can
5909use the name to indicate that type. @xref{Symbols}.
bfa74976
RS
5910
5911When a token is referred to in the grammar rules by a character literal,
5912the numeric code for that character is also the code for the token type.
72d2299c
PE
5913So @code{yylex} can simply return that character code, possibly converted
5914to @code{unsigned char} to avoid sign-extension. The null character
5915must not be used this way, because its code is zero and that
bfa74976
RS
5916signifies end-of-input.
5917
5918Here is an example showing these things:
5919
5920@example
13863333
AD
5921int
5922yylex (void)
bfa74976
RS
5923@{
5924 @dots{}
72d2299c 5925 if (c == EOF) /* Detect end-of-input. */
bfa74976
RS
5926 return 0;
5927 @dots{}
5928 if (c == '+' || c == '-')
72d2299c 5929 return c; /* Assume token type for `+' is '+'. */
bfa74976 5930 @dots{}
72d2299c 5931 return INT; /* Return the type of the token. */
bfa74976
RS
5932 @dots{}
5933@}
5934@end example
5935
5936@noindent
5937This interface has been designed so that the output from the @code{lex}
5938utility can be used without change as the definition of @code{yylex}.
5939
931c7513
RS
5940If the grammar uses literal string tokens, there are two ways that
5941@code{yylex} can determine the token type codes for them:
5942
5943@itemize @bullet
5944@item
5945If the grammar defines symbolic token names as aliases for the
5946literal string tokens, @code{yylex} can use these symbolic names like
5947all others. In this case, the use of the literal string tokens in
5948the grammar file has no effect on @code{yylex}.
5949
5950@item
9ecbd125 5951@code{yylex} can find the multicharacter token in the @code{yytname}
931c7513 5952table. The index of the token in the table is the token type's code.
9ecbd125 5953The name of a multicharacter token is recorded in @code{yytname} with a
931c7513 5954double-quote, the token's characters, and another double-quote. The
9e0876fb
PE
5955token's characters are escaped as necessary to be suitable as input
5956to Bison.
931c7513 5957
9e0876fb
PE
5958Here's code for looking up a multicharacter token in @code{yytname},
5959assuming that the characters of the token are stored in
5960@code{token_buffer}, and assuming that the token does not contain any
5961characters like @samp{"} that require escaping.
931c7513 5962
ea118b72 5963@example
931c7513
RS
5964for (i = 0; i < YYNTOKENS; i++)
5965 @{
5966 if (yytname[i] != 0
5967 && yytname[i][0] == '"'
68449b3a
PE
5968 && ! strncmp (yytname[i] + 1, token_buffer,
5969 strlen (token_buffer))
931c7513
RS
5970 && yytname[i][strlen (token_buffer) + 1] == '"'
5971 && yytname[i][strlen (token_buffer) + 2] == 0)
5972 break;
5973 @}
ea118b72 5974@end example
931c7513
RS
5975
5976The @code{yytname} table is generated only if you use the
8c9a50be 5977@code{%token-table} declaration. @xref{Decl Summary}.
931c7513
RS
5978@end itemize
5979
342b8b6e 5980@node Token Values
bfa74976
RS
5981@subsection Semantic Values of Tokens
5982
5983@vindex yylval
9d9b8b70 5984In an ordinary (nonreentrant) parser, the semantic value of the token must
bfa74976
RS
5985be stored into the global variable @code{yylval}. When you are using
5986just one data type for semantic values, @code{yylval} has that type.
5987Thus, if the type is @code{int} (the default), you might write this in
5988@code{yylex}:
5989
5990@example
5991@group
5992 @dots{}
72d2299c
PE
5993 yylval = value; /* Put value onto Bison stack. */
5994 return INT; /* Return the type of the token. */
bfa74976
RS
5995 @dots{}
5996@end group
5997@end example
5998
5999When you are using multiple data types, @code{yylval}'s type is a union
704a47c4
AD
6000made from the @code{%union} declaration (@pxref{Union Decl, ,The
6001Collection of Value Types}). So when you store a token's value, you
6002must use the proper member of the union. If the @code{%union}
6003declaration looks like this:
bfa74976
RS
6004
6005@example
6006@group
6007%union @{
6008 int intval;
6009 double val;
6010 symrec *tptr;
6011@}
6012@end group
6013@end example
6014
6015@noindent
6016then the code in @code{yylex} might look like this:
6017
6018@example
6019@group
6020 @dots{}
72d2299c
PE
6021 yylval.intval = value; /* Put value onto Bison stack. */
6022 return INT; /* Return the type of the token. */
bfa74976
RS
6023 @dots{}
6024@end group
6025@end example
6026
95923bd6
AD
6027@node Token Locations
6028@subsection Textual Locations of Tokens
bfa74976
RS
6029
6030@vindex yylloc
7404cdf3
JD
6031If you are using the @samp{@@@var{n}}-feature (@pxref{Tracking Locations})
6032in actions to keep track of the textual locations of tokens and groupings,
6033then you must provide this information in @code{yylex}. The function
6034@code{yyparse} expects to find the textual location of a token just parsed
6035in the global variable @code{yylloc}. So @code{yylex} must store the proper
6036data in that variable.
847bf1f5
AD
6037
6038By default, the value of @code{yylloc} is a structure and you need only
89cab50d
AD
6039initialize the members that are going to be used by the actions. The
6040four members are called @code{first_line}, @code{first_column},
6041@code{last_line} and @code{last_column}. Note that the use of this
6042feature makes the parser noticeably slower.
bfa74976
RS
6043
6044@tindex YYLTYPE
6045The data type of @code{yylloc} has the name @code{YYLTYPE}.
6046
342b8b6e 6047@node Pure Calling
c656404a 6048@subsection Calling Conventions for Pure Parsers
bfa74976 6049
d9df47b6 6050When you use the Bison declaration @code{%define api.pure} to request a
e425e872
RS
6051pure, reentrant parser, the global communication variables @code{yylval}
6052and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
6053Parser}.) In such parsers the two global variables are replaced by
6054pointers passed as arguments to @code{yylex}. You must declare them as
6055shown here, and pass the information back by storing it through those
6056pointers.
bfa74976
RS
6057
6058@example
13863333
AD
6059int
6060yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
bfa74976
RS
6061@{
6062 @dots{}
6063 *lvalp = value; /* Put value onto Bison stack. */
6064 return INT; /* Return the type of the token. */
6065 @dots{}
6066@}
6067@end example
6068
6069If the grammar file does not use the @samp{@@} constructs to refer to
95923bd6 6070textual locations, then the type @code{YYLTYPE} will not be defined. In
bfa74976
RS
6071this case, omit the second argument; @code{yylex} will be called with
6072only one argument.
6073
e425e872 6074
2a8d363a
AD
6075If you wish to pass the additional parameter data to @code{yylex}, use
6076@code{%lex-param} just like @code{%parse-param} (@pxref{Parser
6077Function}).
e425e872 6078
feeb0eda 6079@deffn {Directive} lex-param @{@var{argument-declaration}@}
2a8d363a 6080@findex %lex-param
287c78f6
PE
6081Declare that the braced-code @var{argument-declaration} is an
6082additional @code{yylex} argument declaration.
2a8d363a 6083@end deffn
e425e872 6084
2a8d363a 6085For instance:
e425e872
RS
6086
6087@example
feeb0eda
PE
6088%parse-param @{int *nastiness@}
6089%lex-param @{int *nastiness@}
6090%parse-param @{int *randomness@}
e425e872
RS
6091@end example
6092
6093@noindent
18ad57b3 6094results in the following signatures:
e425e872
RS
6095
6096@example
2a8d363a
AD
6097int yylex (int *nastiness);
6098int yyparse (int *nastiness, int *randomness);
e425e872
RS
6099@end example
6100
d9df47b6 6101If @code{%define api.pure} is added:
c656404a
RS
6102
6103@example
2a8d363a
AD
6104int yylex (YYSTYPE *lvalp, int *nastiness);
6105int yyparse (int *nastiness, int *randomness);
c656404a
RS
6106@end example
6107
2a8d363a 6108@noindent
d9df47b6 6109and finally, if both @code{%define api.pure} and @code{%locations} are used:
c656404a 6110
2a8d363a
AD
6111@example
6112int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
6113int yyparse (int *nastiness, int *randomness);
6114@end example
931c7513 6115
342b8b6e 6116@node Error Reporting
bfa74976
RS
6117@section The Error Reporting Function @code{yyerror}
6118@cindex error reporting function
6119@findex yyerror
6120@cindex parse error
6121@cindex syntax error
6122
6e649e65 6123The Bison parser detects a @dfn{syntax error} or @dfn{parse error}
9ecbd125 6124whenever it reads a token which cannot satisfy any syntax rule. An
bfa74976 6125action in the grammar can also explicitly proclaim an error, using the
ceed8467
AD
6126macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use
6127in Actions}).
bfa74976
RS
6128
6129The Bison parser expects to report the error by calling an error
6130reporting function named @code{yyerror}, which you must supply. It is
6131called by @code{yyparse} whenever a syntax error is found, and it
6e649e65
PE
6132receives one argument. For a syntax error, the string is normally
6133@w{@code{"syntax error"}}.
bfa74976 6134
2a8d363a 6135@findex %error-verbose
6f04ee6c
JD
6136If you invoke the directive @code{%error-verbose} in the Bison declarations
6137section (@pxref{Bison Declarations, ,The Bison Declarations Section}), then
6138Bison provides a more verbose and specific error message string instead of
6139just plain @w{@code{"syntax error"}}. However, that message sometimes
6140contains incorrect information if LAC is not enabled (@pxref{LAC}).
bfa74976 6141
1a059451
PE
6142The parser can detect one other kind of error: memory exhaustion. This
6143can happen when the input contains constructions that are very deeply
bfa74976 6144nested. It isn't likely you will encounter this, since the Bison
1a059451
PE
6145parser normally extends its stack automatically up to a very large limit. But
6146if memory is exhausted, @code{yyparse} calls @code{yyerror} in the usual
6147fashion, except that the argument string is @w{@code{"memory exhausted"}}.
6148
6149In some cases diagnostics like @w{@code{"syntax error"}} are
6150translated automatically from English to some other language before
6151they are passed to @code{yyerror}. @xref{Internationalization}.
bfa74976
RS
6152
6153The following definition suffices in simple programs:
6154
6155@example
6156@group
13863333 6157void
38a92d50 6158yyerror (char const *s)
bfa74976
RS
6159@{
6160@end group
6161@group
6162 fprintf (stderr, "%s\n", s);
6163@}
6164@end group
6165@end example
6166
6167After @code{yyerror} returns to @code{yyparse}, the latter will attempt
6168error recovery if you have written suitable error recovery grammar rules
6169(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
6170immediately return 1.
6171
93724f13 6172Obviously, in location tracking pure parsers, @code{yyerror} should have
fa7e68c3 6173an access to the current location.
35430378 6174This is indeed the case for the GLR
2a8d363a 6175parsers, but not for the Yacc parser, for historical reasons. I.e., if
d9df47b6 6176@samp{%locations %define api.pure} is passed then the prototypes for
2a8d363a
AD
6177@code{yyerror} are:
6178
6179@example
38a92d50
PE
6180void yyerror (char const *msg); /* Yacc parsers. */
6181void yyerror (YYLTYPE *locp, char const *msg); /* GLR parsers. */
2a8d363a
AD
6182@end example
6183
feeb0eda 6184If @samp{%parse-param @{int *nastiness@}} is used, then:
2a8d363a
AD
6185
6186@example
b317297e
PE
6187void yyerror (int *nastiness, char const *msg); /* Yacc parsers. */
6188void yyerror (int *nastiness, char const *msg); /* GLR parsers. */
2a8d363a
AD
6189@end example
6190
35430378 6191Finally, GLR and Yacc parsers share the same @code{yyerror} calling
2a8d363a
AD
6192convention for absolutely pure parsers, i.e., when the calling
6193convention of @code{yylex} @emph{and} the calling convention of
d9df47b6
JD
6194@code{%define api.pure} are pure.
6195I.e.:
2a8d363a
AD
6196
6197@example
6198/* Location tracking. */
6199%locations
6200/* Pure yylex. */
d9df47b6 6201%define api.pure
feeb0eda 6202%lex-param @{int *nastiness@}
2a8d363a 6203/* Pure yyparse. */
feeb0eda
PE
6204%parse-param @{int *nastiness@}
6205%parse-param @{int *randomness@}
2a8d363a
AD
6206@end example
6207
6208@noindent
6209results in the following signatures for all the parser kinds:
6210
6211@example
6212int yylex (YYSTYPE *lvalp, YYLTYPE *llocp, int *nastiness);
6213int yyparse (int *nastiness, int *randomness);
93724f13
AD
6214void yyerror (YYLTYPE *locp,
6215 int *nastiness, int *randomness,
38a92d50 6216 char const *msg);
2a8d363a
AD
6217@end example
6218
1c0c3e95 6219@noindent
38a92d50
PE
6220The prototypes are only indications of how the code produced by Bison
6221uses @code{yyerror}. Bison-generated code always ignores the returned
6222value, so @code{yyerror} can return any type, including @code{void}.
6223Also, @code{yyerror} can be a variadic function; that is why the
6224message is always passed last.
6225
6226Traditionally @code{yyerror} returns an @code{int} that is always
6227ignored, but this is purely for historical reasons, and @code{void} is
6228preferable since it more accurately describes the return type for
6229@code{yyerror}.
93724f13 6230
bfa74976
RS
6231@vindex yynerrs
6232The variable @code{yynerrs} contains the number of syntax errors
8a2800e7 6233reported so far. Normally this variable is global; but if you
704a47c4
AD
6234request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser})
6235then it is a local variable which only the actions can access.
bfa74976 6236
342b8b6e 6237@node Action Features
bfa74976
RS
6238@section Special Features for Use in Actions
6239@cindex summary, action features
6240@cindex action features summary
6241
6242Here is a table of Bison constructs, variables and macros that
6243are useful in actions.
6244
18b519c0 6245@deffn {Variable} $$
bfa74976
RS
6246Acts like a variable that contains the semantic value for the
6247grouping made by the current rule. @xref{Actions}.
18b519c0 6248@end deffn
bfa74976 6249
18b519c0 6250@deffn {Variable} $@var{n}
bfa74976
RS
6251Acts like a variable that contains the semantic value for the
6252@var{n}th component of the current rule. @xref{Actions}.
18b519c0 6253@end deffn
bfa74976 6254
18b519c0 6255@deffn {Variable} $<@var{typealt}>$
bfa74976 6256Like @code{$$} but specifies alternative @var{typealt} in the union
704a47c4
AD
6257specified by the @code{%union} declaration. @xref{Action Types, ,Data
6258Types of Values in Actions}.
18b519c0 6259@end deffn
bfa74976 6260
18b519c0 6261@deffn {Variable} $<@var{typealt}>@var{n}
bfa74976 6262Like @code{$@var{n}} but specifies alternative @var{typealt} in the
13863333 6263union specified by the @code{%union} declaration.
e0c471a9 6264@xref{Action Types, ,Data Types of Values in Actions}.
18b519c0 6265@end deffn
bfa74976 6266
34a41a93 6267@deffn {Macro} YYABORT @code{;}
bfa74976
RS
6268Return immediately from @code{yyparse}, indicating failure.
6269@xref{Parser Function, ,The Parser Function @code{yyparse}}.
18b519c0 6270@end deffn
bfa74976 6271
34a41a93 6272@deffn {Macro} YYACCEPT @code{;}
bfa74976
RS
6273Return immediately from @code{yyparse}, indicating success.
6274@xref{Parser Function, ,The Parser Function @code{yyparse}}.
18b519c0 6275@end deffn
bfa74976 6276
34a41a93 6277@deffn {Macro} YYBACKUP (@var{token}, @var{value})@code{;}
bfa74976
RS
6278@findex YYBACKUP
6279Unshift a token. This macro is allowed only for rules that reduce
742e4900 6280a single value, and only when there is no lookahead token.
35430378 6281It is also disallowed in GLR parsers.
742e4900 6282It installs a lookahead token with token type @var{token} and
bfa74976
RS
6283semantic value @var{value}; then it discards the value that was
6284going to be reduced by this rule.
6285
6286If the macro is used when it is not valid, such as when there is
742e4900 6287a lookahead token already, then it reports a syntax error with
bfa74976
RS
6288a message @samp{cannot back up} and performs ordinary error
6289recovery.
6290
6291In either case, the rest of the action is not executed.
18b519c0 6292@end deffn
bfa74976 6293
18b519c0 6294@deffn {Macro} YYEMPTY
742e4900 6295Value stored in @code{yychar} when there is no lookahead token.
18b519c0 6296@end deffn
bfa74976 6297
32c29292 6298@deffn {Macro} YYEOF
742e4900 6299Value stored in @code{yychar} when the lookahead is the end of the input
32c29292
JD
6300stream.
6301@end deffn
6302
34a41a93 6303@deffn {Macro} YYERROR @code{;}
bfa74976
RS
6304Cause an immediate syntax error. This statement initiates error
6305recovery just as if the parser itself had detected an error; however, it
6306does not call @code{yyerror}, and does not print any message. If you
6307want to print an error message, call @code{yyerror} explicitly before
6308the @samp{YYERROR;} statement. @xref{Error Recovery}.
18b519c0 6309@end deffn
bfa74976 6310
18b519c0 6311@deffn {Macro} YYRECOVERING
02103984
PE
6312@findex YYRECOVERING
6313The expression @code{YYRECOVERING ()} yields 1 when the parser
6314is recovering from a syntax error, and 0 otherwise.
bfa74976 6315@xref{Error Recovery}.
18b519c0 6316@end deffn
bfa74976 6317
18b519c0 6318@deffn {Variable} yychar
742e4900
JD
6319Variable containing either the lookahead token, or @code{YYEOF} when the
6320lookahead is the end of the input stream, or @code{YYEMPTY} when no lookahead
32c29292
JD
6321has been performed so the next token is not yet known.
6322Do not modify @code{yychar} in a deferred semantic action (@pxref{GLR Semantic
6323Actions}).
742e4900 6324@xref{Lookahead, ,Lookahead Tokens}.
18b519c0 6325@end deffn
bfa74976 6326
34a41a93 6327@deffn {Macro} yyclearin @code{;}
742e4900 6328Discard the current lookahead token. This is useful primarily in
32c29292
JD
6329error rules.
6330Do not invoke @code{yyclearin} in a deferred semantic action (@pxref{GLR
6331Semantic Actions}).
6332@xref{Error Recovery}.
18b519c0 6333@end deffn
bfa74976 6334
34a41a93 6335@deffn {Macro} yyerrok @code{;}
bfa74976 6336Resume generating error messages immediately for subsequent syntax
13863333 6337errors. This is useful primarily in error rules.
bfa74976 6338@xref{Error Recovery}.
18b519c0 6339@end deffn
bfa74976 6340
32c29292 6341@deffn {Variable} yylloc
742e4900 6342Variable containing the lookahead token location when @code{yychar} is not set
32c29292
JD
6343to @code{YYEMPTY} or @code{YYEOF}.
6344Do not modify @code{yylloc} in a deferred semantic action (@pxref{GLR Semantic
6345Actions}).
6346@xref{Actions and Locations, ,Actions and Locations}.
6347@end deffn
6348
6349@deffn {Variable} yylval
742e4900 6350Variable containing the lookahead token semantic value when @code{yychar} is
32c29292
JD
6351not set to @code{YYEMPTY} or @code{YYEOF}.
6352Do not modify @code{yylval} in a deferred semantic action (@pxref{GLR Semantic
6353Actions}).
6354@xref{Actions, ,Actions}.
6355@end deffn
6356
18b519c0 6357@deffn {Value} @@$
847bf1f5 6358@findex @@$
7404cdf3
JD
6359Acts like a structure variable containing information on the textual
6360location of the grouping made by the current rule. @xref{Tracking
6361Locations}.
bfa74976 6362
847bf1f5
AD
6363@c Check if those paragraphs are still useful or not.
6364
6365@c @example
6366@c struct @{
6367@c int first_line, last_line;
6368@c int first_column, last_column;
6369@c @};
6370@c @end example
6371
6372@c Thus, to get the starting line number of the third component, you would
6373@c use @samp{@@3.first_line}.
bfa74976 6374
847bf1f5
AD
6375@c In order for the members of this structure to contain valid information,
6376@c you must make @code{yylex} supply this information about each token.
6377@c If you need only certain members, then @code{yylex} need only fill in
6378@c those members.
bfa74976 6379
847bf1f5 6380@c The use of this feature makes the parser noticeably slower.
18b519c0 6381@end deffn
847bf1f5 6382
18b519c0 6383@deffn {Value} @@@var{n}
847bf1f5 6384@findex @@@var{n}
7404cdf3
JD
6385Acts like a structure variable containing information on the textual
6386location of the @var{n}th component of the current rule. @xref{Tracking
6387Locations}.
18b519c0 6388@end deffn
bfa74976 6389
f7ab6a50
PE
6390@node Internationalization
6391@section Parser Internationalization
6392@cindex internationalization
6393@cindex i18n
6394@cindex NLS
6395@cindex gettext
6396@cindex bison-po
6397
6398A Bison-generated parser can print diagnostics, including error and
6399tracing messages. By default, they appear in English. However, Bison
f8e1c9e5
AD
6400also supports outputting diagnostics in the user's native language. To
6401make this work, the user should set the usual environment variables.
6402@xref{Users, , The User's View, gettext, GNU @code{gettext} utilities}.
6403For example, the shell command @samp{export LC_ALL=fr_CA.UTF-8} might
35430378 6404set the user's locale to French Canadian using the UTF-8
f7ab6a50
PE
6405encoding. The exact set of available locales depends on the user's
6406installation.
6407
6408The maintainer of a package that uses a Bison-generated parser enables
6409the internationalization of the parser's output through the following
35430378
JD
6410steps. Here we assume a package that uses GNU Autoconf and
6411GNU Automake.
f7ab6a50
PE
6412
6413@enumerate
6414@item
30757c8c 6415@cindex bison-i18n.m4
35430378 6416Into the directory containing the GNU Autoconf macros used
f7ab6a50
PE
6417by the package---often called @file{m4}---copy the
6418@file{bison-i18n.m4} file installed by Bison under
6419@samp{share/aclocal/bison-i18n.m4} in Bison's installation directory.
6420For example:
6421
6422@example
6423cp /usr/local/share/aclocal/bison-i18n.m4 m4/bison-i18n.m4
6424@end example
6425
6426@item
30757c8c
PE
6427@findex BISON_I18N
6428@vindex BISON_LOCALEDIR
6429@vindex YYENABLE_NLS
f7ab6a50
PE
6430In the top-level @file{configure.ac}, after the @code{AM_GNU_GETTEXT}
6431invocation, add an invocation of @code{BISON_I18N}. This macro is
6432defined in the file @file{bison-i18n.m4} that you copied earlier. It
6433causes @samp{configure} to find the value of the
30757c8c
PE
6434@code{BISON_LOCALEDIR} variable, and it defines the source-language
6435symbol @code{YYENABLE_NLS} to enable translations in the
6436Bison-generated parser.
f7ab6a50
PE
6437
6438@item
6439In the @code{main} function of your program, designate the directory
6440containing Bison's runtime message catalog, through a call to
6441@samp{bindtextdomain} with domain name @samp{bison-runtime}.
6442For example:
6443
6444@example
6445bindtextdomain ("bison-runtime", BISON_LOCALEDIR);
6446@end example
6447
6448Typically this appears after any other call @code{bindtextdomain
6449(PACKAGE, LOCALEDIR)} that your package already has. Here we rely on
6450@samp{BISON_LOCALEDIR} to be defined as a string through the
6451@file{Makefile}.
6452
6453@item
6454In the @file{Makefile.am} that controls the compilation of the @code{main}
6455function, make @samp{BISON_LOCALEDIR} available as a C preprocessor macro,
6456either in @samp{DEFS} or in @samp{AM_CPPFLAGS}. For example:
6457
6458@example
6459DEFS = @@DEFS@@ -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
6460@end example
6461
6462or:
6463
6464@example
6465AM_CPPFLAGS = -DBISON_LOCALEDIR='"$(BISON_LOCALEDIR)"'
6466@end example
6467
6468@item
6469Finally, invoke the command @command{autoreconf} to generate the build
6470infrastructure.
6471@end enumerate
6472
bfa74976 6473
342b8b6e 6474@node Algorithm
13863333
AD
6475@chapter The Bison Parser Algorithm
6476@cindex Bison parser algorithm
bfa74976
RS
6477@cindex algorithm of parser
6478@cindex shifting
6479@cindex reduction
6480@cindex parser stack
6481@cindex stack, parser
6482
6483As Bison reads tokens, it pushes them onto a stack along with their
6484semantic values. The stack is called the @dfn{parser stack}. Pushing a
6485token is traditionally called @dfn{shifting}.
6486
6487For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
6488@samp{3} to come. The stack will have four elements, one for each token
6489that was shifted.
6490
6491But the stack does not always have an element for each token read. When
6492the last @var{n} tokens and groupings shifted match the components of a
6493grammar rule, they can be combined according to that rule. This is called
6494@dfn{reduction}. Those tokens and groupings are replaced on the stack by a
6495single grouping whose symbol is the result (left hand side) of that rule.
6496Running the rule's action is part of the process of reduction, because this
6497is what computes the semantic value of the resulting grouping.
6498
6499For example, if the infix calculator's parser stack contains this:
6500
6501@example
65021 + 5 * 3
6503@end example
6504
6505@noindent
6506and the next input token is a newline character, then the last three
6507elements can be reduced to 15 via the rule:
6508
6509@example
6510expr: expr '*' expr;
6511@end example
6512
6513@noindent
6514Then the stack contains just these three elements:
6515
6516@example
65171 + 15
6518@end example
6519
6520@noindent
6521At this point, another reduction can be made, resulting in the single value
652216. Then the newline token can be shifted.
6523
6524The parser tries, by shifts and reductions, to reduce the entire input down
6525to a single grouping whose symbol is the grammar's start-symbol
6526(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
6527
6528This kind of parser is known in the literature as a bottom-up parser.
6529
6530@menu
742e4900 6531* Lookahead:: Parser looks one token ahead when deciding what to do.
bfa74976
RS
6532* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
6533* Precedence:: Operator precedence works by resolving conflicts.
6534* Contextual Precedence:: When an operator's precedence depends on context.
6535* Parser States:: The parser is a finite-state-machine with stack.
6536* Reduce/Reduce:: When two rules are applicable in the same situation.
5da0355a 6537* Mysterious Conflicts:: Conflicts that look unjustified.
6f04ee6c 6538* Tuning LR:: How to tune fundamental aspects of LR-based parsing.
676385e2 6539* Generalized LR Parsing:: Parsing arbitrary context-free grammars.
1a059451 6540* Memory Management:: What happens when memory is exhausted. How to avoid it.
bfa74976
RS
6541@end menu
6542
742e4900
JD
6543@node Lookahead
6544@section Lookahead Tokens
6545@cindex lookahead token
bfa74976
RS
6546
6547The Bison parser does @emph{not} always reduce immediately as soon as the
6548last @var{n} tokens and groupings match a rule. This is because such a
6549simple strategy is inadequate to handle most languages. Instead, when a
6550reduction is possible, the parser sometimes ``looks ahead'' at the next
6551token in order to decide what to do.
6552
6553When a token is read, it is not immediately shifted; first it becomes the
742e4900 6554@dfn{lookahead token}, which is not on the stack. Now the parser can
bfa74976 6555perform one or more reductions of tokens and groupings on the stack, while
742e4900
JD
6556the lookahead token remains off to the side. When no more reductions
6557should take place, the lookahead token is shifted onto the stack. This
bfa74976 6558does not mean that all possible reductions have been done; depending on the
742e4900 6559token type of the lookahead token, some rules may choose to delay their
bfa74976
RS
6560application.
6561
742e4900 6562Here is a simple case where lookahead is needed. These three rules define
bfa74976
RS
6563expressions which contain binary addition operators and postfix unary
6564factorial operators (@samp{!}), and allow parentheses for grouping.
6565
6566@example
6567@group
de6be119
AD
6568expr:
6569 term '+' expr
6570| term
6571;
bfa74976
RS
6572@end group
6573
6574@group
de6be119
AD
6575term:
6576 '(' expr ')'
6577| term '!'
534cee7a 6578| "number"
de6be119 6579;
bfa74976
RS
6580@end group
6581@end example
6582
6583Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
6584should be done? If the following token is @samp{)}, then the first three
6585tokens must be reduced to form an @code{expr}. This is the only valid
6586course, because shifting the @samp{)} would produce a sequence of symbols
6587@w{@code{term ')'}}, and no rule allows this.
6588
6589If the following token is @samp{!}, then it must be shifted immediately so
6590that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
6591parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
6592@code{expr}. It would then be impossible to shift the @samp{!} because
6593doing so would produce on the stack the sequence of symbols @code{expr
6594'!'}. No rule allows that sequence.
6595
6596@vindex yychar
32c29292
JD
6597@vindex yylval
6598@vindex yylloc
742e4900 6599The lookahead token is stored in the variable @code{yychar}.
32c29292
JD
6600Its semantic value and location, if any, are stored in the variables
6601@code{yylval} and @code{yylloc}.
bfa74976
RS
6602@xref{Action Features, ,Special Features for Use in Actions}.
6603
342b8b6e 6604@node Shift/Reduce
bfa74976
RS
6605@section Shift/Reduce Conflicts
6606@cindex conflicts
6607@cindex shift/reduce conflicts
6608@cindex dangling @code{else}
6609@cindex @code{else}, dangling
6610
6611Suppose we are parsing a language which has if-then and if-then-else
6612statements, with a pair of rules like this:
6613
6614@example
6615@group
6616if_stmt:
534cee7a
AD
6617 "if" expr "then" stmt
6618| "if" expr "then" stmt "else" stmt
de6be119 6619;
bfa74976
RS
6620@end group
6621@end example
6622
6623@noindent
534cee7a
AD
6624Here @code{"if"}, @code{"then"} and @code{"else"} are terminal symbols for
6625specific keyword tokens.
bfa74976 6626
534cee7a 6627When the @code{"else"} token is read and becomes the lookahead token, the
bfa74976
RS
6628contents of the stack (assuming the input is valid) are just right for
6629reduction by the first rule. But it is also legitimate to shift the
534cee7a 6630@code{"else"}, because that would lead to eventual reduction by the second
bfa74976
RS
6631rule.
6632
6633This situation, where either a shift or a reduction would be valid, is
6634called a @dfn{shift/reduce conflict}. Bison is designed to resolve
6635these conflicts by choosing to shift, unless otherwise directed by
6636operator precedence declarations. To see the reason for this, let's
6637contrast it with the other alternative.
6638
534cee7a 6639Since the parser prefers to shift the @code{"else"}, the result is to attach
bfa74976
RS
6640the else-clause to the innermost if-statement, making these two inputs
6641equivalent:
6642
6643@example
534cee7a 6644if x then if y then win; else lose;
bfa74976 6645
534cee7a 6646if x then do; if y then win; else lose; end;
bfa74976
RS
6647@end example
6648
6649But if the parser chose to reduce when possible rather than shift, the
6650result would be to attach the else-clause to the outermost if-statement,
6651making these two inputs equivalent:
6652
6653@example
534cee7a 6654if x then if y then win; else lose;
bfa74976 6655
534cee7a 6656if x then do; if y then win; end; else lose;
bfa74976
RS
6657@end example
6658
6659The conflict exists because the grammar as written is ambiguous: either
6660parsing of the simple nested if-statement is legitimate. The established
6661convention is that these ambiguities are resolved by attaching the
6662else-clause to the innermost if-statement; this is what Bison accomplishes
6663by choosing to shift rather than reduce. (It would ideally be cleaner to
6664write an unambiguous grammar, but that is very hard to do in this case.)
6665This particular ambiguity was first encountered in the specifications of
6666Algol 60 and is called the ``dangling @code{else}'' ambiguity.
6667
6668To avoid warnings from Bison about predictable, legitimate shift/reduce
c28cd5dc 6669conflicts, you can use the @code{%expect @var{n}} declaration.
cf22447c
JD
6670There will be no warning as long as the number of shift/reduce conflicts
6671is exactly @var{n}, and Bison will report an error if there is a
6672different number.
c28cd5dc
AD
6673@xref{Expect Decl, ,Suppressing Conflict Warnings}. However, we don't
6674recommend the use of @code{%expect} (except @samp{%expect 0}!), as an equal
6675number of conflicts does not mean that they are the @emph{same}. When
6676possible, you should rather use precedence directives to @emph{fix} the
6677conflicts explicitly (@pxref{Non Operators,, Using Precedence For Non
6678Operators}).
bfa74976
RS
6679
6680The definition of @code{if_stmt} above is solely to blame for the
6681conflict, but the conflict does not actually appear without additional
9913d6e4
JD
6682rules. Here is a complete Bison grammar file that actually manifests
6683the conflict:
bfa74976
RS
6684
6685@example
6686@group
bfa74976
RS
6687%%
6688@end group
6689@group
de6be119
AD
6690stmt:
6691 expr
6692| if_stmt
6693;
bfa74976
RS
6694@end group
6695
6696@group
6697if_stmt:
534cee7a
AD
6698 "if" expr "then" stmt
6699| "if" expr "then" stmt "else" stmt
de6be119 6700;
bfa74976
RS
6701@end group
6702
de6be119 6703expr:
534cee7a 6704 "identifier"
de6be119 6705;
bfa74976
RS
6706@end example
6707
342b8b6e 6708@node Precedence
bfa74976
RS
6709@section Operator Precedence
6710@cindex operator precedence
6711@cindex precedence of operators
6712
6713Another situation where shift/reduce conflicts appear is in arithmetic
6714expressions. Here shifting is not always the preferred resolution; the
6715Bison declarations for operator precedence allow you to specify when to
6716shift and when to reduce.
6717
6718@menu
6719* Why Precedence:: An example showing why precedence is needed.
6720* Using Precedence:: How to specify precedence in Bison grammars.
6721* Precedence Examples:: How these features are used in the previous example.
6722* How Precedence:: How they work.
c28cd5dc 6723* Non Operators:: Using precedence for general conflicts.
bfa74976
RS
6724@end menu
6725
342b8b6e 6726@node Why Precedence
bfa74976
RS
6727@subsection When Precedence is Needed
6728
6729Consider the following ambiguous grammar fragment (ambiguous because the
6730input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
6731
6732@example
6733@group
de6be119
AD
6734expr:
6735 expr '-' expr
6736| expr '*' expr
6737| expr '<' expr
6738| '(' expr ')'
6739@dots{}
6740;
bfa74976
RS
6741@end group
6742@end example
6743
6744@noindent
6745Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
14ded682
AD
6746should it reduce them via the rule for the subtraction operator? It
6747depends on the next token. Of course, if the next token is @samp{)}, we
6748must reduce; shifting is invalid because no single rule can reduce the
6749token sequence @w{@samp{- 2 )}} or anything starting with that. But if
6750the next token is @samp{*} or @samp{<}, we have a choice: either
6751shifting or reduction would allow the parse to complete, but with
6752different results.
6753
6754To decide which one Bison should do, we must consider the results. If
6755the next operator token @var{op} is shifted, then it must be reduced
6756first in order to permit another opportunity to reduce the difference.
6757The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other
6758hand, if the subtraction is reduced before shifting @var{op}, the result
6759is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or
6760reduce should depend on the relative precedence of the operators
6761@samp{-} and @var{op}: @samp{*} should be shifted first, but not
6762@samp{<}.
bfa74976
RS
6763
6764@cindex associativity
6765What about input such as @w{@samp{1 - 2 - 5}}; should this be
14ded682
AD
6766@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most
6767operators we prefer the former, which is called @dfn{left association}.
6768The latter alternative, @dfn{right association}, is desirable for
6769assignment operators. The choice of left or right association is a
6770matter of whether the parser chooses to shift or reduce when the stack
742e4900 6771contains @w{@samp{1 - 2}} and the lookahead token is @samp{-}: shifting
14ded682 6772makes right-associativity.
bfa74976 6773
342b8b6e 6774@node Using Precedence
bfa74976
RS
6775@subsection Specifying Operator Precedence
6776@findex %left
6777@findex %right
6778@findex %nonassoc
6779
6780Bison allows you to specify these choices with the operator precedence
6781declarations @code{%left} and @code{%right}. Each such declaration
6782contains a list of tokens, which are operators whose precedence and
6783associativity is being declared. The @code{%left} declaration makes all
6784those operators left-associative and the @code{%right} declaration makes
6785them right-associative. A third alternative is @code{%nonassoc}, which
6786declares that it is a syntax error to find the same operator twice ``in a
6787row''.
6788
6789The relative precedence of different operators is controlled by the
6790order in which they are declared. The first @code{%left} or
6791@code{%right} declaration in the file declares the operators whose
6792precedence is lowest, the next such declaration declares the operators
6793whose precedence is a little higher, and so on.
6794
342b8b6e 6795@node Precedence Examples
bfa74976
RS
6796@subsection Precedence Examples
6797
6798In our example, we would want the following declarations:
6799
6800@example
6801%left '<'
6802%left '-'
6803%left '*'
6804@end example
6805
6806In a more complete example, which supports other operators as well, we
6807would declare them in groups of equal precedence. For example, @code{'+'} is
6808declared with @code{'-'}:
6809
6810@example
534cee7a 6811%left '<' '>' '=' "!=" "<=" ">="
bfa74976
RS
6812%left '+' '-'
6813%left '*' '/'
6814@end example
6815
342b8b6e 6816@node How Precedence
bfa74976
RS
6817@subsection How Precedence Works
6818
6819The first effect of the precedence declarations is to assign precedence
6820levels to the terminal symbols declared. The second effect is to assign
704a47c4
AD
6821precedence levels to certain rules: each rule gets its precedence from
6822the last terminal symbol mentioned in the components. (You can also
6823specify explicitly the precedence of a rule. @xref{Contextual
6824Precedence, ,Context-Dependent Precedence}.)
6825
6826Finally, the resolution of conflicts works by comparing the precedence
742e4900 6827of the rule being considered with that of the lookahead token. If the
704a47c4
AD
6828token's precedence is higher, the choice is to shift. If the rule's
6829precedence is higher, the choice is to reduce. If they have equal
6830precedence, the choice is made based on the associativity of that
6831precedence level. The verbose output file made by @samp{-v}
6832(@pxref{Invocation, ,Invoking Bison}) says how each conflict was
6833resolved.
bfa74976
RS
6834
6835Not all rules and not all tokens have precedence. If either the rule or
742e4900 6836the lookahead token has no precedence, then the default is to shift.
bfa74976 6837
c28cd5dc
AD
6838@node Non Operators
6839@subsection Using Precedence For Non Operators
6840
6841Using properly precedence and associativity directives can help fixing
6842shift/reduce conflicts that do not involve arithmetics-like operators. For
6843instance, the ``dangling @code{else}'' problem (@pxref{Shift/Reduce, ,
6844Shift/Reduce Conflicts}) can be solved elegantly in two different ways.
6845
6846In the present case, the conflict is between the token @code{"else"} willing
6847to be shifted, and the rule @samp{if_stmt: "if" expr "then" stmt}, asking
6848for reduction. By default, the precedence of a rule is that of its last
6849token, here @code{"then"}, so the conflict will be solved appropriately
6850by giving @code{"else"} a precedence higher than that of @code{"then"}, for
6851instance as follows:
6852
6853@example
6854@group
6855%nonassoc "then"
6856%nonassoc "else"
6857@end group
6858@end example
6859
6860Alternatively, you may give both tokens the same precedence, in which case
6861associativity is used to solve the conflict. To preserve the shift action,
6862use right associativity:
6863
6864@example
6865%right "then" "else"
6866@end example
6867
6868Neither solution is perfect however. Since Bison does not provide, so far,
6869support for ``scoped'' precedence, both force you to declare the precedence
6870of these keywords with respect to the other operators your grammar.
6871Therefore, instead of being warned about new conflicts you would be unaware
6872of (e.g., a shift/reduce conflict due to @samp{if test then 1 else 2 + 3}
6873being ambiguous: @samp{if test then 1 else (2 + 3)} or @samp{(if test then 1
6874else 2) + 3}?), the conflict will be already ``fixed''.
6875
342b8b6e 6876@node Contextual Precedence
bfa74976
RS
6877@section Context-Dependent Precedence
6878@cindex context-dependent precedence
6879@cindex unary operator precedence
6880@cindex precedence, context-dependent
6881@cindex precedence, unary operator
6882@findex %prec
6883
6884Often the precedence of an operator depends on the context. This sounds
6885outlandish at first, but it is really very common. For example, a minus
6886sign typically has a very high precedence as a unary operator, and a
6887somewhat lower precedence (lower than multiplication) as a binary operator.
6888
6889The Bison precedence declarations, @code{%left}, @code{%right} and
6890@code{%nonassoc}, can only be used once for a given token; so a token has
6891only one precedence declared in this way. For context-dependent
6892precedence, you need to use an additional mechanism: the @code{%prec}
e0c471a9 6893modifier for rules.
bfa74976
RS
6894
6895The @code{%prec} modifier declares the precedence of a particular rule by
6896specifying a terminal symbol whose precedence should be used for that rule.
6897It's not necessary for that symbol to appear otherwise in the rule. The
6898modifier's syntax is:
6899
6900@example
6901%prec @var{terminal-symbol}
6902@end example
6903
6904@noindent
6905and it is written after the components of the rule. Its effect is to
6906assign the rule the precedence of @var{terminal-symbol}, overriding
6907the precedence that would be deduced for it in the ordinary way. The
6908altered rule precedence then affects how conflicts involving that rule
6909are resolved (@pxref{Precedence, ,Operator Precedence}).
6910
6911Here is how @code{%prec} solves the problem of unary minus. First, declare
6912a precedence for a fictitious terminal symbol named @code{UMINUS}. There
6913are no tokens of this type, but the symbol serves to stand for its
6914precedence:
6915
6916@example
6917@dots{}
6918%left '+' '-'
6919%left '*'
6920%left UMINUS
6921@end example
6922
6923Now the precedence of @code{UMINUS} can be used in specific rules:
6924
6925@example
6926@group
de6be119
AD
6927exp:
6928 @dots{}
6929| exp '-' exp
6930 @dots{}
6931| '-' exp %prec UMINUS
bfa74976
RS
6932@end group
6933@end example
6934
91d2c560 6935@ifset defaultprec
39a06c25
PE
6936If you forget to append @code{%prec UMINUS} to the rule for unary
6937minus, Bison silently assumes that minus has its usual precedence.
6938This kind of problem can be tricky to debug, since one typically
6939discovers the mistake only by testing the code.
6940
22fccf95 6941The @code{%no-default-prec;} declaration makes it easier to discover
39a06c25
PE
6942this kind of problem systematically. It causes rules that lack a
6943@code{%prec} modifier to have no precedence, even if the last terminal
6944symbol mentioned in their components has a declared precedence.
6945
22fccf95 6946If @code{%no-default-prec;} is in effect, you must specify @code{%prec}
39a06c25
PE
6947for all rules that participate in precedence conflict resolution.
6948Then you will see any shift/reduce conflict until you tell Bison how
6949to resolve it, either by changing your grammar or by adding an
6950explicit precedence. This will probably add declarations to the
6951grammar, but it helps to protect against incorrect rule precedences.
6952
22fccf95
PE
6953The effect of @code{%no-default-prec;} can be reversed by giving
6954@code{%default-prec;}, which is the default.
91d2c560 6955@end ifset
39a06c25 6956
342b8b6e 6957@node Parser States
bfa74976
RS
6958@section Parser States
6959@cindex finite-state machine
6960@cindex parser state
6961@cindex state (of parser)
6962
6963The function @code{yyparse} is implemented using a finite-state machine.
6964The values pushed on the parser stack are not simply token type codes; they
6965represent the entire sequence of terminal and nonterminal symbols at or
6966near the top of the stack. The current state collects all the information
6967about previous input which is relevant to deciding what to do next.
6968
742e4900
JD
6969Each time a lookahead token is read, the current parser state together
6970with the type of lookahead token are looked up in a table. This table
6971entry can say, ``Shift the lookahead token.'' In this case, it also
bfa74976
RS
6972specifies the new parser state, which is pushed onto the top of the
6973parser stack. Or it can say, ``Reduce using rule number @var{n}.''
6974This means that a certain number of tokens or groupings are taken off
6975the top of the stack, and replaced by one grouping. In other words,
6976that number of states are popped from the stack, and one new state is
6977pushed.
6978
742e4900 6979There is one other alternative: the table can say that the lookahead token
bfa74976
RS
6980is erroneous in the current state. This causes error processing to begin
6981(@pxref{Error Recovery}).
6982
342b8b6e 6983@node Reduce/Reduce
bfa74976
RS
6984@section Reduce/Reduce Conflicts
6985@cindex reduce/reduce conflict
6986@cindex conflicts, reduce/reduce
6987
6988A reduce/reduce conflict occurs if there are two or more rules that apply
6989to the same sequence of input. This usually indicates a serious error
6990in the grammar.
6991
6992For example, here is an erroneous attempt to define a sequence
6993of zero or more @code{word} groupings.
6994
6995@example
98842516 6996@group
de6be119
AD
6997sequence:
6998 /* empty */ @{ printf ("empty sequence\n"); @}
6999| maybeword
7000| sequence word @{ printf ("added word %s\n", $2); @}
7001;
98842516 7002@end group
bfa74976 7003
98842516 7004@group
de6be119
AD
7005maybeword:
7006 /* empty */ @{ printf ("empty maybeword\n"); @}
7007| word @{ printf ("single word %s\n", $1); @}
7008;
98842516 7009@end group
bfa74976
RS
7010@end example
7011
7012@noindent
7013The error is an ambiguity: there is more than one way to parse a single
7014@code{word} into a @code{sequence}. It could be reduced to a
7015@code{maybeword} and then into a @code{sequence} via the second rule.
7016Alternatively, nothing-at-all could be reduced into a @code{sequence}
7017via the first rule, and this could be combined with the @code{word}
7018using the third rule for @code{sequence}.
7019
7020There is also more than one way to reduce nothing-at-all into a
7021@code{sequence}. This can be done directly via the first rule,
7022or indirectly via @code{maybeword} and then the second rule.
7023
7024You might think that this is a distinction without a difference, because it
7025does not change whether any particular input is valid or not. But it does
7026affect which actions are run. One parsing order runs the second rule's
7027action; the other runs the first rule's action and the third rule's action.
7028In this example, the output of the program changes.
7029
7030Bison resolves a reduce/reduce conflict by choosing to use the rule that
7031appears first in the grammar, but it is very risky to rely on this. Every
7032reduce/reduce conflict must be studied and usually eliminated. Here is the
7033proper way to define @code{sequence}:
7034
7035@example
51356dd2 7036@group
de6be119
AD
7037sequence:
7038 /* empty */ @{ printf ("empty sequence\n"); @}
7039| sequence word @{ printf ("added word %s\n", $2); @}
7040;
51356dd2 7041@end group
bfa74976
RS
7042@end example
7043
7044Here is another common error that yields a reduce/reduce conflict:
7045
7046@example
de6be119 7047sequence:
51356dd2 7048@group
de6be119
AD
7049 /* empty */
7050| sequence words
7051| sequence redirects
7052;
51356dd2 7053@end group
bfa74976 7054
51356dd2 7055@group
de6be119
AD
7056words:
7057 /* empty */
7058| words word
7059;
51356dd2 7060@end group
bfa74976 7061
51356dd2 7062@group
de6be119
AD
7063redirects:
7064 /* empty */
7065| redirects redirect
7066;
51356dd2 7067@end group
bfa74976
RS
7068@end example
7069
7070@noindent
7071The intention here is to define a sequence which can contain either
7072@code{word} or @code{redirect} groupings. The individual definitions of
7073@code{sequence}, @code{words} and @code{redirects} are error-free, but the
7074three together make a subtle ambiguity: even an empty input can be parsed
7075in infinitely many ways!
7076
7077Consider: nothing-at-all could be a @code{words}. Or it could be two
7078@code{words} in a row, or three, or any number. It could equally well be a
7079@code{redirects}, or two, or any number. Or it could be a @code{words}
7080followed by three @code{redirects} and another @code{words}. And so on.
7081
7082Here are two ways to correct these rules. First, to make it a single level
7083of sequence:
7084
7085@example
de6be119
AD
7086sequence:
7087 /* empty */
7088| sequence word
7089| sequence redirect
7090;
bfa74976
RS
7091@end example
7092
7093Second, to prevent either a @code{words} or a @code{redirects}
7094from being empty:
7095
7096@example
98842516 7097@group
de6be119
AD
7098sequence:
7099 /* empty */
7100| sequence words
7101| sequence redirects
7102;
98842516 7103@end group
bfa74976 7104
98842516 7105@group
de6be119
AD
7106words:
7107 word
7108| words word
7109;
98842516 7110@end group
bfa74976 7111
98842516 7112@group
de6be119
AD
7113redirects:
7114 redirect
7115| redirects redirect
7116;
98842516 7117@end group
bfa74976
RS
7118@end example
7119
53e2cd1e
AD
7120Yet this proposal introduces another kind of ambiguity! The input
7121@samp{word word} can be parsed as a single @code{words} composed of two
7122@samp{word}s, or as two one-@code{word} @code{words} (and likewise for
7123@code{redirect}/@code{redirects}). However this ambiguity is now a
7124shift/reduce conflict, and therefore it can now be addressed with precedence
7125directives.
7126
7127To simplify the matter, we will proceed with @code{word} and @code{redirect}
7128being tokens: @code{"word"} and @code{"redirect"}.
7129
7130To prefer the longest @code{words}, the conflict between the token
7131@code{"word"} and the rule @samp{sequence: sequence words} must be resolved
7132as a shift. To this end, we use the same techniques as exposed above, see
7133@ref{Non Operators,, Using Precedence For Non Operators}. One solution
7134relies on precedences: use @code{%prec} to give a lower precedence to the
7135rule:
7136
7137@example
7138%nonassoc "word"
7139%nonassoc "sequence"
7140%%
7141@group
7142sequence:
7143 /* empty */
7144| sequence word %prec "sequence"
7145| sequence redirect %prec "sequence"
7146;
7147@end group
7148
7149@group
7150words:
7151 word
7152| words "word"
7153;
7154@end group
7155@end example
7156
7157Another solution relies on associativity: provide both the token and the
7158rule with the same precedence, but make them right-associative:
7159
7160@example
7161%right "word" "redirect"
7162%%
7163@group
7164sequence:
7165 /* empty */
7166| sequence word %prec "word"
7167| sequence redirect %prec "redirect"
7168;
7169@end group
7170@end example
7171
5da0355a
JD
7172@node Mysterious Conflicts
7173@section Mysterious Conflicts
6f04ee6c 7174@cindex Mysterious Conflicts
bfa74976
RS
7175
7176Sometimes reduce/reduce conflicts can occur that don't look warranted.
7177Here is an example:
7178
7179@example
7180@group
bfa74976 7181%%
de6be119 7182def: param_spec return_spec ',';
bfa74976 7183param_spec:
de6be119
AD
7184 type
7185| name_list ':' type
7186;
bfa74976
RS
7187@end group
7188@group
7189return_spec:
de6be119
AD
7190 type
7191| name ':' type
7192;
bfa74976
RS
7193@end group
7194@group
534cee7a 7195type: "id";
bfa74976
RS
7196@end group
7197@group
534cee7a 7198name: "id";
bfa74976 7199name_list:
de6be119
AD
7200 name
7201| name ',' name_list
7202;
bfa74976
RS
7203@end group
7204@end example
7205
534cee7a
AD
7206It would seem that this grammar can be parsed with only a single token of
7207lookahead: when a @code{param_spec} is being read, an @code{"id"} is a
7208@code{name} if a comma or colon follows, or a @code{type} if another
7209@code{"id"} follows. In other words, this grammar is LR(1).
bfa74976 7210
6f04ee6c
JD
7211@cindex LR
7212@cindex LALR
34a6c2d1 7213However, for historical reasons, Bison cannot by default handle all
35430378 7214LR(1) grammars.
534cee7a 7215In this grammar, two contexts, that after an @code{"id"} at the beginning
34a6c2d1
JD
7216of a @code{param_spec} and likewise at the beginning of a
7217@code{return_spec}, are similar enough that Bison assumes they are the
7218same.
7219They appear similar because the same set of rules would be
bfa74976
RS
7220active---the rule for reducing to a @code{name} and that for reducing to
7221a @code{type}. Bison is unable to determine at that stage of processing
742e4900 7222that the rules would require different lookahead tokens in the two
bfa74976
RS
7223contexts, so it makes a single parser state for them both. Combining
7224the two contexts causes a conflict later. In parser terminology, this
35430378 7225occurrence means that the grammar is not LALR(1).
bfa74976 7226
6f04ee6c
JD
7227@cindex IELR
7228@cindex canonical LR
7229For many practical grammars (specifically those that fall into the non-LR(1)
7230class), the limitations of LALR(1) result in difficulties beyond just
7231mysterious reduce/reduce conflicts. The best way to fix all these problems
7232is to select a different parser table construction algorithm. Either
7233IELR(1) or canonical LR(1) would suffice, but the former is more efficient
7234and easier to debug during development. @xref{LR Table Construction}, for
7235details. (Bison's IELR(1) and canonical LR(1) implementations are
7236experimental. More user feedback will help to stabilize them.)
34a6c2d1 7237
35430378 7238If you instead wish to work around LALR(1)'s limitations, you
34a6c2d1
JD
7239can often fix a mysterious conflict by identifying the two parser states
7240that are being confused, and adding something to make them look
7241distinct. In the above example, adding one rule to
bfa74976
RS
7242@code{return_spec} as follows makes the problem go away:
7243
7244@example
7245@group
bfa74976
RS
7246@dots{}
7247return_spec:
de6be119
AD
7248 type
7249| name ':' type
534cee7a 7250| "id" "bogus" /* This rule is never used. */
de6be119 7251;
bfa74976
RS
7252@end group
7253@end example
7254
7255This corrects the problem because it introduces the possibility of an
534cee7a 7256additional active rule in the context after the @code{"id"} at the beginning of
bfa74976
RS
7257@code{return_spec}. This rule is not active in the corresponding context
7258in a @code{param_spec}, so the two contexts receive distinct parser states.
534cee7a 7259As long as the token @code{"bogus"} is never generated by @code{yylex},
bfa74976
RS
7260the added rule cannot alter the way actual input is parsed.
7261
7262In this particular example, there is another way to solve the problem:
534cee7a 7263rewrite the rule for @code{return_spec} to use @code{"id"} directly
bfa74976
RS
7264instead of via @code{name}. This also causes the two confusing
7265contexts to have different sets of active rules, because the one for
7266@code{return_spec} activates the altered rule for @code{return_spec}
7267rather than the one for @code{name}.
7268
7269@example
7270param_spec:
de6be119
AD
7271 type
7272| name_list ':' type
7273;
bfa74976 7274return_spec:
de6be119 7275 type
534cee7a 7276| "id" ':' type
de6be119 7277;
bfa74976
RS
7278@end example
7279
35430378 7280For a more detailed exposition of LALR(1) parsers and parser
71caec06 7281generators, @pxref{Bibliography,,DeRemer 1982}.
e054b190 7282
6f04ee6c
JD
7283@node Tuning LR
7284@section Tuning LR
7285
7286The default behavior of Bison's LR-based parsers is chosen mostly for
7287historical reasons, but that behavior is often not robust. For example, in
7288the previous section, we discussed the mysterious conflicts that can be
7289produced by LALR(1), Bison's default parser table construction algorithm.
7290Another example is Bison's @code{%error-verbose} directive, which instructs
7291the generated parser to produce verbose syntax error messages, which can
7292sometimes contain incorrect information.
7293
7294In this section, we explore several modern features of Bison that allow you
7295to tune fundamental aspects of the generated LR-based parsers. Some of
7296these features easily eliminate shortcomings like those mentioned above.
7297Others can be helpful purely for understanding your parser.
7298
7299Most of the features discussed in this section are still experimental. More
7300user feedback will help to stabilize them.
7301
7302@menu
7303* LR Table Construction:: Choose a different construction algorithm.
7304* Default Reductions:: Disable default reductions.
7305* LAC:: Correct lookahead sets in the parser states.
7306* Unreachable States:: Keep unreachable parser states for debugging.
7307@end menu
7308
7309@node LR Table Construction
7310@subsection LR Table Construction
7311@cindex Mysterious Conflict
7312@cindex LALR
7313@cindex IELR
7314@cindex canonical LR
7315@findex %define lr.type
7316
7317For historical reasons, Bison constructs LALR(1) parser tables by default.
7318However, LALR does not possess the full language-recognition power of LR.
7319As a result, the behavior of parsers employing LALR parser tables is often
5da0355a 7320mysterious. We presented a simple example of this effect in @ref{Mysterious
6f04ee6c
JD
7321Conflicts}.
7322
7323As we also demonstrated in that example, the traditional approach to
7324eliminating such mysterious behavior is to restructure the grammar.
7325Unfortunately, doing so correctly is often difficult. Moreover, merely
7326discovering that LALR causes mysterious behavior in your parser can be
7327difficult as well.
7328
7329Fortunately, Bison provides an easy way to eliminate the possibility of such
7330mysterious behavior altogether. You simply need to activate a more powerful
7331parser table construction algorithm by using the @code{%define lr.type}
7332directive.
7333
7334@deffn {Directive} {%define lr.type @var{TYPE}}
7335Specify the type of parser tables within the LR(1) family. The accepted
7336values for @var{TYPE} are:
7337
7338@itemize
7339@item @code{lalr} (default)
7340@item @code{ielr}
7341@item @code{canonical-lr}
7342@end itemize
7343
7344(This feature is experimental. More user feedback will help to stabilize
7345it.)
7346@end deffn
7347
7348For example, to activate IELR, you might add the following directive to you
7349grammar file:
7350
7351@example
7352%define lr.type ielr
7353@end example
7354
5da0355a 7355@noindent For the example in @ref{Mysterious Conflicts}, the mysterious
6f04ee6c
JD
7356conflict is then eliminated, so there is no need to invest time in
7357comprehending the conflict or restructuring the grammar to fix it. If,
7358during future development, the grammar evolves such that all mysterious
7359behavior would have disappeared using just LALR, you need not fear that
7360continuing to use IELR will result in unnecessarily large parser tables.
7361That is, IELR generates LALR tables when LALR (using a deterministic parsing
7362algorithm) is sufficient to support the full language-recognition power of
7363LR. Thus, by enabling IELR at the start of grammar development, you can
7364safely and completely eliminate the need to consider LALR's shortcomings.
7365
7366While IELR is almost always preferable, there are circumstances where LALR
7367or the canonical LR parser tables described by Knuth
7368(@pxref{Bibliography,,Knuth 1965}) can be useful. Here we summarize the
7369relative advantages of each parser table construction algorithm within
7370Bison:
7371
7372@itemize
7373@item LALR
7374
7375There are at least two scenarios where LALR can be worthwhile:
7376
7377@itemize
7378@item GLR without static conflict resolution.
7379
7380@cindex GLR with LALR
7381When employing GLR parsers (@pxref{GLR Parsers}), if you do not resolve any
7382conflicts statically (for example, with @code{%left} or @code{%prec}), then
7383the parser explores all potential parses of any given input. In this case,
7384the choice of parser table construction algorithm is guaranteed not to alter
7385the language accepted by the parser. LALR parser tables are the smallest
7386parser tables Bison can currently construct, so they may then be preferable.
7387Nevertheless, once you begin to resolve conflicts statically, GLR behaves
7388more like a deterministic parser in the syntactic contexts where those
7389conflicts appear, and so either IELR or canonical LR can then be helpful to
7390avoid LALR's mysterious behavior.
7391
7392@item Malformed grammars.
7393
7394Occasionally during development, an especially malformed grammar with a
7395major recurring flaw may severely impede the IELR or canonical LR parser
7396table construction algorithm. LALR can be a quick way to construct parser
7397tables in order to investigate such problems while ignoring the more subtle
7398differences from IELR and canonical LR.
7399@end itemize
7400
7401@item IELR
7402
7403IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given
7404any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables
7405always accept exactly the same set of sentences. However, like LALR, IELR
7406merges parser states during parser table construction so that the number of
7407parser states is often an order of magnitude less than for canonical LR.
7408More importantly, because canonical LR's extra parser states may contain
7409duplicate conflicts in the case of non-LR grammars, the number of conflicts
7410for IELR is often an order of magnitude less as well. This effect can
7411significantly reduce the complexity of developing a grammar.
7412
7413@item Canonical LR
7414
7415@cindex delayed syntax error detection
7416@cindex LAC
7417@findex %nonassoc
7418While inefficient, canonical LR parser tables can be an interesting means to
7419explore a grammar because they possess a property that IELR and LALR tables
7420do not. That is, if @code{%nonassoc} is not used and default reductions are
7421left disabled (@pxref{Default Reductions}), then, for every left context of
7422every canonical LR state, the set of tokens accepted by that state is
7423guaranteed to be the exact set of tokens that is syntactically acceptable in
7424that left context. It might then seem that an advantage of canonical LR
7425parsers in production is that, under the above constraints, they are
7426guaranteed to detect a syntax error as soon as possible without performing
7427any unnecessary reductions. However, IELR parsers that use LAC are also
7428able to achieve this behavior without sacrificing @code{%nonassoc} or
7429default reductions. For details and a few caveats of LAC, @pxref{LAC}.
7430@end itemize
7431
7432For a more detailed exposition of the mysterious behavior in LALR parsers
7433and the benefits of IELR, @pxref{Bibliography,,Denny 2008 March}, and
7434@ref{Bibliography,,Denny 2010 November}.
7435
7436@node Default Reductions
7437@subsection Default Reductions
7438@cindex default reductions
7439@findex %define lr.default-reductions
7440@findex %nonassoc
7441
7442After parser table construction, Bison identifies the reduction with the
7443largest lookahead set in each parser state. To reduce the size of the
7444parser state, traditional Bison behavior is to remove that lookahead set and
7445to assign that reduction to be the default parser action. Such a reduction
7446is known as a @dfn{default reduction}.
7447
7448Default reductions affect more than the size of the parser tables. They
7449also affect the behavior of the parser:
7450
7451@itemize
7452@item Delayed @code{yylex} invocations.
7453
7454@cindex delayed yylex invocations
7455@cindex consistent states
7456@cindex defaulted states
7457A @dfn{consistent state} is a state that has only one possible parser
7458action. If that action is a reduction and is encoded as a default
7459reduction, then that consistent state is called a @dfn{defaulted state}.
7460Upon reaching a defaulted state, a Bison-generated parser does not bother to
7461invoke @code{yylex} to fetch the next token before performing the reduction.
7462In other words, whether default reductions are enabled in consistent states
7463determines how soon a Bison-generated parser invokes @code{yylex} for a
7464token: immediately when it @emph{reaches} that token in the input or when it
7465eventually @emph{needs} that token as a lookahead to determine the next
7466parser action. Traditionally, default reductions are enabled, and so the
7467parser exhibits the latter behavior.
7468
7469The presence of defaulted states is an important consideration when
7470designing @code{yylex} and the grammar file. That is, if the behavior of
7471@code{yylex} can influence or be influenced by the semantic actions
7472associated with the reductions in defaulted states, then the delay of the
7473next @code{yylex} invocation until after those reductions is significant.
7474For example, the semantic actions might pop a scope stack that @code{yylex}
7475uses to determine what token to return. Thus, the delay might be necessary
7476to ensure that @code{yylex} does not look up the next token in a scope that
7477should already be considered closed.
7478
7479@item Delayed syntax error detection.
7480
7481@cindex delayed syntax error detection
7482When the parser fetches a new token by invoking @code{yylex}, it checks
7483whether there is an action for that token in the current parser state. The
7484parser detects a syntax error if and only if either (1) there is no action
7485for that token or (2) the action for that token is the error action (due to
7486the use of @code{%nonassoc}). However, if there is a default reduction in
7487that state (which might or might not be a defaulted state), then it is
7488impossible for condition 1 to exist. That is, all tokens have an action.
7489Thus, the parser sometimes fails to detect the syntax error until it reaches
7490a later state.
7491
7492@cindex LAC
7493@c If there's an infinite loop, default reductions can prevent an incorrect
7494@c sentence from being rejected.
7495While default reductions never cause the parser to accept syntactically
7496incorrect sentences, the delay of syntax error detection can have unexpected
7497effects on the behavior of the parser. However, the delay can be caused
7498anyway by parser state merging and the use of @code{%nonassoc}, and it can
7499be fixed by another Bison feature, LAC. We discuss the effects of delayed
7500syntax error detection and LAC more in the next section (@pxref{LAC}).
7501@end itemize
7502
7503For canonical LR, the only default reduction that Bison enables by default
7504is the accept action, which appears only in the accepting state, which has
7505no other action and is thus a defaulted state. However, the default accept
7506action does not delay any @code{yylex} invocation or syntax error detection
7507because the accept action ends the parse.
7508
7509For LALR and IELR, Bison enables default reductions in nearly all states by
7510default. There are only two exceptions. First, states that have a shift
7511action on the @code{error} token do not have default reductions because
7512delayed syntax error detection could then prevent the @code{error} token
7513from ever being shifted in that state. However, parser state merging can
7514cause the same effect anyway, and LAC fixes it in both cases, so future
7515versions of Bison might drop this exception when LAC is activated. Second,
7516GLR parsers do not record the default reduction as the action on a lookahead
7517token for which there is a conflict. The correct action in this case is to
7518split the parse instead.
7519
7520To adjust which states have default reductions enabled, use the
7521@code{%define lr.default-reductions} directive.
7522
7523@deffn {Directive} {%define lr.default-reductions @var{WHERE}}
7524Specify the kind of states that are permitted to contain default reductions.
7525The accepted values of @var{WHERE} are:
7526@itemize
a6e5a280 7527@item @code{most} (default for LALR and IELR)
6f04ee6c
JD
7528@item @code{consistent}
7529@item @code{accepting} (default for canonical LR)
7530@end itemize
7531
7532(The ability to specify where default reductions are permitted is
7533experimental. More user feedback will help to stabilize it.)
7534@end deffn
7535
6f04ee6c
JD
7536@node LAC
7537@subsection LAC
7538@findex %define parse.lac
7539@cindex LAC
7540@cindex lookahead correction
7541
7542Canonical LR, IELR, and LALR can suffer from a couple of problems upon
7543encountering a syntax error. First, the parser might perform additional
7544parser stack reductions before discovering the syntax error. Such
7545reductions can perform user semantic actions that are unexpected because
7546they are based on an invalid token, and they cause error recovery to begin
7547in a different syntactic context than the one in which the invalid token was
7548encountered. Second, when verbose error messages are enabled (@pxref{Error
7549Reporting}), the expected token list in the syntax error message can both
7550contain invalid tokens and omit valid tokens.
7551
7552The culprits for the above problems are @code{%nonassoc}, default reductions
7553in inconsistent states (@pxref{Default Reductions}), and parser state
7554merging. Because IELR and LALR merge parser states, they suffer the most.
7555Canonical LR can suffer only if @code{%nonassoc} is used or if default
7556reductions are enabled for inconsistent states.
7557
7558LAC (Lookahead Correction) is a new mechanism within the parsing algorithm
7559that solves these problems for canonical LR, IELR, and LALR without
7560sacrificing @code{%nonassoc}, default reductions, or state merging. You can
7561enable LAC with the @code{%define parse.lac} directive.
7562
7563@deffn {Directive} {%define parse.lac @var{VALUE}}
7564Enable LAC to improve syntax error handling.
7565@itemize
7566@item @code{none} (default)
7567@item @code{full}
7568@end itemize
7569(This feature is experimental. More user feedback will help to stabilize
7570it. Moreover, it is currently only available for deterministic parsers in
7571C.)
7572@end deffn
7573
7574Conceptually, the LAC mechanism is straight-forward. Whenever the parser
7575fetches a new token from the scanner so that it can determine the next
7576parser action, it immediately suspends normal parsing and performs an
7577exploratory parse using a temporary copy of the normal parser state stack.
7578During this exploratory parse, the parser does not perform user semantic
7579actions. If the exploratory parse reaches a shift action, normal parsing
7580then resumes on the normal parser stacks. If the exploratory parse reaches
7581an error instead, the parser reports a syntax error. If verbose syntax
7582error messages are enabled, the parser must then discover the list of
7583expected tokens, so it performs a separate exploratory parse for each token
7584in the grammar.
7585
7586There is one subtlety about the use of LAC. That is, when in a consistent
7587parser state with a default reduction, the parser will not attempt to fetch
7588a token from the scanner because no lookahead is needed to determine the
7589next parser action. Thus, whether default reductions are enabled in
7590consistent states (@pxref{Default Reductions}) affects how soon the parser
7591detects a syntax error: immediately when it @emph{reaches} an erroneous
7592token or when it eventually @emph{needs} that token as a lookahead to
7593determine the next parser action. The latter behavior is probably more
7594intuitive, so Bison currently provides no way to achieve the former behavior
7595while default reductions are enabled in consistent states.
7596
7597Thus, when LAC is in use, for some fixed decision of whether to enable
7598default reductions in consistent states, canonical LR and IELR behave almost
7599exactly the same for both syntactically acceptable and syntactically
7600unacceptable input. While LALR still does not support the full
7601language-recognition power of canonical LR and IELR, LAC at least enables
7602LALR's syntax error handling to correctly reflect LALR's
7603language-recognition power.
7604
7605There are a few caveats to consider when using LAC:
7606
7607@itemize
7608@item Infinite parsing loops.
7609
7610IELR plus LAC does have one shortcoming relative to canonical LR. Some
7611parsers generated by Bison can loop infinitely. LAC does not fix infinite
7612parsing loops that occur between encountering a syntax error and detecting
7613it, but enabling canonical LR or disabling default reductions sometimes
7614does.
7615
7616@item Verbose error message limitations.
7617
7618Because of internationalization considerations, Bison-generated parsers
7619limit the size of the expected token list they are willing to report in a
7620verbose syntax error message. If the number of expected tokens exceeds that
7621limit, the list is simply dropped from the message. Enabling LAC can
7622increase the size of the list and thus cause the parser to drop it. Of
7623course, dropping the list is better than reporting an incorrect list.
7624
7625@item Performance.
7626
7627Because LAC requires many parse actions to be performed twice, it can have a
7628performance penalty. However, not all parse actions must be performed
7629twice. Specifically, during a series of default reductions in consistent
7630states and shift actions, the parser never has to initiate an exploratory
7631parse. Moreover, the most time-consuming tasks in a parse are often the
7632file I/O, the lexical analysis performed by the scanner, and the user's
7633semantic actions, but none of these are performed during the exploratory
7634parse. Finally, the base of the temporary stack used during an exploratory
7635parse is a pointer into the normal parser state stack so that the stack is
7636never physically copied. In our experience, the performance penalty of LAC
56da1e52 7637has proved insignificant for practical grammars.
6f04ee6c
JD
7638@end itemize
7639
56706c61
JD
7640While the LAC algorithm shares techniques that have been recognized in the
7641parser community for years, for the publication that introduces LAC,
7642@pxref{Bibliography,,Denny 2010 May}.
121c4982 7643
6f04ee6c
JD
7644@node Unreachable States
7645@subsection Unreachable States
7646@findex %define lr.keep-unreachable-states
7647@cindex unreachable states
7648
7649If there exists no sequence of transitions from the parser's start state to
7650some state @var{s}, then Bison considers @var{s} to be an @dfn{unreachable
7651state}. A state can become unreachable during conflict resolution if Bison
7652disables a shift action leading to it from a predecessor state.
7653
7654By default, Bison removes unreachable states from the parser after conflict
7655resolution because they are useless in the generated parser. However,
7656keeping unreachable states is sometimes useful when trying to understand the
7657relationship between the parser and the grammar.
7658
7659@deffn {Directive} {%define lr.keep-unreachable-states @var{VALUE}}
7660Request that Bison allow unreachable states to remain in the parser tables.
7661@var{VALUE} must be a Boolean. The default is @code{false}.
7662@end deffn
7663
7664There are a few caveats to consider:
7665
7666@itemize @bullet
7667@item Missing or extraneous warnings.
7668
7669Unreachable states may contain conflicts and may use rules not used in any
7670other state. Thus, keeping unreachable states may induce warnings that are
7671irrelevant to your parser's behavior, and it may eliminate warnings that are
7672relevant. Of course, the change in warnings may actually be relevant to a
7673parser table analysis that wants to keep unreachable states, so this
7674behavior will likely remain in future Bison releases.
7675
7676@item Other useless states.
7677
7678While Bison is able to remove unreachable states, it is not guaranteed to
7679remove other kinds of useless states. Specifically, when Bison disables
7680reduce actions during conflict resolution, some goto actions may become
7681useless, and thus some additional states may become useless. If Bison were
7682to compute which goto actions were useless and then disable those actions,
7683it could identify such states as unreachable and then remove those states.
7684However, Bison does not compute which goto actions are useless.
7685@end itemize
7686
fae437e8 7687@node Generalized LR Parsing
35430378
JD
7688@section Generalized LR (GLR) Parsing
7689@cindex GLR parsing
7690@cindex generalized LR (GLR) parsing
676385e2 7691@cindex ambiguous grammars
9d9b8b70 7692@cindex nondeterministic parsing
676385e2 7693
fae437e8
AD
7694Bison produces @emph{deterministic} parsers that choose uniquely
7695when to reduce and which reduction to apply
742e4900 7696based on a summary of the preceding input and on one extra token of lookahead.
676385e2
PH
7697As a result, normal Bison handles a proper subset of the family of
7698context-free languages.
fae437e8 7699Ambiguous grammars, since they have strings with more than one possible
676385e2
PH
7700sequence of reductions cannot have deterministic parsers in this sense.
7701The same is true of languages that require more than one symbol of
742e4900 7702lookahead, since the parser lacks the information necessary to make a
676385e2 7703decision at the point it must be made in a shift-reduce parser.
5da0355a 7704Finally, as previously mentioned (@pxref{Mysterious Conflicts}),
34a6c2d1 7705there are languages where Bison's default choice of how to
676385e2
PH
7706summarize the input seen so far loses necessary information.
7707
7708When you use the @samp{%glr-parser} declaration in your grammar file,
7709Bison generates a parser that uses a different algorithm, called
35430378 7710Generalized LR (or GLR). A Bison GLR
c827f760 7711parser uses the same basic
676385e2
PH
7712algorithm for parsing as an ordinary Bison parser, but behaves
7713differently in cases where there is a shift-reduce conflict that has not
fae437e8 7714been resolved by precedence rules (@pxref{Precedence}) or a
35430378 7715reduce-reduce conflict. When a GLR parser encounters such a
c827f760 7716situation, it
fae437e8 7717effectively @emph{splits} into a several parsers, one for each possible
676385e2
PH
7718shift or reduction. These parsers then proceed as usual, consuming
7719tokens in lock-step. Some of the stacks may encounter other conflicts
fae437e8 7720and split further, with the result that instead of a sequence of states,
35430378 7721a Bison GLR parsing stack is what is in effect a tree of states.
676385e2
PH
7722
7723In effect, each stack represents a guess as to what the proper parse
7724is. Additional input may indicate that a guess was wrong, in which case
7725the appropriate stack silently disappears. Otherwise, the semantics
fae437e8 7726actions generated in each stack are saved, rather than being executed
676385e2 7727immediately. When a stack disappears, its saved semantic actions never
fae437e8 7728get executed. When a reduction causes two stacks to become equivalent,
676385e2
PH
7729their sets of semantic actions are both saved with the state that
7730results from the reduction. We say that two stacks are equivalent
fae437e8 7731when they both represent the same sequence of states,
676385e2
PH
7732and each pair of corresponding states represents a
7733grammar symbol that produces the same segment of the input token
7734stream.
7735
7736Whenever the parser makes a transition from having multiple
34a6c2d1 7737states to having one, it reverts to the normal deterministic parsing
676385e2
PH
7738algorithm, after resolving and executing the saved-up actions.
7739At this transition, some of the states on the stack will have semantic
7740values that are sets (actually multisets) of possible actions. The
7741parser tries to pick one of the actions by first finding one whose rule
7742has the highest dynamic precedence, as set by the @samp{%dprec}
fae437e8 7743declaration. Otherwise, if the alternative actions are not ordered by
676385e2 7744precedence, but there the same merging function is declared for both
fae437e8 7745rules by the @samp{%merge} declaration,
676385e2
PH
7746Bison resolves and evaluates both and then calls the merge function on
7747the result. Otherwise, it reports an ambiguity.
7748
35430378
JD
7749It is possible to use a data structure for the GLR parsing tree that
7750permits the processing of any LR(1) grammar in linear time (in the
c827f760 7751size of the input), any unambiguous (not necessarily
35430378 7752LR(1)) grammar in
fae437e8 7753quadratic worst-case time, and any general (possibly ambiguous)
676385e2
PH
7754context-free grammar in cubic worst-case time. However, Bison currently
7755uses a simpler data structure that requires time proportional to the
7756length of the input times the maximum number of stacks required for any
9d9b8b70 7757prefix of the input. Thus, really ambiguous or nondeterministic
676385e2
PH
7758grammars can require exponential time and space to process. Such badly
7759behaving examples, however, are not generally of practical interest.
9d9b8b70 7760Usually, nondeterminism in a grammar is local---the parser is ``in
676385e2 7761doubt'' only for a few tokens at a time. Therefore, the current data
35430378 7762structure should generally be adequate. On LR(1) portions of a
34a6c2d1 7763grammar, in particular, it is only slightly slower than with the
35430378 7764deterministic LR(1) Bison parser.
676385e2 7765
71caec06
JD
7766For a more detailed exposition of GLR parsers, @pxref{Bibliography,,Scott
77672000}.
f6481e2f 7768
1a059451
PE
7769@node Memory Management
7770@section Memory Management, and How to Avoid Memory Exhaustion
7771@cindex memory exhaustion
7772@cindex memory management
bfa74976
RS
7773@cindex stack overflow
7774@cindex parser stack overflow
7775@cindex overflow of parser stack
7776
1a059451 7777The Bison parser stack can run out of memory if too many tokens are shifted and
bfa74976 7778not reduced. When this happens, the parser function @code{yyparse}
1a059451 7779calls @code{yyerror} and then returns 2.
bfa74976 7780
c827f760 7781Because Bison parsers have growing stacks, hitting the upper limit
d1a1114f 7782usually results from using a right recursion instead of a left
188867ac 7783recursion, see @ref{Recursion, ,Recursive Rules}.
d1a1114f 7784
bfa74976
RS
7785@vindex YYMAXDEPTH
7786By defining the macro @code{YYMAXDEPTH}, you can control how deep the
1a059451 7787parser stack can become before memory is exhausted. Define the
bfa74976
RS
7788macro with a value that is an integer. This value is the maximum number
7789of tokens that can be shifted (and not reduced) before overflow.
bfa74976
RS
7790
7791The stack space allowed is not necessarily allocated. If you specify a
1a059451 7792large value for @code{YYMAXDEPTH}, the parser normally allocates a small
bfa74976
RS
7793stack at first, and then makes it bigger by stages as needed. This
7794increasing allocation happens automatically and silently. Therefore,
7795you do not need to make @code{YYMAXDEPTH} painfully small merely to save
7796space for ordinary inputs that do not need much stack.
7797
d7e14fc0
PE
7798However, do not allow @code{YYMAXDEPTH} to be a value so large that
7799arithmetic overflow could occur when calculating the size of the stack
7800space. Also, do not allow @code{YYMAXDEPTH} to be less than
7801@code{YYINITDEPTH}.
7802
bfa74976
RS
7803@cindex default stack limit
7804The default value of @code{YYMAXDEPTH}, if you do not define it, is
780510000.
7806
7807@vindex YYINITDEPTH
7808You can control how much stack is allocated initially by defining the
34a6c2d1
JD
7809macro @code{YYINITDEPTH} to a positive integer. For the deterministic
7810parser in C, this value must be a compile-time constant
d7e14fc0
PE
7811unless you are assuming C99 or some other target language or compiler
7812that allows variable-length arrays. The default is 200.
7813
1a059451 7814Do not allow @code{YYINITDEPTH} to be greater than @code{YYMAXDEPTH}.
bfa74976 7815
d1a1114f 7816@c FIXME: C++ output.
c781580d 7817Because of semantic differences between C and C++, the deterministic
34a6c2d1 7818parsers in C produced by Bison cannot grow when compiled
1a059451
PE
7819by C++ compilers. In this precise case (compiling a C parser as C++) you are
7820suggested to grow @code{YYINITDEPTH}. The Bison maintainers hope to fix
7821this deficiency in a future release.
d1a1114f 7822
342b8b6e 7823@node Error Recovery
bfa74976
RS
7824@chapter Error Recovery
7825@cindex error recovery
7826@cindex recovery from errors
7827
6e649e65 7828It is not usually acceptable to have a program terminate on a syntax
bfa74976
RS
7829error. For example, a compiler should recover sufficiently to parse the
7830rest of the input file and check it for errors; a calculator should accept
7831another expression.
7832
7833In a simple interactive command parser where each input is one line, it may
7834be sufficient to allow @code{yyparse} to return 1 on error and have the
7835caller ignore the rest of the input line when that happens (and then call
7836@code{yyparse} again). But this is inadequate for a compiler, because it
7837forgets all the syntactic context leading up to the error. A syntax error
7838deep within a function in the compiler input should not cause the compiler
7839to treat the following line like the beginning of a source file.
7840
7841@findex error
7842You can define how to recover from a syntax error by writing rules to
7843recognize the special token @code{error}. This is a terminal symbol that
7844is always defined (you need not declare it) and reserved for error
7845handling. The Bison parser generates an @code{error} token whenever a
7846syntax error happens; if you have provided a rule to recognize this token
13863333 7847in the current context, the parse can continue.
bfa74976
RS
7848
7849For example:
7850
7851@example
0765d393 7852stmts:
de6be119 7853 /* empty string */
0765d393
AD
7854| stmts '\n'
7855| stmts exp '\n'
7856| stmts error '\n'
bfa74976
RS
7857@end example
7858
7859The fourth rule in this example says that an error followed by a newline
0765d393 7860makes a valid addition to any @code{stmts}.
bfa74976
RS
7861
7862What happens if a syntax error occurs in the middle of an @code{exp}? The
7863error recovery rule, interpreted strictly, applies to the precise sequence
0765d393 7864of a @code{stmts}, an @code{error} and a newline. If an error occurs in
bfa74976 7865the middle of an @code{exp}, there will probably be some additional tokens
0765d393 7866and subexpressions on the stack after the last @code{stmts}, and there
bfa74976
RS
7867will be tokens to read before the next newline. So the rule is not
7868applicable in the ordinary way.
7869
7870But Bison can force the situation to fit the rule, by discarding part of
72f889cc
AD
7871the semantic context and part of the input. First it discards states
7872and objects from the stack until it gets back to a state in which the
bfa74976 7873@code{error} token is acceptable. (This means that the subexpressions
0765d393 7874already parsed are discarded, back to the last complete @code{stmts}.)
72f889cc 7875At this point the @code{error} token can be shifted. Then, if the old
742e4900 7876lookahead token is not acceptable to be shifted next, the parser reads
bfa74976 7877tokens and discards them until it finds a token which is acceptable. In
72f889cc
AD
7878this example, Bison reads and discards input until the next newline so
7879that the fourth rule can apply. Note that discarded symbols are
7880possible sources of memory leaks, see @ref{Destructor Decl, , Freeing
7881Discarded Symbols}, for a means to reclaim this memory.
bfa74976
RS
7882
7883The choice of error rules in the grammar is a choice of strategies for
7884error recovery. A simple and useful strategy is simply to skip the rest of
7885the current input line or current statement if an error is detected:
7886
7887@example
0765d393 7888stmt: error ';' /* On error, skip until ';' is read. */
bfa74976
RS
7889@end example
7890
7891It is also useful to recover to the matching close-delimiter of an
7892opening-delimiter that has already been parsed. Otherwise the
7893close-delimiter will probably appear to be unmatched, and generate another,
7894spurious error message:
7895
7896@example
de6be119
AD
7897primary:
7898 '(' expr ')'
7899| '(' error ')'
7900@dots{}
7901;
bfa74976
RS
7902@end example
7903
7904Error recovery strategies are necessarily guesses. When they guess wrong,
7905one syntax error often leads to another. In the above example, the error
7906recovery rule guesses that an error is due to bad input within one
0765d393
AD
7907@code{stmt}. Suppose that instead a spurious semicolon is inserted in the
7908middle of a valid @code{stmt}. After the error recovery rule recovers
bfa74976
RS
7909from the first error, another syntax error will be found straightaway,
7910since the text following the spurious semicolon is also an invalid
0765d393 7911@code{stmt}.
bfa74976
RS
7912
7913To prevent an outpouring of error messages, the parser will output no error
7914message for another syntax error that happens shortly after the first; only
7915after three consecutive input tokens have been successfully shifted will
7916error messages resume.
7917
7918Note that rules which accept the @code{error} token may have actions, just
7919as any other rules can.
7920
7921@findex yyerrok
7922You can make error messages resume immediately by using the macro
7923@code{yyerrok} in an action. If you do this in the error rule's action, no
7924error messages will be suppressed. This macro requires no arguments;
7925@samp{yyerrok;} is a valid C statement.
7926
7927@findex yyclearin
742e4900 7928The previous lookahead token is reanalyzed immediately after an error. If
bfa74976
RS
7929this is unacceptable, then the macro @code{yyclearin} may be used to clear
7930this token. Write the statement @samp{yyclearin;} in the error rule's
7931action.
32c29292 7932@xref{Action Features, ,Special Features for Use in Actions}.
bfa74976 7933
6e649e65 7934For example, suppose that on a syntax error, an error handling routine is
bfa74976
RS
7935called that advances the input stream to some point where parsing should
7936once again commence. The next symbol returned by the lexical scanner is
742e4900 7937probably correct. The previous lookahead token ought to be discarded
bfa74976
RS
7938with @samp{yyclearin;}.
7939
7940@vindex YYRECOVERING
02103984
PE
7941The expression @code{YYRECOVERING ()} yields 1 when the parser
7942is recovering from a syntax error, and 0 otherwise.
7943Syntax error diagnostics are suppressed while recovering from a syntax
7944error.
bfa74976 7945
342b8b6e 7946@node Context Dependency
bfa74976
RS
7947@chapter Handling Context Dependencies
7948
7949The Bison paradigm is to parse tokens first, then group them into larger
7950syntactic units. In many languages, the meaning of a token is affected by
7951its context. Although this violates the Bison paradigm, certain techniques
7952(known as @dfn{kludges}) may enable you to write Bison parsers for such
7953languages.
7954
7955@menu
7956* Semantic Tokens:: Token parsing can depend on the semantic context.
7957* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
7958* Tie-in Recovery:: Lexical tie-ins have implications for how
7959 error recovery rules must be written.
7960@end menu
7961
7962(Actually, ``kludge'' means any technique that gets its job done but is
7963neither clean nor robust.)
7964
342b8b6e 7965@node Semantic Tokens
bfa74976
RS
7966@section Semantic Info in Token Types
7967
7968The C language has a context dependency: the way an identifier is used
7969depends on what its current meaning is. For example, consider this:
7970
7971@example
7972foo (x);
7973@end example
7974
7975This looks like a function call statement, but if @code{foo} is a typedef
7976name, then this is actually a declaration of @code{x}. How can a Bison
7977parser for C decide how to parse this input?
7978
35430378 7979The method used in GNU C is to have two different token types,
bfa74976
RS
7980@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
7981identifier, it looks up the current declaration of the identifier in order
7982to decide which token type to return: @code{TYPENAME} if the identifier is
7983declared as a typedef, @code{IDENTIFIER} otherwise.
7984
7985The grammar rules can then express the context dependency by the choice of
7986token type to recognize. @code{IDENTIFIER} is accepted as an expression,
7987but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
7988@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
7989is @emph{not} significant, such as in declarations that can shadow a
7990typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
7991accepted---there is one rule for each of the two token types.
7992
7993This technique is simple to use if the decision of which kinds of
7994identifiers to allow is made at a place close to where the identifier is
7995parsed. But in C this is not always so: C allows a declaration to
7996redeclare a typedef name provided an explicit type has been specified
7997earlier:
7998
7999@example
3a4f411f
PE
8000typedef int foo, bar;
8001int baz (void)
98842516 8002@group
3a4f411f
PE
8003@{
8004 static bar (bar); /* @r{redeclare @code{bar} as static variable} */
8005 extern foo foo (foo); /* @r{redeclare @code{foo} as function} */
8006 return foo (bar);
8007@}
98842516 8008@end group
bfa74976
RS
8009@end example
8010
8011Unfortunately, the name being declared is separated from the declaration
8012construct itself by a complicated syntactic structure---the ``declarator''.
8013
9ecbd125 8014As a result, part of the Bison parser for C needs to be duplicated, with
14ded682
AD
8015all the nonterminal names changed: once for parsing a declaration in
8016which a typedef name can be redefined, and once for parsing a
8017declaration in which that can't be done. Here is a part of the
8018duplication, with actions omitted for brevity:
bfa74976
RS
8019
8020@example
98842516 8021@group
bfa74976 8022initdcl:
de6be119
AD
8023 declarator maybeasm '=' init
8024| declarator maybeasm
8025;
98842516 8026@end group
bfa74976 8027
98842516 8028@group
bfa74976 8029notype_initdcl:
de6be119
AD
8030 notype_declarator maybeasm '=' init
8031| notype_declarator maybeasm
8032;
98842516 8033@end group
bfa74976
RS
8034@end example
8035
8036@noindent
8037Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
8038cannot. The distinction between @code{declarator} and
8039@code{notype_declarator} is the same sort of thing.
8040
8041There is some similarity between this technique and a lexical tie-in
8042(described next), in that information which alters the lexical analysis is
8043changed during parsing by other parts of the program. The difference is
8044here the information is global, and is used for other purposes in the
8045program. A true lexical tie-in has a special-purpose flag controlled by
8046the syntactic context.
8047
342b8b6e 8048@node Lexical Tie-ins
bfa74976
RS
8049@section Lexical Tie-ins
8050@cindex lexical tie-in
8051
8052One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
8053which is set by Bison actions, whose purpose is to alter the way tokens are
8054parsed.
8055
8056For example, suppose we have a language vaguely like C, but with a special
8057construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
8058an expression in parentheses in which all integers are hexadecimal. In
8059particular, the token @samp{a1b} must be treated as an integer rather than
8060as an identifier if it appears in that context. Here is how you can do it:
8061
8062@example
8063@group
8064%@{
38a92d50
PE
8065 int hexflag;
8066 int yylex (void);
8067 void yyerror (char const *);
bfa74976
RS
8068%@}
8069%%
8070@dots{}
8071@end group
8072@group
de6be119
AD
8073expr:
8074 IDENTIFIER
8075| constant
8076| HEX '(' @{ hexflag = 1; @}
8077 expr ')' @{ hexflag = 0; $$ = $4; @}
8078| expr '+' expr @{ $$ = make_sum ($1, $3); @}
8079@dots{}
8080;
bfa74976
RS
8081@end group
8082
8083@group
8084constant:
de6be119
AD
8085 INTEGER
8086| STRING
8087;
bfa74976
RS
8088@end group
8089@end example
8090
8091@noindent
8092Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
8093it is nonzero, all integers are parsed in hexadecimal, and tokens starting
8094with letters are parsed as integers if possible.
8095
9913d6e4
JD
8096The declaration of @code{hexflag} shown in the prologue of the grammar
8097file is needed to make it accessible to the actions (@pxref{Prologue,
8098,The Prologue}). You must also write the code in @code{yylex} to obey
8099the flag.
bfa74976 8100
342b8b6e 8101@node Tie-in Recovery
bfa74976
RS
8102@section Lexical Tie-ins and Error Recovery
8103
8104Lexical tie-ins make strict demands on any error recovery rules you have.
8105@xref{Error Recovery}.
8106
8107The reason for this is that the purpose of an error recovery rule is to
8108abort the parsing of one construct and resume in some larger construct.
8109For example, in C-like languages, a typical error recovery rule is to skip
8110tokens until the next semicolon, and then start a new statement, like this:
8111
8112@example
de6be119
AD
8113stmt:
8114 expr ';'
8115| IF '(' expr ')' stmt @{ @dots{} @}
8116@dots{}
8117| error ';' @{ hexflag = 0; @}
8118;
bfa74976
RS
8119@end example
8120
8121If there is a syntax error in the middle of a @samp{hex (@var{expr})}
8122construct, this error rule will apply, and then the action for the
8123completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
8124remain set for the entire rest of the input, or until the next @code{hex}
8125keyword, causing identifiers to be misinterpreted as integers.
8126
8127To avoid this problem the error recovery rule itself clears @code{hexflag}.
8128
8129There may also be an error recovery rule that works within expressions.
8130For example, there could be a rule which applies within parentheses
8131and skips to the close-parenthesis:
8132
8133@example
8134@group
de6be119
AD
8135expr:
8136 @dots{}
8137| '(' expr ')' @{ $$ = $2; @}
8138| '(' error ')'
8139@dots{}
bfa74976
RS
8140@end group
8141@end example
8142
8143If this rule acts within the @code{hex} construct, it is not going to abort
8144that construct (since it applies to an inner level of parentheses within
8145the construct). Therefore, it should not clear the flag: the rest of
8146the @code{hex} construct should be parsed with the flag still in effect.
8147
8148What if there is an error recovery rule which might abort out of the
8149@code{hex} construct or might not, depending on circumstances? There is no
8150way you can write the action to determine whether a @code{hex} construct is
8151being aborted or not. So if you are using a lexical tie-in, you had better
8152make sure your error recovery rules are not of this kind. Each rule must
8153be such that you can be sure that it always will, or always won't, have to
8154clear the flag.
8155
ec3bc396
AD
8156@c ================================================== Debugging Your Parser
8157
342b8b6e 8158@node Debugging
bfa74976 8159@chapter Debugging Your Parser
ec3bc396 8160
56d60c19
AD
8161Developing a parser can be a challenge, especially if you don't understand
8162the algorithm (@pxref{Algorithm, ,The Bison Parser Algorithm}). This
8163chapter explains how to generate and read the detailed description of the
8164automaton, and how to enable and understand the parser run-time traces.
ec3bc396
AD
8165
8166@menu
8167* Understanding:: Understanding the structure of your parser.
8168* Tracing:: Tracing the execution of your parser.
8169@end menu
8170
8171@node Understanding
8172@section Understanding Your Parser
8173
8174As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm})
8175Bison parsers are @dfn{shift/reduce automata}. In some cases (much more
8176frequent than one would hope), looking at this automaton is required to
8177tune or simply fix a parser. Bison provides two different
35fe0834 8178representation of it, either textually or graphically (as a DOT file).
ec3bc396
AD
8179
8180The textual file is generated when the options @option{--report} or
2ba03112 8181@option{--verbose} are specified, see @ref{Invocation, , Invoking
ec3bc396 8182Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from
9913d6e4
JD
8183the parser implementation file name, and adding @samp{.output}
8184instead. Therefore, if the grammar file is @file{foo.y}, then the
8185parser implementation file is called @file{foo.tab.c} by default. As
8186a consequence, the verbose output file is called @file{foo.output}.
ec3bc396
AD
8187
8188The following grammar file, @file{calc.y}, will be used in the sequel:
8189
8190@example
8191%token NUM STR
8192%left '+' '-'
8193%left '*'
8194%%
de6be119
AD
8195exp:
8196 exp '+' exp
8197| exp '-' exp
8198| exp '*' exp
8199| exp '/' exp
8200| NUM
8201;
ec3bc396
AD
8202useless: STR;
8203%%
8204@end example
8205
88bce5a2
AD
8206@command{bison} reports:
8207
8208@example
379261b3
JD
8209calc.y: warning: 1 nonterminal useless in grammar
8210calc.y: warning: 1 rule useless in grammar
cff03fb2
JD
8211calc.y:11.1-7: warning: nonterminal useless in grammar: useless
8212calc.y:11.10-12: warning: rule useless in grammar: useless: STR
5a99098d 8213calc.y: conflicts: 7 shift/reduce
88bce5a2
AD
8214@end example
8215
8216When given @option{--report=state}, in addition to @file{calc.tab.c}, it
8217creates a file @file{calc.output} with contents detailed below. The
8218order of the output and the exact presentation might vary, but the
8219interpretation is the same.
ec3bc396 8220
ec3bc396
AD
8221@noindent
8222@cindex token, useless
8223@cindex useless token
8224@cindex nonterminal, useless
8225@cindex useless nonterminal
8226@cindex rule, useless
8227@cindex useless rule
84c1cdc7
AD
8228The first section reports useless tokens, nonterminals and rules. Useless
8229nonterminals and rules are removed in order to produce a smaller parser, but
8230useless tokens are preserved, since they might be used by the scanner (note
8231the difference between ``useless'' and ``unused'' below):
ec3bc396
AD
8232
8233@example
84c1cdc7 8234Nonterminals useless in grammar
ec3bc396
AD
8235 useless
8236
84c1cdc7 8237Terminals unused in grammar
ec3bc396
AD
8238 STR
8239
84c1cdc7
AD
8240Rules useless in grammar
8241 6 useless: STR
ec3bc396
AD
8242@end example
8243
8244@noindent
84c1cdc7
AD
8245The next section lists states that still have conflicts.
8246
8247@example
8248State 8 conflicts: 1 shift/reduce
8249State 9 conflicts: 1 shift/reduce
8250State 10 conflicts: 1 shift/reduce
8251State 11 conflicts: 4 shift/reduce
8252@end example
8253
8254@noindent
8255Then Bison reproduces the exact grammar it used:
ec3bc396
AD
8256
8257@example
8258Grammar
8259
84c1cdc7
AD
8260 0 $accept: exp $end
8261
8262 1 exp: exp '+' exp
8263 2 | exp '-' exp
8264 3 | exp '*' exp
8265 4 | exp '/' exp
8266 5 | NUM
ec3bc396
AD
8267@end example
8268
8269@noindent
8270and reports the uses of the symbols:
8271
8272@example
98842516 8273@group
ec3bc396
AD
8274Terminals, with rules where they appear
8275
88bce5a2 8276$end (0) 0
ec3bc396
AD
8277'*' (42) 3
8278'+' (43) 1
8279'-' (45) 2
8280'/' (47) 4
8281error (256)
8282NUM (258) 5
84c1cdc7 8283STR (259)
98842516 8284@end group
ec3bc396 8285
98842516 8286@group
ec3bc396
AD
8287Nonterminals, with rules where they appear
8288
84c1cdc7 8289$accept (9)
ec3bc396 8290 on left: 0
84c1cdc7 8291exp (10)
ec3bc396 8292 on left: 1 2 3 4 5, on right: 0 1 2 3 4
98842516 8293@end group
ec3bc396
AD
8294@end example
8295
8296@noindent
8297@cindex item
8298@cindex pointed rule
8299@cindex rule, pointed
8300Bison then proceeds onto the automaton itself, describing each state
d13d14cc
PE
8301with its set of @dfn{items}, also known as @dfn{pointed rules}. Each
8302item is a production rule together with a point (@samp{.}) marking
8303the location of the input cursor.
ec3bc396
AD
8304
8305@example
8306state 0
8307
84c1cdc7 8308 0 $accept: . exp $end
ec3bc396 8309
84c1cdc7 8310 NUM shift, and go to state 1
ec3bc396 8311
84c1cdc7 8312 exp go to state 2
ec3bc396
AD
8313@end example
8314
8315This reads as follows: ``state 0 corresponds to being at the very
8316beginning of the parsing, in the initial rule, right before the start
8317symbol (here, @code{exp}). When the parser returns to this state right
8318after having reduced a rule that produced an @code{exp}, the control
8319flow jumps to state 2. If there is no such transition on a nonterminal
d13d14cc 8320symbol, and the lookahead is a @code{NUM}, then this token is shifted onto
ec3bc396 8321the parse stack, and the control flow jumps to state 1. Any other
742e4900 8322lookahead triggers a syntax error.''
ec3bc396
AD
8323
8324@cindex core, item set
8325@cindex item set core
8326@cindex kernel, item set
8327@cindex item set core
8328Even though the only active rule in state 0 seems to be rule 0, the
742e4900 8329report lists @code{NUM} as a lookahead token because @code{NUM} can be
ec3bc396
AD
8330at the beginning of any rule deriving an @code{exp}. By default Bison
8331reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
8332you want to see more detail you can invoke @command{bison} with
d13d14cc 8333@option{--report=itemset} to list the derived items as well:
ec3bc396
AD
8334
8335@example
8336state 0
8337
84c1cdc7
AD
8338 0 $accept: . exp $end
8339 1 exp: . exp '+' exp
8340 2 | . exp '-' exp
8341 3 | . exp '*' exp
8342 4 | . exp '/' exp
8343 5 | . NUM
ec3bc396 8344
84c1cdc7 8345 NUM shift, and go to state 1
ec3bc396 8346
84c1cdc7 8347 exp go to state 2
ec3bc396
AD
8348@end example
8349
8350@noindent
84c1cdc7 8351In the state 1@dots{}
ec3bc396
AD
8352
8353@example
8354state 1
8355
84c1cdc7 8356 5 exp: NUM .
ec3bc396 8357
84c1cdc7 8358 $default reduce using rule 5 (exp)
ec3bc396
AD
8359@end example
8360
8361@noindent
742e4900 8362the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead token
ec3bc396
AD
8363(@samp{$default}), the parser will reduce it. If it was coming from
8364state 0, then, after this reduction it will return to state 0, and will
8365jump to state 2 (@samp{exp: go to state 2}).
8366
8367@example
8368state 2
8369
84c1cdc7
AD
8370 0 $accept: exp . $end
8371 1 exp: exp . '+' exp
8372 2 | exp . '-' exp
8373 3 | exp . '*' exp
8374 4 | exp . '/' exp
ec3bc396 8375
84c1cdc7
AD
8376 $end shift, and go to state 3
8377 '+' shift, and go to state 4
8378 '-' shift, and go to state 5
8379 '*' shift, and go to state 6
8380 '/' shift, and go to state 7
ec3bc396
AD
8381@end example
8382
8383@noindent
8384In state 2, the automaton can only shift a symbol. For instance,
84c1cdc7 8385because of the item @samp{exp: exp . '+' exp}, if the lookahead is
d13d14cc 8386@samp{+} it is shifted onto the parse stack, and the automaton
84c1cdc7 8387jumps to state 4, corresponding to the item @samp{exp: exp '+' . exp}.
d13d14cc
PE
8388Since there is no default action, any lookahead not listed triggers a syntax
8389error.
ec3bc396 8390
34a6c2d1 8391@cindex accepting state
ec3bc396
AD
8392The state 3 is named the @dfn{final state}, or the @dfn{accepting
8393state}:
8394
8395@example
8396state 3
8397
84c1cdc7 8398 0 $accept: exp $end .
ec3bc396 8399
84c1cdc7 8400 $default accept
ec3bc396
AD
8401@end example
8402
8403@noindent
84c1cdc7
AD
8404the initial rule is completed (the start symbol and the end-of-input were
8405read), the parsing exits successfully.
ec3bc396
AD
8406
8407The interpretation of states 4 to 7 is straightforward, and is left to
8408the reader.
8409
8410@example
8411state 4
8412
84c1cdc7 8413 1 exp: exp '+' . exp
ec3bc396 8414
84c1cdc7
AD
8415 NUM shift, and go to state 1
8416
8417 exp go to state 8
ec3bc396 8418
ec3bc396
AD
8419
8420state 5
8421
84c1cdc7
AD
8422 2 exp: exp '-' . exp
8423
8424 NUM shift, and go to state 1
ec3bc396 8425
84c1cdc7 8426 exp go to state 9
ec3bc396 8427
ec3bc396
AD
8428
8429state 6
8430
84c1cdc7 8431 3 exp: exp '*' . exp
ec3bc396 8432
84c1cdc7
AD
8433 NUM shift, and go to state 1
8434
8435 exp go to state 10
ec3bc396 8436
ec3bc396
AD
8437
8438state 7
8439
84c1cdc7 8440 4 exp: exp '/' . exp
ec3bc396 8441
84c1cdc7 8442 NUM shift, and go to state 1
ec3bc396 8443
84c1cdc7 8444 exp go to state 11
ec3bc396
AD
8445@end example
8446
5a99098d
PE
8447As was announced in beginning of the report, @samp{State 8 conflicts:
84481 shift/reduce}:
ec3bc396
AD
8449
8450@example
8451state 8
8452
84c1cdc7
AD
8453 1 exp: exp . '+' exp
8454 1 | exp '+' exp .
8455 2 | exp . '-' exp
8456 3 | exp . '*' exp
8457 4 | exp . '/' exp
ec3bc396 8458
84c1cdc7
AD
8459 '*' shift, and go to state 6
8460 '/' shift, and go to state 7
ec3bc396 8461
84c1cdc7
AD
8462 '/' [reduce using rule 1 (exp)]
8463 $default reduce using rule 1 (exp)
ec3bc396
AD
8464@end example
8465
742e4900 8466Indeed, there are two actions associated to the lookahead @samp{/}:
ec3bc396
AD
8467either shifting (and going to state 7), or reducing rule 1. The
8468conflict means that either the grammar is ambiguous, or the parser lacks
8469information to make the right decision. Indeed the grammar is
8470ambiguous, as, since we did not specify the precedence of @samp{/}, the
8471sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM /
8472NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) /
8473NUM}, which corresponds to reducing rule 1.
8474
34a6c2d1 8475Because in deterministic parsing a single decision can be made, Bison
ec3bc396 8476arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, ,
84c1cdc7 8477Shift/Reduce Conflicts}. Discarded actions are reported between
ec3bc396
AD
8478square brackets.
8479
8480Note that all the previous states had a single possible action: either
8481shifting the next token and going to the corresponding state, or
8482reducing a single rule. In the other cases, i.e., when shifting
8483@emph{and} reducing is possible or when @emph{several} reductions are
742e4900
JD
8484possible, the lookahead is required to select the action. State 8 is
8485one such state: if the lookahead is @samp{*} or @samp{/} then the action
ec3bc396
AD
8486is shifting, otherwise the action is reducing rule 1. In other words,
8487the first two items, corresponding to rule 1, are not eligible when the
742e4900 8488lookahead token is @samp{*}, since we specified that @samp{*} has higher
8dd162d3 8489precedence than @samp{+}. More generally, some items are eligible only
742e4900
JD
8490with some set of possible lookahead tokens. When run with
8491@option{--report=lookahead}, Bison specifies these lookahead tokens:
ec3bc396
AD
8492
8493@example
8494state 8
8495
84c1cdc7
AD
8496 1 exp: exp . '+' exp
8497 1 | exp '+' exp . [$end, '+', '-', '/']
8498 2 | exp . '-' exp
8499 3 | exp . '*' exp
8500 4 | exp . '/' exp
8501
8502 '*' shift, and go to state 6
8503 '/' shift, and go to state 7
ec3bc396 8504
84c1cdc7
AD
8505 '/' [reduce using rule 1 (exp)]
8506 $default reduce using rule 1 (exp)
8507@end example
8508
8509Note however that while @samp{NUM + NUM / NUM} is ambiguous (which results in
8510the conflicts on @samp{/}), @samp{NUM + NUM * NUM} is not: the conflict was
8511solved thanks to associativity and precedence directives. If invoked with
8512@option{--report=solved}, Bison includes information about the solved
8513conflicts in the report:
ec3bc396 8514
84c1cdc7
AD
8515@example
8516Conflict between rule 1 and token '+' resolved as reduce (%left '+').
8517Conflict between rule 1 and token '-' resolved as reduce (%left '-').
8518Conflict between rule 1 and token '*' resolved as shift ('+' < '*').
ec3bc396
AD
8519@end example
8520
84c1cdc7 8521
ec3bc396
AD
8522The remaining states are similar:
8523
8524@example
98842516 8525@group
ec3bc396
AD
8526state 9
8527
84c1cdc7
AD
8528 1 exp: exp . '+' exp
8529 2 | exp . '-' exp
8530 2 | exp '-' exp .
8531 3 | exp . '*' exp
8532 4 | exp . '/' exp
ec3bc396 8533
84c1cdc7
AD
8534 '*' shift, and go to state 6
8535 '/' shift, and go to state 7
ec3bc396 8536
84c1cdc7
AD
8537 '/' [reduce using rule 2 (exp)]
8538 $default reduce using rule 2 (exp)
98842516 8539@end group
ec3bc396 8540
98842516 8541@group
ec3bc396
AD
8542state 10
8543
84c1cdc7
AD
8544 1 exp: exp . '+' exp
8545 2 | exp . '-' exp
8546 3 | exp . '*' exp
8547 3 | exp '*' exp .
8548 4 | exp . '/' exp
ec3bc396 8549
84c1cdc7 8550 '/' shift, and go to state 7
ec3bc396 8551
84c1cdc7
AD
8552 '/' [reduce using rule 3 (exp)]
8553 $default reduce using rule 3 (exp)
98842516 8554@end group
ec3bc396 8555
98842516 8556@group
ec3bc396
AD
8557state 11
8558
84c1cdc7
AD
8559 1 exp: exp . '+' exp
8560 2 | exp . '-' exp
8561 3 | exp . '*' exp
8562 4 | exp . '/' exp
8563 4 | exp '/' exp .
8564
8565 '+' shift, and go to state 4
8566 '-' shift, and go to state 5
8567 '*' shift, and go to state 6
8568 '/' shift, and go to state 7
8569
8570 '+' [reduce using rule 4 (exp)]
8571 '-' [reduce using rule 4 (exp)]
8572 '*' [reduce using rule 4 (exp)]
8573 '/' [reduce using rule 4 (exp)]
8574 $default reduce using rule 4 (exp)
98842516 8575@end group
ec3bc396
AD
8576@end example
8577
8578@noindent
fa7e68c3
PE
8579Observe that state 11 contains conflicts not only due to the lack of
8580precedence of @samp{/} with respect to @samp{+}, @samp{-}, and
8581@samp{*}, but also because the
ec3bc396
AD
8582associativity of @samp{/} is not specified.
8583
8584
8585@node Tracing
8586@section Tracing Your Parser
bfa74976
RS
8587@findex yydebug
8588@cindex debugging
8589@cindex tracing the parser
8590
56d60c19
AD
8591When a Bison grammar compiles properly but parses ``incorrectly'', the
8592@code{yydebug} parser-trace feature helps figuring out why.
8593
8594@menu
8595* Enabling Traces:: Activating run-time trace support
8596* Mfcalc Traces:: Extending @code{mfcalc} to support traces
8597* The YYPRINT Macro:: Obsolete interface for semantic value reports
8598@end menu
bfa74976 8599
56d60c19
AD
8600@node Enabling Traces
8601@subsection Enabling Traces
3ded9a63
AD
8602There are several means to enable compilation of trace facilities:
8603
8604@table @asis
8605@item the macro @code{YYDEBUG}
8606@findex YYDEBUG
8607Define the macro @code{YYDEBUG} to a nonzero value when you compile the
35430378 8608parser. This is compliant with POSIX Yacc. You could use
3ded9a63
AD
8609@samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
8610YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue, , The
8611Prologue}).
8612
e6ae99fe 8613If the @code{%define} variable @code{api.prefix} is used (@pxref{Multiple
e358222b
AD
8614Parsers, ,Multiple Parsers in the Same Program}), for instance @samp{%define
8615api.prefix x}, then if @code{CDEBUG} is defined, its value controls the
3746fc33
AD
8616tracing feature (enabled if and only if nonzero); otherwise tracing is
8617enabled if and only if @code{YYDEBUG} is nonzero.
e358222b
AD
8618
8619@item the option @option{-t} (POSIX Yacc compliant)
8620@itemx the option @option{--debug} (Bison extension)
8621Use the @samp{-t} option when you run Bison (@pxref{Invocation, ,Invoking
8622Bison}). With @samp{%define api.prefix c}, it defines @code{CDEBUG} to 1,
8623otherwise it defines @code{YYDEBUG} to 1.
3ded9a63
AD
8624
8625@item the directive @samp{%debug}
8626@findex %debug
e358222b
AD
8627Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison Declaration
8628Summary}). This is a Bison extension, especially useful for languages that
8629don't use a preprocessor. Unless POSIX and Yacc portability matter to you,
8630this is the preferred solution.
3ded9a63
AD
8631@end table
8632
8633We suggest that you always enable the debug option so that debugging is
8634always possible.
bfa74976 8635
56d60c19 8636@findex YYFPRINTF
02a81e05 8637The trace facility outputs messages with macro calls of the form
e2742e46 8638@code{YYFPRINTF (stderr, @var{format}, @var{args})} where
f57a7536 8639@var{format} and @var{args} are the usual @code{printf} format and variadic
4947ebdb
PE
8640arguments. If you define @code{YYDEBUG} to a nonzero value but do not
8641define @code{YYFPRINTF}, @code{<stdio.h>} is automatically included
9c437126 8642and @code{YYFPRINTF} is defined to @code{fprintf}.
bfa74976
RS
8643
8644Once you have compiled the program with trace facilities, the way to
8645request a trace is to store a nonzero value in the variable @code{yydebug}.
8646You can do this by making the C code do it (in @code{main}, perhaps), or
8647you can alter the value with a C debugger.
8648
8649Each step taken by the parser when @code{yydebug} is nonzero produces a
8650line or two of trace information, written on @code{stderr}. The trace
8651messages tell you these things:
8652
8653@itemize @bullet
8654@item
8655Each time the parser calls @code{yylex}, what kind of token was read.
8656
8657@item
8658Each time a token is shifted, the depth and complete contents of the
8659state stack (@pxref{Parser States}).
8660
8661@item
8662Each time a rule is reduced, which rule it is, and the complete contents
8663of the state stack afterward.
8664@end itemize
8665
56d60c19
AD
8666To make sense of this information, it helps to refer to the automaton
8667description file (@pxref{Understanding, ,Understanding Your Parser}).
8668This file shows the meaning of each state in terms of
704a47c4
AD
8669positions in various rules, and also what each state will do with each
8670possible input token. As you read the successive trace messages, you
8671can see that the parser is functioning according to its specification in
8672the listing file. Eventually you will arrive at the place where
8673something undesirable happens, and you will see which parts of the
8674grammar are to blame.
bfa74976 8675
56d60c19 8676The parser implementation file is a C/C++/Java program and you can use
9913d6e4
JD
8677debuggers on it, but it's not easy to interpret what it is doing. The
8678parser function is a finite-state machine interpreter, and aside from
8679the actions it executes the same code over and over. Only the values
8680of variables show where in the grammar it is working.
bfa74976 8681
56d60c19
AD
8682@node Mfcalc Traces
8683@subsection Enabling Debug Traces for @code{mfcalc}
8684
8685The debugging information normally gives the token type of each token read,
8686but not its semantic value. The @code{%printer} directive allows specify
8687how semantic values are reported, see @ref{Printer Decl, , Printing
8688Semantic Values}. For backward compatibility, Yacc like C parsers may also
8689use the @code{YYPRINT} (@pxref{The YYPRINT Macro, , The @code{YYPRINT}
8690Macro}), but its use is discouraged.
8691
8692As a demonstration of @code{%printer}, consider the multi-function
8693calculator, @code{mfcalc} (@pxref{Multi-function Calc}). To enable run-time
8694traces, and semantic value reports, insert the following directives in its
8695prologue:
8696
8697@comment file: mfcalc.y: 2
8698@example
8699/* Generate the parser description file. */
8700%verbose
8701/* Enable run-time traces (yydebug). */
8702%define parse.trace
8703
8704/* Formatting semantic values. */
8705%printer @{ fprintf (yyoutput, "%s", $$->name); @} VAR;
8706%printer @{ fprintf (yyoutput, "%s()", $$->name); @} FNCT;
8707%printer @{ fprintf (yyoutput, "%g", $$); @} <val>;
8708@end example
8709
8710The @code{%define} directive instructs Bison to generate run-time trace
8711support. Then, activation of these traces is controlled at run-time by the
8712@code{yydebug} variable, which is disabled by default. Because these traces
8713will refer to the ``states'' of the parser, it is helpful to ask for the
8714creation of a description of that parser; this is the purpose of (admittedly
8715ill-named) @code{%verbose} directive.
8716
8717The set of @code{%printer} directives demonstrates how to format the
8718semantic value in the traces. Note that the specification can be done
8719either on the symbol type (e.g., @code{VAR} or @code{FNCT}), or on the type
8720tag: since @code{<val>} is the type for both @code{NUM} and @code{exp}, this
8721printer will be used for them.
8722
8723Here is a sample of the information provided by run-time traces. The traces
8724are sent onto standard error.
8725
8726@example
8727$ @kbd{echo 'sin(1-1)' | ./mfcalc -p}
8728Starting parse
8729Entering state 0
8730Reducing stack by rule 1 (line 34):
8731-> $$ = nterm input ()
8732Stack now 0
8733Entering state 1
8734@end example
8735
8736@noindent
8737This first batch shows a specific feature of this grammar: the first rule
8738(which is in line 34 of @file{mfcalc.y} can be reduced without even having
8739to look for the first token. The resulting left-hand symbol (@code{$$}) is
8740a valueless (@samp{()}) @code{input} non terminal (@code{nterm}).
8741
8742Then the parser calls the scanner.
8743@example
8744Reading a token: Next token is token FNCT (sin())
8745Shifting token FNCT (sin())
8746Entering state 6
8747@end example
8748
8749@noindent
8750That token (@code{token}) is a function (@code{FNCT}) whose value is
8751@samp{sin} as formatted per our @code{%printer} specification: @samp{sin()}.
8752The parser stores (@code{Shifting}) that token, and others, until it can do
8753something about it.
8754
8755@example
8756Reading a token: Next token is token '(' ()
8757Shifting token '(' ()
8758Entering state 14
8759Reading a token: Next token is token NUM (1.000000)
8760Shifting token NUM (1.000000)
8761Entering state 4
8762Reducing stack by rule 6 (line 44):
8763 $1 = token NUM (1.000000)
8764-> $$ = nterm exp (1.000000)
8765Stack now 0 1 6 14
8766Entering state 24
8767@end example
8768
8769@noindent
8770The previous reduction demonstrates the @code{%printer} directive for
8771@code{<val>}: both the token @code{NUM} and the resulting non-terminal
8772@code{exp} have @samp{1} as value.
8773
8774@example
8775Reading a token: Next token is token '-' ()
8776Shifting token '-' ()
8777Entering state 17
8778Reading a token: Next token is token NUM (1.000000)
8779Shifting token NUM (1.000000)
8780Entering state 4
8781Reducing stack by rule 6 (line 44):
8782 $1 = token NUM (1.000000)
8783-> $$ = nterm exp (1.000000)
8784Stack now 0 1 6 14 24 17
8785Entering state 26
8786Reading a token: Next token is token ')' ()
8787Reducing stack by rule 11 (line 49):
8788 $1 = nterm exp (1.000000)
8789 $2 = token '-' ()
8790 $3 = nterm exp (1.000000)
8791-> $$ = nterm exp (0.000000)
8792Stack now 0 1 6 14
8793Entering state 24
8794@end example
8795
8796@noindent
8797The rule for the subtraction was just reduced. The parser is about to
8798discover the end of the call to @code{sin}.
8799
8800@example
8801Next token is token ')' ()
8802Shifting token ')' ()
8803Entering state 31
8804Reducing stack by rule 9 (line 47):
8805 $1 = token FNCT (sin())
8806 $2 = token '(' ()
8807 $3 = nterm exp (0.000000)
8808 $4 = token ')' ()
8809-> $$ = nterm exp (0.000000)
8810Stack now 0 1
8811Entering state 11
8812@end example
8813
8814@noindent
8815Finally, the end-of-line allow the parser to complete the computation, and
8816display its result.
8817
8818@example
8819Reading a token: Next token is token '\n' ()
8820Shifting token '\n' ()
8821Entering state 22
8822Reducing stack by rule 4 (line 40):
8823 $1 = nterm exp (0.000000)
8824 $2 = token '\n' ()
8825@result{} 0
8826-> $$ = nterm line ()
8827Stack now 0 1
8828Entering state 10
8829Reducing stack by rule 2 (line 35):
8830 $1 = nterm input ()
8831 $2 = nterm line ()
8832-> $$ = nterm input ()
8833Stack now 0
8834Entering state 1
8835@end example
8836
8837The parser has returned into state 1, in which it is waiting for the next
8838expression to evaluate, or for the end-of-file token, which causes the
8839completion of the parsing.
8840
8841@example
8842Reading a token: Now at end of input.
8843Shifting token $end ()
8844Entering state 2
8845Stack now 0 1 2
8846Cleanup: popping token $end ()
8847Cleanup: popping nterm input ()
8848@end example
8849
8850
8851@node The YYPRINT Macro
8852@subsection The @code{YYPRINT} Macro
8853
8854@findex YYPRINT
8855Before @code{%printer} support, semantic values could be displayed using the
8856@code{YYPRINT} macro, which works only for terminal symbols and only with
8857the @file{yacc.c} skeleton.
8858
8859@deffn {Macro} YYPRINT (@var{stream}, @var{token}, @var{value});
bfa74976 8860@findex YYPRINT
56d60c19
AD
8861If you define @code{YYPRINT}, it should take three arguments. The parser
8862will pass a standard I/O stream, the numeric code for the token type, and
8863the token value (from @code{yylval}).
8864
8865For @file{yacc.c} only. Obsoleted by @code{%printer}.
8866@end deffn
bfa74976
RS
8867
8868Here is an example of @code{YYPRINT} suitable for the multi-function
f56274a8 8869calculator (@pxref{Mfcalc Declarations, ,Declarations for @code{mfcalc}}):
bfa74976 8870
ea118b72 8871@example
38a92d50
PE
8872%@{
8873 static void print_token_value (FILE *, int, YYSTYPE);
56d60c19
AD
8874 #define YYPRINT(File, Type, Value) \
8875 print_token_value (File, Type, Value)
38a92d50
PE
8876%@}
8877
8878@dots{} %% @dots{} %% @dots{}
bfa74976
RS
8879
8880static void
831d3c99 8881print_token_value (FILE *file, int type, YYSTYPE value)
bfa74976
RS
8882@{
8883 if (type == VAR)
d3c4e709 8884 fprintf (file, "%s", value.tptr->name);
bfa74976 8885 else if (type == NUM)
d3c4e709 8886 fprintf (file, "%d", value.val);
bfa74976 8887@}
ea118b72 8888@end example
bfa74976 8889
ec3bc396
AD
8890@c ================================================= Invoking Bison
8891
342b8b6e 8892@node Invocation
bfa74976
RS
8893@chapter Invoking Bison
8894@cindex invoking Bison
8895@cindex Bison invocation
8896@cindex options for invoking Bison
8897
8898The usual way to invoke Bison is as follows:
8899
8900@example
8901bison @var{infile}
8902@end example
8903
8904Here @var{infile} is the grammar file name, which usually ends in
9913d6e4
JD
8905@samp{.y}. The parser implementation file's name is made by replacing
8906the @samp{.y} with @samp{.tab.c} and removing any leading directory.
8907Thus, the @samp{bison foo.y} file name yields @file{foo.tab.c}, and
8908the @samp{bison hack/foo.y} file name yields @file{foo.tab.c}. It's
8909also possible, in case you are writing C++ code instead of C in your
8910grammar file, to name it @file{foo.ypp} or @file{foo.y++}. Then, the
8911output files will take an extension like the given one as input
8912(respectively @file{foo.tab.cpp} and @file{foo.tab.c++}). This
8913feature takes effect with all options that manipulate file names like
234a3be3
AD
8914@samp{-o} or @samp{-d}.
8915
8916For example :
8917
8918@example
8919bison -d @var{infile.yxx}
8920@end example
84163231 8921@noindent
72d2299c 8922will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}, and
234a3be3
AD
8923
8924@example
b56471a6 8925bison -d -o @var{output.c++} @var{infile.y}
234a3be3 8926@end example
84163231 8927@noindent
234a3be3
AD
8928will produce @file{output.c++} and @file{outfile.h++}.
8929
35430378 8930For compatibility with POSIX, the standard Bison
397ec073
PE
8931distribution also contains a shell script called @command{yacc} that
8932invokes Bison with the @option{-y} option.
8933
bfa74976 8934@menu
13863333 8935* Bison Options:: All the options described in detail,
c827f760 8936 in alphabetical order by short options.
bfa74976 8937* Option Cross Key:: Alphabetical list of long options.
93dd49ab 8938* Yacc Library:: Yacc-compatible @code{yylex} and @code{main}.
bfa74976
RS
8939@end menu
8940
342b8b6e 8941@node Bison Options
bfa74976
RS
8942@section Bison Options
8943
8944Bison supports both traditional single-letter options and mnemonic long
8945option names. Long option names are indicated with @samp{--} instead of
8946@samp{-}. Abbreviations for option names are allowed as long as they
8947are unique. When a long option takes an argument, like
8948@samp{--file-prefix}, connect the option name and the argument with
8949@samp{=}.
8950
8951Here is a list of options that can be used with Bison, alphabetized by
8952short option. It is followed by a cross key alphabetized by long
8953option.
8954
89cab50d
AD
8955@c Please, keep this ordered as in `bison --help'.
8956@noindent
8957Operations modes:
8958@table @option
8959@item -h
8960@itemx --help
8961Print a summary of the command-line options to Bison and exit.
bfa74976 8962
89cab50d
AD
8963@item -V
8964@itemx --version
8965Print the version number of Bison and exit.
bfa74976 8966
f7ab6a50
PE
8967@item --print-localedir
8968Print the name of the directory containing locale-dependent data.
8969
a0de5091
JD
8970@item --print-datadir
8971Print the name of the directory containing skeletons and XSLT.
8972
89cab50d
AD
8973@item -y
8974@itemx --yacc
9913d6e4
JD
8975Act more like the traditional Yacc command. This can cause different
8976diagnostics to be generated, and may change behavior in other minor
8977ways. Most importantly, imitate Yacc's output file name conventions,
8978so that the parser implementation file is called @file{y.tab.c}, and
8979the other outputs are called @file{y.output} and @file{y.tab.h}.
8980Also, if generating a deterministic parser in C, generate
8981@code{#define} statements in addition to an @code{enum} to associate
8982token numbers with token names. Thus, the following shell script can
8983substitute for Yacc, and the Bison distribution contains such a script
8984for compatibility with POSIX:
bfa74976 8985
89cab50d 8986@example
397ec073 8987#! /bin/sh
26e06a21 8988bison -y "$@@"
89cab50d 8989@end example
54662697
PE
8990
8991The @option{-y}/@option{--yacc} option is intended for use with
8992traditional Yacc grammars. If your grammar uses a Bison extension
8993like @samp{%glr-parser}, Bison might not be Yacc-compatible even if
8994this option is specified.
8995
ecd1b61c
JD
8996@item -W [@var{category}]
8997@itemx --warnings[=@var{category}]
118d4978
AD
8998Output warnings falling in @var{category}. @var{category} can be one
8999of:
9000@table @code
9001@item midrule-values
8e55b3aa
JD
9002Warn about mid-rule values that are set but not used within any of the actions
9003of the parent rule.
9004For example, warn about unused @code{$2} in:
118d4978
AD
9005
9006@example
9007exp: '1' @{ $$ = 1; @} '+' exp @{ $$ = $1 + $4; @};
9008@end example
9009
8e55b3aa
JD
9010Also warn about mid-rule values that are used but not set.
9011For example, warn about unset @code{$$} in the mid-rule action in:
118d4978
AD
9012
9013@example
de6be119 9014exp: '1' @{ $1 = 1; @} '+' exp @{ $$ = $2 + $4; @};
118d4978
AD
9015@end example
9016
9017These warnings are not enabled by default since they sometimes prove to
9018be false alarms in existing grammars employing the Yacc constructs
8e55b3aa 9019@code{$0} or @code{$-@var{n}} (where @var{n} is some positive integer).
118d4978 9020
118d4978 9021@item yacc
35430378 9022Incompatibilities with POSIX Yacc.
118d4978 9023
6f8bdce2
JD
9024@item conflicts-sr
9025@itemx conflicts-rr
9026S/R and R/R conflicts. These warnings are enabled by default. However, if
9027the @code{%expect} or @code{%expect-rr} directive is specified, an
9028unexpected number of conflicts is an error, and an expected number of
9029conflicts is not reported, so @option{-W} and @option{--warning} then have
9030no effect on the conflict report.
9031
8ffd7912
JD
9032@item other
9033All warnings not categorized above. These warnings are enabled by default.
9034
9035This category is provided merely for the sake of completeness. Future
9036releases of Bison may move warnings from this category to new, more specific
9037categories.
9038
118d4978 9039@item all
8e55b3aa 9040All the warnings.
118d4978 9041@item none
8e55b3aa 9042Turn off all the warnings.
118d4978 9043@item error
8e55b3aa 9044Treat warnings as errors.
118d4978
AD
9045@end table
9046
9047A category can be turned off by prefixing its name with @samp{no-}. For
cf22447c 9048instance, @option{-Wno-yacc} will hide the warnings about
35430378 9049POSIX Yacc incompatibilities.
89cab50d
AD
9050@end table
9051
9052@noindent
9053Tuning the parser:
9054
9055@table @option
9056@item -t
9057@itemx --debug
9913d6e4
JD
9058In the parser implementation file, define the macro @code{YYDEBUG} to
90591 if it is not already defined, so that the debugging facilities are
9060compiled. @xref{Tracing, ,Tracing Your Parser}.
89cab50d 9061
e14c6831
AD
9062@item -D @var{name}[=@var{value}]
9063@itemx --define=@var{name}[=@var{value}]
c33bc800 9064@itemx -F @var{name}[=@var{value}]
34d41938
JD
9065@itemx --force-define=@var{name}[=@var{value}]
9066Each of these is equivalent to @samp{%define @var{name} "@var{value}"}
2f4518a1 9067(@pxref{%define Summary}) except that Bison processes multiple
34d41938
JD
9068definitions for the same @var{name} as follows:
9069
9070@itemize
9071@item
e3a33f7c
JD
9072Bison quietly ignores all command-line definitions for @var{name} except
9073the last.
34d41938 9074@item
e3a33f7c
JD
9075If that command-line definition is specified by a @code{-D} or
9076@code{--define}, Bison reports an error for any @code{%define}
9077definition for @var{name}.
34d41938 9078@item
e3a33f7c
JD
9079If that command-line definition is specified by a @code{-F} or
9080@code{--force-define} instead, Bison quietly ignores all @code{%define}
9081definitions for @var{name}.
9082@item
9083Otherwise, Bison reports an error if there are multiple @code{%define}
9084definitions for @var{name}.
34d41938
JD
9085@end itemize
9086
9087You should avoid using @code{-F} and @code{--force-define} in your
9913d6e4
JD
9088make files unless you are confident that it is safe to quietly ignore
9089any conflicting @code{%define} that may be added to the grammar file.
e14c6831 9090
0e021770
PE
9091@item -L @var{language}
9092@itemx --language=@var{language}
9093Specify the programming language for the generated parser, as if
9094@code{%language} was specified (@pxref{Decl Summary, , Bison Declaration
59da312b 9095Summary}). Currently supported languages include C, C++, and Java.
e6e704dc 9096@var{language} is case-insensitive.
0e021770 9097
ed4d67dc
JD
9098This option is experimental and its effect may be modified in future
9099releases.
9100
89cab50d 9101@item --locations
d8988b2f 9102Pretend that @code{%locations} was specified. @xref{Decl Summary}.
89cab50d
AD
9103
9104@item -p @var{prefix}
9105@itemx --name-prefix=@var{prefix}
4b3847c3
AD
9106Pretend that @code{%name-prefix "@var{prefix}"} was specified (@pxref{Decl
9107Summary}). Obsoleted by @code{-Dapi.prefix=@var{prefix}}. @xref{Multiple
9108Parsers, ,Multiple Parsers in the Same Program}.
bfa74976
RS
9109
9110@item -l
9111@itemx --no-lines
9913d6e4
JD
9112Don't put any @code{#line} preprocessor commands in the parser
9113implementation file. Ordinarily Bison puts them in the parser
9114implementation file so that the C compiler and debuggers will
9115associate errors with your source file, the grammar file. This option
9116causes them to associate errors with the parser implementation file,
9117treating it as an independent source file in its own right.
bfa74976 9118
e6e704dc
JD
9119@item -S @var{file}
9120@itemx --skeleton=@var{file}
a7867f53 9121Specify the skeleton to use, similar to @code{%skeleton}
e6e704dc
JD
9122(@pxref{Decl Summary, , Bison Declaration Summary}).
9123
ed4d67dc
JD
9124@c You probably don't need this option unless you are developing Bison.
9125@c You should use @option{--language} if you want to specify the skeleton for a
9126@c different language, because it is clearer and because it will always
9127@c choose the correct skeleton for non-deterministic or push parsers.
e6e704dc 9128
a7867f53
JD
9129If @var{file} does not contain a @code{/}, @var{file} is the name of a skeleton
9130file in the Bison installation directory.
9131If it does, @var{file} is an absolute file name or a file name relative to the
9132current working directory.
9133This is similar to how most shells resolve commands.
9134
89cab50d
AD
9135@item -k
9136@itemx --token-table
d8988b2f 9137Pretend that @code{%token-table} was specified. @xref{Decl Summary}.
89cab50d 9138@end table
bfa74976 9139
89cab50d
AD
9140@noindent
9141Adjust the output:
bfa74976 9142
89cab50d 9143@table @option
8e55b3aa 9144@item --defines[=@var{file}]
d8988b2f 9145Pretend that @code{%defines} was specified, i.e., write an extra output
6deb4447 9146file containing macro definitions for the token type names defined in
4bfd5e4e 9147the grammar, as well as a few other declarations. @xref{Decl Summary}.
931c7513 9148
8e55b3aa
JD
9149@item -d
9150This is the same as @code{--defines} except @code{-d} does not accept a
9151@var{file} argument since POSIX Yacc requires that @code{-d} can be bundled
9152with other short options.
342b8b6e 9153
89cab50d
AD
9154@item -b @var{file-prefix}
9155@itemx --file-prefix=@var{prefix}
9c437126 9156Pretend that @code{%file-prefix} was specified, i.e., specify prefix to use
72d2299c 9157for all Bison output file names. @xref{Decl Summary}.
bfa74976 9158
ec3bc396
AD
9159@item -r @var{things}
9160@itemx --report=@var{things}
9161Write an extra output file containing verbose description of the comma
9162separated list of @var{things} among:
9163
9164@table @code
9165@item state
9166Description of the grammar, conflicts (resolved and unresolved), and
34a6c2d1 9167parser's automaton.
ec3bc396 9168
c473e022
AD
9169@item itemset
9170Implies @code{state} and augments the description of the automaton with
9171the full set of items for each state, instead of its core only.
9172
742e4900 9173@item lookahead
ec3bc396 9174Implies @code{state} and augments the description of the automaton with
742e4900 9175each rule's lookahead set.
ec3bc396 9176
c473e022
AD
9177@item solved
9178Implies @code{state}. Explain how conflicts were solved thanks to
9179precedence and associativity directives.
9180
9181@item all
9182Enable all the items.
9183
9184@item none
9185Do not generate the report.
ec3bc396
AD
9186@end table
9187
1bb2bd75
JD
9188@item --report-file=@var{file}
9189Specify the @var{file} for the verbose description.
9190
bfa74976
RS
9191@item -v
9192@itemx --verbose
9c437126 9193Pretend that @code{%verbose} was specified, i.e., write an extra output
6deb4447 9194file containing verbose descriptions of the grammar and
72d2299c 9195parser. @xref{Decl Summary}.
bfa74976 9196
fa4d969f
PE
9197@item -o @var{file}
9198@itemx --output=@var{file}
9913d6e4 9199Specify the @var{file} for the parser implementation file.
bfa74976 9200
fa4d969f 9201The other output files' names are constructed from @var{file} as
d8988b2f 9202described under the @samp{-v} and @samp{-d} options.
342b8b6e 9203
72183df4 9204@item -g [@var{file}]
8e55b3aa 9205@itemx --graph[=@var{file}]
34a6c2d1 9206Output a graphical representation of the parser's
35fe0834 9207automaton computed by Bison, in @uref{http://www.graphviz.org/, Graphviz}
35430378 9208@uref{http://www.graphviz.org/doc/info/lang.html, DOT} format.
8e55b3aa
JD
9209@code{@var{file}} is optional.
9210If omitted and the grammar file is @file{foo.y}, the output file will be
9211@file{foo.dot}.
59da312b 9212
72183df4 9213@item -x [@var{file}]
8e55b3aa 9214@itemx --xml[=@var{file}]
34a6c2d1 9215Output an XML report of the parser's automaton computed by Bison.
8e55b3aa 9216@code{@var{file}} is optional.
59da312b
JD
9217If omitted and the grammar file is @file{foo.y}, the output file will be
9218@file{foo.xml}.
9219(The current XML schema is experimental and may evolve.
9220More user feedback will help to stabilize it.)
bfa74976
RS
9221@end table
9222
342b8b6e 9223@node Option Cross Key
bfa74976
RS
9224@section Option Cross Key
9225
9226Here is a list of options, alphabetized by long option, to help you find
34d41938 9227the corresponding short option and directive.
bfa74976 9228
34d41938 9229@multitable {@option{--force-define=@var{name}[=@var{value}]}} {@option{-F @var{name}[=@var{value}]}} {@code{%nondeterministic-parser}}
72183df4 9230@headitem Long Option @tab Short Option @tab Bison Directive
f4101aa6 9231@include cross-options.texi
aa08666d 9232@end multitable
bfa74976 9233
93dd49ab
PE
9234@node Yacc Library
9235@section Yacc Library
9236
9237The Yacc library contains default implementations of the
9238@code{yyerror} and @code{main} functions. These default
35430378 9239implementations are normally not useful, but POSIX requires
93dd49ab
PE
9240them. To use the Yacc library, link your program with the
9241@option{-ly} option. Note that Bison's implementation of the Yacc
35430378 9242library is distributed under the terms of the GNU General
93dd49ab
PE
9243Public License (@pxref{Copying}).
9244
9245If you use the Yacc library's @code{yyerror} function, you should
9246declare @code{yyerror} as follows:
9247
9248@example
9249int yyerror (char const *);
9250@end example
9251
9252Bison ignores the @code{int} value returned by this @code{yyerror}.
9253If you use the Yacc library's @code{main} function, your
9254@code{yyparse} function should have the following type signature:
9255
9256@example
9257int yyparse (void);
9258@end example
9259
12545799
AD
9260@c ================================================= C++ Bison
9261
8405b70c
PB
9262@node Other Languages
9263@chapter Parsers Written In Other Languages
12545799
AD
9264
9265@menu
9266* C++ Parsers:: The interface to generate C++ parser classes
8405b70c 9267* Java Parsers:: The interface to generate Java parser classes
12545799
AD
9268@end menu
9269
9270@node C++ Parsers
9271@section C++ Parsers
9272
9273@menu
9274* C++ Bison Interface:: Asking for C++ parser generation
9275* C++ Semantic Values:: %union vs. C++
9276* C++ Location Values:: The position and location classes
9277* C++ Parser Interface:: Instantiating and running the parser
9278* C++ Scanner Interface:: Exchanges between yylex and parse
8405b70c 9279* A Complete C++ Example:: Demonstrating their use
12545799
AD
9280@end menu
9281
9282@node C++ Bison Interface
9283@subsection C++ Bison Interface
ed4d67dc 9284@c - %skeleton "lalr1.cc"
12545799
AD
9285@c - Always pure
9286@c - initial action
9287
34a6c2d1 9288The C++ deterministic parser is selected using the skeleton directive,
baacae49
AD
9289@samp{%skeleton "lalr1.cc"}, or the synonymous command-line option
9290@option{--skeleton=lalr1.cc}.
e6e704dc 9291@xref{Decl Summary}.
0e021770 9292
793fbca5
JD
9293When run, @command{bison} will create several entities in the @samp{yy}
9294namespace.
9295@findex %define namespace
2f4518a1
JD
9296Use the @samp{%define namespace} directive to change the namespace
9297name, see @ref{%define Summary,,namespace}. The various classes are
9298generated in the following files:
aa08666d 9299
12545799
AD
9300@table @file
9301@item position.hh
9302@itemx location.hh
9303The definition of the classes @code{position} and @code{location},
9304used for location tracking. @xref{C++ Location Values}.
9305
9306@item stack.hh
9307An auxiliary class @code{stack} used by the parser.
9308
fa4d969f
PE
9309@item @var{file}.hh
9310@itemx @var{file}.cc
9913d6e4 9311(Assuming the extension of the grammar file was @samp{.yy}.) The
cd8b5791
AD
9312declaration and implementation of the C++ parser class. The basename
9313and extension of these two files follow the same rules as with regular C
9314parsers (@pxref{Invocation}).
12545799 9315
cd8b5791
AD
9316The header is @emph{mandatory}; you must either pass
9317@option{-d}/@option{--defines} to @command{bison}, or use the
12545799
AD
9318@samp{%defines} directive.
9319@end table
9320
9321All these files are documented using Doxygen; run @command{doxygen}
9322for a complete and accurate documentation.
9323
9324@node C++ Semantic Values
9325@subsection C++ Semantic Values
9326@c - No objects in unions
178e123e 9327@c - YYSTYPE
12545799
AD
9328@c - Printer and destructor
9329
9330The @code{%union} directive works as for C, see @ref{Union Decl, ,The
9331Collection of Value Types}. In particular it produces a genuine
9332@code{union}@footnote{In the future techniques to allow complex types
fb9712a9
AD
9333within pseudo-unions (similar to Boost variants) might be implemented to
9334alleviate these issues.}, which have a few specific features in C++.
12545799
AD
9335@itemize @minus
9336@item
fb9712a9
AD
9337The type @code{YYSTYPE} is defined but its use is discouraged: rather
9338you should refer to the parser's encapsulated type
9339@code{yy::parser::semantic_type}.
12545799
AD
9340@item
9341Non POD (Plain Old Data) types cannot be used. C++ forbids any
9342instance of classes with constructors in unions: only @emph{pointers}
9343to such objects are allowed.
9344@end itemize
9345
9346Because objects have to be stored via pointers, memory is not
9347reclaimed automatically: using the @code{%destructor} directive is the
9348only means to avoid leaks. @xref{Destructor Decl, , Freeing Discarded
9349Symbols}.
9350
9351
9352@node C++ Location Values
9353@subsection C++ Location Values
9354@c - %locations
9355@c - class Position
9356@c - class Location
16dc6a9e 9357@c - %define filename_type "const symbol::Symbol"
12545799
AD
9358
9359When the directive @code{%locations} is used, the C++ parser supports
7404cdf3
JD
9360location tracking, see @ref{Tracking Locations}. Two auxiliary classes
9361define a @code{position}, a single point in a file, and a @code{location}, a
9362range composed of a pair of @code{position}s (possibly spanning several
9363files).
12545799 9364
936c88d1
AD
9365@tindex uint
9366In this section @code{uint} is an abbreviation for @code{unsigned int}: in
9367genuine code only the latter is used.
9368
9369@menu
9370* C++ position:: One point in the source file
9371* C++ location:: Two points in the source file
9372@end menu
9373
9374@node C++ position
9375@subsubsection C++ @code{position}
9376
9377@deftypeop {Constructor} {position} {} position (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1)
9378Create a @code{position} denoting a given point. Note that @code{file} is
9379not reclaimed when the @code{position} is destroyed: memory managed must be
9380handled elsewhere.
9381@end deftypeop
9382
9383@deftypemethod {position} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1)
9384Reset the position to the given values.
9385@end deftypemethod
9386
9387@deftypeivar {position} {std::string*} file
12545799
AD
9388The name of the file. It will always be handled as a pointer, the
9389parser will never duplicate nor deallocate it. As an experimental
9390feature you may change it to @samp{@var{type}*} using @samp{%define
16dc6a9e 9391filename_type "@var{type}"}.
936c88d1 9392@end deftypeivar
12545799 9393
936c88d1 9394@deftypeivar {position} {uint} line
12545799 9395The line, starting at 1.
936c88d1 9396@end deftypeivar
12545799 9397
936c88d1 9398@deftypemethod {position} {uint} lines (int @var{height} = 1)
12545799
AD
9399Advance by @var{height} lines, resetting the column number.
9400@end deftypemethod
9401
936c88d1
AD
9402@deftypeivar {position} {uint} column
9403The column, starting at 1.
9404@end deftypeivar
12545799 9405
936c88d1 9406@deftypemethod {position} {uint} columns (int @var{width} = 1)
12545799
AD
9407Advance by @var{width} columns, without changing the line number.
9408@end deftypemethod
9409
936c88d1
AD
9410@deftypemethod {position} {position&} operator+= (int @var{width})
9411@deftypemethodx {position} {position} operator+ (int @var{width})
9412@deftypemethodx {position} {position&} operator-= (int @var{width})
9413@deftypemethodx {position} {position} operator- (int @var{width})
12545799
AD
9414Various forms of syntactic sugar for @code{columns}.
9415@end deftypemethod
9416
936c88d1
AD
9417@deftypemethod {position} {bool} operator== (const position& @var{that})
9418@deftypemethodx {position} {bool} operator!= (const position& @var{that})
9419Whether @code{*this} and @code{that} denote equal/different positions.
9420@end deftypemethod
9421
9422@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const position& @var{p})
12545799 9423Report @var{p} on @var{o} like this:
fa4d969f
PE
9424@samp{@var{file}:@var{line}.@var{column}}, or
9425@samp{@var{line}.@var{column}} if @var{file} is null.
936c88d1
AD
9426@end deftypefun
9427
9428@node C++ location
9429@subsubsection C++ @code{location}
9430
9431@deftypeop {Constructor} {location} {} location (const position& @var{begin}, const position& @var{end})
9432Create a @code{Location} from the endpoints of the range.
9433@end deftypeop
9434
9435@deftypeop {Constructor} {location} {} location (const position& @var{pos} = position())
9436@deftypeopx {Constructor} {location} {} location (std::string* @var{file}, uint @var{line}, uint @var{col})
9437Create a @code{Location} denoting an empty range located at a given point.
9438@end deftypeop
9439
9440@deftypemethod {location} {void} initialize (std::string* @var{file} = 0, uint @var{line} = 1, uint @var{col} = 1)
9441Reset the location to an empty range at the given values.
12545799
AD
9442@end deftypemethod
9443
936c88d1
AD
9444@deftypeivar {location} {position} begin
9445@deftypeivarx {location} {position} end
12545799 9446The first, inclusive, position of the range, and the first beyond.
936c88d1 9447@end deftypeivar
12545799 9448
936c88d1
AD
9449@deftypemethod {location} {uint} columns (int @var{width} = 1)
9450@deftypemethodx {location} {uint} lines (int @var{height} = 1)
12545799
AD
9451Advance the @code{end} position.
9452@end deftypemethod
9453
936c88d1
AD
9454@deftypemethod {location} {location} operator+ (const location& @var{end})
9455@deftypemethodx {location} {location} operator+ (int @var{width})
9456@deftypemethodx {location} {location} operator+= (int @var{width})
12545799
AD
9457Various forms of syntactic sugar.
9458@end deftypemethod
9459
9460@deftypemethod {location} {void} step ()
9461Move @code{begin} onto @code{end}.
9462@end deftypemethod
9463
936c88d1
AD
9464@deftypemethod {location} {bool} operator== (const location& @var{that})
9465@deftypemethodx {location} {bool} operator!= (const location& @var{that})
9466Whether @code{*this} and @code{that} denote equal/different ranges of
9467positions.
9468@end deftypemethod
9469
9470@deftypefun {std::ostream&} operator<< (std::ostream& @var{o}, const location& @var{p})
9471Report @var{p} on @var{o}, taking care of special cases such as: no
9472@code{filename} defined, or equal filename/line or column.
9473@end deftypefun
12545799
AD
9474
9475@node C++ Parser Interface
9476@subsection C++ Parser Interface
9477@c - define parser_class_name
9478@c - Ctor
9479@c - parse, error, set_debug_level, debug_level, set_debug_stream,
9480@c debug_stream.
9481@c - Reporting errors
9482
9483The output files @file{@var{output}.hh} and @file{@var{output}.cc}
9484declare and define the parser class in the namespace @code{yy}. The
9485class name defaults to @code{parser}, but may be changed using
16dc6a9e 9486@samp{%define parser_class_name "@var{name}"}. The interface of
9d9b8b70 9487this class is detailed below. It can be extended using the
12545799
AD
9488@code{%parse-param} feature: its semantics is slightly changed since
9489it describes an additional member of the parser class, and an
9490additional argument for its constructor.
9491
baacae49
AD
9492@defcv {Type} {parser} {semantic_type}
9493@defcvx {Type} {parser} {location_type}
12545799 9494The types for semantics value and locations.
8a0adb01 9495@end defcv
12545799 9496
baacae49 9497@defcv {Type} {parser} {token}
2c0f9706
AD
9498A structure that contains (only) the @code{yytokentype} enumeration, which
9499defines the tokens. To refer to the token @code{FOO},
9500use @code{yy::parser::token::FOO}. The scanner can use
baacae49
AD
9501@samp{typedef yy::parser::token token;} to ``import'' the token enumeration
9502(@pxref{Calc++ Scanner}).
9503@end defcv
9504
12545799
AD
9505@deftypemethod {parser} {} parser (@var{type1} @var{arg1}, ...)
9506Build a new parser object. There are no arguments by default, unless
9507@samp{%parse-param @{@var{type1} @var{arg1}@}} was used.
9508@end deftypemethod
9509
9510@deftypemethod {parser} {int} parse ()
9511Run the syntactic analysis, and return 0 on success, 1 otherwise.
9512@end deftypemethod
9513
9514@deftypemethod {parser} {std::ostream&} debug_stream ()
9515@deftypemethodx {parser} {void} set_debug_stream (std::ostream& @var{o})
9516Get or set the stream used for tracing the parsing. It defaults to
9517@code{std::cerr}.
9518@end deftypemethod
9519
9520@deftypemethod {parser} {debug_level_type} debug_level ()
9521@deftypemethodx {parser} {void} set_debug_level (debug_level @var{l})
9522Get or set the tracing level. Currently its value is either 0, no trace,
9d9b8b70 9523or nonzero, full tracing.
12545799
AD
9524@end deftypemethod
9525
9526@deftypemethod {parser} {void} error (const location_type& @var{l}, const std::string& @var{m})
9527The definition for this member function must be supplied by the user:
9528the parser uses it to report a parser error occurring at @var{l},
9529described by @var{m}.
9530@end deftypemethod
9531
9532
9533@node C++ Scanner Interface
9534@subsection C++ Scanner Interface
9535@c - prefix for yylex.
9536@c - Pure interface to yylex
9537@c - %lex-param
9538
9539The parser invokes the scanner by calling @code{yylex}. Contrary to C
9540parsers, C++ parsers are always pure: there is no point in using the
d9df47b6 9541@code{%define api.pure} directive. Therefore the interface is as follows.
12545799 9542
baacae49 9543@deftypemethod {parser} {int} yylex (semantic_type* @var{yylval}, location_type* @var{yylloc}, @var{type1} @var{arg1}, ...)
12545799
AD
9544Return the next token. Its type is the return value, its semantic
9545value and location being @var{yylval} and @var{yylloc}. Invocations of
9546@samp{%lex-param @{@var{type1} @var{arg1}@}} yield additional arguments.
9547@end deftypemethod
9548
9549
9550@node A Complete C++ Example
8405b70c 9551@subsection A Complete C++ Example
12545799
AD
9552
9553This section demonstrates the use of a C++ parser with a simple but
9554complete example. This example should be available on your system,
9555ready to compile, in the directory @dfn{../bison/examples/calc++}. It
9556focuses on the use of Bison, therefore the design of the various C++
9557classes is very naive: no accessors, no encapsulation of members etc.
9558We will use a Lex scanner, and more precisely, a Flex scanner, to
9559demonstrate the various interaction. A hand written scanner is
9560actually easier to interface with.
9561
9562@menu
9563* Calc++ --- C++ Calculator:: The specifications
9564* Calc++ Parsing Driver:: An active parsing context
9565* Calc++ Parser:: A parser class
9566* Calc++ Scanner:: A pure C++ Flex scanner
9567* Calc++ Top Level:: Conducting the band
9568@end menu
9569
9570@node Calc++ --- C++ Calculator
8405b70c 9571@subsubsection Calc++ --- C++ Calculator
12545799
AD
9572
9573Of course the grammar is dedicated to arithmetics, a single
9d9b8b70 9574expression, possibly preceded by variable assignments. An
12545799
AD
9575environment containing possibly predefined variables such as
9576@code{one} and @code{two}, is exchanged with the parser. An example
9577of valid input follows.
9578
9579@example
9580three := 3
9581seven := one + two * three
9582seven * seven
9583@end example
9584
9585@node Calc++ Parsing Driver
8405b70c 9586@subsubsection Calc++ Parsing Driver
12545799
AD
9587@c - An env
9588@c - A place to store error messages
9589@c - A place for the result
9590
9591To support a pure interface with the parser (and the scanner) the
9592technique of the ``parsing context'' is convenient: a structure
9593containing all the data to exchange. Since, in addition to simply
9594launch the parsing, there are several auxiliary tasks to execute (open
9595the file for parsing, instantiate the parser etc.), we recommend
9596transforming the simple parsing context structure into a fully blown
9597@dfn{parsing driver} class.
9598
9599The declaration of this driver class, @file{calc++-driver.hh}, is as
9600follows. The first part includes the CPP guard and imports the
fb9712a9
AD
9601required standard library components, and the declaration of the parser
9602class.
12545799 9603
1c59e0a1 9604@comment file: calc++-driver.hh
12545799
AD
9605@example
9606#ifndef CALCXX_DRIVER_HH
9607# define CALCXX_DRIVER_HH
9608# include <string>
9609# include <map>
fb9712a9 9610# include "calc++-parser.hh"
12545799
AD
9611@end example
9612
12545799
AD
9613
9614@noindent
9615Then comes the declaration of the scanning function. Flex expects
9616the signature of @code{yylex} to be defined in the macro
9617@code{YY_DECL}, and the C++ parser expects it to be declared. We can
9618factor both as follows.
1c59e0a1
AD
9619
9620@comment file: calc++-driver.hh
12545799 9621@example
3dc5e96b
PE
9622// Tell Flex the lexer's prototype ...
9623# define YY_DECL \
c095d689 9624 yy::calcxx_parser::token_type \
8b49e6bf
AD
9625 yylex (yy::calcxx_parser::semantic_type *yylval, \
9626 yy::calcxx_parser::location_type *yylloc, \
c095d689 9627 calcxx_driver& driver)
12545799
AD
9628// ... and declare it for the parser's sake.
9629YY_DECL;
9630@end example
9631
9632@noindent
9633The @code{calcxx_driver} class is then declared with its most obvious
9634members.
9635
1c59e0a1 9636@comment file: calc++-driver.hh
12545799
AD
9637@example
9638// Conducting the whole scanning and parsing of Calc++.
9639class calcxx_driver
9640@{
9641public:
9642 calcxx_driver ();
9643 virtual ~calcxx_driver ();
9644
9645 std::map<std::string, int> variables;
9646
9647 int result;
9648@end example
9649
9650@noindent
9651To encapsulate the coordination with the Flex scanner, it is useful to
9652have two members function to open and close the scanning phase.
12545799 9653
1c59e0a1 9654@comment file: calc++-driver.hh
12545799
AD
9655@example
9656 // Handling the scanner.
9657 void scan_begin ();
9658 void scan_end ();
9659 bool trace_scanning;
9660@end example
9661
9662@noindent
9663Similarly for the parser itself.
9664
1c59e0a1 9665@comment file: calc++-driver.hh
12545799 9666@example
bb32f4f2
AD
9667 // Run the parser. Return 0 on success.
9668 int parse (const std::string& f);
12545799
AD
9669 std::string file;
9670 bool trace_parsing;
9671@end example
9672
9673@noindent
9674To demonstrate pure handling of parse errors, instead of simply
9675dumping them on the standard error output, we will pass them to the
9676compiler driver using the following two member functions. Finally, we
9677close the class declaration and CPP guard.
9678
1c59e0a1 9679@comment file: calc++-driver.hh
12545799
AD
9680@example
9681 // Error handling.
9682 void error (const yy::location& l, const std::string& m);
9683 void error (const std::string& m);
9684@};
9685#endif // ! CALCXX_DRIVER_HH
9686@end example
9687
9688The implementation of the driver is straightforward. The @code{parse}
9689member function deserves some attention. The @code{error} functions
9690are simple stubs, they should actually register the located error
9691messages and set error state.
9692
1c59e0a1 9693@comment file: calc++-driver.cc
12545799
AD
9694@example
9695#include "calc++-driver.hh"
9696#include "calc++-parser.hh"
9697
9698calcxx_driver::calcxx_driver ()
9699 : trace_scanning (false), trace_parsing (false)
9700@{
9701 variables["one"] = 1;
9702 variables["two"] = 2;
9703@}
9704
9705calcxx_driver::~calcxx_driver ()
9706@{
9707@}
9708
bb32f4f2 9709int
12545799
AD
9710calcxx_driver::parse (const std::string &f)
9711@{
9712 file = f;
9713 scan_begin ();
9714 yy::calcxx_parser parser (*this);
9715 parser.set_debug_level (trace_parsing);
bb32f4f2 9716 int res = parser.parse ();
12545799 9717 scan_end ();
bb32f4f2 9718 return res;
12545799
AD
9719@}
9720
9721void
9722calcxx_driver::error (const yy::location& l, const std::string& m)
9723@{
9724 std::cerr << l << ": " << m << std::endl;
9725@}
9726
9727void
9728calcxx_driver::error (const std::string& m)
9729@{
9730 std::cerr << m << std::endl;
9731@}
9732@end example
9733
9734@node Calc++ Parser
8405b70c 9735@subsubsection Calc++ Parser
12545799 9736
9913d6e4
JD
9737The grammar file @file{calc++-parser.yy} starts by asking for the C++
9738deterministic parser skeleton, the creation of the parser header file,
9739and specifies the name of the parser class. Because the C++ skeleton
9740changed several times, it is safer to require the version you designed
9741the grammar for.
1c59e0a1
AD
9742
9743@comment file: calc++-parser.yy
12545799 9744@example
ea118b72 9745%skeleton "lalr1.cc" /* -*- C++ -*- */
e6e704dc 9746%require "@value{VERSION}"
12545799 9747%defines
16dc6a9e 9748%define parser_class_name "calcxx_parser"
fb9712a9
AD
9749@end example
9750
9751@noindent
16dc6a9e 9752@findex %code requires
fb9712a9
AD
9753Then come the declarations/inclusions needed to define the
9754@code{%union}. Because the parser uses the parsing driver and
9755reciprocally, both cannot include the header of the other. Because the
9756driver's header needs detailed knowledge about the parser class (in
9757particular its inner types), it is the parser's header which will simply
9758use a forward declaration of the driver.
8e6f2266 9759@xref{%code Summary}.
fb9712a9
AD
9760
9761@comment file: calc++-parser.yy
9762@example
16dc6a9e 9763%code requires @{
12545799 9764# include <string>
fb9712a9 9765class calcxx_driver;
9bc0dd67 9766@}
12545799
AD
9767@end example
9768
9769@noindent
9770The driver is passed by reference to the parser and to the scanner.
9771This provides a simple but effective pure interface, not relying on
9772global variables.
9773
1c59e0a1 9774@comment file: calc++-parser.yy
12545799
AD
9775@example
9776// The parsing context.
9777%parse-param @{ calcxx_driver& driver @}
9778%lex-param @{ calcxx_driver& driver @}
9779@end example
9780
9781@noindent
9782Then we request the location tracking feature, and initialize the
c781580d 9783first location's file name. Afterward new locations are computed
12545799
AD
9784relatively to the previous locations: the file name will be
9785automatically propagated.
9786
1c59e0a1 9787@comment file: calc++-parser.yy
12545799
AD
9788@example
9789%locations
9790%initial-action
9791@{
9792 // Initialize the initial location.
b47dbebe 9793 @@$.begin.filename = @@$.end.filename = &driver.file;
12545799
AD
9794@};
9795@end example
9796
9797@noindent
6f04ee6c
JD
9798Use the two following directives to enable parser tracing and verbose error
9799messages. However, verbose error messages can contain incorrect information
9800(@pxref{LAC}).
12545799 9801
1c59e0a1 9802@comment file: calc++-parser.yy
12545799
AD
9803@example
9804%debug
9805%error-verbose
9806@end example
9807
9808@noindent
9809Semantic values cannot use ``real'' objects, but only pointers to
9810them.
9811
1c59e0a1 9812@comment file: calc++-parser.yy
12545799
AD
9813@example
9814// Symbols.
9815%union
9816@{
9817 int ival;
9818 std::string *sval;
9819@};
9820@end example
9821
fb9712a9 9822@noindent
136a0f76
PB
9823@findex %code
9824The code between @samp{%code @{} and @samp{@}} is output in the
34f98f46 9825@file{*.cc} file; it needs detailed knowledge about the driver.
fb9712a9
AD
9826
9827@comment file: calc++-parser.yy
9828@example
136a0f76 9829%code @{
fb9712a9 9830# include "calc++-driver.hh"
34f98f46 9831@}
fb9712a9
AD
9832@end example
9833
9834
12545799
AD
9835@noindent
9836The token numbered as 0 corresponds to end of file; the following line
9837allows for nicer error messages referring to ``end of file'' instead
9838of ``$end''. Similarly user friendly named are provided for each
9839symbol. Note that the tokens names are prefixed by @code{TOKEN_} to
9840avoid name clashes.
9841
1c59e0a1 9842@comment file: calc++-parser.yy
12545799 9843@example
fb9712a9
AD
9844%token END 0 "end of file"
9845%token ASSIGN ":="
9846%token <sval> IDENTIFIER "identifier"
9847%token <ival> NUMBER "number"
a8c2e813 9848%type <ival> exp
12545799
AD
9849@end example
9850
9851@noindent
9852To enable memory deallocation during error recovery, use
9853@code{%destructor}.
9854
287c78f6 9855@c FIXME: Document %printer, and mention that it takes a braced-code operand.
1c59e0a1 9856@comment file: calc++-parser.yy
12545799 9857@example
68fff38a 9858%printer @{ yyoutput << *$$; @} "identifier"
12545799
AD
9859%destructor @{ delete $$; @} "identifier"
9860
68fff38a 9861%printer @{ yyoutput << $$; @} <ival>
12545799
AD
9862@end example
9863
9864@noindent
9865The grammar itself is straightforward.
9866
1c59e0a1 9867@comment file: calc++-parser.yy
12545799
AD
9868@example
9869%%
9870%start unit;
9871unit: assignments exp @{ driver.result = $2; @};
9872
de6be119
AD
9873assignments:
9874 /* Nothing. */ @{@}
9875| assignments assignment @{@};
12545799 9876
3dc5e96b
PE
9877assignment:
9878 "identifier" ":=" exp
9879 @{ driver.variables[*$1] = $3; delete $1; @};
12545799
AD
9880
9881%left '+' '-';
9882%left '*' '/';
9883exp: exp '+' exp @{ $$ = $1 + $3; @}
9884 | exp '-' exp @{ $$ = $1 - $3; @}
9885 | exp '*' exp @{ $$ = $1 * $3; @}
9886 | exp '/' exp @{ $$ = $1 / $3; @}
3dc5e96b 9887 | "identifier" @{ $$ = driver.variables[*$1]; delete $1; @}
fb9712a9 9888 | "number" @{ $$ = $1; @};
12545799
AD
9889%%
9890@end example
9891
9892@noindent
9893Finally the @code{error} member function registers the errors to the
9894driver.
9895
1c59e0a1 9896@comment file: calc++-parser.yy
12545799
AD
9897@example
9898void
1c59e0a1
AD
9899yy::calcxx_parser::error (const yy::calcxx_parser::location_type& l,
9900 const std::string& m)
12545799
AD
9901@{
9902 driver.error (l, m);
9903@}
9904@end example
9905
9906@node Calc++ Scanner
8405b70c 9907@subsubsection Calc++ Scanner
12545799
AD
9908
9909The Flex scanner first includes the driver declaration, then the
9910parser's to get the set of defined tokens.
9911
1c59e0a1 9912@comment file: calc++-scanner.ll
12545799 9913@example
ea118b72 9914%@{ /* -*- C++ -*- */
04098407 9915# include <cstdlib>
b10dd689
AD
9916# include <cerrno>
9917# include <climits>
12545799
AD
9918# include <string>
9919# include "calc++-driver.hh"
9920# include "calc++-parser.hh"
eaea13f5
PE
9921
9922/* Work around an incompatibility in flex (at least versions
9923 2.5.31 through 2.5.33): it generates code that does
9924 not conform to C89. See Debian bug 333231
9925 <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>. */
7870f699
PE
9926# undef yywrap
9927# define yywrap() 1
eaea13f5 9928
c095d689
AD
9929/* By default yylex returns int, we use token_type.
9930 Unfortunately yyterminate by default returns 0, which is
9931 not of token_type. */
8c5b881d 9932#define yyterminate() return token::END
12545799
AD
9933%@}
9934@end example
9935
9936@noindent
9937Because there is no @code{#include}-like feature we don't need
9938@code{yywrap}, we don't need @code{unput} either, and we parse an
9939actual file, this is not an interactive session with the user.
9940Finally we enable the scanner tracing features.
9941
1c59e0a1 9942@comment file: calc++-scanner.ll
12545799
AD
9943@example
9944%option noyywrap nounput batch debug
9945@end example
9946
9947@noindent
9948Abbreviations allow for more readable rules.
9949
1c59e0a1 9950@comment file: calc++-scanner.ll
12545799
AD
9951@example
9952id [a-zA-Z][a-zA-Z_0-9]*
9953int [0-9]+
9954blank [ \t]
9955@end example
9956
9957@noindent
9d9b8b70 9958The following paragraph suffices to track locations accurately. Each
12545799
AD
9959time @code{yylex} is invoked, the begin position is moved onto the end
9960position. Then when a pattern is matched, the end position is
9961advanced of its width. In case it matched ends of lines, the end
9962cursor is adjusted, and each time blanks are matched, the begin cursor
9963is moved onto the end cursor to effectively ignore the blanks
9964preceding tokens. Comments would be treated equally.
9965
1c59e0a1 9966@comment file: calc++-scanner.ll
12545799 9967@example
98842516 9968@group
828c373b
AD
9969%@{
9970# define YY_USER_ACTION yylloc->columns (yyleng);
9971%@}
98842516 9972@end group
12545799
AD
9973%%
9974%@{
9975 yylloc->step ();
12545799
AD
9976%@}
9977@{blank@}+ yylloc->step ();
9978[\n]+ yylloc->lines (yyleng); yylloc->step ();
9979@end example
9980
9981@noindent
fb9712a9
AD
9982The rules are simple, just note the use of the driver to report errors.
9983It is convenient to use a typedef to shorten
9984@code{yy::calcxx_parser::token::identifier} into
9d9b8b70 9985@code{token::identifier} for instance.
12545799 9986
1c59e0a1 9987@comment file: calc++-scanner.ll
12545799 9988@example
fb9712a9
AD
9989%@{
9990 typedef yy::calcxx_parser::token token;
9991%@}
8c5b881d 9992 /* Convert ints to the actual type of tokens. */
c095d689 9993[-+*/] return yy::calcxx_parser::token_type (yytext[0]);
fb9712a9 9994":=" return token::ASSIGN;
04098407
PE
9995@{int@} @{
9996 errno = 0;
9997 long n = strtol (yytext, NULL, 10);
9998 if (! (INT_MIN <= n && n <= INT_MAX && errno != ERANGE))
9999 driver.error (*yylloc, "integer is out of range");
10000 yylval->ival = n;
fb9712a9 10001 return token::NUMBER;
04098407 10002@}
fb9712a9 10003@{id@} yylval->sval = new std::string (yytext); return token::IDENTIFIER;
12545799
AD
10004. driver.error (*yylloc, "invalid character");
10005%%
10006@end example
10007
10008@noindent
10009Finally, because the scanner related driver's member function depend
10010on the scanner's data, it is simpler to implement them in this file.
10011
1c59e0a1 10012@comment file: calc++-scanner.ll
12545799 10013@example
98842516 10014@group
12545799
AD
10015void
10016calcxx_driver::scan_begin ()
10017@{
10018 yy_flex_debug = trace_scanning;
56d60c19 10019 if (file.empty () || file == "-")
bb32f4f2
AD
10020 yyin = stdin;
10021 else if (!(yyin = fopen (file.c_str (), "r")))
10022 @{
2c0f9706 10023 error ("cannot open " + file + ": " + strerror(errno));
dd561157 10024 exit (EXIT_FAILURE);
bb32f4f2 10025 @}
12545799 10026@}
98842516 10027@end group
12545799 10028
98842516 10029@group
12545799
AD
10030void
10031calcxx_driver::scan_end ()
10032@{
10033 fclose (yyin);
10034@}
98842516 10035@end group
12545799
AD
10036@end example
10037
10038@node Calc++ Top Level
8405b70c 10039@subsubsection Calc++ Top Level
12545799
AD
10040
10041The top level file, @file{calc++.cc}, poses no problem.
10042
1c59e0a1 10043@comment file: calc++.cc
12545799
AD
10044@example
10045#include <iostream>
10046#include "calc++-driver.hh"
10047
98842516 10048@group
12545799 10049int
fa4d969f 10050main (int argc, char *argv[])
12545799
AD
10051@{
10052 calcxx_driver driver;
56d60c19
AD
10053 for (int i = 1; i < argc; ++i)
10054 if (argv[i] == std::string ("-p"))
12545799 10055 driver.trace_parsing = true;
56d60c19 10056 else if (argv[i] == std::string ("-s"))
12545799 10057 driver.trace_scanning = true;
56d60c19 10058 else if (!driver.parse (argv[i]))
bb32f4f2 10059 std::cout << driver.result << std::endl;
12545799 10060@}
98842516 10061@end group
12545799
AD
10062@end example
10063
8405b70c
PB
10064@node Java Parsers
10065@section Java Parsers
10066
10067@menu
f56274a8
DJ
10068* Java Bison Interface:: Asking for Java parser generation
10069* Java Semantic Values:: %type and %token vs. Java
10070* Java Location Values:: The position and location classes
10071* Java Parser Interface:: Instantiating and running the parser
10072* Java Scanner Interface:: Specifying the scanner for the parser
10073* Java Action Features:: Special features for use in actions
10074* Java Differences:: Differences between C/C++ and Java Grammars
10075* Java Declarations Summary:: List of Bison declarations used with Java
8405b70c
PB
10076@end menu
10077
10078@node Java Bison Interface
10079@subsection Java Bison Interface
10080@c - %language "Java"
8405b70c 10081
59da312b
JD
10082(The current Java interface is experimental and may evolve.
10083More user feedback will help to stabilize it.)
10084
e254a580
DJ
10085The Java parser skeletons are selected using the @code{%language "Java"}
10086directive or the @option{-L java}/@option{--language=java} option.
8405b70c 10087
e254a580 10088@c FIXME: Documented bug.
9913d6e4
JD
10089When generating a Java parser, @code{bison @var{basename}.y} will
10090create a single Java source file named @file{@var{basename}.java}
10091containing the parser implementation. Using a grammar file without a
10092@file{.y} suffix is currently broken. The basename of the parser
10093implementation file can be changed by the @code{%file-prefix}
10094directive or the @option{-p}/@option{--name-prefix} option. The
10095entire parser implementation file name can be changed by the
10096@code{%output} directive or the @option{-o}/@option{--output} option.
10097The parser implementation file contains a single class for the parser.
8405b70c 10098
e254a580 10099You can create documentation for generated parsers using Javadoc.
8405b70c 10100
e254a580
DJ
10101Contrary to C parsers, Java parsers do not use global variables; the
10102state of the parser is always local to an instance of the parser class.
10103Therefore, all Java parsers are ``pure'', and the @code{%pure-parser}
10104and @code{%define api.pure} directives does not do anything when used in
10105Java.
8405b70c 10106
e254a580 10107Push parsers are currently unsupported in Java and @code{%define
812775a0 10108api.push-pull} have no effect.
01b477c6 10109
35430378 10110GLR parsers are currently unsupported in Java. Do not use the
e254a580
DJ
10111@code{glr-parser} directive.
10112
10113No header file can be generated for Java parsers. Do not use the
10114@code{%defines} directive or the @option{-d}/@option{--defines} options.
10115
10116@c FIXME: Possible code change.
10117Currently, support for debugging and verbose errors are always compiled
10118in. Thus the @code{%debug} and @code{%token-table} directives and the
10119@option{-t}/@option{--debug} and @option{-k}/@option{--token-table}
10120options have no effect. This may change in the future to eliminate
10121unused code in the generated parser, so use @code{%debug} and
10122@code{%verbose-error} explicitly if needed. Also, in the future the
10123@code{%token-table} directive might enable a public interface to
10124access the token names and codes.
8405b70c
PB
10125
10126@node Java Semantic Values
10127@subsection Java Semantic Values
10128@c - No %union, specify type in %type/%token.
10129@c - YYSTYPE
10130@c - Printer and destructor
10131
10132There is no @code{%union} directive in Java parsers. Instead, the
10133semantic values' types (class names) should be specified in the
10134@code{%type} or @code{%token} directive:
10135
10136@example
10137%type <Expression> expr assignment_expr term factor
10138%type <Integer> number
10139@end example
10140
10141By default, the semantic stack is declared to have @code{Object} members,
10142which means that the class types you specify can be of any class.
10143To improve the type safety of the parser, you can declare the common
e254a580
DJ
10144superclass of all the semantic values using the @code{%define stype}
10145directive. For example, after the following declaration:
8405b70c
PB
10146
10147@example
e254a580 10148%define stype "ASTNode"
8405b70c
PB
10149@end example
10150
10151@noindent
10152any @code{%type} or @code{%token} specifying a semantic type which
10153is not a subclass of ASTNode, will cause a compile-time error.
10154
e254a580 10155@c FIXME: Documented bug.
8405b70c
PB
10156Types used in the directives may be qualified with a package name.
10157Primitive data types are accepted for Java version 1.5 or later. Note
10158that in this case the autoboxing feature of Java 1.5 will be used.
e254a580
DJ
10159Generic types may not be used; this is due to a limitation in the
10160implementation of Bison, and may change in future releases.
8405b70c
PB
10161
10162Java parsers do not support @code{%destructor}, since the language
10163adopts garbage collection. The parser will try to hold references
10164to semantic values for as little time as needed.
10165
10166Java parsers do not support @code{%printer}, as @code{toString()}
10167can be used to print the semantic values. This however may change
10168(in a backwards-compatible way) in future versions of Bison.
10169
10170
10171@node Java Location Values
10172@subsection Java Location Values
10173@c - %locations
10174@c - class Position
10175@c - class Location
10176
7404cdf3
JD
10177When the directive @code{%locations} is used, the Java parser supports
10178location tracking, see @ref{Tracking Locations}. An auxiliary user-defined
10179class defines a @dfn{position}, a single point in a file; Bison itself
10180defines a class representing a @dfn{location}, a range composed of a pair of
10181positions (possibly spanning several files). The location class is an inner
10182class of the parser; the name is @code{Location} by default, and may also be
10183renamed using @code{%define location_type "@var{class-name}"}.
8405b70c
PB
10184
10185The location class treats the position as a completely opaque value.
10186By default, the class name is @code{Position}, but this can be changed
e254a580
DJ
10187with @code{%define position_type "@var{class-name}"}. This class must
10188be supplied by the user.
8405b70c
PB
10189
10190
e254a580
DJ
10191@deftypeivar {Location} {Position} begin
10192@deftypeivarx {Location} {Position} end
8405b70c 10193The first, inclusive, position of the range, and the first beyond.
e254a580
DJ
10194@end deftypeivar
10195
10196@deftypeop {Constructor} {Location} {} Location (Position @var{loc})
c046698e 10197Create a @code{Location} denoting an empty range located at a given point.
e254a580 10198@end deftypeop
8405b70c 10199
e254a580
DJ
10200@deftypeop {Constructor} {Location} {} Location (Position @var{begin}, Position @var{end})
10201Create a @code{Location} from the endpoints of the range.
10202@end deftypeop
10203
10204@deftypemethod {Location} {String} toString ()
8405b70c
PB
10205Prints the range represented by the location. For this to work
10206properly, the position class should override the @code{equals} and
10207@code{toString} methods appropriately.
10208@end deftypemethod
10209
10210
10211@node Java Parser Interface
10212@subsection Java Parser Interface
10213@c - define parser_class_name
10214@c - Ctor
10215@c - parse, error, set_debug_level, debug_level, set_debug_stream,
10216@c debug_stream.
10217@c - Reporting errors
10218
e254a580
DJ
10219The name of the generated parser class defaults to @code{YYParser}. The
10220@code{YY} prefix may be changed using the @code{%name-prefix} directive
10221or the @option{-p}/@option{--name-prefix} option. Alternatively, use
10222@code{%define parser_class_name "@var{name}"} to give a custom name to
10223the class. The interface of this class is detailed below.
8405b70c 10224
e254a580
DJ
10225By default, the parser class has package visibility. A declaration
10226@code{%define public} will change to public visibility. Remember that,
10227according to the Java language specification, the name of the @file{.java}
10228file should match the name of the class in this case. Similarly, you can
10229use @code{abstract}, @code{final} and @code{strictfp} with the
10230@code{%define} declaration to add other modifiers to the parser class.
10231
10232The Java package name of the parser class can be specified using the
10233@code{%define package} directive. The superclass and the implemented
10234interfaces of the parser class can be specified with the @code{%define
10235extends} and @code{%define implements} directives.
10236
10237The parser class defines an inner class, @code{Location}, that is used
10238for location tracking (see @ref{Java Location Values}), and a inner
10239interface, @code{Lexer} (see @ref{Java Scanner Interface}). Other than
10240these inner class/interface, and the members described in the interface
10241below, all the other members and fields are preceded with a @code{yy} or
10242@code{YY} prefix to avoid clashes with user code.
10243
10244@c FIXME: The following constants and variables are still undocumented:
10245@c @code{bisonVersion}, @code{bisonSkeleton} and @code{errorVerbose}.
10246
10247The parser class can be extended using the @code{%parse-param}
10248directive. Each occurrence of the directive will add a @code{protected
10249final} field to the parser class, and an argument to its constructor,
10250which initialize them automatically.
10251
10252Token names defined by @code{%token} and the predefined @code{EOF} token
10253name are added as constant fields to the parser class.
10254
10255@deftypeop {Constructor} {YYParser} {} YYParser (@var{lex_param}, @dots{}, @var{parse_param}, @dots{})
10256Build a new parser object with embedded @code{%code lexer}. There are
10257no parameters, unless @code{%parse-param}s and/or @code{%lex-param}s are
10258used.
10259@end deftypeop
10260
10261@deftypeop {Constructor} {YYParser} {} YYParser (Lexer @var{lexer}, @var{parse_param}, @dots{})
10262Build a new parser object using the specified scanner. There are no
10263additional parameters unless @code{%parse-param}s are used.
10264
10265If the scanner is defined by @code{%code lexer}, this constructor is
10266declared @code{protected} and is called automatically with a scanner
10267created with the correct @code{%lex-param}s.
10268@end deftypeop
8405b70c
PB
10269
10270@deftypemethod {YYParser} {boolean} parse ()
10271Run the syntactic analysis, and return @code{true} on success,
10272@code{false} otherwise.
10273@end deftypemethod
10274
01b477c6 10275@deftypemethod {YYParser} {boolean} recovering ()
8405b70c 10276During the syntactic analysis, return @code{true} if recovering
e254a580
DJ
10277from a syntax error.
10278@xref{Error Recovery}.
8405b70c
PB
10279@end deftypemethod
10280
10281@deftypemethod {YYParser} {java.io.PrintStream} getDebugStream ()
10282@deftypemethodx {YYParser} {void} setDebugStream (java.io.printStream @var{o})
10283Get or set the stream used for tracing the parsing. It defaults to
10284@code{System.err}.
10285@end deftypemethod
10286
10287@deftypemethod {YYParser} {int} getDebugLevel ()
10288@deftypemethodx {YYParser} {void} setDebugLevel (int @var{l})
10289Get or set the tracing level. Currently its value is either 0, no trace,
10290or nonzero, full tracing.
10291@end deftypemethod
10292
8405b70c
PB
10293
10294@node Java Scanner Interface
10295@subsection Java Scanner Interface
01b477c6 10296@c - %code lexer
8405b70c 10297@c - %lex-param
01b477c6 10298@c - Lexer interface
8405b70c 10299
e254a580
DJ
10300There are two possible ways to interface a Bison-generated Java parser
10301with a scanner: the scanner may be defined by @code{%code lexer}, or
10302defined elsewhere. In either case, the scanner has to implement the
10303@code{Lexer} inner interface of the parser class.
10304
10305In the first case, the body of the scanner class is placed in
10306@code{%code lexer} blocks. If you want to pass parameters from the
10307parser constructor to the scanner constructor, specify them with
10308@code{%lex-param}; they are passed before @code{%parse-param}s to the
10309constructor.
01b477c6 10310
59c5ac72 10311In the second case, the scanner has to implement the @code{Lexer} interface,
01b477c6
PB
10312which is defined within the parser class (e.g., @code{YYParser.Lexer}).
10313The constructor of the parser object will then accept an object
10314implementing the interface; @code{%lex-param} is not used in this
10315case.
10316
10317In both cases, the scanner has to implement the following methods.
10318
e254a580
DJ
10319@deftypemethod {Lexer} {void} yyerror (Location @var{loc}, String @var{msg})
10320This method is defined by the user to emit an error message. The first
10321parameter is omitted if location tracking is not active. Its type can be
10322changed using @code{%define location_type "@var{class-name}".}
8405b70c
PB
10323@end deftypemethod
10324
e254a580 10325@deftypemethod {Lexer} {int} yylex ()
8405b70c 10326Return the next token. Its type is the return value, its semantic
c781580d 10327value and location are saved and returned by the their methods in the
e254a580
DJ
10328interface.
10329
10330Use @code{%define lex_throws} to specify any uncaught exceptions.
10331Default is @code{java.io.IOException}.
8405b70c
PB
10332@end deftypemethod
10333
10334@deftypemethod {Lexer} {Position} getStartPos ()
10335@deftypemethodx {Lexer} {Position} getEndPos ()
01b477c6
PB
10336Return respectively the first position of the last token that
10337@code{yylex} returned, and the first position beyond it. These
10338methods are not needed unless location tracking is active.
8405b70c 10339
e254a580 10340The return type can be changed using @code{%define position_type
8405b70c
PB
10341"@var{class-name}".}
10342@end deftypemethod
10343
10344@deftypemethod {Lexer} {Object} getLVal ()
c781580d 10345Return the semantic value of the last token that yylex returned.
8405b70c 10346
e254a580 10347The return type can be changed using @code{%define stype
8405b70c
PB
10348"@var{class-name}".}
10349@end deftypemethod
10350
10351
e254a580
DJ
10352@node Java Action Features
10353@subsection Special Features for Use in Java Actions
10354
10355The following special constructs can be uses in Java actions.
10356Other analogous C action features are currently unavailable for Java.
10357
10358Use @code{%define throws} to specify any uncaught exceptions from parser
10359actions, and initial actions specified by @code{%initial-action}.
10360
10361@defvar $@var{n}
10362The semantic value for the @var{n}th component of the current rule.
10363This may not be assigned to.
10364@xref{Java Semantic Values}.
10365@end defvar
10366
10367@defvar $<@var{typealt}>@var{n}
10368Like @code{$@var{n}} but specifies a alternative type @var{typealt}.
10369@xref{Java Semantic Values}.
10370@end defvar
10371
10372@defvar $$
10373The semantic value for the grouping made by the current rule. As a
10374value, this is in the base type (@code{Object} or as specified by
10375@code{%define stype}) as in not cast to the declared subtype because
10376casts are not allowed on the left-hand side of Java assignments.
10377Use an explicit Java cast if the correct subtype is needed.
10378@xref{Java Semantic Values}.
10379@end defvar
10380
10381@defvar $<@var{typealt}>$
10382Same as @code{$$} since Java always allow assigning to the base type.
10383Perhaps we should use this and @code{$<>$} for the value and @code{$$}
10384for setting the value but there is currently no easy way to distinguish
10385these constructs.
10386@xref{Java Semantic Values}.
10387@end defvar
10388
10389@defvar @@@var{n}
10390The location information of the @var{n}th component of the current rule.
10391This may not be assigned to.
10392@xref{Java Location Values}.
10393@end defvar
10394
10395@defvar @@$
10396The location information of the grouping made by the current rule.
10397@xref{Java Location Values}.
10398@end defvar
10399
34a41a93 10400@deftypefn {Statement} return YYABORT @code{;}
e254a580
DJ
10401Return immediately from the parser, indicating failure.
10402@xref{Java Parser Interface}.
34a41a93 10403@end deftypefn
8405b70c 10404
34a41a93 10405@deftypefn {Statement} return YYACCEPT @code{;}
e254a580
DJ
10406Return immediately from the parser, indicating success.
10407@xref{Java Parser Interface}.
34a41a93 10408@end deftypefn
8405b70c 10409
34a41a93 10410@deftypefn {Statement} {return} YYERROR @code{;}
4a11b852 10411Start error recovery (without printing an error message).
e254a580 10412@xref{Error Recovery}.
34a41a93 10413@end deftypefn
8405b70c 10414
e254a580
DJ
10415@deftypefn {Function} {boolean} recovering ()
10416Return whether error recovery is being done. In this state, the parser
10417reads token until it reaches a known state, and then restarts normal
10418operation.
10419@xref{Error Recovery}.
10420@end deftypefn
8405b70c 10421
e254a580
DJ
10422@deftypefn {Function} {protected void} yyerror (String msg)
10423@deftypefnx {Function} {protected void} yyerror (Position pos, String msg)
10424@deftypefnx {Function} {protected void} yyerror (Location loc, String msg)
10425Print an error message using the @code{yyerror} method of the scanner
10426instance in use.
10427@end deftypefn
8405b70c 10428
8405b70c 10429
8405b70c
PB
10430@node Java Differences
10431@subsection Differences between C/C++ and Java Grammars
10432
10433The different structure of the Java language forces several differences
10434between C/C++ grammars, and grammars designed for Java parsers. This
29553547 10435section summarizes these differences.
8405b70c
PB
10436
10437@itemize
10438@item
01b477c6 10439Java lacks a preprocessor, so the @code{YYERROR}, @code{YYACCEPT},
8405b70c 10440@code{YYABORT} symbols (@pxref{Table of Symbols}) cannot obviously be
01b477c6
PB
10441macros. Instead, they should be preceded by @code{return} when they
10442appear in an action. The actual definition of these symbols is
8405b70c
PB
10443opaque to the Bison grammar, and it might change in the future. The
10444only meaningful operation that you can do, is to return them.
2ba03112 10445@xref{Java Action Features}.
8405b70c
PB
10446
10447Note that of these three symbols, only @code{YYACCEPT} and
10448@code{YYABORT} will cause a return from the @code{yyparse}
10449method@footnote{Java parsers include the actions in a separate
10450method than @code{yyparse} in order to have an intuitive syntax that
10451corresponds to these C macros.}.
10452
e254a580
DJ
10453@item
10454Java lacks unions, so @code{%union} has no effect. Instead, semantic
10455values have a common base type: @code{Object} or as specified by
c781580d 10456@samp{%define stype}. Angle brackets on @code{%token}, @code{type},
e254a580
DJ
10457@code{$@var{n}} and @code{$$} specify subtypes rather than fields of
10458an union. The type of @code{$$}, even with angle brackets, is the base
10459type since Java casts are not allow on the left-hand side of assignments.
10460Also, @code{$@var{n}} and @code{@@@var{n}} are not allowed on the
15cd62c2 10461left-hand side of assignments. @xref{Java Semantic Values}, and
2ba03112 10462@ref{Java Action Features}.
e254a580 10463
8405b70c 10464@item
c781580d 10465The prologue declarations have a different meaning than in C/C++ code.
01b477c6
PB
10466@table @asis
10467@item @code{%code imports}
10468blocks are placed at the beginning of the Java source code. They may
10469include copyright notices. For a @code{package} declarations, it is
10470suggested to use @code{%define package} instead.
8405b70c 10471
01b477c6
PB
10472@item unqualified @code{%code}
10473blocks are placed inside the parser class.
10474
10475@item @code{%code lexer}
10476blocks, if specified, should include the implementation of the
10477scanner. If there is no such block, the scanner can be any class
2ba03112 10478that implements the appropriate interface (@pxref{Java Scanner
01b477c6 10479Interface}).
29553547 10480@end table
8405b70c
PB
10481
10482Other @code{%code} blocks are not supported in Java parsers.
e254a580
DJ
10483In particular, @code{%@{ @dots{} %@}} blocks should not be used
10484and may give an error in future versions of Bison.
10485
01b477c6 10486The epilogue has the same meaning as in C/C++ code and it can
e254a580
DJ
10487be used to define other classes used by the parser @emph{outside}
10488the parser class.
8405b70c
PB
10489@end itemize
10490
e254a580
DJ
10491
10492@node Java Declarations Summary
10493@subsection Java Declarations Summary
10494
10495This summary only include declarations specific to Java or have special
10496meaning when used in a Java parser.
10497
10498@deffn {Directive} {%language "Java"}
10499Generate a Java class for the parser.
10500@end deffn
10501
10502@deffn {Directive} %lex-param @{@var{type} @var{name}@}
10503A parameter for the lexer class defined by @code{%code lexer}
10504@emph{only}, added as parameters to the lexer constructor and the parser
10505constructor that @emph{creates} a lexer. Default is none.
10506@xref{Java Scanner Interface}.
10507@end deffn
10508
10509@deffn {Directive} %name-prefix "@var{prefix}"
10510The prefix of the parser class name @code{@var{prefix}Parser} if
10511@code{%define parser_class_name} is not used. Default is @code{YY}.
10512@xref{Java Bison Interface}.
10513@end deffn
10514
10515@deffn {Directive} %parse-param @{@var{type} @var{name}@}
10516A parameter for the parser class added as parameters to constructor(s)
10517and as fields initialized by the constructor(s). Default is none.
10518@xref{Java Parser Interface}.
10519@end deffn
10520
10521@deffn {Directive} %token <@var{type}> @var{token} @dots{}
10522Declare tokens. Note that the angle brackets enclose a Java @emph{type}.
10523@xref{Java Semantic Values}.
10524@end deffn
10525
10526@deffn {Directive} %type <@var{type}> @var{nonterminal} @dots{}
10527Declare the type of nonterminals. Note that the angle brackets enclose
10528a Java @emph{type}.
10529@xref{Java Semantic Values}.
10530@end deffn
10531
10532@deffn {Directive} %code @{ @var{code} @dots{} @}
10533Code appended to the inside of the parser class.
10534@xref{Java Differences}.
10535@end deffn
10536
10537@deffn {Directive} {%code imports} @{ @var{code} @dots{} @}
10538Code inserted just after the @code{package} declaration.
10539@xref{Java Differences}.
10540@end deffn
10541
10542@deffn {Directive} {%code lexer} @{ @var{code} @dots{} @}
10543Code added to the body of a inner lexer class within the parser class.
10544@xref{Java Scanner Interface}.
10545@end deffn
10546
10547@deffn {Directive} %% @var{code} @dots{}
10548Code (after the second @code{%%}) appended to the end of the file,
10549@emph{outside} the parser class.
10550@xref{Java Differences}.
10551@end deffn
10552
10553@deffn {Directive} %@{ @var{code} @dots{} %@}
10554Not supported. Use @code{%code import} instead.
10555@xref{Java Differences}.
10556@end deffn
10557
10558@deffn {Directive} {%define abstract}
10559Whether the parser class is declared @code{abstract}. Default is false.
10560@xref{Java Bison Interface}.
10561@end deffn
10562
10563@deffn {Directive} {%define extends} "@var{superclass}"
10564The superclass of the parser class. Default is none.
10565@xref{Java Bison Interface}.
10566@end deffn
10567
10568@deffn {Directive} {%define final}
10569Whether the parser class is declared @code{final}. Default is false.
10570@xref{Java Bison Interface}.
10571@end deffn
10572
10573@deffn {Directive} {%define implements} "@var{interfaces}"
10574The implemented interfaces of the parser class, a comma-separated list.
10575Default is none.
10576@xref{Java Bison Interface}.
10577@end deffn
10578
10579@deffn {Directive} {%define lex_throws} "@var{exceptions}"
10580The exceptions thrown by the @code{yylex} method of the lexer, a
10581comma-separated list. Default is @code{java.io.IOException}.
10582@xref{Java Scanner Interface}.
10583@end deffn
10584
10585@deffn {Directive} {%define location_type} "@var{class}"
10586The name of the class used for locations (a range between two
10587positions). This class is generated as an inner class of the parser
10588class by @command{bison}. Default is @code{Location}.
10589@xref{Java Location Values}.
10590@end deffn
10591
10592@deffn {Directive} {%define package} "@var{package}"
10593The package to put the parser class in. Default is none.
10594@xref{Java Bison Interface}.
10595@end deffn
10596
10597@deffn {Directive} {%define parser_class_name} "@var{name}"
10598The name of the parser class. Default is @code{YYParser} or
10599@code{@var{name-prefix}Parser}.
10600@xref{Java Bison Interface}.
10601@end deffn
10602
10603@deffn {Directive} {%define position_type} "@var{class}"
10604The name of the class used for positions. This class must be supplied by
10605the user. Default is @code{Position}.
10606@xref{Java Location Values}.
10607@end deffn
10608
10609@deffn {Directive} {%define public}
10610Whether the parser class is declared @code{public}. Default is false.
10611@xref{Java Bison Interface}.
10612@end deffn
10613
10614@deffn {Directive} {%define stype} "@var{class}"
10615The base type of semantic values. Default is @code{Object}.
10616@xref{Java Semantic Values}.
10617@end deffn
10618
10619@deffn {Directive} {%define strictfp}
10620Whether the parser class is declared @code{strictfp}. Default is false.
10621@xref{Java Bison Interface}.
10622@end deffn
10623
10624@deffn {Directive} {%define throws} "@var{exceptions}"
10625The exceptions thrown by user-supplied parser actions and
10626@code{%initial-action}, a comma-separated list. Default is none.
10627@xref{Java Parser Interface}.
10628@end deffn
10629
10630
12545799 10631@c ================================================= FAQ
d1a1114f
AD
10632
10633@node FAQ
10634@chapter Frequently Asked Questions
10635@cindex frequently asked questions
10636@cindex questions
10637
10638Several questions about Bison come up occasionally. Here some of them
10639are addressed.
10640
10641@menu
55ba27be
AD
10642* Memory Exhausted:: Breaking the Stack Limits
10643* How Can I Reset the Parser:: @code{yyparse} Keeps some State
10644* Strings are Destroyed:: @code{yylval} Loses Track of Strings
10645* Implementing Gotos/Loops:: Control Flow in the Calculator
ed2e6384 10646* Multiple start-symbols:: Factoring closely related grammars
35430378 10647* Secure? Conform?:: Is Bison POSIX safe?
55ba27be
AD
10648* I can't build Bison:: Troubleshooting
10649* Where can I find help?:: Troubleshouting
10650* Bug Reports:: Troublereporting
8405b70c 10651* More Languages:: Parsers in C++, Java, and so on
55ba27be
AD
10652* Beta Testing:: Experimenting development versions
10653* Mailing Lists:: Meeting other Bison users
d1a1114f
AD
10654@end menu
10655
1a059451
PE
10656@node Memory Exhausted
10657@section Memory Exhausted
d1a1114f 10658
ab8932bf 10659@quotation
1a059451 10660My parser returns with error with a @samp{memory exhausted}
d1a1114f 10661message. What can I do?
ab8932bf 10662@end quotation
d1a1114f 10663
188867ac
AD
10664This question is already addressed elsewhere, see @ref{Recursion, ,Recursive
10665Rules}.
d1a1114f 10666
e64fec0a
PE
10667@node How Can I Reset the Parser
10668@section How Can I Reset the Parser
5b066063 10669
0e14ad77
PE
10670The following phenomenon has several symptoms, resulting in the
10671following typical questions:
5b066063 10672
ab8932bf 10673@quotation
5b066063
AD
10674I invoke @code{yyparse} several times, and on correct input it works
10675properly; but when a parse error is found, all the other calls fail
0e14ad77 10676too. How can I reset the error flag of @code{yyparse}?
ab8932bf 10677@end quotation
5b066063
AD
10678
10679@noindent
10680or
10681
ab8932bf 10682@quotation
0e14ad77 10683My parser includes support for an @samp{#include}-like feature, in
5b066063 10684which case I run @code{yyparse} from @code{yyparse}. This fails
ab8932bf
AD
10685although I did specify @samp{%define api.pure}.
10686@end quotation
5b066063 10687
0e14ad77
PE
10688These problems typically come not from Bison itself, but from
10689Lex-generated scanners. Because these scanners use large buffers for
5b066063
AD
10690speed, they might not notice a change of input file. As a
10691demonstration, consider the following source file,
10692@file{first-line.l}:
10693
98842516
AD
10694@example
10695@group
10696%@{
5b066063
AD
10697#include <stdio.h>
10698#include <stdlib.h>
98842516
AD
10699%@}
10700@end group
5b066063
AD
10701%%
10702.*\n ECHO; return 1;
10703%%
98842516 10704@group
5b066063 10705int
0e14ad77 10706yyparse (char const *file)
98842516 10707@{
5b066063
AD
10708 yyin = fopen (file, "r");
10709 if (!yyin)
98842516
AD
10710 @{
10711 perror ("fopen");
10712 exit (EXIT_FAILURE);
10713 @}
10714@end group
10715@group
fa7e68c3 10716 /* One token only. */
5b066063 10717 yylex ();
0e14ad77 10718 if (fclose (yyin) != 0)
98842516
AD
10719 @{
10720 perror ("fclose");
10721 exit (EXIT_FAILURE);
10722 @}
5b066063 10723 return 0;
98842516
AD
10724@}
10725@end group
5b066063 10726
98842516 10727@group
5b066063 10728int
0e14ad77 10729main (void)
98842516 10730@{
5b066063
AD
10731 yyparse ("input");
10732 yyparse ("input");
10733 return 0;
98842516
AD
10734@}
10735@end group
10736@end example
5b066063
AD
10737
10738@noindent
10739If the file @file{input} contains
10740
ab8932bf 10741@example
5b066063
AD
10742input:1: Hello,
10743input:2: World!
ab8932bf 10744@end example
5b066063
AD
10745
10746@noindent
0e14ad77 10747then instead of getting the first line twice, you get:
5b066063
AD
10748
10749@example
10750$ @kbd{flex -ofirst-line.c first-line.l}
10751$ @kbd{gcc -ofirst-line first-line.c -ll}
10752$ @kbd{./first-line}
10753input:1: Hello,
10754input:2: World!
10755@end example
10756
0e14ad77
PE
10757Therefore, whenever you change @code{yyin}, you must tell the
10758Lex-generated scanner to discard its current buffer and switch to the
10759new one. This depends upon your implementation of Lex; see its
10760documentation for more. For Flex, it suffices to call
10761@samp{YY_FLUSH_BUFFER} after each change to @code{yyin}. If your
10762Flex-generated scanner needs to read from several input streams to
10763handle features like include files, you might consider using Flex
10764functions like @samp{yy_switch_to_buffer} that manipulate multiple
10765input buffers.
5b066063 10766
b165c324
AD
10767If your Flex-generated scanner uses start conditions (@pxref{Start
10768conditions, , Start conditions, flex, The Flex Manual}), you might
10769also want to reset the scanner's state, i.e., go back to the initial
10770start condition, through a call to @samp{BEGIN (0)}.
10771
fef4cb51
AD
10772@node Strings are Destroyed
10773@section Strings are Destroyed
10774
ab8932bf 10775@quotation
c7e441b4 10776My parser seems to destroy old strings, or maybe it loses track of
fef4cb51
AD
10777them. Instead of reporting @samp{"foo", "bar"}, it reports
10778@samp{"bar", "bar"}, or even @samp{"foo\nbar", "bar"}.
ab8932bf 10779@end quotation
fef4cb51
AD
10780
10781This error is probably the single most frequent ``bug report'' sent to
10782Bison lists, but is only concerned with a misunderstanding of the role
8c5b881d 10783of the scanner. Consider the following Lex code:
fef4cb51 10784
ab8932bf 10785@example
98842516 10786@group
ab8932bf 10787%@{
fef4cb51
AD
10788#include <stdio.h>
10789char *yylval = NULL;
ab8932bf 10790%@}
98842516
AD
10791@end group
10792@group
fef4cb51
AD
10793%%
10794.* yylval = yytext; return 1;
10795\n /* IGNORE */
10796%%
98842516
AD
10797@end group
10798@group
fef4cb51
AD
10799int
10800main ()
ab8932bf 10801@{
fa7e68c3 10802 /* Similar to using $1, $2 in a Bison action. */
fef4cb51
AD
10803 char *fst = (yylex (), yylval);
10804 char *snd = (yylex (), yylval);
10805 printf ("\"%s\", \"%s\"\n", fst, snd);
10806 return 0;
ab8932bf 10807@}
98842516 10808@end group
ab8932bf 10809@end example
fef4cb51
AD
10810
10811If you compile and run this code, you get:
10812
10813@example
10814$ @kbd{flex -osplit-lines.c split-lines.l}
10815$ @kbd{gcc -osplit-lines split-lines.c -ll}
10816$ @kbd{printf 'one\ntwo\n' | ./split-lines}
10817"one
10818two", "two"
10819@end example
10820
10821@noindent
10822this is because @code{yytext} is a buffer provided for @emph{reading}
10823in the action, but if you want to keep it, you have to duplicate it
10824(e.g., using @code{strdup}). Note that the output may depend on how
10825your implementation of Lex handles @code{yytext}. For instance, when
10826given the Lex compatibility option @option{-l} (which triggers the
10827option @samp{%array}) Flex generates a different behavior:
10828
10829@example
10830$ @kbd{flex -l -osplit-lines.c split-lines.l}
10831$ @kbd{gcc -osplit-lines split-lines.c -ll}
10832$ @kbd{printf 'one\ntwo\n' | ./split-lines}
10833"two", "two"
10834@end example
10835
10836
2fa09258
AD
10837@node Implementing Gotos/Loops
10838@section Implementing Gotos/Loops
a06ea4aa 10839
ab8932bf 10840@quotation
a06ea4aa 10841My simple calculator supports variables, assignments, and functions,
2fa09258 10842but how can I implement gotos, or loops?
ab8932bf 10843@end quotation
a06ea4aa
AD
10844
10845Although very pedagogical, the examples included in the document blur
a1c84f45 10846the distinction to make between the parser---whose job is to recover
a06ea4aa 10847the structure of a text and to transmit it to subsequent modules of
a1c84f45 10848the program---and the processing (such as the execution) of this
a06ea4aa
AD
10849structure. This works well with so called straight line programs,
10850i.e., precisely those that have a straightforward execution model:
10851execute simple instructions one after the others.
10852
10853@cindex abstract syntax tree
35430378 10854@cindex AST
a06ea4aa
AD
10855If you want a richer model, you will probably need to use the parser
10856to construct a tree that does represent the structure it has
10857recovered; this tree is usually called the @dfn{abstract syntax tree},
35430378 10858or @dfn{AST} for short. Then, walking through this tree,
a06ea4aa
AD
10859traversing it in various ways, will enable treatments such as its
10860execution or its translation, which will result in an interpreter or a
10861compiler.
10862
10863This topic is way beyond the scope of this manual, and the reader is
10864invited to consult the dedicated literature.
10865
10866
ed2e6384
AD
10867@node Multiple start-symbols
10868@section Multiple start-symbols
10869
ab8932bf 10870@quotation
ed2e6384
AD
10871I have several closely related grammars, and I would like to share their
10872implementations. In fact, I could use a single grammar but with
10873multiple entry points.
ab8932bf 10874@end quotation
ed2e6384
AD
10875
10876Bison does not support multiple start-symbols, but there is a very
10877simple means to simulate them. If @code{foo} and @code{bar} are the two
10878pseudo start-symbols, then introduce two new tokens, say
10879@code{START_FOO} and @code{START_BAR}, and use them as switches from the
10880real start-symbol:
10881
10882@example
10883%token START_FOO START_BAR;
10884%start start;
de6be119
AD
10885start:
10886 START_FOO foo
10887| START_BAR bar;
ed2e6384
AD
10888@end example
10889
10890These tokens prevents the introduction of new conflicts. As far as the
10891parser goes, that is all that is needed.
10892
10893Now the difficult part is ensuring that the scanner will send these
10894tokens first. If your scanner is hand-written, that should be
10895straightforward. If your scanner is generated by Lex, them there is
10896simple means to do it: recall that anything between @samp{%@{ ... %@}}
10897after the first @code{%%} is copied verbatim in the top of the generated
10898@code{yylex} function. Make sure a variable @code{start_token} is
10899available in the scanner (e.g., a global variable or using
10900@code{%lex-param} etc.), and use the following:
10901
10902@example
10903 /* @r{Prologue.} */
10904%%
10905%@{
10906 if (start_token)
10907 @{
10908 int t = start_token;
10909 start_token = 0;
10910 return t;
10911 @}
10912%@}
10913 /* @r{The rules.} */
10914@end example
10915
10916
55ba27be
AD
10917@node Secure? Conform?
10918@section Secure? Conform?
10919
ab8932bf 10920@quotation
55ba27be 10921Is Bison secure? Does it conform to POSIX?
ab8932bf 10922@end quotation
55ba27be
AD
10923
10924If you're looking for a guarantee or certification, we don't provide it.
10925However, Bison is intended to be a reliable program that conforms to the
35430378 10926POSIX specification for Yacc. If you run into problems,
55ba27be
AD
10927please send us a bug report.
10928
10929@node I can't build Bison
10930@section I can't build Bison
10931
ab8932bf 10932@quotation
8c5b881d
PE
10933I can't build Bison because @command{make} complains that
10934@code{msgfmt} is not found.
55ba27be 10935What should I do?
ab8932bf 10936@end quotation
55ba27be
AD
10937
10938Like most GNU packages with internationalization support, that feature
10939is turned on by default. If you have problems building in the @file{po}
10940subdirectory, it indicates that your system's internationalization
10941support is lacking. You can re-configure Bison with
10942@option{--disable-nls} to turn off this support, or you can install GNU
10943gettext from @url{ftp://ftp.gnu.org/gnu/gettext/} and re-configure
10944Bison. See the file @file{ABOUT-NLS} for more information.
10945
10946
10947@node Where can I find help?
10948@section Where can I find help?
10949
ab8932bf 10950@quotation
55ba27be 10951I'm having trouble using Bison. Where can I find help?
ab8932bf 10952@end quotation
55ba27be
AD
10953
10954First, read this fine manual. Beyond that, you can send mail to
10955@email{help-bison@@gnu.org}. This mailing list is intended to be
10956populated with people who are willing to answer questions about using
10957and installing Bison. Please keep in mind that (most of) the people on
10958the list have aspects of their lives which are not related to Bison (!),
10959so you may not receive an answer to your question right away. This can
10960be frustrating, but please try not to honk them off; remember that any
10961help they provide is purely voluntary and out of the kindness of their
10962hearts.
10963
10964@node Bug Reports
10965@section Bug Reports
10966
ab8932bf 10967@quotation
55ba27be 10968I found a bug. What should I include in the bug report?
ab8932bf 10969@end quotation
55ba27be
AD
10970
10971Before you send a bug report, make sure you are using the latest
10972version. Check @url{ftp://ftp.gnu.org/pub/gnu/bison/} or one of its
10973mirrors. Be sure to include the version number in your bug report. If
10974the bug is present in the latest version but not in a previous version,
10975try to determine the most recent version which did not contain the bug.
10976
10977If the bug is parser-related, you should include the smallest grammar
10978you can which demonstrates the bug. The grammar file should also be
10979complete (i.e., I should be able to run it through Bison without having
10980to edit or add anything). The smaller and simpler the grammar, the
10981easier it will be to fix the bug.
10982
10983Include information about your compilation environment, including your
10984operating system's name and version and your compiler's name and
10985version. If you have trouble compiling, you should also include a
10986transcript of the build session, starting with the invocation of
10987`configure'. Depending on the nature of the bug, you may be asked to
10988send additional files as well (such as `config.h' or `config.cache').
10989
10990Patches are most welcome, but not required. That is, do not hesitate to
d6864e19 10991send a bug report just because you cannot provide a fix.
55ba27be
AD
10992
10993Send bug reports to @email{bug-bison@@gnu.org}.
10994
8405b70c
PB
10995@node More Languages
10996@section More Languages
55ba27be 10997
ab8932bf 10998@quotation
8405b70c 10999Will Bison ever have C++ and Java support? How about @var{insert your
55ba27be 11000favorite language here}?
ab8932bf 11001@end quotation
55ba27be 11002
8405b70c 11003C++ and Java support is there now, and is documented. We'd love to add other
55ba27be
AD
11004languages; contributions are welcome.
11005
11006@node Beta Testing
11007@section Beta Testing
11008
ab8932bf 11009@quotation
55ba27be 11010What is involved in being a beta tester?
ab8932bf 11011@end quotation
55ba27be
AD
11012
11013It's not terribly involved. Basically, you would download a test
11014release, compile it, and use it to build and run a parser or two. After
11015that, you would submit either a bug report or a message saying that
11016everything is okay. It is important to report successes as well as
11017failures because test releases eventually become mainstream releases,
11018but only if they are adequately tested. If no one tests, development is
11019essentially halted.
11020
11021Beta testers are particularly needed for operating systems to which the
11022developers do not have easy access. They currently have easy access to
11023recent GNU/Linux and Solaris versions. Reports about other operating
11024systems are especially welcome.
11025
11026@node Mailing Lists
11027@section Mailing Lists
11028
ab8932bf 11029@quotation
55ba27be 11030How do I join the help-bison and bug-bison mailing lists?
ab8932bf 11031@end quotation
55ba27be
AD
11032
11033See @url{http://lists.gnu.org/}.
a06ea4aa 11034
d1a1114f
AD
11035@c ================================================= Table of Symbols
11036
342b8b6e 11037@node Table of Symbols
bfa74976
RS
11038@appendix Bison Symbols
11039@cindex Bison symbols, table of
11040@cindex symbols in Bison, table of
11041
18b519c0 11042@deffn {Variable} @@$
3ded9a63 11043In an action, the location of the left-hand side of the rule.
7404cdf3 11044@xref{Tracking Locations}.
18b519c0 11045@end deffn
3ded9a63 11046
18b519c0 11047@deffn {Variable} @@@var{n}
7404cdf3
JD
11048In an action, the location of the @var{n}-th symbol of the right-hand side
11049of the rule. @xref{Tracking Locations}.
18b519c0 11050@end deffn
3ded9a63 11051
1f68dca5 11052@deffn {Variable} @@@var{name}
7404cdf3
JD
11053In an action, the location of a symbol addressed by name. @xref{Tracking
11054Locations}.
1f68dca5
AR
11055@end deffn
11056
11057@deffn {Variable} @@[@var{name}]
7404cdf3
JD
11058In an action, the location of a symbol addressed by name. @xref{Tracking
11059Locations}.
1f68dca5
AR
11060@end deffn
11061
18b519c0 11062@deffn {Variable} $$
3ded9a63
AD
11063In an action, the semantic value of the left-hand side of the rule.
11064@xref{Actions}.
18b519c0 11065@end deffn
3ded9a63 11066
18b519c0 11067@deffn {Variable} $@var{n}
3ded9a63
AD
11068In an action, the semantic value of the @var{n}-th symbol of the
11069right-hand side of the rule. @xref{Actions}.
18b519c0 11070@end deffn
3ded9a63 11071
1f68dca5
AR
11072@deffn {Variable} $@var{name}
11073In an action, the semantic value of a symbol addressed by name.
11074@xref{Actions}.
11075@end deffn
11076
11077@deffn {Variable} $[@var{name}]
11078In an action, the semantic value of a symbol addressed by name.
11079@xref{Actions}.
11080@end deffn
11081
dd8d9022
AD
11082@deffn {Delimiter} %%
11083Delimiter used to separate the grammar rule section from the
11084Bison declarations section or the epilogue.
11085@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
18b519c0 11086@end deffn
bfa74976 11087
dd8d9022
AD
11088@c Don't insert spaces, or check the DVI output.
11089@deffn {Delimiter} %@{@var{code}%@}
9913d6e4
JD
11090All code listed between @samp{%@{} and @samp{%@}} is copied verbatim
11091to the parser implementation file. Such code forms the prologue of
11092the grammar file. @xref{Grammar Outline, ,Outline of a Bison
dd8d9022 11093Grammar}.
18b519c0 11094@end deffn
bfa74976 11095
dd8d9022
AD
11096@deffn {Construct} /*@dots{}*/
11097Comment delimiters, as in C.
18b519c0 11098@end deffn
bfa74976 11099
dd8d9022
AD
11100@deffn {Delimiter} :
11101Separates a rule's result from its components. @xref{Rules, ,Syntax of
11102Grammar Rules}.
18b519c0 11103@end deffn
bfa74976 11104
dd8d9022
AD
11105@deffn {Delimiter} ;
11106Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}.
18b519c0 11107@end deffn
bfa74976 11108
dd8d9022
AD
11109@deffn {Delimiter} |
11110Separates alternate rules for the same result nonterminal.
11111@xref{Rules, ,Syntax of Grammar Rules}.
18b519c0 11112@end deffn
bfa74976 11113
12e35840
JD
11114@deffn {Directive} <*>
11115Used to define a default tagged @code{%destructor} or default tagged
11116@code{%printer}.
85894313
JD
11117
11118This feature is experimental.
11119More user feedback will help to determine whether it should become a permanent
11120feature.
11121
12e35840
JD
11122@xref{Destructor Decl, , Freeing Discarded Symbols}.
11123@end deffn
11124
3ebecc24 11125@deffn {Directive} <>
12e35840
JD
11126Used to define a default tagless @code{%destructor} or default tagless
11127@code{%printer}.
85894313
JD
11128
11129This feature is experimental.
11130More user feedback will help to determine whether it should become a permanent
11131feature.
11132
12e35840
JD
11133@xref{Destructor Decl, , Freeing Discarded Symbols}.
11134@end deffn
11135
dd8d9022
AD
11136@deffn {Symbol} $accept
11137The predefined nonterminal whose only rule is @samp{$accept: @var{start}
11138$end}, where @var{start} is the start symbol. @xref{Start Decl, , The
11139Start-Symbol}. It cannot be used in the grammar.
18b519c0 11140@end deffn
bfa74976 11141
136a0f76 11142@deffn {Directive} %code @{@var{code}@}
148d66d8 11143@deffnx {Directive} %code @var{qualifier} @{@var{code}@}
406dec82
JD
11144Insert @var{code} verbatim into the output parser source at the
11145default location or at the location specified by @var{qualifier}.
8e6f2266 11146@xref{%code Summary}.
9bc0dd67 11147@end deffn
9bc0dd67 11148
18b519c0 11149@deffn {Directive} %debug
6deb4447 11150Equip the parser for debugging. @xref{Decl Summary}.
18b519c0 11151@end deffn
6deb4447 11152
91d2c560 11153@ifset defaultprec
22fccf95
PE
11154@deffn {Directive} %default-prec
11155Assign a precedence to rules that lack an explicit @samp{%prec}
11156modifier. @xref{Contextual Precedence, ,Context-Dependent
11157Precedence}.
39a06c25 11158@end deffn
91d2c560 11159@end ifset
39a06c25 11160
6f04ee6c
JD
11161@deffn {Directive} %define @var{variable}
11162@deffnx {Directive} %define @var{variable} @var{value}
11163@deffnx {Directive} %define @var{variable} "@var{value}"
2f4518a1 11164Define a variable to adjust Bison's behavior. @xref{%define Summary}.
148d66d8
JD
11165@end deffn
11166
18b519c0 11167@deffn {Directive} %defines
9913d6e4
JD
11168Bison declaration to create a parser header file, which is usually
11169meant for the scanner. @xref{Decl Summary}.
18b519c0 11170@end deffn
6deb4447 11171
02975b9a
JD
11172@deffn {Directive} %defines @var{defines-file}
11173Same as above, but save in the file @var{defines-file}.
11174@xref{Decl Summary}.
11175@end deffn
11176
18b519c0 11177@deffn {Directive} %destructor
258b75ca 11178Specify how the parser should reclaim the memory associated to
fa7e68c3 11179discarded symbols. @xref{Destructor Decl, , Freeing Discarded Symbols}.
18b519c0 11180@end deffn
72f889cc 11181
18b519c0 11182@deffn {Directive} %dprec
676385e2 11183Bison declaration to assign a precedence to a rule that is used at parse
c827f760 11184time to resolve reduce/reduce conflicts. @xref{GLR Parsers, ,Writing
35430378 11185GLR Parsers}.
18b519c0 11186@end deffn
676385e2 11187
dd8d9022
AD
11188@deffn {Symbol} $end
11189The predefined token marking the end of the token stream. It cannot be
11190used in the grammar.
11191@end deffn
11192
11193@deffn {Symbol} error
11194A token name reserved for error recovery. This token may be used in
11195grammar rules so as to allow the Bison parser to recognize an error in
11196the grammar without halting the process. In effect, a sentence
11197containing an error may be recognized as valid. On a syntax error, the
742e4900
JD
11198token @code{error} becomes the current lookahead token. Actions
11199corresponding to @code{error} are then executed, and the lookahead
dd8d9022
AD
11200token is reset to the token that originally caused the violation.
11201@xref{Error Recovery}.
18d192f0
AD
11202@end deffn
11203
18b519c0 11204@deffn {Directive} %error-verbose
2a8d363a 11205Bison declaration to request verbose, specific error message strings
6f04ee6c 11206when @code{yyerror} is called. @xref{Error Reporting}.
18b519c0 11207@end deffn
2a8d363a 11208
02975b9a 11209@deffn {Directive} %file-prefix "@var{prefix}"
72d2299c 11210Bison declaration to set the prefix of the output files. @xref{Decl
d8988b2f 11211Summary}.
18b519c0 11212@end deffn
d8988b2f 11213
18b519c0 11214@deffn {Directive} %glr-parser
35430378
JD
11215Bison declaration to produce a GLR parser. @xref{GLR
11216Parsers, ,Writing GLR Parsers}.
18b519c0 11217@end deffn
676385e2 11218
dd8d9022
AD
11219@deffn {Directive} %initial-action
11220Run user code before parsing. @xref{Initial Action Decl, , Performing Actions before Parsing}.
11221@end deffn
11222
e6e704dc
JD
11223@deffn {Directive} %language
11224Specify the programming language for the generated parser.
11225@xref{Decl Summary}.
11226@end deffn
11227
18b519c0 11228@deffn {Directive} %left
bfa74976
RS
11229Bison declaration to assign left associativity to token(s).
11230@xref{Precedence Decl, ,Operator Precedence}.
18b519c0 11231@end deffn
bfa74976 11232
feeb0eda 11233@deffn {Directive} %lex-param @{@var{argument-declaration}@}
2a8d363a
AD
11234Bison declaration to specifying an additional parameter that
11235@code{yylex} should accept. @xref{Pure Calling,, Calling Conventions
11236for Pure Parsers}.
18b519c0 11237@end deffn
2a8d363a 11238
18b519c0 11239@deffn {Directive} %merge
676385e2 11240Bison declaration to assign a merging function to a rule. If there is a
fae437e8 11241reduce/reduce conflict with a rule having the same merging function, the
676385e2 11242function is applied to the two semantic values to get a single result.
35430378 11243@xref{GLR Parsers, ,Writing GLR Parsers}.
18b519c0 11244@end deffn
676385e2 11245
02975b9a 11246@deffn {Directive} %name-prefix "@var{prefix}"
4b3847c3
AD
11247Obsoleted by the @code{%define} variable @code{api.prefix} (@pxref{Multiple
11248Parsers, ,Multiple Parsers in the Same Program}).
11249
11250Rename the external symbols (variables and functions) used in the parser so
11251that they start with @var{prefix} instead of @samp{yy}. Contrary to
11252@code{api.prefix}, do no rename types and macros.
11253
11254The precise list of symbols renamed in C parsers is @code{yyparse},
11255@code{yylex}, @code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yychar},
11256@code{yydebug}, and (if locations are used) @code{yylloc}. If you use a
11257push parser, @code{yypush_parse}, @code{yypull_parse}, @code{yypstate},
11258@code{yypstate_new} and @code{yypstate_delete} will also be renamed. For
11259example, if you use @samp{%name-prefix "c_"}, the names become
11260@code{c_parse}, @code{c_lex}, and so on. For C++ parsers, see the
11261@code{%define namespace} documentation in this section.
18b519c0 11262@end deffn
d8988b2f 11263
4b3847c3 11264
91d2c560 11265@ifset defaultprec
22fccf95
PE
11266@deffn {Directive} %no-default-prec
11267Do not assign a precedence to rules that lack an explicit @samp{%prec}
11268modifier. @xref{Contextual Precedence, ,Context-Dependent
11269Precedence}.
11270@end deffn
91d2c560 11271@end ifset
22fccf95 11272
18b519c0 11273@deffn {Directive} %no-lines
931c7513 11274Bison declaration to avoid generating @code{#line} directives in the
9913d6e4 11275parser implementation file. @xref{Decl Summary}.
18b519c0 11276@end deffn
931c7513 11277
18b519c0 11278@deffn {Directive} %nonassoc
9d9b8b70 11279Bison declaration to assign nonassociativity to token(s).
bfa74976 11280@xref{Precedence Decl, ,Operator Precedence}.
18b519c0 11281@end deffn
bfa74976 11282
02975b9a 11283@deffn {Directive} %output "@var{file}"
9913d6e4
JD
11284Bison declaration to set the name of the parser implementation file.
11285@xref{Decl Summary}.
18b519c0 11286@end deffn
d8988b2f 11287
feeb0eda 11288@deffn {Directive} %parse-param @{@var{argument-declaration}@}
2a8d363a
AD
11289Bison declaration to specifying an additional parameter that
11290@code{yyparse} should accept. @xref{Parser Function,, The Parser
11291Function @code{yyparse}}.
18b519c0 11292@end deffn
2a8d363a 11293
18b519c0 11294@deffn {Directive} %prec
bfa74976
RS
11295Bison declaration to assign a precedence to a specific rule.
11296@xref{Contextual Precedence, ,Context-Dependent Precedence}.
18b519c0 11297@end deffn
bfa74976 11298
18b519c0 11299@deffn {Directive} %pure-parser
2f4518a1
JD
11300Deprecated version of @code{%define api.pure} (@pxref{%define
11301Summary,,api.pure}), for which Bison is more careful to warn about
11302unreasonable usage.
18b519c0 11303@end deffn
bfa74976 11304
b50d2359 11305@deffn {Directive} %require "@var{version}"
9b8a5ce0
AD
11306Require version @var{version} or higher of Bison. @xref{Require Decl, ,
11307Require a Version of Bison}.
b50d2359
AD
11308@end deffn
11309
18b519c0 11310@deffn {Directive} %right
bfa74976
RS
11311Bison declaration to assign right associativity to token(s).
11312@xref{Precedence Decl, ,Operator Precedence}.
18b519c0 11313@end deffn
bfa74976 11314
e6e704dc
JD
11315@deffn {Directive} %skeleton
11316Specify the skeleton to use; usually for development.
11317@xref{Decl Summary}.
11318@end deffn
11319
18b519c0 11320@deffn {Directive} %start
704a47c4
AD
11321Bison declaration to specify the start symbol. @xref{Start Decl, ,The
11322Start-Symbol}.
18b519c0 11323@end deffn
bfa74976 11324
18b519c0 11325@deffn {Directive} %token
bfa74976
RS
11326Bison declaration to declare token(s) without specifying precedence.
11327@xref{Token Decl, ,Token Type Names}.
18b519c0 11328@end deffn
bfa74976 11329
18b519c0 11330@deffn {Directive} %token-table
9913d6e4
JD
11331Bison declaration to include a token name table in the parser
11332implementation file. @xref{Decl Summary}.
18b519c0 11333@end deffn
931c7513 11334
18b519c0 11335@deffn {Directive} %type
704a47c4
AD
11336Bison declaration to declare nonterminals. @xref{Type Decl,
11337,Nonterminal Symbols}.
18b519c0 11338@end deffn
bfa74976 11339
dd8d9022
AD
11340@deffn {Symbol} $undefined
11341The predefined token onto which all undefined values returned by
11342@code{yylex} are mapped. It cannot be used in the grammar, rather, use
11343@code{error}.
11344@end deffn
11345
18b519c0 11346@deffn {Directive} %union
bfa74976
RS
11347Bison declaration to specify several possible data types for semantic
11348values. @xref{Union Decl, ,The Collection of Value Types}.
18b519c0 11349@end deffn
bfa74976 11350
dd8d9022
AD
11351@deffn {Macro} YYABORT
11352Macro to pretend that an unrecoverable syntax error has occurred, by
11353making @code{yyparse} return 1 immediately. The error reporting
11354function @code{yyerror} is not called. @xref{Parser Function, ,The
11355Parser Function @code{yyparse}}.
8405b70c
PB
11356
11357For Java parsers, this functionality is invoked using @code{return YYABORT;}
11358instead.
dd8d9022 11359@end deffn
3ded9a63 11360
dd8d9022
AD
11361@deffn {Macro} YYACCEPT
11362Macro to pretend that a complete utterance of the language has been
11363read, by making @code{yyparse} return 0 immediately.
11364@xref{Parser Function, ,The Parser Function @code{yyparse}}.
8405b70c
PB
11365
11366For Java parsers, this functionality is invoked using @code{return YYACCEPT;}
11367instead.
dd8d9022 11368@end deffn
bfa74976 11369
dd8d9022 11370@deffn {Macro} YYBACKUP
742e4900 11371Macro to discard a value from the parser stack and fake a lookahead
dd8d9022 11372token. @xref{Action Features, ,Special Features for Use in Actions}.
18b519c0 11373@end deffn
bfa74976 11374
dd8d9022 11375@deffn {Variable} yychar
32c29292 11376External integer variable that contains the integer value of the
742e4900 11377lookahead token. (In a pure parser, it is a local variable within
dd8d9022
AD
11378@code{yyparse}.) Error-recovery rule actions may examine this variable.
11379@xref{Action Features, ,Special Features for Use in Actions}.
18b519c0 11380@end deffn
bfa74976 11381
dd8d9022
AD
11382@deffn {Variable} yyclearin
11383Macro used in error-recovery rule actions. It clears the previous
742e4900 11384lookahead token. @xref{Error Recovery}.
18b519c0 11385@end deffn
bfa74976 11386
dd8d9022
AD
11387@deffn {Macro} YYDEBUG
11388Macro to define to equip the parser with tracing code. @xref{Tracing,
11389,Tracing Your Parser}.
18b519c0 11390@end deffn
bfa74976 11391
dd8d9022
AD
11392@deffn {Variable} yydebug
11393External integer variable set to zero by default. If @code{yydebug}
11394is given a nonzero value, the parser will output information on input
11395symbols and parser action. @xref{Tracing, ,Tracing Your Parser}.
18b519c0 11396@end deffn
bfa74976 11397
dd8d9022
AD
11398@deffn {Macro} yyerrok
11399Macro to cause parser to recover immediately to its normal mode
11400after a syntax error. @xref{Error Recovery}.
11401@end deffn
11402
11403@deffn {Macro} YYERROR
4a11b852
AD
11404Cause an immediate syntax error. This statement initiates error
11405recovery just as if the parser itself had detected an error; however, it
11406does not call @code{yyerror}, and does not print any message. If you
11407want to print an error message, call @code{yyerror} explicitly before
11408the @samp{YYERROR;} statement. @xref{Error Recovery}.
8405b70c
PB
11409
11410For Java parsers, this functionality is invoked using @code{return YYERROR;}
11411instead.
dd8d9022
AD
11412@end deffn
11413
11414@deffn {Function} yyerror
11415User-supplied function to be called by @code{yyparse} on error.
11416@xref{Error Reporting, ,The Error
11417Reporting Function @code{yyerror}}.
11418@end deffn
11419
11420@deffn {Macro} YYERROR_VERBOSE
11421An obsolete macro that you define with @code{#define} in the prologue
11422to request verbose, specific error message strings
11423when @code{yyerror} is called. It doesn't matter what definition you
258cddbc
AD
11424use for @code{YYERROR_VERBOSE}, just whether you define it.
11425Supported by the C skeletons only; using
6f04ee6c 11426@code{%error-verbose} is preferred. @xref{Error Reporting}.
dd8d9022
AD
11427@end deffn
11428
56d60c19
AD
11429@deffn {Macro} YYFPRINTF
11430Macro used to output run-time traces.
11431@xref{Enabling Traces}.
11432@end deffn
11433
dd8d9022
AD
11434@deffn {Macro} YYINITDEPTH
11435Macro for specifying the initial size of the parser stack.
1a059451 11436@xref{Memory Management}.
dd8d9022
AD
11437@end deffn
11438
11439@deffn {Function} yylex
11440User-supplied lexical analyzer function, called with no arguments to get
11441the next token. @xref{Lexical, ,The Lexical Analyzer Function
11442@code{yylex}}.
11443@end deffn
11444
11445@deffn {Macro} YYLEX_PARAM
11446An obsolete macro for specifying an extra argument (or list of extra
32c29292 11447arguments) for @code{yyparse} to pass to @code{yylex}. The use of this
dd8d9022
AD
11448macro is deprecated, and is supported only for Yacc like parsers.
11449@xref{Pure Calling,, Calling Conventions for Pure Parsers}.
11450@end deffn
11451
11452@deffn {Variable} yylloc
11453External variable in which @code{yylex} should place the line and column
11454numbers associated with a token. (In a pure parser, it is a local
11455variable within @code{yyparse}, and its address is passed to
32c29292
JD
11456@code{yylex}.)
11457You can ignore this variable if you don't use the @samp{@@} feature in the
11458grammar actions.
11459@xref{Token Locations, ,Textual Locations of Tokens}.
742e4900 11460In semantic actions, it stores the location of the lookahead token.
32c29292 11461@xref{Actions and Locations, ,Actions and Locations}.
dd8d9022
AD
11462@end deffn
11463
11464@deffn {Type} YYLTYPE
11465Data type of @code{yylloc}; by default, a structure with four
11466members. @xref{Location Type, , Data Types of Locations}.
11467@end deffn
11468
11469@deffn {Variable} yylval
11470External variable in which @code{yylex} should place the semantic
11471value associated with a token. (In a pure parser, it is a local
11472variable within @code{yyparse}, and its address is passed to
32c29292
JD
11473@code{yylex}.)
11474@xref{Token Values, ,Semantic Values of Tokens}.
742e4900 11475In semantic actions, it stores the semantic value of the lookahead token.
32c29292 11476@xref{Actions, ,Actions}.
dd8d9022
AD
11477@end deffn
11478
11479@deffn {Macro} YYMAXDEPTH
1a059451
PE
11480Macro for specifying the maximum size of the parser stack. @xref{Memory
11481Management}.
dd8d9022
AD
11482@end deffn
11483
11484@deffn {Variable} yynerrs
8a2800e7 11485Global variable which Bison increments each time it reports a syntax error.
f4101aa6 11486(In a pure parser, it is a local variable within @code{yyparse}. In a
9987d1b3 11487pure push parser, it is a member of yypstate.)
dd8d9022
AD
11488@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
11489@end deffn
11490
11491@deffn {Function} yyparse
11492The parser function produced by Bison; call this function to start
11493parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
11494@end deffn
11495
56d60c19
AD
11496@deffn {Macro} YYPRINT
11497Macro used to output token semantic values. For @file{yacc.c} only.
11498Obsoleted by @code{%printer}.
11499@xref{The YYPRINT Macro, , The @code{YYPRINT} Macro}.
11500@end deffn
11501
9987d1b3 11502@deffn {Function} yypstate_delete
f4101aa6 11503The function to delete a parser instance, produced by Bison in push mode;
9987d1b3 11504call this function to delete the memory associated with a parser.
f4101aa6 11505@xref{Parser Delete Function, ,The Parser Delete Function
9987d1b3 11506@code{yypstate_delete}}.
59da312b
JD
11507(The current push parsing interface is experimental and may evolve.
11508More user feedback will help to stabilize it.)
9987d1b3
JD
11509@end deffn
11510
11511@deffn {Function} yypstate_new
f4101aa6 11512The function to create a parser instance, produced by Bison in push mode;
9987d1b3 11513call this function to create a new parser.
f4101aa6 11514@xref{Parser Create Function, ,The Parser Create Function
9987d1b3 11515@code{yypstate_new}}.
59da312b
JD
11516(The current push parsing interface is experimental and may evolve.
11517More user feedback will help to stabilize it.)
9987d1b3
JD
11518@end deffn
11519
11520@deffn {Function} yypull_parse
f4101aa6
AD
11521The parser function produced by Bison in push mode; call this function to
11522parse the rest of the input stream.
11523@xref{Pull Parser Function, ,The Pull Parser Function
9987d1b3 11524@code{yypull_parse}}.
59da312b
JD
11525(The current push parsing interface is experimental and may evolve.
11526More user feedback will help to stabilize it.)
9987d1b3
JD
11527@end deffn
11528
11529@deffn {Function} yypush_parse
f4101aa6
AD
11530The parser function produced by Bison in push mode; call this function to
11531parse a single token. @xref{Push Parser Function, ,The Push Parser Function
9987d1b3 11532@code{yypush_parse}}.
59da312b
JD
11533(The current push parsing interface is experimental and may evolve.
11534More user feedback will help to stabilize it.)
9987d1b3
JD
11535@end deffn
11536
dd8d9022
AD
11537@deffn {Macro} YYPARSE_PARAM
11538An obsolete macro for specifying the name of a parameter that
11539@code{yyparse} should accept. The use of this macro is deprecated, and
11540is supported only for Yacc like parsers. @xref{Pure Calling,, Calling
11541Conventions for Pure Parsers}.
11542@end deffn
11543
11544@deffn {Macro} YYRECOVERING
02103984
PE
11545The expression @code{YYRECOVERING ()} yields 1 when the parser
11546is recovering from a syntax error, and 0 otherwise.
11547@xref{Action Features, ,Special Features for Use in Actions}.
dd8d9022
AD
11548@end deffn
11549
11550@deffn {Macro} YYSTACK_USE_ALLOCA
34a6c2d1
JD
11551Macro used to control the use of @code{alloca} when the
11552deterministic parser in C needs to extend its stacks. If defined to 0,
d7e14fc0
PE
11553the parser will use @code{malloc} to extend its stacks. If defined to
115541, the parser will use @code{alloca}. Values other than 0 and 1 are
11555reserved for future Bison extensions. If not defined,
11556@code{YYSTACK_USE_ALLOCA} defaults to 0.
11557
55289366 11558In the all-too-common case where your code may run on a host with a
d7e14fc0
PE
11559limited stack and with unreliable stack-overflow checking, you should
11560set @code{YYMAXDEPTH} to a value that cannot possibly result in
11561unchecked stack overflow on any of your target hosts when
11562@code{alloca} is called. You can inspect the code that Bison
11563generates in order to determine the proper numeric values. This will
11564require some expertise in low-level implementation details.
dd8d9022
AD
11565@end deffn
11566
11567@deffn {Type} YYSTYPE
11568Data type of semantic values; @code{int} by default.
11569@xref{Value Type, ,Data Types of Semantic Values}.
18b519c0 11570@end deffn
bfa74976 11571
342b8b6e 11572@node Glossary
bfa74976
RS
11573@appendix Glossary
11574@cindex glossary
11575
11576@table @asis
6f04ee6c 11577@item Accepting state
34a6c2d1
JD
11578A state whose only action is the accept action.
11579The accepting state is thus a consistent state.
11580@xref{Understanding,,}.
11581
35430378 11582@item Backus-Naur Form (BNF; also called ``Backus Normal Form'')
c827f760
PE
11583Formal method of specifying context-free grammars originally proposed
11584by John Backus, and slightly improved by Peter Naur in his 1960-01-02
11585committee document contributing to what became the Algol 60 report.
11586@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
bfa74976 11587
6f04ee6c
JD
11588@item Consistent state
11589A state containing only one possible action. @xref{Default Reductions}.
34a6c2d1 11590
bfa74976
RS
11591@item Context-free grammars
11592Grammars specified as rules that can be applied regardless of context.
11593Thus, if there is a rule which says that an integer can be used as an
11594expression, integers are allowed @emph{anywhere} an expression is
89cab50d
AD
11595permitted. @xref{Language and Grammar, ,Languages and Context-Free
11596Grammars}.
bfa74976 11597
6f04ee6c 11598@item Default reduction
620b5727 11599The reduction that a parser should perform if the current parser state
2f4518a1 11600contains no other action for the lookahead token. In permitted parser
6f04ee6c
JD
11601states, Bison declares the reduction with the largest lookahead set to be
11602the default reduction and removes that lookahead set. @xref{Default
11603Reductions}.
11604
11605@item Defaulted state
11606A consistent state with a default reduction. @xref{Default Reductions}.
34a6c2d1 11607
bfa74976
RS
11608@item Dynamic allocation
11609Allocation of memory that occurs during execution, rather than at
11610compile time or on entry to a function.
11611
11612@item Empty string
11613Analogous to the empty set in set theory, the empty string is a
11614character string of length zero.
11615
11616@item Finite-state stack machine
11617A ``machine'' that has discrete states in which it is said to exist at
11618each instant in time. As input to the machine is processed, the
11619machine moves from state to state as specified by the logic of the
11620machine. In the case of the parser, the input is the language being
11621parsed, and the states correspond to various stages in the grammar
c827f760 11622rules. @xref{Algorithm, ,The Bison Parser Algorithm}.
bfa74976 11623
35430378 11624@item Generalized LR (GLR)
676385e2 11625A parsing algorithm that can handle all context-free grammars, including those
35430378 11626that are not LR(1). It resolves situations that Bison's
34a6c2d1 11627deterministic parsing
676385e2
PH
11628algorithm cannot by effectively splitting off multiple parsers, trying all
11629possible parsers, and discarding those that fail in the light of additional
c827f760 11630right context. @xref{Generalized LR Parsing, ,Generalized
35430378 11631LR Parsing}.
676385e2 11632
bfa74976
RS
11633@item Grouping
11634A language construct that is (in general) grammatically divisible;
c827f760 11635for example, `expression' or `declaration' in C@.
bfa74976
RS
11636@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
11637
6f04ee6c
JD
11638@item IELR(1) (Inadequacy Elimination LR(1))
11639A minimal LR(1) parser table construction algorithm. That is, given any
2f4518a1 11640context-free grammar, IELR(1) generates parser tables with the full
6f04ee6c
JD
11641language-recognition power of canonical LR(1) but with nearly the same
11642number of parser states as LALR(1). This reduction in parser states is
11643often an order of magnitude. More importantly, because canonical LR(1)'s
11644extra parser states may contain duplicate conflicts in the case of non-LR(1)
11645grammars, the number of conflicts for IELR(1) is often an order of magnitude
11646less as well. This can significantly reduce the complexity of developing a
11647grammar. @xref{LR Table Construction}.
34a6c2d1 11648
bfa74976
RS
11649@item Infix operator
11650An arithmetic operator that is placed between the operands on which it
11651performs some operation.
11652
11653@item Input stream
11654A continuous flow of data between devices or programs.
11655
35430378 11656@item LAC (Lookahead Correction)
4c38b19e 11657A parsing mechanism that fixes the problem of delayed syntax error
6f04ee6c
JD
11658detection, which is caused by LR state merging, default reductions, and the
11659use of @code{%nonassoc}. Delayed syntax error detection results in
11660unexpected semantic actions, initiation of error recovery in the wrong
11661syntactic context, and an incorrect list of expected tokens in a verbose
11662syntax error message. @xref{LAC}.
4c38b19e 11663
bfa74976
RS
11664@item Language construct
11665One of the typical usage schemas of the language. For example, one of
11666the constructs of the C language is the @code{if} statement.
11667@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
11668
11669@item Left associativity
11670Operators having left associativity are analyzed from left to right:
11671@samp{a+b+c} first computes @samp{a+b} and then combines with
11672@samp{c}. @xref{Precedence, ,Operator Precedence}.
11673
11674@item Left recursion
89cab50d
AD
11675A rule whose result symbol is also its first component symbol; for
11676example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive
11677Rules}.
bfa74976
RS
11678
11679@item Left-to-right parsing
11680Parsing a sentence of a language by analyzing it token by token from
c827f760 11681left to right. @xref{Algorithm, ,The Bison Parser Algorithm}.
bfa74976
RS
11682
11683@item Lexical analyzer (scanner)
11684A function that reads an input stream and returns tokens one by one.
11685@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
11686
11687@item Lexical tie-in
11688A flag, set by actions in the grammar rules, which alters the way
11689tokens are parsed. @xref{Lexical Tie-ins}.
11690
931c7513 11691@item Literal string token
14ded682 11692A token which consists of two or more fixed characters. @xref{Symbols}.
931c7513 11693
742e4900
JD
11694@item Lookahead token
11695A token already read but not yet shifted. @xref{Lookahead, ,Lookahead
89cab50d 11696Tokens}.
bfa74976 11697
35430378 11698@item LALR(1)
bfa74976 11699The class of context-free grammars that Bison (like most other parser
35430378 11700generators) can handle by default; a subset of LR(1).
5da0355a 11701@xref{Mysterious Conflicts}.
bfa74976 11702
35430378 11703@item LR(1)
bfa74976 11704The class of context-free grammars in which at most one token of
742e4900 11705lookahead is needed to disambiguate the parsing of any piece of input.
bfa74976
RS
11706
11707@item Nonterminal symbol
11708A grammar symbol standing for a grammatical construct that can
11709be expressed through rules in terms of smaller constructs; in other
11710words, a construct that is not a token. @xref{Symbols}.
11711
bfa74976
RS
11712@item Parser
11713A function that recognizes valid sentences of a language by analyzing
11714the syntax structure of a set of tokens passed to it from a lexical
11715analyzer.
11716
11717@item Postfix operator
11718An arithmetic operator that is placed after the operands upon which it
11719performs some operation.
11720
11721@item Reduction
11722Replacing a string of nonterminals and/or terminals with a single
89cab50d 11723nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison
c827f760 11724Parser Algorithm}.
bfa74976
RS
11725
11726@item Reentrant
11727A reentrant subprogram is a subprogram which can be in invoked any
11728number of times in parallel, without interference between the various
11729invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
11730
11731@item Reverse polish notation
11732A language in which all operators are postfix operators.
11733
11734@item Right recursion
89cab50d
AD
11735A rule whose result symbol is also its last component symbol; for
11736example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive
11737Rules}.
bfa74976
RS
11738
11739@item Semantics
11740In computer languages, the semantics are specified by the actions
11741taken for each instance of the language, i.e., the meaning of
11742each statement. @xref{Semantics, ,Defining Language Semantics}.
11743
11744@item Shift
11745A parser is said to shift when it makes the choice of analyzing
11746further input from the stream rather than reducing immediately some
c827f760 11747already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm}.
bfa74976
RS
11748
11749@item Single-character literal
11750A single character that is recognized and interpreted as is.
11751@xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
11752
11753@item Start symbol
11754The nonterminal symbol that stands for a complete valid utterance in
11755the language being parsed. The start symbol is usually listed as the
13863333 11756first nonterminal symbol in a language specification.
bfa74976
RS
11757@xref{Start Decl, ,The Start-Symbol}.
11758
11759@item Symbol table
11760A data structure where symbol names and associated data are stored
11761during parsing to allow for recognition and use of existing
11762information in repeated uses of a symbol. @xref{Multi-function Calc}.
11763
6e649e65
PE
11764@item Syntax error
11765An error encountered during parsing of an input stream due to invalid
11766syntax. @xref{Error Recovery}.
11767
bfa74976
RS
11768@item Token
11769A basic, grammatically indivisible unit of a language. The symbol
11770that describes a token in the grammar is a terminal symbol.
11771The input of the Bison parser is a stream of tokens which comes from
11772the lexical analyzer. @xref{Symbols}.
11773
11774@item Terminal symbol
89cab50d
AD
11775A grammar symbol that has no rules in the grammar and therefore is
11776grammatically indivisible. The piece of text it represents is a token.
11777@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
6f04ee6c
JD
11778
11779@item Unreachable state
11780A parser state to which there does not exist a sequence of transitions from
11781the parser's start state. A state can become unreachable during conflict
11782resolution. @xref{Unreachable States}.
bfa74976
RS
11783@end table
11784
342b8b6e 11785@node Copying This Manual
f2b5126e 11786@appendix Copying This Manual
f2b5126e
PB
11787@include fdl.texi
11788
71caec06
JD
11789@node Bibliography
11790@unnumbered Bibliography
11791
11792@table @asis
11793@item [Denny 2008]
11794Joel E. Denny and Brian A. Malloy, IELR(1): Practical LR(1) Parser Tables
11795for Non-LR(1) Grammars with Conflict Resolution, in @cite{Proceedings of the
117962008 ACM Symposium on Applied Computing} (SAC'08), ACM, New York, NY, USA,
11797pp.@: 240--245. @uref{http://dx.doi.org/10.1145/1363686.1363747}
11798
11799@item [Denny 2010 May]
11800Joel E. Denny, PSLR(1): Pseudo-Scannerless Minimal LR(1) for the
11801Deterministic Parsing of Composite Languages, Ph.D. Dissertation, Clemson
11802University, Clemson, SC, USA (May 2010).
11803@uref{http://proquest.umi.com/pqdlink?did=2041473591&Fmt=7&clientId=79356&RQT=309&VName=PQD}
11804
11805@item [Denny 2010 November]
11806Joel E. Denny and Brian A. Malloy, The IELR(1) Algorithm for Generating
11807Minimal LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution,
11808in @cite{Science of Computer Programming}, Vol.@: 75, Issue 11 (November
118092010), pp.@: 943--979. @uref{http://dx.doi.org/10.1016/j.scico.2009.08.001}
11810
11811@item [DeRemer 1982]
11812Frank DeRemer and Thomas Pennello, Efficient Computation of LALR(1)
11813Look-Ahead Sets, in @cite{ACM Transactions on Programming Languages and
11814Systems}, Vol.@: 4, No.@: 4 (October 1982), pp.@:
11815615--649. @uref{http://dx.doi.org/10.1145/69622.357187}
11816
11817@item [Knuth 1965]
11818Donald E. Knuth, On the Translation of Languages from Left to Right, in
11819@cite{Information and Control}, Vol.@: 8, Issue 6 (December 1965), pp.@:
11820607--639. @uref{http://dx.doi.org/10.1016/S0019-9958(65)90426-2}
11821
11822@item [Scott 2000]
11823Elizabeth Scott, Adrian Johnstone, and Shamsa Sadaf Hussain,
11824@cite{Tomita-Style Generalised LR Parsers}, Royal Holloway, University of
11825London, Department of Computer Science, TR-00-12 (December 2000).
11826@uref{http://www.cs.rhul.ac.uk/research/languages/publications/tomita_style_1.ps}
11827@end table
11828
f9b86351
AD
11829@node Index of Terms
11830@unnumbered Index of Terms
bfa74976
RS
11831
11832@printindex cp
11833
bfa74976 11834@bye
a06ea4aa 11835
232be91a
AD
11836@c LocalWords: texinfo setfilename settitle setchapternewpage finalout texi FSF
11837@c LocalWords: ifinfo smallbook shorttitlepage titlepage GPL FIXME iftex FSF's
11838@c LocalWords: akim fn cp syncodeindex vr tp synindex dircategory direntry Naur
11839@c LocalWords: ifset vskip pt filll insertcopying sp ISBN Etienne Suvasa Multi
11840@c LocalWords: ifnottex yyparse detailmenu GLR RPN Calc var Decls Rpcalc multi
11841@c LocalWords: rpcalc Lexer Expr ltcalc mfcalc yylex defaultprec Donnelly Gotos
11842@c LocalWords: yyerror pxref LR yylval cindex dfn LALR samp gpl BNF xref yypush
11843@c LocalWords: const int paren ifnotinfo AC noindent emph expr stmt findex lr
11844@c LocalWords: glr YYSTYPE TYPENAME prog dprec printf decl init stmtMerge POSIX
11845@c LocalWords: pre STDC GNUC endif yy YY alloca lf stddef stdlib YYDEBUG yypull
11846@c LocalWords: NUM exp subsubsection kbd Ctrl ctype EOF getchar isdigit nonfree
11847@c LocalWords: ungetc stdin scanf sc calc ulator ls lm cc NEG prec yyerrok rr
11848@c LocalWords: longjmp fprintf stderr yylloc YYLTYPE cos ln Stallman Destructor
56da1e52 11849@c LocalWords: symrec val tptr FNCT fnctptr func struct sym enum IEC syntaxes
232be91a
AD
11850@c LocalWords: fnct putsym getsym fname arith fncts atan ptr malloc sizeof Lex
11851@c LocalWords: strlen strcpy fctn strcmp isalpha symbuf realloc isalnum DOTDOT
11852@c LocalWords: ptypes itype YYPRINT trigraphs yytname expseq vindex dtype Unary
11853@c LocalWords: Rhs YYRHSLOC LE nonassoc op deffn typeless yynerrs nonterminal
11854@c LocalWords: yychar yydebug msg YYNTOKENS YYNNTS YYNRULES YYNSTATES reentrant
11855@c LocalWords: cparse clex deftypefun NE defmac YYACCEPT YYABORT param yypstate
11856@c LocalWords: strncmp intval tindex lvalp locp llocp typealt YYBACKUP subrange
11857@c LocalWords: YYEMPTY YYEOF YYRECOVERING yyclearin GE def UMINUS maybeword loc
11858@c LocalWords: Johnstone Shamsa Sadaf Hussain Tomita TR uref YYMAXDEPTH inline
56da1e52 11859@c LocalWords: YYINITDEPTH stmts ref initdcl maybeasm notype Lookahead yyoutput
232be91a
AD
11860@c LocalWords: hexflag STR exdent itemset asis DYYDEBUG YYFPRINTF args Autoconf
11861@c LocalWords: infile ypp yxx outfile itemx tex leaderfill Troubleshouting sqrt
11862@c LocalWords: hbox hss hfill tt ly yyin fopen fclose ofirst gcc ll lookahead
11863@c LocalWords: nbar yytext fst snd osplit ntwo strdup AST Troublereporting th
11864@c LocalWords: YYSTACK DVI fdl printindex IELR nondeterministic nonterminals ps
4c38b19e 11865@c LocalWords: subexpressions declarator nondeferred config libintl postfix LAC
56da1e52
AD
11866@c LocalWords: preprocessor nonpositive unary nonnumeric typedef extern rhs sr
11867@c LocalWords: yytokentype destructor multicharacter nonnull EBCDIC nterm LR's
232be91a 11868@c LocalWords: lvalue nonnegative XNUM CHR chr TAGLESS tagless stdout api TOK
56da1e52 11869@c LocalWords: destructors Reentrancy nonreentrant subgrammar nonassociative Ph
232be91a
AD
11870@c LocalWords: deffnx namespace xml goto lalr ielr runtime lex yacc yyps env
11871@c LocalWords: yystate variadic Unshift NLS gettext po UTF Automake LOCALEDIR
11872@c LocalWords: YYENABLE bindtextdomain Makefile DEFS CPPFLAGS DBISON DeRemer
56da1e52 11873@c LocalWords: autoreconf Pennello multisets nondeterminism Generalised baz ACM
232be91a 11874@c LocalWords: redeclare automata Dparse localedir datadir XSLT midrule Wno
56da1e52 11875@c LocalWords: Graphviz multitable headitem hh basename Doxygen fno filename
232be91a
AD
11876@c LocalWords: doxygen ival sval deftypemethod deallocate pos deftypemethodx
11877@c LocalWords: Ctor defcv defcvx arg accessors arithmetics CPP ifndef CALCXX
11878@c LocalWords: lexer's calcxx bool LPAREN RPAREN deallocation cerrno climits
11879@c LocalWords: cstdlib Debian undef yywrap unput noyywrap nounput zA yyleng
56da1e52 11880@c LocalWords: errno strtol ERANGE str strerror iostream argc argv Javadoc PSLR
232be91a
AD
11881@c LocalWords: bytecode initializers superclass stype ASTNode autoboxing nls
11882@c LocalWords: toString deftypeivar deftypeivarx deftypeop YYParser strictfp
11883@c LocalWords: superclasses boolean getErrorVerbose setErrorVerbose deftypecv
11884@c LocalWords: getDebugStream setDebugStream getDebugLevel setDebugLevel url
3746fc33 11885@c LocalWords: bisonVersion deftypecvx bisonSkeleton getStartPos getEndPos uint
56da1e52 11886@c LocalWords: getLVal defvar deftypefn deftypefnx gotos msgfmt Corbett LALR's
3746fc33
AD
11887@c LocalWords: subdirectory Solaris nonassociativity perror schemas Malloy ints
11888@c LocalWords: Scannerless ispell american ChangeLog smallexample CSTYPE CLTYPE
11889@c LocalWords: clval CDEBUG cdebug deftypeopx yyterminate
53e2cd1e
AD
11890@c LocalWords: parsers parser's
11891@c LocalWords: associativity subclasses precedences unresolvable runnable
11892@c LocalWords: allocators subunit initializations unreferenced untyped
11893@c LocalWords: errorVerbose subtype subtypes
f3103c5b
AD
11894
11895@c Local Variables:
11896@c ispell-dictionary: "american"
11897@c fill-column: 76
11898@c End: