]> git.saurik.com Git - bison.git/blame - doc/bison.texinfo
* src/atgeneral.m4: Update from Autoconf.
[bison.git] / doc / bison.texinfo
CommitLineData
bfa74976
RS
1\input texinfo @c -*-texinfo-*-
2@comment %**start of header
3@setfilename bison.info
df1af54c
JT
4@include version.texi
5@settitle Bison @value{VERSION}
bfa74976
RS
6@setchapternewpage odd
7
5378c3e7
DM
8@iftex
9@finalout
10@end iftex
11
13863333 12@c SMALL BOOK version
bfa74976 13@c This edition has been formatted so that you can format and print it in
13863333 14@c the smallbook format.
bfa74976
RS
15@c @smallbook
16
bfa74976
RS
17@c Set following if you have the new `shorttitlepage' command
18@c @clear shorttitlepage-enabled
19@c @set shorttitlepage-enabled
20
21@c ISPELL CHECK: done, 14 Jan 1993 --bob
22
23@c Check COPYRIGHT dates. should be updated in the titlepage, ifinfo
24@c titlepage; should NOT be changed in the GPL. --mew
25
26@iftex
27@syncodeindex fn cp
28@syncodeindex vr cp
29@syncodeindex tp cp
30@end iftex
31@ifinfo
32@synindex fn cp
33@synindex vr cp
34@synindex tp cp
35@end ifinfo
36@comment %**end of header
37
bd773d73
JT
38@ifinfo
39@format
40START-INFO-DIR-ENTRY
41* bison: (bison). GNU Project parser generator (yacc replacement).
42END-INFO-DIR-ENTRY
43@end format
44@end ifinfo
45
bfa74976
RS
46@ifinfo
47This file documents the Bison parser generator.
48
13863333
AD
49Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, 2000
50Free Software Foundation, Inc.
bfa74976
RS
51
52Permission is granted to make and distribute verbatim copies of
53this manual provided the copyright notice and this permission notice
54are preserved on all copies.
55
56@ignore
57Permission is granted to process this file through Tex and print the
58results, provided the printed document carries copying permission
59notice identical to this one except for the removal of this paragraph
60(this paragraph not being relevant to the printed manual).
61
62@end ignore
63Permission is granted to copy and distribute modified versions of this
64manual under the conditions for verbatim copying, provided also that the
65sections entitled ``GNU General Public License'' and ``Conditions for
66Using Bison'' are included exactly as in the original, and provided that
67the entire resulting derived work is distributed under the terms of a
68permission notice identical to this one.
69
70Permission is granted to copy and distribute translations of this manual
71into another language, under the above conditions for modified versions,
72except that the sections entitled ``GNU General Public License'',
73``Conditions for Using Bison'' and this permission notice may be
74included in translations approved by the Free Software Foundation
75instead of in the original English.
76@end ifinfo
77
78@ifset shorttitlepage-enabled
79@shorttitlepage Bison
80@end ifset
81@titlepage
82@title Bison
83@subtitle The YACC-compatible Parser Generator
df1af54c 84@subtitle @value{UPDATED}, Bison Version @value{VERSION}
bfa74976
RS
85
86@author by Charles Donnelly and Richard Stallman
87
88@page
89@vskip 0pt plus 1filll
13863333
AD
90Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
911999, 2000
92Free Software Foundation, Inc.
bfa74976
RS
93
94@sp 2
95Published by the Free Software Foundation @*
931c7513
RS
9659 Temple Place, Suite 330 @*
97Boston, MA 02111-1307 USA @*
9ecbd125
JT
98Printed copies are available from the Free Software Foundation.@*
99ISBN 1-882114-44-2
bfa74976
RS
100
101Permission is granted to make and distribute verbatim copies of
102this manual provided the copyright notice and this permission notice
103are preserved on all copies.
104
105@ignore
106Permission is granted to process this file through TeX and print the
107results, provided the printed document carries copying permission
108notice identical to this one except for the removal of this paragraph
109(this paragraph not being relevant to the printed manual).
110
111@end ignore
112Permission is granted to copy and distribute modified versions of this
113manual under the conditions for verbatim copying, provided also that the
114sections entitled ``GNU General Public License'' and ``Conditions for
115Using Bison'' are included exactly as in the original, and provided that
116the entire resulting derived work is distributed under the terms of a
117permission notice identical to this one.
118
119Permission is granted to copy and distribute translations of this manual
120into another language, under the above conditions for modified versions,
121except that the sections entitled ``GNU General Public License'',
122``Conditions for Using Bison'' and this permission notice may be
123included in translations approved by the Free Software Foundation
124instead of in the original English.
125@sp 2
126Cover art by Etienne Suvasa.
127@end titlepage
128@page
129
130@node Top, Introduction, (dir), (dir)
131
132@ifinfo
df1af54c 133This manual documents version @value{VERSION} of Bison.
bfa74976
RS
134@end ifinfo
135
136@menu
13863333
AD
137* Introduction::
138* Conditions::
bfa74976
RS
139* Copying:: The GNU General Public License says
140 how you can copy and share Bison
141
142Tutorial sections:
143* Concepts:: Basic concepts for understanding Bison.
144* Examples:: Three simple explained examples of using Bison.
145
146Reference sections:
147* Grammar File:: Writing Bison declarations and rules.
148* Interface:: C-language interface to the parser function @code{yyparse}.
149* Algorithm:: How the Bison parser works at run-time.
150* Error Recovery:: Writing rules for error recovery.
151* Context Dependency:: What to do if your language syntax is too
152 messy for Bison to handle straightforwardly.
153* Debugging:: Debugging Bison parsers that parse wrong.
154* Invocation:: How to run Bison (to produce the parser source file).
155* Table of Symbols:: All the keywords of the Bison language are explained.
156* Glossary:: Basic concepts are explained.
157* Index:: Cross-references to the text.
158
159 --- The Detailed Node Listing ---
160
161The Concepts of Bison
162
163* Language and Grammar:: Languages and context-free grammars,
164 as mathematical ideas.
165* Grammar in Bison:: How we represent grammars for Bison's sake.
166* Semantic Values:: Each token or syntactic grouping can have
167 a semantic value (the value of an integer,
168 the name of an identifier, etc.).
169* Semantic Actions:: Each rule can have an action containing C code.
170* Bison Parser:: What are Bison's input and output,
171 how is the output used?
172* Stages:: Stages in writing and running Bison grammars.
173* Grammar Layout:: Overall structure of a Bison grammar file.
174
175Examples
176
177* RPN Calc:: Reverse polish notation calculator;
178 a first example with no operator precedence.
179* Infix Calc:: Infix (algebraic) notation calculator.
180 Operator precedence is introduced.
181* Simple Error Recovery:: Continuing after syntax errors.
182* Multi-function Calc:: Calculator with memory and trig functions.
183 It uses multiple data-types for semantic values.
184* Exercises:: Ideas for improving the multi-function calculator.
185
186Reverse Polish Notation Calculator
187
188* Decls: Rpcalc Decls. Bison and C declarations for rpcalc.
189* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
190* Lexer: Rpcalc Lexer. The lexical analyzer.
191* Main: Rpcalc Main. The controlling function.
192* Error: Rpcalc Error. The error reporting function.
193* Gen: Rpcalc Gen. Running Bison on the grammar file.
194* Comp: Rpcalc Compile. Run the C compiler on the output code.
195
196Grammar Rules for @code{rpcalc}
197
13863333
AD
198* Rpcalc Input::
199* Rpcalc Line::
200* Rpcalc Expr::
bfa74976
RS
201
202Multi-Function Calculator: @code{mfcalc}
203
204* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
205* Rules: Mfcalc Rules. Grammar rules for the calculator.
206* Symtab: Mfcalc Symtab. Symbol table management subroutines.
207
208Bison Grammar Files
209
210* Grammar Outline:: Overall layout of the grammar file.
211* Symbols:: Terminal and nonterminal symbols.
212* Rules:: How to write grammar rules.
213* Recursion:: Writing recursive rules.
214* Semantics:: Semantic values and actions.
215* Declarations:: All kinds of Bison declarations are described here.
216* Multiple Parsers:: Putting more than one Bison parser in one program.
217
218Outline of a Bison Grammar
219
220* C Declarations:: Syntax and usage of the C declarations section.
221* Bison Declarations:: Syntax and usage of the Bison declarations section.
222* Grammar Rules:: Syntax and usage of the grammar rules section.
223* C Code:: Syntax and usage of the additional C code section.
224
225Defining Language Semantics
226
227* Value Type:: Specifying one data type for all semantic values.
228* Multiple Types:: Specifying several alternative data types.
229* Actions:: An action is the semantic definition of a grammar rule.
230* Action Types:: Specifying data types for actions to operate on.
231* Mid-Rule Actions:: Most actions go at the end of a rule.
232 This says when, why and how to use the exceptional
233 action in the middle of a rule.
234
235Bison Declarations
236
237* Token Decl:: Declaring terminal symbols.
238* Precedence Decl:: Declaring terminals with precedence and associativity.
239* Union Decl:: Declaring the set of all semantic value types.
240* Type Decl:: Declaring the choice of type for a nonterminal symbol.
241* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
242* Start Decl:: Specifying the start symbol.
243* Pure Decl:: Requesting a reentrant parser.
244* Decl Summary:: Table of all Bison declarations.
245
246Parser C-Language Interface
247
248* Parser Function:: How to call @code{yyparse} and what it returns.
13863333 249* Lexical:: You must supply a function @code{yylex}
bfa74976
RS
250 which reads tokens.
251* Error Reporting:: You must supply a function @code{yyerror}.
252* Action Features:: Special features for use in actions.
253
254The Lexical Analyzer Function @code{yylex}
255
256* Calling Convention:: How @code{yyparse} calls @code{yylex}.
257* Token Values:: How @code{yylex} must return the semantic value
258 of the token it has read.
259* Token Positions:: How @code{yylex} must return the text position
260 (line number, etc.) of the token, if the
261 actions want that.
262* Pure Calling:: How the calling convention differs
263 in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
264
13863333 265The Bison Parser Algorithm
bfa74976
RS
266
267* Look-Ahead:: Parser looks one token ahead when deciding what to do.
268* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
269* Precedence:: Operator precedence works by resolving conflicts.
270* Contextual Precedence:: When an operator's precedence depends on context.
271* Parser States:: The parser is a finite-state-machine with stack.
272* Reduce/Reduce:: When two rules are applicable in the same situation.
273* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
274* Stack Overflow:: What happens when stack gets full. How to avoid it.
275
276Operator Precedence
277
278* Why Precedence:: An example showing why precedence is needed.
279* Using Precedence:: How to specify precedence in Bison grammars.
280* Precedence Examples:: How these features are used in the previous example.
281* How Precedence:: How they work.
282
283Handling Context Dependencies
284
285* Semantic Tokens:: Token parsing can depend on the semantic context.
286* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
287* Tie-in Recovery:: Lexical tie-ins have implications for how
288 error recovery rules must be written.
289
290Invoking Bison
291
13863333 292* Bison Options:: All the options described in detail,
bfa74976
RS
293 in alphabetical order by short options.
294* Option Cross Key:: Alphabetical list of long options.
295* VMS Invocation:: Bison command syntax on VMS.
296@end menu
297
298@node Introduction, Conditions, Top, Top
299@unnumbered Introduction
300@cindex introduction
301
302@dfn{Bison} is a general-purpose parser generator that converts a
303grammar description for an LALR(1) context-free grammar into a C
304program to parse that grammar. Once you are proficient with Bison,
305you may use it to develop a wide range of language parsers, from those
306used in simple desk calculators to complex programming languages.
307
308Bison is upward compatible with Yacc: all properly-written Yacc grammars
309ought to work with Bison with no change. Anyone familiar with Yacc
310should be able to use Bison with little trouble. You need to be fluent in
311C programming in order to use Bison or to understand this manual.
312
313We begin with tutorial chapters that explain the basic concepts of using
314Bison and show three explained examples, each building on the last. If you
315don't know Bison or Yacc, start by reading these chapters. Reference
316chapters follow which describe specific aspects of Bison in detail.
317
931c7513
RS
318Bison was written primarily by Robert Corbett; Richard Stallman made it
319Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added
320multicharacter string literals and other features.
321
df1af54c 322This edition corresponds to version @value{VERSION} of Bison.
bfa74976
RS
323
324@node Conditions, Copying, Introduction, Top
325@unnumbered Conditions for Using Bison
326
a31239f1 327As of Bison version 1.24, we have changed the distribution terms for
9ecbd125 328@code{yyparse} to permit using Bison's output in nonfree programs.
a31239f1
RS
329Formerly, Bison parsers could be used only in programs that were free
330software.
331
332The other GNU programming tools, such as the GNU C compiler, have never
9ecbd125 333had such a requirement. They could always be used for nonfree
a31239f1
RS
334software. The reason Bison was different was not due to a special
335policy decision; it resulted from applying the usual General Public
336License to all of the Bison source code.
337
338The output of the Bison utility---the Bison parser file---contains a
339verbatim copy of a sizable piece of Bison, which is the code for the
340@code{yyparse} function. (The actions from your grammar are inserted
341into this function at one point, but the rest of the function is not
342changed.) When we applied the GPL terms to the code for @code{yyparse},
343the effect was to restrict the use of Bison output to free software.
344
345We didn't change the terms because of sympathy for people who want to
346make software proprietary. @strong{Software should be free.} But we
347concluded that limiting Bison's use to free software was doing little to
348encourage people to make other software free. So we decided to make the
349practical conditions for using Bison match the practical conditions for
350using the other GNU tools.
bfa74976
RS
351
352@node Copying, Concepts, Conditions, Top
353@unnumbered GNU GENERAL PUBLIC LICENSE
354@center Version 2, June 1991
355
356@display
357Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
c49a8e71 35859 Temple Place - Suite 330, Boston, MA 02111-1307, USA
bfa74976
RS
359
360Everyone is permitted to copy and distribute verbatim copies
361of this license document, but changing it is not allowed.
362@end display
363
364@unnumberedsec Preamble
365
366 The licenses for most software are designed to take away your
367freedom to share and change it. By contrast, the GNU General Public
368License is intended to guarantee your freedom to share and change free
369software---to make sure the software is free for all its users. This
370General Public License applies to most of the Free Software
371Foundation's software and to any other program whose authors commit to
372using it. (Some other Free Software Foundation software is covered by
373the GNU Library General Public License instead.) You can apply it to
374your programs, too.
375
376 When we speak of free software, we are referring to freedom, not
377price. Our General Public Licenses are designed to make sure that you
378have the freedom to distribute copies of free software (and charge for
379this service if you wish), that you receive source code or can get it
380if you want it, that you can change the software or use pieces of it
381in new free programs; and that you know you can do these things.
382
383 To protect your rights, we need to make restrictions that forbid
384anyone to deny you these rights or to ask you to surrender the rights.
385These restrictions translate to certain responsibilities for you if you
386distribute copies of the software, or if you modify it.
387
388 For example, if you distribute copies of such a program, whether
389gratis or for a fee, you must give the recipients all the rights that
390you have. You must make sure that they, too, receive or can get the
391source code. And you must show them these terms so they know their
392rights.
393
394 We protect your rights with two steps: (1) copyright the software, and
395(2) offer you this license which gives you legal permission to copy,
396distribute and/or modify the software.
397
398 Also, for each author's protection and ours, we want to make certain
399that everyone understands that there is no warranty for this free
400software. If the software is modified by someone else and passed on, we
401want its recipients to know that what they have is not the original, so
402that any problems introduced by others will not reflect on the original
403authors' reputations.
404
405 Finally, any free program is threatened constantly by software
406patents. We wish to avoid the danger that redistributors of a free
407program will individually obtain patent licenses, in effect making the
408program proprietary. To prevent this, we have made it clear that any
409patent must be licensed for everyone's free use or not licensed at all.
410
411 The precise terms and conditions for copying, distribution and
412modification follow.
413
414@iftex
415@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
416@end iftex
417@ifinfo
418@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
419@end ifinfo
420
0189fd92 421@enumerate 0
bfa74976
RS
422@item
423This License applies to any program or other work which contains
424a notice placed by the copyright holder saying it may be distributed
425under the terms of this General Public License. The ``Program'', below,
426refers to any such program or work, and a ``work based on the Program''
427means either the Program or any derivative work under copyright law:
428that is to say, a work containing the Program or a portion of it,
429either verbatim or with modifications and/or translated into another
430language. (Hereinafter, translation is included without limitation in
431the term ``modification''.) Each licensee is addressed as ``you''.
432
433Activities other than copying, distribution and modification are not
434covered by this License; they are outside its scope. The act of
435running the Program is not restricted, and the output from the Program
436is covered only if its contents constitute a work based on the
437Program (independent of having been made by running the Program).
438Whether that is true depends on what the Program does.
439
440@item
441You may copy and distribute verbatim copies of the Program's
442source code as you receive it, in any medium, provided that you
443conspicuously and appropriately publish on each copy an appropriate
444copyright notice and disclaimer of warranty; keep intact all the
445notices that refer to this License and to the absence of any warranty;
446and give any other recipients of the Program a copy of this License
447along with the Program.
448
449You may charge a fee for the physical act of transferring a copy, and
450you may at your option offer warranty protection in exchange for a fee.
451
452@item
453You may modify your copy or copies of the Program or any portion
454of it, thus forming a work based on the Program, and copy and
455distribute such modifications or work under the terms of Section 1
456above, provided that you also meet all of these conditions:
457
458@enumerate a
459@item
460You must cause the modified files to carry prominent notices
461stating that you changed the files and the date of any change.
462
463@item
464You must cause any work that you distribute or publish, that in
465whole or in part contains or is derived from the Program or any
466part thereof, to be licensed as a whole at no charge to all third
467parties under the terms of this License.
468
469@item
470If the modified program normally reads commands interactively
471when run, you must cause it, when started running for such
472interactive use in the most ordinary way, to print or display an
473announcement including an appropriate copyright notice and a
474notice that there is no warranty (or else, saying that you provide
475a warranty) and that users may redistribute the program under
476these conditions, and telling the user how to view a copy of this
477License. (Exception: if the Program itself is interactive but
478does not normally print such an announcement, your work based on
479the Program is not required to print an announcement.)
480@end enumerate
481
482These requirements apply to the modified work as a whole. If
483identifiable sections of that work are not derived from the Program,
484and can be reasonably considered independent and separate works in
485themselves, then this License, and its terms, do not apply to those
486sections when you distribute them as separate works. But when you
487distribute the same sections as part of a whole which is a work based
488on the Program, the distribution of the whole must be on the terms of
489this License, whose permissions for other licensees extend to the
490entire whole, and thus to each and every part regardless of who wrote it.
491
492Thus, it is not the intent of this section to claim rights or contest
493your rights to work written entirely by you; rather, the intent is to
494exercise the right to control the distribution of derivative or
495collective works based on the Program.
496
497In addition, mere aggregation of another work not based on the Program
498with the Program (or with a work based on the Program) on a volume of
499a storage or distribution medium does not bring the other work under
500the scope of this License.
501
502@item
503You may copy and distribute the Program (or a work based on it,
504under Section 2) in object code or executable form under the terms of
505Sections 1 and 2 above provided that you also do one of the following:
506
507@enumerate a
508@item
509Accompany it with the complete corresponding machine-readable
510source code, which must be distributed under the terms of Sections
5111 and 2 above on a medium customarily used for software interchange; or,
512
513@item
514Accompany it with a written offer, valid for at least three
515years, to give any third party, for a charge no more than your
516cost of physically performing source distribution, a complete
517machine-readable copy of the corresponding source code, to be
518distributed under the terms of Sections 1 and 2 above on a medium
519customarily used for software interchange; or,
520
521@item
522Accompany it with the information you received as to the offer
523to distribute corresponding source code. (This alternative is
524allowed only for noncommercial distribution and only if you
525received the program in object code or executable form with such
526an offer, in accord with Subsection b above.)
527@end enumerate
528
529The source code for a work means the preferred form of the work for
530making modifications to it. For an executable work, complete source
531code means all the source code for all modules it contains, plus any
532associated interface definition files, plus the scripts used to
533control compilation and installation of the executable. However, as a
534special exception, the source code distributed need not include
535anything that is normally distributed (in either source or binary
536form) with the major components (compiler, kernel, and so on) of the
537operating system on which the executable runs, unless that component
538itself accompanies the executable.
539
540If distribution of executable or object code is made by offering
541access to copy from a designated place, then offering equivalent
542access to copy the source code from the same place counts as
543distribution of the source code, even though third parties are not
544compelled to copy the source along with the object code.
545
546@item
547You may not copy, modify, sublicense, or distribute the Program
548except as expressly provided under this License. Any attempt
549otherwise to copy, modify, sublicense or distribute the Program is
550void, and will automatically terminate your rights under this License.
551However, parties who have received copies, or rights, from you under
552this License will not have their licenses terminated so long as such
553parties remain in full compliance.
554
555@item
556You are not required to accept this License, since you have not
557signed it. However, nothing else grants you permission to modify or
558distribute the Program or its derivative works. These actions are
559prohibited by law if you do not accept this License. Therefore, by
560modifying or distributing the Program (or any work based on the
561Program), you indicate your acceptance of this License to do so, and
562all its terms and conditions for copying, distributing or modifying
563the Program or works based on it.
564
565@item
566Each time you redistribute the Program (or any work based on the
567Program), the recipient automatically receives a license from the
568original licensor to copy, distribute or modify the Program subject to
569these terms and conditions. You may not impose any further
570restrictions on the recipients' exercise of the rights granted herein.
571You are not responsible for enforcing compliance by third parties to
572this License.
573
574@item
575If, as a consequence of a court judgment or allegation of patent
576infringement or for any other reason (not limited to patent issues),
577conditions are imposed on you (whether by court order, agreement or
578otherwise) that contradict the conditions of this License, they do not
579excuse you from the conditions of this License. If you cannot
580distribute so as to satisfy simultaneously your obligations under this
581License and any other pertinent obligations, then as a consequence you
582may not distribute the Program at all. For example, if a patent
583license would not permit royalty-free redistribution of the Program by
584all those who receive copies directly or indirectly through you, then
585the only way you could satisfy both it and this License would be to
586refrain entirely from distribution of the Program.
587
588If any portion of this section is held invalid or unenforceable under
589any particular circumstance, the balance of the section is intended to
590apply and the section as a whole is intended to apply in other
591circumstances.
592
593It is not the purpose of this section to induce you to infringe any
594patents or other property right claims or to contest validity of any
595such claims; this section has the sole purpose of protecting the
596integrity of the free software distribution system, which is
597implemented by public license practices. Many people have made
598generous contributions to the wide range of software distributed
599through that system in reliance on consistent application of that
600system; it is up to the author/donor to decide if he or she is willing
601to distribute software through any other system and a licensee cannot
602impose that choice.
603
604This section is intended to make thoroughly clear what is believed to
605be a consequence of the rest of this License.
606
607@item
608If the distribution and/or use of the Program is restricted in
609certain countries either by patents or by copyrighted interfaces, the
610original copyright holder who places the Program under this License
611may add an explicit geographical distribution limitation excluding
612those countries, so that distribution is permitted only in or among
613countries not thus excluded. In such case, this License incorporates
614the limitation as if written in the body of this License.
615
616@item
617The Free Software Foundation may publish revised and/or new versions
618of the General Public License from time to time. Such new versions will
619be similar in spirit to the present version, but may differ in detail to
620address new problems or concerns.
621
622Each version is given a distinguishing version number. If the Program
623specifies a version number of this License which applies to it and ``any
624later version'', you have the option of following the terms and conditions
625either of that version or of any later version published by the Free
626Software Foundation. If the Program does not specify a version number of
627this License, you may choose any version ever published by the Free Software
628Foundation.
629
630@item
631If you wish to incorporate parts of the Program into other free
632programs whose distribution conditions are different, write to the author
633to ask for permission. For software which is copyrighted by the Free
634Software Foundation, write to the Free Software Foundation; we sometimes
635make exceptions for this. Our decision will be guided by the two goals
636of preserving the free status of all derivatives of our free software and
637of promoting the sharing and reuse of software generally.
638
639@iftex
640@heading NO WARRANTY
641@end iftex
642@ifinfo
643@center NO WARRANTY
644@end ifinfo
645
646@item
647BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
648FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
649OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
650PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
651OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
652MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
653TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
654PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
655REPAIR OR CORRECTION.
656
657@item
658IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
659WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
660REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
661INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
662OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
663TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
664YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
665PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
666POSSIBILITY OF SUCH DAMAGES.
667@end enumerate
668
669@iftex
670@heading END OF TERMS AND CONDITIONS
671@end iftex
672@ifinfo
673@center END OF TERMS AND CONDITIONS
674@end ifinfo
675
676@page
677@unnumberedsec How to Apply These Terms to Your New Programs
678
679 If you develop a new program, and you want it to be of the greatest
680possible use to the public, the best way to achieve this is to make it
681free software which everyone can redistribute and change under these terms.
682
683 To do so, attach the following notices to the program. It is safest
684to attach them to the start of each source file to most effectively
685convey the exclusion of warranty; and each file should have at least
686the ``copyright'' line and a pointer to where the full notice is found.
687
688@smallexample
689@var{one line to give the program's name and a brief idea of what it does.}
690Copyright (C) 19@var{yy} @var{name of author}
691
692This program is free software; you can redistribute it and/or modify
693it under the terms of the GNU General Public License as published by
694the Free Software Foundation; either version 2 of the License, or
695(at your option) any later version.
696
697This program is distributed in the hope that it will be useful,
698but WITHOUT ANY WARRANTY; without even the implied warranty of
699MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
700GNU General Public License for more details.
701
702You should have received a copy of the GNU General Public License
703along with this program; if not, write to the Free Software
2d97f5cc
JT
704Foundation, Inc., 59 Temple Place - Suite 330,
705Boston, MA 02111-1307, USA.
bfa74976
RS
706@end smallexample
707
708Also add information on how to contact you by electronic and paper mail.
709
710If the program is interactive, make it output a short notice like this
711when it starts in an interactive mode:
712
713@smallexample
714Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
13863333 715Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
bfa74976
RS
716type `show w'.
717This is free software, and you are welcome to redistribute it
718under certain conditions; type `show c' for details.
719@end smallexample
720
721The hypothetical commands @samp{show w} and @samp{show c} should show
722the appropriate parts of the General Public License. Of course, the
723commands you use may be called something other than @samp{show w} and
724@samp{show c}; they could even be mouse-clicks or menu items---whatever
725suits your program.
726
727You should also get your employer (if you work as a programmer) or your
728school, if any, to sign a ``copyright disclaimer'' for the program, if
729necessary. Here is a sample; alter the names:
730
731@smallexample
732Yoyodyne, Inc., hereby disclaims all copyright interest in the program
733`Gnomovision' (which makes passes at compilers) written by James Hacker.
734
735@var{signature of Ty Coon}, 1 April 1989
736Ty Coon, President of Vice
737@end smallexample
738
739This General Public License does not permit incorporating your program into
740proprietary programs. If your program is a subroutine library, you may
741consider it more useful to permit linking proprietary applications with the
742library. If this is what you want to do, use the GNU Library General
743Public License instead of this License.
744
745@node Concepts, Examples, Copying, Top
746@chapter The Concepts of Bison
747
748This chapter introduces many of the basic concepts without which the
749details of Bison will not make sense. If you do not already know how to
750use Bison or Yacc, we suggest you start by reading this chapter carefully.
751
752@menu
753* Language and Grammar:: Languages and context-free grammars,
754 as mathematical ideas.
755* Grammar in Bison:: How we represent grammars for Bison's sake.
756* Semantic Values:: Each token or syntactic grouping can have
757 a semantic value (the value of an integer,
758 the name of an identifier, etc.).
759* Semantic Actions:: Each rule can have an action containing C code.
760* Bison Parser:: What are Bison's input and output,
761 how is the output used?
762* Stages:: Stages in writing and running Bison grammars.
763* Grammar Layout:: Overall structure of a Bison grammar file.
764@end menu
765
766@node Language and Grammar, Grammar in Bison, , Concepts
767@section Languages and Context-Free Grammars
768
bfa74976
RS
769@cindex context-free grammar
770@cindex grammar, context-free
771In order for Bison to parse a language, it must be described by a
772@dfn{context-free grammar}. This means that you specify one or more
773@dfn{syntactic groupings} and give rules for constructing them from their
774parts. For example, in the C language, one kind of grouping is called an
775`expression'. One rule for making an expression might be, ``An expression
776can be made of a minus sign and another expression''. Another would be,
777``An expression can be an integer''. As you can see, rules are often
778recursive, but there must be at least one rule which leads out of the
779recursion.
780
781@cindex BNF
782@cindex Backus-Naur form
783The most common formal system for presenting such rules for humans to read
784is @dfn{Backus-Naur Form} or ``BNF'', which was developed in order to
785specify the language Algol 60. Any grammar expressed in BNF is a
786context-free grammar. The input to Bison is essentially machine-readable
787BNF.
788
789Not all context-free languages can be handled by Bison, only those
790that are LALR(1). In brief, this means that it must be possible to
791tell how to parse any portion of an input string with just a single
792token of look-ahead. Strictly speaking, that is a description of an
793LR(1) grammar, and LALR(1) involves additional restrictions that are
794hard to explain simply; but it is rare in actual practice to find an
795LR(1) grammar that fails to be LALR(1). @xref{Mystery Conflicts, ,
796Mysterious Reduce/Reduce Conflicts}, for more information on this.
797
798@cindex symbols (abstract)
799@cindex token
800@cindex syntactic grouping
801@cindex grouping, syntactic
802In the formal grammatical rules for a language, each kind of syntactic unit
803or grouping is named by a @dfn{symbol}. Those which are built by grouping
804smaller constructs according to grammatical rules are called
805@dfn{nonterminal symbols}; those which can't be subdivided are called
806@dfn{terminal symbols} or @dfn{token types}. We call a piece of input
807corresponding to a single terminal symbol a @dfn{token}, and a piece
808corresponding to a single nonterminal symbol a @dfn{grouping}.@refill
809
810We can use the C language as an example of what symbols, terminal and
811nonterminal, mean. The tokens of C are identifiers, constants (numeric and
812string), and the various keywords, arithmetic operators and punctuation
813marks. So the terminal symbols of a grammar for C include `identifier',
814`number', `string', plus one symbol for each keyword, operator or
815punctuation mark: `if', `return', `const', `static', `int', `char',
816`plus-sign', `open-brace', `close-brace', `comma' and many more. (These
817tokens can be subdivided into characters, but that is a matter of
818lexicography, not grammar.)
819
820Here is a simple C function subdivided into tokens:
821
822@example
823int /* @r{keyword `int'} */
824square (x) /* @r{identifier, open-paren,} */
825 /* @r{identifier, close-paren} */
826 int x; /* @r{keyword `int', identifier, semicolon} */
827@{ /* @r{open-brace} */
828 return x * x; /* @r{keyword `return', identifier,} */
829 /* @r{asterisk, identifier, semicolon} */
830@} /* @r{close-brace} */
831@end example
832
833The syntactic groupings of C include the expression, the statement, the
834declaration, and the function definition. These are represented in the
835grammar of C by nonterminal symbols `expression', `statement',
836`declaration' and `function definition'. The full grammar uses dozens of
837additional language constructs, each with its own nonterminal symbol, in
838order to express the meanings of these four. The example above is a
839function definition; it contains one declaration, and one statement. In
840the statement, each @samp{x} is an expression and so is @samp{x * x}.
841
842Each nonterminal symbol must have grammatical rules showing how it is made
843out of simpler constructs. For example, one kind of C statement is the
844@code{return} statement; this would be described with a grammar rule which
845reads informally as follows:
846
847@quotation
848A `statement' can be made of a `return' keyword, an `expression' and a
849`semicolon'.
850@end quotation
851
852@noindent
853There would be many other rules for `statement', one for each kind of
854statement in C.
855
856@cindex start symbol
857One nonterminal symbol must be distinguished as the special one which
858defines a complete utterance in the language. It is called the @dfn{start
859symbol}. In a compiler, this means a complete input program. In the C
860language, the nonterminal symbol `sequence of definitions and declarations'
861plays this role.
862
863For example, @samp{1 + 2} is a valid C expression---a valid part of a C
864program---but it is not valid as an @emph{entire} C program. In the
865context-free grammar of C, this follows from the fact that `expression' is
866not the start symbol.
867
868The Bison parser reads a sequence of tokens as its input, and groups the
869tokens using the grammar rules. If the input is valid, the end result is
870that the entire token sequence reduces to a single grouping whose symbol is
871the grammar's start symbol. If we use a grammar for C, the entire input
872must be a `sequence of definitions and declarations'. If not, the parser
873reports a syntax error.
874
875@node Grammar in Bison, Semantic Values, Language and Grammar, Concepts
876@section From Formal Rules to Bison Input
877@cindex Bison grammar
878@cindex grammar, Bison
879@cindex formal grammar
880
881A formal grammar is a mathematical construct. To define the language
882for Bison, you must write a file expressing the grammar in Bison syntax:
883a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}.
884
885A nonterminal symbol in the formal grammar is represented in Bison input
886as an identifier, like an identifier in C. By convention, it should be
887in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
888
889The Bison representation for a terminal symbol is also called a @dfn{token
890type}. Token types as well can be represented as C-like identifiers. By
891convention, these identifiers should be upper case to distinguish them from
892nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
893@code{RETURN}. A terminal symbol that stands for a particular keyword in
894the language should be named after that keyword converted to upper case.
895The terminal symbol @code{error} is reserved for error recovery.
931c7513 896@xref{Symbols}.
bfa74976
RS
897
898A terminal symbol can also be represented as a character literal, just like
899a C character constant. You should do this whenever a token is just a
900single character (parenthesis, plus-sign, etc.): use that same character in
901a literal as the terminal symbol for that token.
902
931c7513
RS
903A third way to represent a terminal symbol is with a C string constant
904containing several characters. @xref{Symbols}, for more information.
905
bfa74976
RS
906The grammar rules also have an expression in Bison syntax. For example,
907here is the Bison rule for a C @code{return} statement. The semicolon in
908quotes is a literal character token, representing part of the C syntax for
909the statement; the naked semicolon, and the colon, are Bison punctuation
910used in every rule.
911
912@example
913stmt: RETURN expr ';'
914 ;
915@end example
916
917@noindent
918@xref{Rules, ,Syntax of Grammar Rules}.
919
920@node Semantic Values, Semantic Actions, Grammar in Bison, Concepts
921@section Semantic Values
922@cindex semantic value
923@cindex value, semantic
924
925A formal grammar selects tokens only by their classifications: for example,
926if a rule mentions the terminal symbol `integer constant', it means that
927@emph{any} integer constant is grammatically valid in that position. The
928precise value of the constant is irrelevant to how to parse the input: if
929@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
930grammatical.@refill
931
932But the precise value is very important for what the input means once it is
933parsed. A compiler is useless if it fails to distinguish between 4, 1 and
9343989 as constants in the program! Therefore, each token in a Bison grammar
935has both a token type and a @dfn{semantic value}. @xref{Semantics, ,Defining Language Semantics},
936for details.
937
938The token type is a terminal symbol defined in the grammar, such as
939@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything
940you need to know to decide where the token may validly appear and how to
941group it with other tokens. The grammar rules know nothing about tokens
942except their types.@refill
943
944The semantic value has all the rest of the information about the
945meaning of the token, such as the value of an integer, or the name of an
946identifier. (A token such as @code{','} which is just punctuation doesn't
947need to have any semantic value.)
948
949For example, an input token might be classified as token type
950@code{INTEGER} and have the semantic value 4. Another input token might
951have the same token type @code{INTEGER} but value 3989. When a grammar
952rule says that @code{INTEGER} is allowed, either of these tokens is
953acceptable because each is an @code{INTEGER}. When the parser accepts the
954token, it keeps track of the token's semantic value.
955
956Each grouping can also have a semantic value as well as its nonterminal
957symbol. For example, in a calculator, an expression typically has a
958semantic value that is a number. In a compiler for a programming
959language, an expression typically has a semantic value that is a tree
960structure describing the meaning of the expression.
961
962@node Semantic Actions, Bison Parser, Semantic Values, Concepts
963@section Semantic Actions
964@cindex semantic actions
965@cindex actions, semantic
966
967In order to be useful, a program must do more than parse input; it must
968also produce some output based on the input. In a Bison grammar, a grammar
969rule can have an @dfn{action} made up of C statements. Each time the
970parser recognizes a match for that rule, the action is executed.
971@xref{Actions}.
13863333 972
bfa74976
RS
973Most of the time, the purpose of an action is to compute the semantic value
974of the whole construct from the semantic values of its parts. For example,
975suppose we have a rule which says an expression can be the sum of two
976expressions. When the parser recognizes such a sum, each of the
977subexpressions has a semantic value which describes how it was built up.
978The action for this rule should create a similar sort of value for the
979newly recognized larger expression.
980
981For example, here is a rule that says an expression can be the sum of
982two subexpressions:
983
984@example
985expr: expr '+' expr @{ $$ = $1 + $3; @}
986 ;
987@end example
988
989@noindent
990The action says how to produce the semantic value of the sum expression
991from the values of the two subexpressions.
992
993@node Bison Parser, Stages, Semantic Actions, Concepts
994@section Bison Output: the Parser File
995@cindex Bison parser
996@cindex Bison utility
997@cindex lexical analyzer, purpose
998@cindex parser
999
1000When you run Bison, you give it a Bison grammar file as input. The output
1001is a C source file that parses the language described by the grammar.
1002This file is called a @dfn{Bison parser}. Keep in mind that the Bison
1003utility and the Bison parser are two distinct programs: the Bison utility
1004is a program whose output is the Bison parser that becomes part of your
1005program.
1006
1007The job of the Bison parser is to group tokens into groupings according to
1008the grammar rules---for example, to build identifiers and operators into
1009expressions. As it does this, it runs the actions for the grammar rules it
1010uses.
1011
1012The tokens come from a function called the @dfn{lexical analyzer} that you
1013must supply in some fashion (such as by writing it in C). The Bison parser
1014calls the lexical analyzer each time it wants a new token. It doesn't know
1015what is ``inside'' the tokens (though their semantic values may reflect
1016this). Typically the lexical analyzer makes the tokens by parsing
1017characters of text, but Bison does not depend on this. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
1018
1019The Bison parser file is C code which defines a function named
1020@code{yyparse} which implements that grammar. This function does not make
1021a complete C program: you must supply some additional functions. One is
1022the lexical analyzer. Another is an error-reporting function which the
1023parser calls to report an error. In addition, a complete C program must
1024start with a function called @code{main}; you have to provide this, and
1025arrange for it to call @code{yyparse} or the parser will never run.
1026@xref{Interface, ,Parser C-Language Interface}.
1027
1028Aside from the token type names and the symbols in the actions you
1029write, all variable and function names used in the Bison parser file
1030begin with @samp{yy} or @samp{YY}. This includes interface functions
1031such as the lexical analyzer function @code{yylex}, the error reporting
1032function @code{yyerror} and the parser function @code{yyparse} itself.
1033This also includes numerous identifiers used for internal purposes.
1034Therefore, you should avoid using C identifiers starting with @samp{yy}
1035or @samp{YY} in the Bison grammar file except for the ones defined in
1036this manual.
1037
1038@node Stages, Grammar Layout, Bison Parser, Concepts
1039@section Stages in Using Bison
1040@cindex stages in using Bison
1041@cindex using Bison
1042
1043The actual language-design process using Bison, from grammar specification
1044to a working compiler or interpreter, has these parts:
1045
1046@enumerate
1047@item
1048Formally specify the grammar in a form recognized by Bison
1049(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule in the language,
1050describe the action that is to be taken when an instance of that rule
1051is recognized. The action is described by a sequence of C statements.
1052
1053@item
1054Write a lexical analyzer to process input and pass tokens to the
1055parser. The lexical analyzer may be written by hand in C
1056(@pxref{Lexical, ,The Lexical Analyzer Function @code{yylex}}). It could also be produced using Lex, but the use
1057of Lex is not discussed in this manual.
1058
1059@item
1060Write a controlling function that calls the Bison-produced parser.
1061
1062@item
1063Write error-reporting routines.
1064@end enumerate
1065
1066To turn this source code as written into a runnable program, you
1067must follow these steps:
1068
1069@enumerate
1070@item
1071Run Bison on the grammar to produce the parser.
1072
1073@item
1074Compile the code output by Bison, as well as any other source files.
1075
1076@item
1077Link the object files to produce the finished product.
1078@end enumerate
1079
1080@node Grammar Layout, , Stages, Concepts
1081@section The Overall Layout of a Bison Grammar
1082@cindex grammar file
1083@cindex file format
1084@cindex format of grammar file
1085@cindex layout of Bison grammar
1086
1087The input file for the Bison utility is a @dfn{Bison grammar file}. The
1088general form of a Bison grammar file is as follows:
1089
1090@example
1091%@{
1092@var{C declarations}
1093%@}
1094
1095@var{Bison declarations}
1096
1097%%
1098@var{Grammar rules}
1099%%
1100@var{Additional C code}
1101@end example
1102
1103@noindent
1104The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
1105in every Bison grammar file to separate the sections.
1106
1107The C declarations may define types and variables used in the actions.
1108You can also use preprocessor commands to define macros used there, and use
1109@code{#include} to include header files that do any of these things.
1110
1111The Bison declarations declare the names of the terminal and nonterminal
1112symbols, and may also describe operator precedence and the data types of
1113semantic values of various symbols.
1114
1115The grammar rules define how to construct each nonterminal symbol from its
1116parts.
1117
1118The additional C code can contain any C code you want to use. Often the
1119definition of the lexical analyzer @code{yylex} goes here, plus subroutines
1120called by the actions in the grammar rules. In a simple program, all the
1121rest of the program can go here.
1122
1123@node Examples, Grammar File, Concepts, Top
1124@chapter Examples
1125@cindex simple examples
1126@cindex examples, simple
1127
1128Now we show and explain three sample programs written using Bison: a
1129reverse polish notation calculator, an algebraic (infix) notation
1130calculator, and a multi-function calculator. All three have been tested
1131under BSD Unix 4.3; each produces a usable, though limited, interactive
1132desk-top calculator.
1133
1134These examples are simple, but Bison grammars for real programming
1135languages are written the same way.
1136@ifinfo
1137You can copy these examples out of the Info file and into a source file
1138to try them.
1139@end ifinfo
1140
1141@menu
1142* RPN Calc:: Reverse polish notation calculator;
1143 a first example with no operator precedence.
1144* Infix Calc:: Infix (algebraic) notation calculator.
1145 Operator precedence is introduced.
1146* Simple Error Recovery:: Continuing after syntax errors.
1147* Multi-function Calc:: Calculator with memory and trig functions.
1148 It uses multiple data-types for semantic values.
1149* Exercises:: Ideas for improving the multi-function calculator.
1150@end menu
1151
1152@node RPN Calc, Infix Calc, , Examples
1153@section Reverse Polish Notation Calculator
1154@cindex reverse polish notation
1155@cindex polish notation calculator
1156@cindex @code{rpcalc}
1157@cindex calculator, simple
1158
1159The first example is that of a simple double-precision @dfn{reverse polish
1160notation} calculator (a calculator using postfix operators). This example
1161provides a good starting point, since operator precedence is not an issue.
1162The second example will illustrate how operator precedence is handled.
1163
1164The source code for this calculator is named @file{rpcalc.y}. The
1165@samp{.y} extension is a convention used for Bison input files.
1166
1167@menu
1168* Decls: Rpcalc Decls. Bison and C declarations for rpcalc.
1169* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
1170* Lexer: Rpcalc Lexer. The lexical analyzer.
1171* Main: Rpcalc Main. The controlling function.
1172* Error: Rpcalc Error. The error reporting function.
1173* Gen: Rpcalc Gen. Running Bison on the grammar file.
1174* Comp: Rpcalc Compile. Run the C compiler on the output code.
1175@end menu
1176
1177@node Rpcalc Decls, Rpcalc Rules, , RPN Calc
1178@subsection Declarations for @code{rpcalc}
1179
1180Here are the C and Bison declarations for the reverse polish notation
1181calculator. As in C, comments are placed between @samp{/*@dots{}*/}.
1182
1183@example
1184/* Reverse polish notation calculator. */
1185
1186%@{
1187#define YYSTYPE double
1188#include <math.h>
1189%@}
1190
1191%token NUM
1192
1193%% /* Grammar rules and actions follow */
1194@end example
1195
1196The C declarations section (@pxref{C Declarations, ,The C Declarations Section}) contains two
1197preprocessor directives.
1198
1199The @code{#define} directive defines the macro @code{YYSTYPE}, thus
1200specifying the C data type for semantic values of both tokens and groupings
1201(@pxref{Value Type, ,Data Types of Semantic Values}). The Bison parser will use whatever type
1202@code{YYSTYPE} is defined as; if you don't define it, @code{int} is the
1203default. Because we specify @code{double}, each token and each expression
1204has an associated value, which is a floating point number.
1205
1206The @code{#include} directive is used to declare the exponentiation
1207function @code{pow}.
1208
1209The second section, Bison declarations, provides information to Bison about
1210the token types (@pxref{Bison Declarations, ,The Bison Declarations Section}). Each terminal symbol that is
1211not a single-character literal must be declared here. (Single-character
1212literals normally don't need to be declared.) In this example, all the
1213arithmetic operators are designated by single-character literals, so the
1214only terminal symbol that needs to be declared is @code{NUM}, the token
1215type for numeric constants.
1216
1217@node Rpcalc Rules, Rpcalc Lexer, Rpcalc Decls, RPN Calc
1218@subsection Grammar Rules for @code{rpcalc}
1219
1220Here are the grammar rules for the reverse polish notation calculator.
1221
1222@example
1223input: /* empty */
1224 | input line
1225;
1226
1227line: '\n'
1228 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1229;
1230
1231exp: NUM @{ $$ = $1; @}
1232 | exp exp '+' @{ $$ = $1 + $2; @}
1233 | exp exp '-' @{ $$ = $1 - $2; @}
1234 | exp exp '*' @{ $$ = $1 * $2; @}
1235 | exp exp '/' @{ $$ = $1 / $2; @}
1236 /* Exponentiation */
1237 | exp exp '^' @{ $$ = pow ($1, $2); @}
1238 /* Unary minus */
1239 | exp 'n' @{ $$ = -$1; @}
1240;
1241%%
1242@end example
1243
1244The groupings of the rpcalc ``language'' defined here are the expression
1245(given the name @code{exp}), the line of input (@code{line}), and the
1246complete input transcript (@code{input}). Each of these nonterminal
1247symbols has several alternate rules, joined by the @samp{|} punctuator
1248which is read as ``or''. The following sections explain what these rules
1249mean.
1250
1251The semantics of the language is determined by the actions taken when a
1252grouping is recognized. The actions are the C code that appears inside
1253braces. @xref{Actions}.
1254
1255You must specify these actions in C, but Bison provides the means for
1256passing semantic values between the rules. In each action, the
1257pseudo-variable @code{$$} stands for the semantic value for the grouping
1258that the rule is going to construct. Assigning a value to @code{$$} is the
1259main job of most actions. The semantic values of the components of the
1260rule are referred to as @code{$1}, @code{$2}, and so on.
1261
1262@menu
13863333
AD
1263* Rpcalc Input::
1264* Rpcalc Line::
1265* Rpcalc Expr::
bfa74976
RS
1266@end menu
1267
1268@node Rpcalc Input, Rpcalc Line, , Rpcalc Rules
1269@subsubsection Explanation of @code{input}
1270
1271Consider the definition of @code{input}:
1272
1273@example
1274input: /* empty */
1275 | input line
1276;
1277@end example
1278
1279This definition reads as follows: ``A complete input is either an empty
1280string, or a complete input followed by an input line''. Notice that
1281``complete input'' is defined in terms of itself. This definition is said
1282to be @dfn{left recursive} since @code{input} appears always as the
1283leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}.
1284
1285The first alternative is empty because there are no symbols between the
1286colon and the first @samp{|}; this means that @code{input} can match an
1287empty string of input (no tokens). We write the rules this way because it
1288is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
1289It's conventional to put an empty alternative first and write the comment
1290@samp{/* empty */} in it.
1291
1292The second alternate rule (@code{input line}) handles all nontrivial input.
1293It means, ``After reading any number of lines, read one more line if
1294possible.'' The left recursion makes this rule into a loop. Since the
1295first alternative matches empty input, the loop can be executed zero or
1296more times.
1297
1298The parser function @code{yyparse} continues to process input until a
1299grammatical error is seen or the lexical analyzer says there are no more
1300input tokens; we will arrange for the latter to happen at end of file.
1301
1302@node Rpcalc Line, Rpcalc Expr, Rpcalc Input, Rpcalc Rules
1303@subsubsection Explanation of @code{line}
1304
1305Now consider the definition of @code{line}:
1306
1307@example
1308line: '\n'
1309 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1310;
1311@end example
1312
1313The first alternative is a token which is a newline character; this means
1314that rpcalc accepts a blank line (and ignores it, since there is no
1315action). The second alternative is an expression followed by a newline.
1316This is the alternative that makes rpcalc useful. The semantic value of
1317the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
1318question is the first symbol in the alternative. The action prints this
1319value, which is the result of the computation the user asked for.
1320
1321This action is unusual because it does not assign a value to @code{$$}. As
1322a consequence, the semantic value associated with the @code{line} is
1323uninitialized (its value will be unpredictable). This would be a bug if
1324that value were ever used, but we don't use it: once rpcalc has printed the
1325value of the user's input line, that value is no longer needed.
1326
1327@node Rpcalc Expr, , Rpcalc Line, Rpcalc Rules
1328@subsubsection Explanation of @code{expr}
1329
1330The @code{exp} grouping has several rules, one for each kind of expression.
1331The first rule handles the simplest expressions: those that are just numbers.
1332The second handles an addition-expression, which looks like two expressions
1333followed by a plus-sign. The third handles subtraction, and so on.
1334
1335@example
1336exp: NUM
1337 | exp exp '+' @{ $$ = $1 + $2; @}
1338 | exp exp '-' @{ $$ = $1 - $2; @}
1339 @dots{}
1340 ;
1341@end example
1342
1343We have used @samp{|} to join all the rules for @code{exp}, but we could
1344equally well have written them separately:
1345
1346@example
1347exp: NUM ;
1348exp: exp exp '+' @{ $$ = $1 + $2; @} ;
1349exp: exp exp '-' @{ $$ = $1 - $2; @} ;
1350 @dots{}
1351@end example
1352
1353Most of the rules have actions that compute the value of the expression in
1354terms of the value of its parts. For example, in the rule for addition,
1355@code{$1} refers to the first component @code{exp} and @code{$2} refers to
1356the second one. The third component, @code{'+'}, has no meaningful
1357associated semantic value, but if it had one you could refer to it as
1358@code{$3}. When @code{yyparse} recognizes a sum expression using this
1359rule, the sum of the two subexpressions' values is produced as the value of
1360the entire expression. @xref{Actions}.
1361
1362You don't have to give an action for every rule. When a rule has no
1363action, Bison by default copies the value of @code{$1} into @code{$$}.
1364This is what happens in the first rule (the one that uses @code{NUM}).
1365
1366The formatting shown here is the recommended convention, but Bison does
1367not require it. You can add or change whitespace as much as you wish.
1368For example, this:
1369
1370@example
1371exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{}
1372@end example
1373
1374@noindent
1375means the same thing as this:
1376
1377@example
1378exp: NUM
1379 | exp exp '+' @{ $$ = $1 + $2; @}
1380 | @dots{}
1381@end example
1382
1383@noindent
1384The latter, however, is much more readable.
1385
1386@node Rpcalc Lexer, Rpcalc Main, Rpcalc Rules, RPN Calc
1387@subsection The @code{rpcalc} Lexical Analyzer
1388@cindex writing a lexical analyzer
1389@cindex lexical analyzer, writing
1390
1391The lexical analyzer's job is low-level parsing: converting characters or
1392sequences of characters into tokens. The Bison parser gets its tokens by
1393calling the lexical analyzer. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
1394
1395Only a simple lexical analyzer is needed for the RPN calculator. This
1396lexical analyzer skips blanks and tabs, then reads in numbers as
1397@code{double} and returns them as @code{NUM} tokens. Any other character
1398that isn't part of a number is a separate token. Note that the token-code
1399for such a single-character token is the character itself.
1400
1401The return value of the lexical analyzer function is a numeric code which
1402represents a token type. The same text used in Bison rules to stand for
1403this token type is also a C expression for the numeric code for the type.
1404This works in two ways. If the token type is a character literal, then its
1405numeric code is the ASCII code for that character; you can use the same
1406character literal in the lexical analyzer to express the number. If the
1407token type is an identifier, that identifier is defined by Bison as a C
1408macro whose definition is the appropriate number. In this example,
1409therefore, @code{NUM} becomes a macro for @code{yylex} to use.
1410
1411The semantic value of the token (if it has one) is stored into the global
1412variable @code{yylval}, which is where the Bison parser will look for it.
1413(The C data type of @code{yylval} is @code{YYSTYPE}, which was defined
1414at the beginning of the grammar; @pxref{Rpcalc Decls, ,Declarations for @code{rpcalc}}.)
1415
1416A token type code of zero is returned if the end-of-file is encountered.
1417(Bison recognizes any nonpositive value as indicating the end of the
1418input.)
1419
1420Here is the code for the lexical analyzer:
1421
1422@example
1423@group
13863333 1424/* Lexical analyzer returns a double floating point
bfa74976
RS
1425 number on the stack and the token NUM, or the ASCII
1426 character read if not a number. Skips all blanks
1427 and tabs, returns 0 for EOF. */
1428
1429#include <ctype.h>
1430@end group
1431
1432@group
13863333
AD
1433int
1434yylex (void)
bfa74976
RS
1435@{
1436 int c;
1437
1438 /* skip white space */
13863333 1439 while ((c = getchar ()) == ' ' || c == '\t')
bfa74976
RS
1440 ;
1441@end group
1442@group
1443 /* process numbers */
13863333 1444 if (c == '.' || isdigit (c))
bfa74976
RS
1445 @{
1446 ungetc (c, stdin);
1447 scanf ("%lf", &yylval);
1448 return NUM;
1449 @}
1450@end group
1451@group
1452 /* return end-of-file */
13863333 1453 if (c == EOF)
bfa74976
RS
1454 return 0;
1455 /* return single chars */
13863333 1456 return c;
bfa74976
RS
1457@}
1458@end group
1459@end example
1460
1461@node Rpcalc Main, Rpcalc Error, Rpcalc Lexer, RPN Calc
1462@subsection The Controlling Function
1463@cindex controlling function
1464@cindex main function in simple example
1465
1466In keeping with the spirit of this example, the controlling function is
1467kept to the bare minimum. The only requirement is that it call
1468@code{yyparse} to start the process of parsing.
1469
1470@example
1471@group
13863333
AD
1472int
1473main (void)
bfa74976 1474@{
13863333 1475 return yyparse ();
bfa74976
RS
1476@}
1477@end group
1478@end example
1479
1480@node Rpcalc Error, Rpcalc Gen, Rpcalc Main, RPN Calc
1481@subsection The Error Reporting Routine
1482@cindex error reporting routine
1483
1484When @code{yyparse} detects a syntax error, it calls the error reporting
13863333
AD
1485function @code{yyerror} to print an error message (usually but not
1486always @code{"parse error"}). It is up to the programmer to supply
1487@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
1488here is the definition we will use:
bfa74976
RS
1489
1490@example
1491@group
1492#include <stdio.h>
1493
13863333
AD
1494void
1495yyerror (const char *s) /* Called by yyparse on error */
bfa74976
RS
1496@{
1497 printf ("%s\n", s);
1498@}
1499@end group
1500@end example
1501
1502After @code{yyerror} returns, the Bison parser may recover from the error
1503and continue parsing if the grammar contains a suitable error rule
1504(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
1505have not written any error rules in this example, so any invalid input will
1506cause the calculator program to exit. This is not clean behavior for a
9ecbd125 1507real calculator, but it is adequate for the first example.
bfa74976
RS
1508
1509@node Rpcalc Gen, Rpcalc Compile, Rpcalc Error, RPN Calc
1510@subsection Running Bison to Make the Parser
1511@cindex running Bison (introduction)
1512
ceed8467
AD
1513Before running Bison to produce a parser, we need to decide how to
1514arrange all the source code in one or more source files. For such a
1515simple example, the easiest thing is to put everything in one file. The
1516definitions of @code{yylex}, @code{yyerror} and @code{main} go at the
1517end, in the ``additional C code'' section of the file (@pxref{Grammar
1518Layout, ,The Overall Layout of a Bison Grammar}).
bfa74976
RS
1519
1520For a large project, you would probably have several source files, and use
1521@code{make} to arrange to recompile them.
1522
1523With all the source in a single file, you use the following command to
1524convert it into a parser file:
1525
1526@example
1527bison @var{file_name}.y
1528@end example
1529
1530@noindent
1531In this example the file was called @file{rpcalc.y} (for ``Reverse Polish
1532CALCulator''). Bison produces a file named @file{@var{file_name}.tab.c},
1533removing the @samp{.y} from the original file name. The file output by
1534Bison contains the source code for @code{yyparse}. The additional
1535functions in the input file (@code{yylex}, @code{yyerror} and @code{main})
1536are copied verbatim to the output.
1537
1538@node Rpcalc Compile, , Rpcalc Gen, RPN Calc
1539@subsection Compiling the Parser File
1540@cindex compiling the parser
1541
1542Here is how to compile and run the parser file:
1543
1544@example
1545@group
1546# @r{List files in current directory.}
1547% ls
1548rpcalc.tab.c rpcalc.y
1549@end group
1550
1551@group
1552# @r{Compile the Bison parser.}
1553# @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
1554% cc rpcalc.tab.c -lm -o rpcalc
1555@end group
1556
1557@group
1558# @r{List files again.}
1559% ls
1560rpcalc rpcalc.tab.c rpcalc.y
1561@end group
1562@end example
1563
1564The file @file{rpcalc} now contains the executable code. Here is an
1565example session using @code{rpcalc}.
1566
1567@example
1568% rpcalc
15694 9 +
157013
15713 7 + 3 4 5 *+-
1572-13
15733 7 + 3 4 5 * + - n @r{Note the unary minus, @samp{n}}
157413
15755 6 / 4 n +
1576-3.166666667
15773 4 ^ @r{Exponentiation}
157881
1579^D @r{End-of-file indicator}
1580%
1581@end example
1582
1583@node Infix Calc, Simple Error Recovery, RPN Calc, Examples
1584@section Infix Notation Calculator: @code{calc}
1585@cindex infix notation calculator
1586@cindex @code{calc}
1587@cindex calculator, infix notation
1588
1589We now modify rpcalc to handle infix operators instead of postfix. Infix
1590notation involves the concept of operator precedence and the need for
1591parentheses nested to arbitrary depth. Here is the Bison code for
1592@file{calc.y}, an infix desk-top calculator.
1593
1594@example
1595/* Infix notation calculator--calc */
1596
1597%@{
1598#define YYSTYPE double
1599#include <math.h>
1600%@}
1601
1602/* BISON Declarations */
1603%token NUM
1604%left '-' '+'
1605%left '*' '/'
1606%left NEG /* negation--unary minus */
1607%right '^' /* exponentiation */
1608
1609/* Grammar follows */
1610%%
1611input: /* empty string */
1612 | input line
1613;
1614
1615line: '\n'
1616 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1617;
1618
1619exp: NUM @{ $$ = $1; @}
1620 | exp '+' exp @{ $$ = $1 + $3; @}
1621 | exp '-' exp @{ $$ = $1 - $3; @}
1622 | exp '*' exp @{ $$ = $1 * $3; @}
1623 | exp '/' exp @{ $$ = $1 / $3; @}
1624 | '-' exp %prec NEG @{ $$ = -$2; @}
1625 | exp '^' exp @{ $$ = pow ($1, $3); @}
1626 | '(' exp ')' @{ $$ = $2; @}
1627;
1628%%
1629@end example
1630
1631@noindent
ceed8467
AD
1632The functions @code{yylex}, @code{yyerror} and @code{main} can be the
1633same as before.
bfa74976
RS
1634
1635There are two important new features shown in this code.
1636
1637In the second section (Bison declarations), @code{%left} declares token
1638types and says they are left-associative operators. The declarations
1639@code{%left} and @code{%right} (right associativity) take the place of
1640@code{%token} which is used to declare a token type name without
1641associativity. (These tokens are single-character literals, which
1642ordinarily don't need to be declared. We declare them here to specify
1643the associativity.)
1644
1645Operator precedence is determined by the line ordering of the
1646declarations; the higher the line number of the declaration (lower on
1647the page or screen), the higher the precedence. Hence, exponentiation
1648has the highest precedence, unary minus (@code{NEG}) is next, followed
1649by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator Precedence}.
1650
1651The other important new feature is the @code{%prec} in the grammar section
1652for the unary minus operator. The @code{%prec} simply instructs Bison that
1653the rule @samp{| '-' exp} has the same precedence as @code{NEG}---in this
1654case the next-to-highest. @xref{Contextual Precedence, ,Context-Dependent Precedence}.
1655
1656Here is a sample run of @file{calc.y}:
1657
1658@need 500
1659@example
1660% calc
16614 + 4.5 - (34/(8*3+-3))
16626.880952381
1663-56 + 2
1664-54
16653 ^ 2
16669
1667@end example
1668
1669@node Simple Error Recovery, Multi-function Calc, Infix Calc, Examples
1670@section Simple Error Recovery
1671@cindex error recovery, simple
1672
1673Up to this point, this manual has not addressed the issue of @dfn{error
1674recovery}---how to continue parsing after the parser detects a syntax
ceed8467
AD
1675error. All we have handled is error reporting with @code{yyerror}.
1676Recall that by default @code{yyparse} returns after calling
1677@code{yyerror}. This means that an erroneous input line causes the
1678calculator program to exit. Now we show how to rectify this deficiency.
bfa74976
RS
1679
1680The Bison language itself includes the reserved word @code{error}, which
1681may be included in the grammar rules. In the example below it has
1682been added to one of the alternatives for @code{line}:
1683
1684@example
1685@group
1686line: '\n'
1687 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1688 | error '\n' @{ yyerrok; @}
1689;
1690@end group
1691@end example
1692
ceed8467
AD
1693This addition to the grammar allows for simple error recovery in the
1694event of a parse error. If an expression that cannot be evaluated is
1695read, the error will be recognized by the third rule for @code{line},
1696and parsing will continue. (The @code{yyerror} function is still called
1697upon to print its message as well.) The action executes the statement
1698@code{yyerrok}, a macro defined automatically by Bison; its meaning is
1699that error recovery is complete (@pxref{Error Recovery}). Note the
1700difference between @code{yyerrok} and @code{yyerror}; neither one is a
1701misprint.@refill
bfa74976
RS
1702
1703This form of error recovery deals with syntax errors. There are other
1704kinds of errors; for example, division by zero, which raises an exception
1705signal that is normally fatal. A real calculator program must handle this
1706signal and use @code{longjmp} to return to @code{main} and resume parsing
1707input lines; it would also have to discard the rest of the current line of
1708input. We won't discuss this issue further because it is not specific to
1709Bison programs.
1710
1711@node Multi-function Calc, Exercises, Simple Error Recovery, Examples
1712@section Multi-Function Calculator: @code{mfcalc}
1713@cindex multi-function calculator
1714@cindex @code{mfcalc}
1715@cindex calculator, multi-function
1716
1717Now that the basics of Bison have been discussed, it is time to move on to
1718a more advanced problem. The above calculators provided only five
1719functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would
1720be nice to have a calculator that provides other mathematical functions such
1721as @code{sin}, @code{cos}, etc.
1722
1723It is easy to add new operators to the infix calculator as long as they are
1724only single-character literals. The lexical analyzer @code{yylex} passes
9ecbd125 1725back all nonnumber characters as tokens, so new grammar rules suffice for
bfa74976
RS
1726adding a new operator. But we want something more flexible: built-in
1727functions whose syntax has this form:
1728
1729@example
1730@var{function_name} (@var{argument})
1731@end example
1732
1733@noindent
1734At the same time, we will add memory to the calculator, by allowing you
1735to create named variables, store values in them, and use them later.
1736Here is a sample session with the multi-function calculator:
1737
1738@example
2a2e87db 1739% mfcalc
bfa74976
RS
1740pi = 3.141592653589
17413.1415926536
1742sin(pi)
17430.0000000000
1744alpha = beta1 = 2.3
17452.3000000000
1746alpha
17472.3000000000
1748ln(alpha)
17490.8329091229
1750exp(ln(beta1))
17512.3000000000
1752%
1753@end example
1754
1755Note that multiple assignment and nested function calls are permitted.
1756
1757@menu
1758* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
1759* Rules: Mfcalc Rules. Grammar rules for the calculator.
1760* Symtab: Mfcalc Symtab. Symbol table management subroutines.
1761@end menu
1762
1763@node Mfcalc Decl, Mfcalc Rules, , Multi-function Calc
1764@subsection Declarations for @code{mfcalc}
1765
1766Here are the C and Bison declarations for the multi-function calculator.
1767
1768@smallexample
1769%@{
1770#include <math.h> /* For math functions, cos(), sin(), etc. */
1771#include "calc.h" /* Contains definition of `symrec' */
1772%@}
1773%union @{
1774double val; /* For returning numbers. */
1775symrec *tptr; /* For returning symbol-table pointers */
1776@}
1777
1778%token <val> NUM /* Simple double precision number */
1779%token <tptr> VAR FNCT /* Variable and Function */
1780%type <val> exp
1781
1782%right '='
1783%left '-' '+'
1784%left '*' '/'
1785%left NEG /* Negation--unary minus */
1786%right '^' /* Exponentiation */
1787
1788/* Grammar follows */
1789
1790%%
1791@end smallexample
1792
1793The above grammar introduces only two new features of the Bison language.
1794These features allow semantic values to have various data types
1795(@pxref{Multiple Types, ,More Than One Value Type}).
1796
1797The @code{%union} declaration specifies the entire list of possible types;
1798this is instead of defining @code{YYSTYPE}. The allowable types are now
1799double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
1800the symbol table. @xref{Union Decl, ,The Collection of Value Types}.
1801
1802Since values can now have various types, it is necessary to associate a
1803type with each grammar symbol whose semantic value is used. These symbols
1804are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their
1805declarations are augmented with information about their data type (placed
1806between angle brackets).
1807
1808The Bison construct @code{%type} is used for declaring nonterminal symbols,
1809just as @code{%token} is used for declaring token types. We have not used
1810@code{%type} before because nonterminal symbols are normally declared
1811implicitly by the rules that define them. But @code{exp} must be declared
1812explicitly so we can specify its value type. @xref{Type Decl, ,Nonterminal Symbols}.
1813
1814@node Mfcalc Rules, Mfcalc Symtab, Mfcalc Decl, Multi-function Calc
1815@subsection Grammar Rules for @code{mfcalc}
1816
1817Here are the grammar rules for the multi-function calculator.
1818Most of them are copied directly from @code{calc}; three rules,
1819those which mention @code{VAR} or @code{FNCT}, are new.
1820
1821@smallexample
1822input: /* empty */
1823 | input line
1824;
1825
1826line:
1827 '\n'
1828 | exp '\n' @{ printf ("\t%.10g\n", $1); @}
1829 | error '\n' @{ yyerrok; @}
1830;
1831
1832exp: NUM @{ $$ = $1; @}
1833 | VAR @{ $$ = $1->value.var; @}
1834 | VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
1835 | FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
1836 | exp '+' exp @{ $$ = $1 + $3; @}
1837 | exp '-' exp @{ $$ = $1 - $3; @}
1838 | exp '*' exp @{ $$ = $1 * $3; @}
1839 | exp '/' exp @{ $$ = $1 / $3; @}
1840 | '-' exp %prec NEG @{ $$ = -$2; @}
1841 | exp '^' exp @{ $$ = pow ($1, $3); @}
1842 | '(' exp ')' @{ $$ = $2; @}
1843;
1844/* End of grammar */
1845%%
1846@end smallexample
1847
1848@node Mfcalc Symtab, , Mfcalc Rules, Multi-function Calc
1849@subsection The @code{mfcalc} Symbol Table
1850@cindex symbol table example
1851
1852The multi-function calculator requires a symbol table to keep track of the
1853names and meanings of variables and functions. This doesn't affect the
1854grammar rules (except for the actions) or the Bison declarations, but it
1855requires some additional C functions for support.
1856
1857The symbol table itself consists of a linked list of records. Its
1858definition, which is kept in the header @file{calc.h}, is as follows. It
1859provides for either functions or variables to be placed in the table.
1860
13863333 1861@c FIXME: ANSIfy the prototypes for FNCTPTR etc.
bfa74976
RS
1862@smallexample
1863@group
1864/* Data type for links in the chain of symbols. */
1865struct symrec
1866@{
1867 char *name; /* name of symbol */
1868 int type; /* type of symbol: either VAR or FNCT */
1869 union @{
1870 double var; /* value of a VAR */
1871 double (*fnctptr)(); /* value of a FNCT */
1872 @} value;
1873 struct symrec *next; /* link field */
1874@};
1875@end group
1876
1877@group
1878typedef struct symrec symrec;
1879
1880/* The symbol table: a chain of `struct symrec'. */
1881extern symrec *sym_table;
1882
1883symrec *putsym ();
1884symrec *getsym ();
1885@end group
1886@end smallexample
1887
1888The new version of @code{main} includes a call to @code{init_table}, a
1889function that initializes the symbol table. Here it is, and
1890@code{init_table} as well:
1891
1892@smallexample
1893@group
1894#include <stdio.h>
1895
13863333
AD
1896int
1897main (void)
bfa74976
RS
1898@{
1899 init_table ();
13863333 1900 return yyparse ();
bfa74976
RS
1901@}
1902@end group
1903
1904@group
13863333
AD
1905void
1906yyerror (const char *s) /* Called by yyparse on error */
bfa74976
RS
1907@{
1908 printf ("%s\n", s);
1909@}
1910
1911struct init
1912@{
1913 char *fname;
1914 double (*fnct)();
1915@};
1916@end group
1917
1918@group
13863333
AD
1919struct init arith_fncts[] =
1920@{
1921 "sin", sin,
1922 "cos", cos,
1923 "atan", atan,
1924 "ln", log,
1925 "exp", exp,
1926 "sqrt", sqrt,
1927 0, 0
1928@};
bfa74976
RS
1929
1930/* The symbol table: a chain of `struct symrec'. */
1931symrec *sym_table = (symrec *)0;
1932@end group
1933
1934@group
13863333
AD
1935/* Put arithmetic functions in table. */
1936void
1937init_table (void)
bfa74976
RS
1938@{
1939 int i;
1940 symrec *ptr;
1941 for (i = 0; arith_fncts[i].fname != 0; i++)
1942 @{
1943 ptr = putsym (arith_fncts[i].fname, FNCT);
1944 ptr->value.fnctptr = arith_fncts[i].fnct;
1945 @}
1946@}
1947@end group
1948@end smallexample
1949
1950By simply editing the initialization list and adding the necessary include
1951files, you can add additional functions to the calculator.
1952
1953Two important functions allow look-up and installation of symbols in the
1954symbol table. The function @code{putsym} is passed a name and the type
1955(@code{VAR} or @code{FNCT}) of the object to be installed. The object is
1956linked to the front of the list, and a pointer to the object is returned.
1957The function @code{getsym} is passed the name of the symbol to look up. If
1958found, a pointer to that symbol is returned; otherwise zero is returned.
1959
1960@smallexample
1961symrec *
13863333 1962putsym (char *sym_name, int sym_type)
bfa74976
RS
1963@{
1964 symrec *ptr;
1965 ptr = (symrec *) malloc (sizeof (symrec));
1966 ptr->name = (char *) malloc (strlen (sym_name) + 1);
1967 strcpy (ptr->name,sym_name);
1968 ptr->type = sym_type;
1969 ptr->value.var = 0; /* set value to 0 even if fctn. */
1970 ptr->next = (struct symrec *)sym_table;
1971 sym_table = ptr;
1972 return ptr;
1973@}
1974
1975symrec *
13863333 1976getsym (const char *sym_name)
bfa74976
RS
1977@{
1978 symrec *ptr;
1979 for (ptr = sym_table; ptr != (symrec *) 0;
1980 ptr = (symrec *)ptr->next)
1981 if (strcmp (ptr->name,sym_name) == 0)
1982 return ptr;
1983 return 0;
1984@}
1985@end smallexample
1986
1987The function @code{yylex} must now recognize variables, numeric values, and
1988the single-character arithmetic operators. Strings of alphanumeric
1989characters with a leading nondigit are recognized as either variables or
1990functions depending on what the symbol table says about them.
1991
1992The string is passed to @code{getsym} for look up in the symbol table. If
1993the name appears in the table, a pointer to its location and its type
1994(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not
1995already in the table, then it is installed as a @code{VAR} using
1996@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
1997returned to @code{yyparse}.@refill
1998
1999No change is needed in the handling of numeric values and arithmetic
2000operators in @code{yylex}.
2001
2002@smallexample
2003@group
2004#include <ctype.h>
13863333
AD
2005
2006int
2007yylex (void)
bfa74976
RS
2008@{
2009 int c;
2010
2011 /* Ignore whitespace, get first nonwhite character. */
2012 while ((c = getchar ()) == ' ' || c == '\t');
2013
2014 if (c == EOF)
2015 return 0;
2016@end group
2017
2018@group
2019 /* Char starts a number => parse the number. */
2020 if (c == '.' || isdigit (c))
2021 @{
2022 ungetc (c, stdin);
2023 scanf ("%lf", &yylval.val);
2024 return NUM;
2025 @}
2026@end group
2027
2028@group
2029 /* Char starts an identifier => read the name. */
2030 if (isalpha (c))
2031 @{
2032 symrec *s;
2033 static char *symbuf = 0;
2034 static int length = 0;
2035 int i;
2036@end group
2037
2038@group
2039 /* Initially make the buffer long enough
2040 for a 40-character symbol name. */
2041 if (length == 0)
2042 length = 40, symbuf = (char *)malloc (length + 1);
2043
2044 i = 0;
2045 do
2046@end group
2047@group
2048 @{
2049 /* If buffer is full, make it bigger. */
2050 if (i == length)
2051 @{
2052 length *= 2;
2053 symbuf = (char *)realloc (symbuf, length + 1);
2054 @}
2055 /* Add this character to the buffer. */
2056 symbuf[i++] = c;
2057 /* Get another character. */
2058 c = getchar ();
2059 @}
2060@end group
2061@group
2062 while (c != EOF && isalnum (c));
2063
2064 ungetc (c, stdin);
2065 symbuf[i] = '\0';
2066@end group
2067
2068@group
2069 s = getsym (symbuf);
2070 if (s == 0)
2071 s = putsym (symbuf, VAR);
2072 yylval.tptr = s;
2073 return s->type;
2074 @}
2075
2076 /* Any other character is a token by itself. */
2077 return c;
2078@}
2079@end group
2080@end smallexample
2081
2082This program is both powerful and flexible. You may easily add new
2083functions, and it is a simple job to modify this code to install predefined
2084variables such as @code{pi} or @code{e} as well.
2085
2086@node Exercises, , Multi-function Calc, Examples
2087@section Exercises
2088@cindex exercises
2089
2090@enumerate
2091@item
2092Add some new functions from @file{math.h} to the initialization list.
2093
2094@item
2095Add another array that contains constants and their values. Then
2096modify @code{init_table} to add these constants to the symbol table.
2097It will be easiest to give the constants type @code{VAR}.
2098
2099@item
2100Make the program report an error if the user refers to an
2101uninitialized variable in any way except to store a value in it.
2102@end enumerate
2103
2104@node Grammar File, Interface, Examples, Top
2105@chapter Bison Grammar Files
2106
2107Bison takes as input a context-free grammar specification and produces a
2108C-language function that recognizes correct instances of the grammar.
2109
2110The Bison grammar input file conventionally has a name ending in @samp{.y}.
2111
2112@menu
2113* Grammar Outline:: Overall layout of the grammar file.
2114* Symbols:: Terminal and nonterminal symbols.
2115* Rules:: How to write grammar rules.
2116* Recursion:: Writing recursive rules.
2117* Semantics:: Semantic values and actions.
2118* Declarations:: All kinds of Bison declarations are described here.
2119* Multiple Parsers:: Putting more than one Bison parser in one program.
2120@end menu
2121
2122@node Grammar Outline, Symbols, , Grammar File
2123@section Outline of a Bison Grammar
2124
2125A Bison grammar file has four main sections, shown here with the
2126appropriate delimiters:
2127
2128@example
2129%@{
2130@var{C declarations}
2131%@}
2132
2133@var{Bison declarations}
2134
2135%%
2136@var{Grammar rules}
2137%%
2138
2139@var{Additional C code}
2140@end example
2141
2142Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
2143
2144@menu
2145* C Declarations:: Syntax and usage of the C declarations section.
2146* Bison Declarations:: Syntax and usage of the Bison declarations section.
2147* Grammar Rules:: Syntax and usage of the grammar rules section.
2148* C Code:: Syntax and usage of the additional C code section.
2149@end menu
2150
2151@node C Declarations, Bison Declarations, , Grammar Outline
2152@subsection The C Declarations Section
2153@cindex C declarations section
2154@cindex declarations, C
2155
2156The @var{C declarations} section contains macro definitions and
2157declarations of functions and variables that are used in the actions in the
2158grammar rules. These are copied to the beginning of the parser file so
2159that they precede the definition of @code{yyparse}. You can use
2160@samp{#include} to get the declarations from a header file. If you don't
2161need any C declarations, you may omit the @samp{%@{} and @samp{%@}}
2162delimiters that bracket this section.
2163
2164@node Bison Declarations, Grammar Rules, C Declarations, Grammar Outline
2165@subsection The Bison Declarations Section
2166@cindex Bison declarations (introduction)
2167@cindex declarations, Bison (introduction)
2168
2169The @var{Bison declarations} section contains declarations that define
2170terminal and nonterminal symbols, specify precedence, and so on.
2171In some simple grammars you may not need any declarations.
2172@xref{Declarations, ,Bison Declarations}.
2173
2174@node Grammar Rules, C Code, Bison Declarations, Grammar Outline
2175@subsection The Grammar Rules Section
2176@cindex grammar rules section
2177@cindex rules section for grammar
2178
2179The @dfn{grammar rules} section contains one or more Bison grammar
2180rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}.
2181
2182There must always be at least one grammar rule, and the first
2183@samp{%%} (which precedes the grammar rules) may never be omitted even
2184if it is the first thing in the file.
2185
2186@node C Code, , Grammar Rules, Grammar Outline
2187@subsection The Additional C Code Section
2188@cindex additional C code section
2189@cindex C code, section for additional
2190
ceed8467
AD
2191The @var{additional C code} section is copied verbatim to the end of the
2192parser file, just as the @var{C declarations} section is copied to the
2193beginning. This is the most convenient place to put anything that you
2194want to have in the parser file but which need not come before the
2195definition of @code{yyparse}. For example, the definitions of
2196@code{yylex} and @code{yyerror} often go here. @xref{Interface, ,Parser
2197C-Language Interface}.
bfa74976
RS
2198
2199If the last section is empty, you may omit the @samp{%%} that separates it
2200from the grammar rules.
2201
2202The Bison parser itself contains many static variables whose names start
2203with @samp{yy} and many macros whose names start with @samp{YY}. It is a
2204good idea to avoid using any such names (except those documented in this
2205manual) in the additional C code section of the grammar file.
2206
2207@node Symbols, Rules, Grammar Outline, Grammar File
2208@section Symbols, Terminal and Nonterminal
2209@cindex nonterminal symbol
2210@cindex terminal symbol
2211@cindex token type
2212@cindex symbol
2213
2214@dfn{Symbols} in Bison grammars represent the grammatical classifications
2215of the language.
2216
2217A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
2218class of syntactically equivalent tokens. You use the symbol in grammar
2219rules to mean that a token in that class is allowed. The symbol is
2220represented in the Bison parser by a numeric code, and the @code{yylex}
2221function returns a token type code to indicate what kind of token has been
2222read. You don't need to know what the code value is; you can use the
2223symbol to stand for it.
2224
2225A @dfn{nonterminal symbol} stands for a class of syntactically equivalent
2226groupings. The symbol name is used in writing grammar rules. By convention,
2227it should be all lower case.
2228
2229Symbol names can contain letters, digits (not at the beginning),
2230underscores and periods. Periods make sense only in nonterminals.
2231
931c7513 2232There are three ways of writing terminal symbols in the grammar:
bfa74976
RS
2233
2234@itemize @bullet
2235@item
2236A @dfn{named token type} is written with an identifier, like an
2237identifier in C. By convention, it should be all upper case. Each
2238such name must be defined with a Bison declaration such as
2239@code{%token}. @xref{Token Decl, ,Token Type Names}.
2240
2241@item
2242@cindex character token
2243@cindex literal token
2244@cindex single-character literal
931c7513
RS
2245A @dfn{character token type} (or @dfn{literal character token}) is
2246written in the grammar using the same syntax used in C for character
2247constants; for example, @code{'+'} is a character token type. A
2248character token type doesn't need to be declared unless you need to
2249specify its semantic value data type (@pxref{Value Type, ,Data Types of
2250Semantic Values}), associativity, or precedence (@pxref{Precedence,
2251,Operator Precedence}).
bfa74976
RS
2252
2253By convention, a character token type is used only to represent a
2254token that consists of that particular character. Thus, the token
2255type @code{'+'} is used to represent the character @samp{+} as a
2256token. Nothing enforces this convention, but if you depart from it,
2257your program will confuse other readers.
2258
2259All the usual escape sequences used in character literals in C can be
2260used in Bison as well, but you must not use the null character as a
931c7513
RS
2261character literal because its ASCII code, zero, is the code @code{yylex}
2262returns for end-of-input (@pxref{Calling Convention, ,Calling Convention
2263for @code{yylex}}).
2264
2265@item
2266@cindex string token
2267@cindex literal string token
9ecbd125 2268@cindex multicharacter literal
931c7513
RS
2269A @dfn{literal string token} is written like a C string constant; for
2270example, @code{"<="} is a literal string token. A literal string token
2271doesn't need to be declared unless you need to specify its semantic
2272value data type (@pxref{Value Type}), associativity, precedence
2273(@pxref{Precedence}).
2274
2275You can associate the literal string token with a symbolic name as an
2276alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
2277Declarations}). If you don't do that, the lexical analyzer has to
2278retrieve the token number for the literal string token from the
2279@code{yytname} table (@pxref{Calling Convention}).
2280
2281@strong{WARNING}: literal string tokens do not work in Yacc.
2282
2283By convention, a literal string token is used only to represent a token
2284that consists of that particular string. Thus, you should use the token
2285type @code{"<="} to represent the string @samp{<=} as a token. Bison
9ecbd125 2286does not enforce this convention, but if you depart from it, people who
931c7513
RS
2287read your program will be confused.
2288
2289All the escape sequences used in string literals in C can be used in
2290Bison as well. A literal string token must contain two or more
2291characters; for a token containing just one character, use a character
2292token (see above).
bfa74976
RS
2293@end itemize
2294
2295How you choose to write a terminal symbol has no effect on its
2296grammatical meaning. That depends only on where it appears in rules and
2297on when the parser function returns that symbol.
2298
2299The value returned by @code{yylex} is always one of the terminal symbols
2300(or 0 for end-of-input). Whichever way you write the token type in the
2301grammar rules, you write it the same way in the definition of @code{yylex}.
2302The numeric code for a character token type is simply the ASCII code for
2303the character, so @code{yylex} can use the identical character constant to
2304generate the requisite code. Each named token type becomes a C macro in
2305the parser file, so @code{yylex} can use the name to stand for the code.
13863333 2306(This is why periods don't make sense in terminal symbols.)
bfa74976
RS
2307@xref{Calling Convention, ,Calling Convention for @code{yylex}}.
2308
2309If @code{yylex} is defined in a separate file, you need to arrange for the
2310token-type macro definitions to be available there. Use the @samp{-d}
2311option when you run Bison, so that it will write these macro definitions
2312into a separate header file @file{@var{name}.tab.h} which you can include
2313in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
2314
2315The symbol @code{error} is a terminal symbol reserved for error recovery
2316(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
2317In particular, @code{yylex} should never return this value.
2318
2319@node Rules, Recursion, Symbols, Grammar File
2320@section Syntax of Grammar Rules
2321@cindex rule syntax
2322@cindex grammar rule syntax
2323@cindex syntax of grammar rules
2324
2325A Bison grammar rule has the following general form:
2326
2327@example
e425e872 2328@group
bfa74976
RS
2329@var{result}: @var{components}@dots{}
2330 ;
e425e872 2331@end group
bfa74976
RS
2332@end example
2333
2334@noindent
9ecbd125 2335where @var{result} is the nonterminal symbol that this rule describes,
bfa74976 2336and @var{components} are various terminal and nonterminal symbols that
13863333 2337are put together by this rule (@pxref{Symbols}).
bfa74976
RS
2338
2339For example,
2340
2341@example
2342@group
2343exp: exp '+' exp
2344 ;
2345@end group
2346@end example
2347
2348@noindent
2349says that two groupings of type @code{exp}, with a @samp{+} token in between,
2350can be combined into a larger grouping of type @code{exp}.
2351
2352Whitespace in rules is significant only to separate symbols. You can add
2353extra whitespace as you wish.
2354
2355Scattered among the components can be @var{actions} that determine
2356the semantics of the rule. An action looks like this:
2357
2358@example
2359@{@var{C statements}@}
2360@end example
2361
2362@noindent
2363Usually there is only one action and it follows the components.
2364@xref{Actions}.
2365
2366@findex |
2367Multiple rules for the same @var{result} can be written separately or can
2368be joined with the vertical-bar character @samp{|} as follows:
2369
2370@ifinfo
2371@example
2372@var{result}: @var{rule1-components}@dots{}
2373 | @var{rule2-components}@dots{}
2374 @dots{}
2375 ;
2376@end example
2377@end ifinfo
2378@iftex
2379@example
2380@group
2381@var{result}: @var{rule1-components}@dots{}
2382 | @var{rule2-components}@dots{}
2383 @dots{}
2384 ;
2385@end group
2386@end example
2387@end iftex
2388
2389@noindent
2390They are still considered distinct rules even when joined in this way.
2391
2392If @var{components} in a rule is empty, it means that @var{result} can
2393match the empty string. For example, here is how to define a
2394comma-separated sequence of zero or more @code{exp} groupings:
2395
2396@example
2397@group
2398expseq: /* empty */
2399 | expseq1
2400 ;
2401@end group
2402
2403@group
2404expseq1: exp
2405 | expseq1 ',' exp
2406 ;
2407@end group
2408@end example
2409
2410@noindent
2411It is customary to write a comment @samp{/* empty */} in each rule
2412with no components.
2413
2414@node Recursion, Semantics, Rules, Grammar File
2415@section Recursive Rules
2416@cindex recursive rule
2417
2418A rule is called @dfn{recursive} when its @var{result} nonterminal appears
2419also on its right hand side. Nearly all Bison grammars need to use
2420recursion, because that is the only way to define a sequence of any number
9ecbd125
JT
2421of a particular thing. Consider this recursive definition of a
2422comma-separated sequence of one or more expressions:
bfa74976
RS
2423
2424@example
2425@group
2426expseq1: exp
2427 | expseq1 ',' exp
2428 ;
2429@end group
2430@end example
2431
2432@cindex left recursion
2433@cindex right recursion
2434@noindent
2435Since the recursive use of @code{expseq1} is the leftmost symbol in the
2436right hand side, we call this @dfn{left recursion}. By contrast, here
2437the same construct is defined using @dfn{right recursion}:
2438
2439@example
2440@group
2441expseq1: exp
2442 | exp ',' expseq1
2443 ;
2444@end group
2445@end example
2446
2447@noindent
2448Any kind of sequence can be defined using either left recursion or
2449right recursion, but you should always use left recursion, because it
2450can parse a sequence of any number of elements with bounded stack
2451space. Right recursion uses up space on the Bison stack in proportion
2452to the number of elements in the sequence, because all the elements
2453must be shifted onto the stack before the rule can be applied even
2454once. @xref{Algorithm, ,The Bison Parser Algorithm }, for
2455further explanation of this.
2456
2457@cindex mutual recursion
2458@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
2459rule does not appear directly on its right hand side, but does appear
2460in rules for other nonterminals which do appear on its right hand
13863333 2461side.
bfa74976
RS
2462
2463For example:
2464
2465@example
2466@group
2467expr: primary
2468 | primary '+' primary
2469 ;
2470@end group
2471
2472@group
2473primary: constant
2474 | '(' expr ')'
2475 ;
2476@end group
2477@end example
2478
2479@noindent
2480defines two mutually-recursive nonterminals, since each refers to the
2481other.
2482
2483@node Semantics, Declarations, Recursion, Grammar File
2484@section Defining Language Semantics
2485@cindex defining language semantics
13863333 2486@cindex language semantics, defining
bfa74976
RS
2487
2488The grammar rules for a language determine only the syntax. The semantics
2489are determined by the semantic values associated with various tokens and
2490groupings, and by the actions taken when various groupings are recognized.
2491
2492For example, the calculator calculates properly because the value
2493associated with each expression is the proper number; it adds properly
2494because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
2495the numbers associated with @var{x} and @var{y}.
2496
2497@menu
2498* Value Type:: Specifying one data type for all semantic values.
2499* Multiple Types:: Specifying several alternative data types.
2500* Actions:: An action is the semantic definition of a grammar rule.
2501* Action Types:: Specifying data types for actions to operate on.
2502* Mid-Rule Actions:: Most actions go at the end of a rule.
2503 This says when, why and how to use the exceptional
2504 action in the middle of a rule.
2505@end menu
2506
2507@node Value Type, Multiple Types, , Semantics
2508@subsection Data Types of Semantic Values
2509@cindex semantic value type
2510@cindex value type, semantic
2511@cindex data types of semantic values
2512@cindex default data type
2513
2514In a simple program it may be sufficient to use the same data type for
2515the semantic values of all language constructs. This was true in the
2516RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish Notation Calculator}).
2517
2518Bison's default is to use type @code{int} for all semantic values. To
2519specify some other type, define @code{YYSTYPE} as a macro, like this:
2520
2521@example
2522#define YYSTYPE double
2523@end example
2524
2525@noindent
2526This macro definition must go in the C declarations section of the grammar
2527file (@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
2528
2529@node Multiple Types, Actions, Value Type, Semantics
2530@subsection More Than One Value Type
2531
2532In most programs, you will need different data types for different kinds
2533of tokens and groupings. For example, a numeric constant may need type
2534@code{int} or @code{long}, while a string constant needs type @code{char *},
2535and an identifier might need a pointer to an entry in the symbol table.
2536
2537To use more than one data type for semantic values in one parser, Bison
2538requires you to do two things:
2539
2540@itemize @bullet
2541@item
2542Specify the entire collection of possible data types, with the
2543@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of Value Types}).
2544
2545@item
2546Choose one of those types for each symbol (terminal or nonterminal)
2547for which semantic values are used. This is done for tokens with the
2548@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names}) and for groupings
2549with the @code{%type} Bison declaration (@pxref{Type Decl, ,Nonterminal Symbols}).
2550@end itemize
2551
2552@node Actions, Action Types, Multiple Types, Semantics
2553@subsection Actions
2554@cindex action
2555@vindex $$
2556@vindex $@var{n}
2557
2558An action accompanies a syntactic rule and contains C code to be executed
2559each time an instance of that rule is recognized. The task of most actions
2560is to compute a semantic value for the grouping built by the rule from the
2561semantic values associated with tokens or smaller groupings.
2562
2563An action consists of C statements surrounded by braces, much like a
2564compound statement in C. It can be placed at any position in the rule; it
2565is executed at that position. Most rules have just one action at the end
2566of the rule, following all the components. Actions in the middle of a rule
2567are tricky and used only for special purposes (@pxref{Mid-Rule Actions, ,Actions in Mid-Rule}).
2568
2569The C code in an action can refer to the semantic values of the components
2570matched by the rule with the construct @code{$@var{n}}, which stands for
2571the value of the @var{n}th component. The semantic value for the grouping
2572being constructed is @code{$$}. (Bison translates both of these constructs
2573into array element references when it copies the actions into the parser
2574file.)
2575
2576Here is a typical example:
2577
2578@example
2579@group
2580exp: @dots{}
2581 | exp '+' exp
2582 @{ $$ = $1 + $3; @}
2583@end group
2584@end example
2585
2586@noindent
2587This rule constructs an @code{exp} from two smaller @code{exp} groupings
2588connected by a plus-sign token. In the action, @code{$1} and @code{$3}
2589refer to the semantic values of the two component @code{exp} groupings,
2590which are the first and third symbols on the right hand side of the rule.
2591The sum is stored into @code{$$} so that it becomes the semantic value of
2592the addition-expression just recognized by the rule. If there were a
2593useful semantic value associated with the @samp{+} token, it could be
2594referred to as @code{$2}.@refill
2595
2596@cindex default action
2597If you don't specify an action for a rule, Bison supplies a default:
2598@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule becomes
2599the value of the whole rule. Of course, the default rule is valid only
2600if the two data types match. There is no meaningful default action for
2601an empty rule; every empty rule must have an explicit action unless the
2602rule's value does not matter.
2603
2604@code{$@var{n}} with @var{n} zero or negative is allowed for reference
2605to tokens and groupings on the stack @emph{before} those that match the
2606current rule. This is a very risky practice, and to use it reliably
2607you must be certain of the context in which the rule is applied. Here
2608is a case in which you can use this reliably:
2609
2610@example
2611@group
2612foo: expr bar '+' expr @{ @dots{} @}
2613 | expr bar '-' expr @{ @dots{} @}
2614 ;
2615@end group
2616
2617@group
2618bar: /* empty */
2619 @{ previous_expr = $0; @}
2620 ;
2621@end group
2622@end example
2623
2624As long as @code{bar} is used only in the fashion shown here, @code{$0}
2625always refers to the @code{expr} which precedes @code{bar} in the
2626definition of @code{foo}.
2627
2628@node Action Types, Mid-Rule Actions, Actions, Semantics
2629@subsection Data Types of Values in Actions
2630@cindex action data types
2631@cindex data types in actions
2632
2633If you have chosen a single data type for semantic values, the @code{$$}
2634and @code{$@var{n}} constructs always have that data type.
2635
2636If you have used @code{%union} to specify a variety of data types, then you
2637must declare a choice among these types for each terminal or nonterminal
2638symbol that can have a semantic value. Then each time you use @code{$$} or
2639@code{$@var{n}}, its data type is determined by which symbol it refers to
2640in the rule. In this example,@refill
2641
2642@example
2643@group
2644exp: @dots{}
2645 | exp '+' exp
2646 @{ $$ = $1 + $3; @}
2647@end group
2648@end example
2649
2650@noindent
2651@code{$1} and @code{$3} refer to instances of @code{exp}, so they all
2652have the data type declared for the nonterminal symbol @code{exp}. If
2653@code{$2} were used, it would have the data type declared for the
2654terminal symbol @code{'+'}, whatever that might be.@refill
2655
2656Alternatively, you can specify the data type when you refer to the value,
2657by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
2658reference. For example, if you have defined types as shown here:
2659
2660@example
2661@group
2662%union @{
2663 int itype;
2664 double dtype;
2665@}
2666@end group
2667@end example
2668
2669@noindent
2670then you can write @code{$<itype>1} to refer to the first subunit of the
2671rule as an integer, or @code{$<dtype>1} to refer to it as a double.
2672
2673@node Mid-Rule Actions, , Action Types, Semantics
2674@subsection Actions in Mid-Rule
2675@cindex actions in mid-rule
2676@cindex mid-rule actions
2677
2678Occasionally it is useful to put an action in the middle of a rule.
2679These actions are written just like usual end-of-rule actions, but they
2680are executed before the parser even recognizes the following components.
2681
2682A mid-rule action may refer to the components preceding it using
2683@code{$@var{n}}, but it may not refer to subsequent components because
2684it is run before they are parsed.
2685
2686The mid-rule action itself counts as one of the components of the rule.
2687This makes a difference when there is another action later in the same rule
2688(and usually there is another at the end): you have to count the actions
2689along with the symbols when working out which number @var{n} to use in
2690@code{$@var{n}}.
2691
2692The mid-rule action can also have a semantic value. The action can set
2693its value with an assignment to @code{$$}, and actions later in the rule
2694can refer to the value using @code{$@var{n}}. Since there is no symbol
2695to name the action, there is no way to declare a data type for the value
2696in advance, so you must use the @samp{$<@dots{}>} construct to specify a
2697data type each time you refer to this value.
2698
2699There is no way to set the value of the entire rule with a mid-rule
2700action, because assignments to @code{$$} do not have that effect. The
2701only way to set the value for the entire rule is with an ordinary action
2702at the end of the rule.
2703
2704Here is an example from a hypothetical compiler, handling a @code{let}
2705statement that looks like @samp{let (@var{variable}) @var{statement}} and
2706serves to create a variable named @var{variable} temporarily for the
2707duration of @var{statement}. To parse this construct, we must put
2708@var{variable} into the symbol table while @var{statement} is parsed, then
2709remove it afterward. Here is how it is done:
2710
2711@example
2712@group
2713stmt: LET '(' var ')'
2714 @{ $<context>$ = push_context ();
2715 declare_variable ($3); @}
2716 stmt @{ $$ = $6;
2717 pop_context ($<context>5); @}
2718@end group
2719@end example
2720
2721@noindent
2722As soon as @samp{let (@var{variable})} has been recognized, the first
2723action is run. It saves a copy of the current semantic context (the
2724list of accessible variables) as its semantic value, using alternative
2725@code{context} in the data-type union. Then it calls
2726@code{declare_variable} to add the new variable to that list. Once the
2727first action is finished, the embedded statement @code{stmt} can be
2728parsed. Note that the mid-rule action is component number 5, so the
2729@samp{stmt} is component number 6.
2730
2731After the embedded statement is parsed, its semantic value becomes the
2732value of the entire @code{let}-statement. Then the semantic value from the
2733earlier action is used to restore the prior list of variables. This
2734removes the temporary @code{let}-variable from the list so that it won't
2735appear to exist while the rest of the program is parsed.
2736
2737Taking action before a rule is completely recognized often leads to
2738conflicts since the parser must commit to a parse in order to execute the
2739action. For example, the following two rules, without mid-rule actions,
2740can coexist in a working parser because the parser can shift the open-brace
2741token and look at what follows before deciding whether there is a
2742declaration or not:
2743
2744@example
2745@group
2746compound: '@{' declarations statements '@}'
2747 | '@{' statements '@}'
2748 ;
2749@end group
2750@end example
2751
2752@noindent
2753But when we add a mid-rule action as follows, the rules become nonfunctional:
2754
2755@example
2756@group
2757compound: @{ prepare_for_local_variables (); @}
2758 '@{' declarations statements '@}'
2759@end group
2760@group
2761 | '@{' statements '@}'
2762 ;
2763@end group
2764@end example
2765
2766@noindent
2767Now the parser is forced to decide whether to run the mid-rule action
2768when it has read no farther than the open-brace. In other words, it
2769must commit to using one rule or the other, without sufficient
2770information to do it correctly. (The open-brace token is what is called
2771the @dfn{look-ahead} token at this time, since the parser is still
2772deciding what to do about it. @xref{Look-Ahead, ,Look-Ahead Tokens}.)
2773
2774You might think that you could correct the problem by putting identical
2775actions into the two rules, like this:
2776
2777@example
2778@group
2779compound: @{ prepare_for_local_variables (); @}
2780 '@{' declarations statements '@}'
2781 | @{ prepare_for_local_variables (); @}
2782 '@{' statements '@}'
2783 ;
2784@end group
2785@end example
2786
2787@noindent
2788But this does not help, because Bison does not realize that the two actions
2789are identical. (Bison never tries to understand the C code in an action.)
2790
2791If the grammar is such that a declaration can be distinguished from a
2792statement by the first token (which is true in C), then one solution which
2793does work is to put the action after the open-brace, like this:
2794
2795@example
2796@group
2797compound: '@{' @{ prepare_for_local_variables (); @}
2798 declarations statements '@}'
2799 | '@{' statements '@}'
2800 ;
2801@end group
2802@end example
2803
2804@noindent
2805Now the first token of the following declaration or statement,
2806which would in any case tell Bison which rule to use, can still do so.
2807
2808Another solution is to bury the action inside a nonterminal symbol which
2809serves as a subroutine:
2810
2811@example
2812@group
2813subroutine: /* empty */
2814 @{ prepare_for_local_variables (); @}
2815 ;
2816
2817@end group
2818
2819@group
2820compound: subroutine
2821 '@{' declarations statements '@}'
2822 | subroutine
2823 '@{' statements '@}'
2824 ;
2825@end group
2826@end example
2827
2828@noindent
2829Now Bison can execute the action in the rule for @code{subroutine} without
2830deciding which rule for @code{compound} it will eventually use. Note that
2831the action is now at the end of its rule. Any mid-rule action can be
2832converted to an end-of-rule action in this way, and this is what Bison
2833actually does to implement mid-rule actions.
2834
2835@node Declarations, Multiple Parsers, Semantics, Grammar File
2836@section Bison Declarations
2837@cindex declarations, Bison
2838@cindex Bison declarations
2839
2840The @dfn{Bison declarations} section of a Bison grammar defines the symbols
2841used in formulating the grammar and the data types of semantic values.
2842@xref{Symbols}.
2843
2844All token type names (but not single-character literal tokens such as
2845@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
2846declared if you need to specify which data type to use for the semantic
2847value (@pxref{Multiple Types, ,More Than One Value Type}).
2848
2849The first rule in the file also specifies the start symbol, by default.
2850If you want some other symbol to be the start symbol, you must declare
2851it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
2852
2853@menu
2854* Token Decl:: Declaring terminal symbols.
2855* Precedence Decl:: Declaring terminals with precedence and associativity.
2856* Union Decl:: Declaring the set of all semantic value types.
2857* Type Decl:: Declaring the choice of type for a nonterminal symbol.
2858* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
2859* Start Decl:: Specifying the start symbol.
2860* Pure Decl:: Requesting a reentrant parser.
2861* Decl Summary:: Table of all Bison declarations.
2862@end menu
2863
2864@node Token Decl, Precedence Decl, , Declarations
2865@subsection Token Type Names
2866@cindex declaring token type names
2867@cindex token type names, declaring
931c7513 2868@cindex declaring literal string tokens
bfa74976
RS
2869@findex %token
2870
2871The basic way to declare a token type name (terminal symbol) is as follows:
2872
2873@example
2874%token @var{name}
2875@end example
2876
2877Bison will convert this into a @code{#define} directive in
2878the parser, so that the function @code{yylex} (if it is in this file)
2879can use the name @var{name} to stand for this token type's code.
2880
2881Alternatively, you can use @code{%left}, @code{%right}, or @code{%nonassoc}
9ecbd125 2882instead of @code{%token}, if you wish to specify associativity and precedence.
bfa74976
RS
2883@xref{Precedence Decl, ,Operator Precedence}.
2884
2885You can explicitly specify the numeric code for a token type by appending
2886an integer value in the field immediately following the token name:
2887
2888@example
2889%token NUM 300
2890@end example
2891
2892@noindent
2893It is generally best, however, to let Bison choose the numeric codes for
2894all token types. Bison will automatically select codes that don't conflict
2895with each other or with ASCII characters.
2896
2897In the event that the stack type is a union, you must augment the
2898@code{%token} or other token declaration to include the data type
13863333 2899alternative delimited by angle-brackets (@pxref{Multiple Types, ,More Than One Value Type}).
bfa74976
RS
2900
2901For example:
2902
2903@example
2904@group
2905%union @{ /* define stack type */
2906 double val;
2907 symrec *tptr;
2908@}
2909%token <val> NUM /* define token NUM and its type */
2910@end group
2911@end example
2912
931c7513
RS
2913You can associate a literal string token with a token type name by
2914writing the literal string at the end of a @code{%token}
2915declaration which declares the name. For example:
2916
2917@example
2918%token arrow "=>"
2919@end example
2920
2921@noindent
2922For example, a grammar for the C language might specify these names with
2923equivalent literal string tokens:
2924
2925@example
2926%token <operator> OR "||"
2927%token <operator> LE 134 "<="
2928%left OR "<="
2929@end example
2930
2931@noindent
2932Once you equate the literal string and the token name, you can use them
2933interchangeably in further declarations or the grammar rules. The
2934@code{yylex} function can use the token name or the literal string to
2935obtain the token type code number (@pxref{Calling Convention}).
2936
bfa74976
RS
2937@node Precedence Decl, Union Decl, Token Decl, Declarations
2938@subsection Operator Precedence
2939@cindex precedence declarations
2940@cindex declaring operator precedence
2941@cindex operator precedence, declaring
2942
2943Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
2944declare a token and specify its precedence and associativity, all at
2945once. These are called @dfn{precedence declarations}.
2946@xref{Precedence, ,Operator Precedence}, for general information on operator precedence.
2947
2948The syntax of a precedence declaration is the same as that of
2949@code{%token}: either
2950
2951@example
2952%left @var{symbols}@dots{}
2953@end example
2954
2955@noindent
2956or
2957
2958@example
2959%left <@var{type}> @var{symbols}@dots{}
2960@end example
2961
2962And indeed any of these declarations serves the purposes of @code{%token}.
2963But in addition, they specify the associativity and relative precedence for
2964all the @var{symbols}:
2965
2966@itemize @bullet
2967@item
2968The associativity of an operator @var{op} determines how repeated uses
2969of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
2970@var{z}} is parsed by grouping @var{x} with @var{y} first or by
2971grouping @var{y} with @var{z} first. @code{%left} specifies
2972left-associativity (grouping @var{x} with @var{y} first) and
2973@code{%right} specifies right-associativity (grouping @var{y} with
2974@var{z} first). @code{%nonassoc} specifies no associativity, which
2975means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
2976considered a syntax error.
2977
2978@item
2979The precedence of an operator determines how it nests with other operators.
2980All the tokens declared in a single precedence declaration have equal
2981precedence and nest together according to their associativity.
2982When two tokens declared in different precedence declarations associate,
2983the one declared later has the higher precedence and is grouped first.
2984@end itemize
2985
2986@node Union Decl, Type Decl, Precedence Decl, Declarations
2987@subsection The Collection of Value Types
2988@cindex declaring value types
2989@cindex value types, declaring
2990@findex %union
2991
2992The @code{%union} declaration specifies the entire collection of possible
2993data types for semantic values. The keyword @code{%union} is followed by a
2994pair of braces containing the same thing that goes inside a @code{union} in
13863333 2995C.
bfa74976
RS
2996
2997For example:
2998
2999@example
3000@group
3001%union @{
3002 double val;
3003 symrec *tptr;
3004@}
3005@end group
3006@end example
3007
3008@noindent
3009This says that the two alternative types are @code{double} and @code{symrec
3010*}. They are given names @code{val} and @code{tptr}; these names are used
3011in the @code{%token} and @code{%type} declarations to pick one of the types
3012for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
3013
3014Note that, unlike making a @code{union} declaration in C, you do not write
3015a semicolon after the closing brace.
3016
3017@node Type Decl, Expect Decl, Union Decl, Declarations
3018@subsection Nonterminal Symbols
3019@cindex declaring value types, nonterminals
3020@cindex value types, nonterminals, declaring
3021@findex %type
3022
3023@noindent
3024When you use @code{%union} to specify multiple value types, you must
3025declare the value type of each nonterminal symbol for which values are
3026used. This is done with a @code{%type} declaration, like this:
3027
3028@example
3029%type <@var{type}> @var{nonterminal}@dots{}
3030@end example
3031
3032@noindent
3033Here @var{nonterminal} is the name of a nonterminal symbol, and @var{type}
3034is the name given in the @code{%union} to the alternative that you want
3035(@pxref{Union Decl, ,The Collection of Value Types}). You can give any number of nonterminal symbols in
3036the same @code{%type} declaration, if they have the same value type. Use
3037spaces to separate the symbol names.
3038
931c7513
RS
3039You can also declare the value type of a terminal symbol. To do this,
3040use the same @code{<@var{type}>} construction in a declaration for the
3041terminal symbol. All kinds of token declarations allow
3042@code{<@var{type}>}.
3043
bfa74976
RS
3044@node Expect Decl, Start Decl, Type Decl, Declarations
3045@subsection Suppressing Conflict Warnings
3046@cindex suppressing conflict warnings
3047@cindex preventing warnings about conflicts
3048@cindex warnings, preventing
3049@cindex conflicts, suppressing warnings of
3050@findex %expect
3051
3052Bison normally warns if there are any conflicts in the grammar
3053(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars have harmless shift/reduce
3054conflicts which are resolved in a predictable way and would be difficult to
3055eliminate. It is desirable to suppress the warning about these conflicts
3056unless the number of conflicts changes. You can do this with the
3057@code{%expect} declaration.
3058
3059The declaration looks like this:
3060
3061@example
3062%expect @var{n}
3063@end example
3064
3065Here @var{n} is a decimal integer. The declaration says there should be no
3066warning if there are @var{n} shift/reduce conflicts and no reduce/reduce
3067conflicts. The usual warning is given if there are either more or fewer
3068conflicts, or if there are any reduce/reduce conflicts.
3069
3070In general, using @code{%expect} involves these steps:
3071
3072@itemize @bullet
3073@item
3074Compile your grammar without @code{%expect}. Use the @samp{-v} option
3075to get a verbose list of where the conflicts occur. Bison will also
3076print the number of conflicts.
3077
3078@item
3079Check each of the conflicts to make sure that Bison's default
3080resolution is what you really want. If not, rewrite the grammar and
3081go back to the beginning.
3082
3083@item
3084Add an @code{%expect} declaration, copying the number @var{n} from the
3085number which Bison printed.
3086@end itemize
3087
3088Now Bison will stop annoying you about the conflicts you have checked, but
3089it will warn you again if changes in the grammar result in additional
3090conflicts.
3091
3092@node Start Decl, Pure Decl, Expect Decl, Declarations
3093@subsection The Start-Symbol
3094@cindex declaring the start symbol
3095@cindex start symbol, declaring
3096@cindex default start symbol
3097@findex %start
3098
3099Bison assumes by default that the start symbol for the grammar is the first
3100nonterminal specified in the grammar specification section. The programmer
3101may override this restriction with the @code{%start} declaration as follows:
3102
3103@example
3104%start @var{symbol}
3105@end example
3106
3107@node Pure Decl, Decl Summary, Start Decl, Declarations
3108@subsection A Pure (Reentrant) Parser
3109@cindex reentrant parser
3110@cindex pure parser
3111@findex %pure_parser
3112
3113A @dfn{reentrant} program is one which does not alter in the course of
3114execution; in other words, it consists entirely of @dfn{pure} (read-only)
3115code. Reentrancy is important whenever asynchronous execution is possible;
3116for example, a nonreentrant program may not be safe to call from a signal
3117handler. In systems with multiple threads of control, a nonreentrant
3118program must be called only within interlocks.
3119
70811b85
RS
3120Normally, Bison generates a parser which is not reentrant. This is
3121suitable for most uses, and it permits compatibility with YACC. (The
3122standard YACC interfaces are inherently nonreentrant, because they use
3123statically allocated variables for communication with @code{yylex},
3124including @code{yylval} and @code{yylloc}.)
bfa74976 3125
70811b85
RS
3126Alternatively, you can generate a pure, reentrant parser. The Bison
3127declaration @code{%pure_parser} says that you want the parser to be
3128reentrant. It looks like this:
bfa74976
RS
3129
3130@example
3131%pure_parser
3132@end example
3133
70811b85
RS
3134The result is that the communication variables @code{yylval} and
3135@code{yylloc} become local variables in @code{yyparse}, and a different
3136calling convention is used for the lexical analyzer function
3137@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
3138Parsers}, for the details of this. The variable @code{yynerrs} also
3139becomes local in @code{yyparse} (@pxref{Error Reporting, ,The Error
3140Reporting Function @code{yyerror}}). The convention for calling
3141@code{yyparse} itself is unchanged.
3142
3143Whether the parser is pure has nothing to do with the grammar rules.
3144You can generate either a pure parser or a nonreentrant parser from any
3145valid grammar.
bfa74976
RS
3146
3147@node Decl Summary, , Pure Decl, Declarations
3148@subsection Bison Declaration Summary
3149@cindex Bison declaration summary
3150@cindex declaration summary
3151@cindex summary, Bison declaration
3152
3153Here is a summary of all Bison declarations:
3154
3155@table @code
3156@item %union
3157Declare the collection of data types that semantic values may have
3158(@pxref{Union Decl, ,The Collection of Value Types}).
3159
3160@item %token
3161Declare a terminal symbol (token type name) with no precedence
3162or associativity specified (@pxref{Token Decl, ,Token Type Names}).
3163
3164@item %right
3165Declare a terminal symbol (token type name) that is right-associative
3166(@pxref{Precedence Decl, ,Operator Precedence}).
3167
3168@item %left
3169Declare a terminal symbol (token type name) that is left-associative
3170(@pxref{Precedence Decl, ,Operator Precedence}).
3171
3172@item %nonassoc
3173Declare a terminal symbol (token type name) that is nonassociative
3174(using it in a way that would be associative is a syntax error)
3175(@pxref{Precedence Decl, ,Operator Precedence}).
3176
3177@item %type
3178Declare the type of semantic values for a nonterminal symbol
3179(@pxref{Type Decl, ,Nonterminal Symbols}).
3180
3181@item %start
3182Specify the grammar's start symbol (@pxref{Start Decl, ,The Start-Symbol}).
3183
3184@item %expect
3185Declare the expected number of shift-reduce conflicts
3186(@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
3187
3188@item %pure_parser
3189Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
931c7513
RS
3190
3191@item %no_lines
3192Don't generate any @code{#line} preprocessor commands in the parser
3193file. Ordinarily Bison writes these commands in the parser file so that
3194the C compiler and debuggers will associate errors and object code with
3195your source file (the grammar file). This directive causes them to
3196associate errors with the parser file, treating it an independent source
3197file in its own right.
3198
3199@item %raw
3200The output file @file{@var{name}.h} normally defines the tokens with
3201Yacc-compatible token numbers. If this option is specified, the
3202internal Bison numbers are used instead. (Yacc-compatible numbers start
9ecbd125 3203at 257 except for single-character tokens; Bison assigns token numbers
931c7513
RS
3204sequentially for all tokens starting at 3.)
3205
3206@item %token_table
3207Generate an array of token names in the parser file. The name of the
3208array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
3209token whose internal Bison token code number is @var{i}. The first three
3210elements of @code{yytname} are always @code{"$"}, @code{"error"}, and
3211@code{"$illegal"}; after these come the symbols defined in the grammar
3212file.
3213
3214For single-character literal tokens and literal string tokens, the name
3215in the table includes the single-quote or double-quote characters: for
3216example, @code{"'+'"} is a single-character literal and @code{"\"<=\""}
3217is a literal string token. All the characters of the literal string
3218token appear verbatim in the string found in the table; even
3219double-quote characters are not escaped. For example, if the token
3220consists of three characters @samp{*"*}, its string in @code{yytname}
3221contains @samp{"*"*"}. (In C, that would be written as
3222@code{"\"*\"*\""}).
3223
3224When you specify @code{%token_table}, Bison also generates macro
3225definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
3226@code{YYNRULES}, and @code{YYNSTATES}:
3227
3228@table @code
3229@item YYNTOKENS
3230The highest token number, plus one.
3231@item YYNNTS
9ecbd125 3232The number of nonterminal symbols.
931c7513
RS
3233@item YYNRULES
3234The number of grammar rules,
3235@item YYNSTATES
3236The number of parser states (@pxref{Parser States}).
3237@end table
bfa74976
RS
3238@end table
3239
931c7513 3240@node Multiple Parsers,, Declarations, Grammar File
bfa74976
RS
3241@section Multiple Parsers in the Same Program
3242
3243Most programs that use Bison parse only one language and therefore contain
3244only one Bison parser. But what if you want to parse more than one
3245language with the same program? Then you need to avoid a name conflict
3246between different definitions of @code{yyparse}, @code{yylval}, and so on.
3247
3248The easy way to do this is to use the option @samp{-p @var{prefix}}
3249(@pxref{Invocation, ,Invoking Bison}). This renames the interface functions and
3250variables of the Bison parser to start with @var{prefix} instead of
3251@samp{yy}. You can use this to give each parser distinct names that do
3252not conflict.
3253
3254The precise list of symbols renamed is @code{yyparse}, @code{yylex},
c656404a
RS
3255@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yychar} and
3256@code{yydebug}. For example, if you use @samp{-p c}, the names become
3257@code{cparse}, @code{clex}, and so on.
bfa74976
RS
3258
3259@strong{All the other variables and macros associated with Bison are not
3260renamed.} These others are not global; there is no conflict if the same
3261name is used in different parsers. For example, @code{YYSTYPE} is not
3262renamed, but defining this in different ways in different parsers causes
3263no trouble (@pxref{Value Type, ,Data Types of Semantic Values}).
3264
3265The @samp{-p} option works by adding macro definitions to the beginning
3266of the parser source file, defining @code{yyparse} as
3267@code{@var{prefix}parse}, and so on. This effectively substitutes one
3268name for the other in the entire parser file.
3269
3270@node Interface, Algorithm, Grammar File, Top
3271@chapter Parser C-Language Interface
3272@cindex C-language interface
3273@cindex interface
3274
3275The Bison parser is actually a C function named @code{yyparse}. Here we
3276describe the interface conventions of @code{yyparse} and the other
3277functions that it needs to use.
3278
3279Keep in mind that the parser uses many C identifiers starting with
3280@samp{yy} and @samp{YY} for internal purposes. If you use such an
3281identifier (aside from those in this manual) in an action or in additional
3282C code in the grammar file, you are likely to run into trouble.
3283
3284@menu
3285* Parser Function:: How to call @code{yyparse} and what it returns.
13863333 3286* Lexical:: You must supply a function @code{yylex}
bfa74976
RS
3287 which reads tokens.
3288* Error Reporting:: You must supply a function @code{yyerror}.
3289* Action Features:: Special features for use in actions.
3290@end menu
3291
3292@node Parser Function, Lexical, , Interface
3293@section The Parser Function @code{yyparse}
3294@findex yyparse
3295
3296You call the function @code{yyparse} to cause parsing to occur. This
3297function reads tokens, executes actions, and ultimately returns when it
3298encounters end-of-input or an unrecoverable syntax error. You can also
3299write an action which directs @code{yyparse} to return immediately without
3300reading further.
3301
3302The value returned by @code{yyparse} is 0 if parsing was successful (return
3303is due to end-of-input).
3304
3305The value is 1 if parsing failed (return is due to a syntax error).
3306
3307In an action, you can cause immediate return from @code{yyparse} by using
3308these macros:
3309
3310@table @code
3311@item YYACCEPT
3312@findex YYACCEPT
3313Return immediately with value 0 (to report success).
3314
3315@item YYABORT
3316@findex YYABORT
3317Return immediately with value 1 (to report failure).
3318@end table
3319
3320@node Lexical, Error Reporting, Parser Function, Interface
3321@section The Lexical Analyzer Function @code{yylex}
3322@findex yylex
3323@cindex lexical analyzer
3324
3325The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
3326the input stream and returns them to the parser. Bison does not create
3327this function automatically; you must write it so that @code{yyparse} can
3328call it. The function is sometimes referred to as a lexical scanner.
3329
3330In simple programs, @code{yylex} is often defined at the end of the Bison
3331grammar file. If @code{yylex} is defined in a separate source file, you
3332need to arrange for the token-type macro definitions to be available there.
3333To do this, use the @samp{-d} option when you run Bison, so that it will
3334write these macro definitions into a separate header file
3335@file{@var{name}.tab.h} which you can include in the other source files
3336that need it. @xref{Invocation, ,Invoking Bison}.@refill
3337
3338@menu
3339* Calling Convention:: How @code{yyparse} calls @code{yylex}.
3340* Token Values:: How @code{yylex} must return the semantic value
3341 of the token it has read.
3342* Token Positions:: How @code{yylex} must return the text position
3343 (line number, etc.) of the token, if the
3344 actions want that.
3345* Pure Calling:: How the calling convention differs
3346 in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
3347@end menu
3348
3349@node Calling Convention, Token Values, , Lexical
3350@subsection Calling Convention for @code{yylex}
3351
3352The value that @code{yylex} returns must be the numeric code for the type
3353of token it has just found, or 0 for end-of-input.
3354
3355When a token is referred to in the grammar rules by a name, that name
3356in the parser file becomes a C macro whose definition is the proper
3357numeric code for that token type. So @code{yylex} can use the name
3358to indicate that type. @xref{Symbols}.
3359
3360When a token is referred to in the grammar rules by a character literal,
3361the numeric code for that character is also the code for the token type.
3362So @code{yylex} can simply return that character code. The null character
3363must not be used this way, because its code is zero and that is what
3364signifies end-of-input.
3365
3366Here is an example showing these things:
3367
3368@example
13863333
AD
3369int
3370yylex (void)
bfa74976
RS
3371@{
3372 @dots{}
3373 if (c == EOF) /* Detect end of file. */
3374 return 0;
3375 @dots{}
3376 if (c == '+' || c == '-')
3377 return c; /* Assume token type for `+' is '+'. */
3378 @dots{}
3379 return INT; /* Return the type of the token. */
3380 @dots{}
3381@}
3382@end example
3383
3384@noindent
3385This interface has been designed so that the output from the @code{lex}
3386utility can be used without change as the definition of @code{yylex}.
3387
931c7513
RS
3388If the grammar uses literal string tokens, there are two ways that
3389@code{yylex} can determine the token type codes for them:
3390
3391@itemize @bullet
3392@item
3393If the grammar defines symbolic token names as aliases for the
3394literal string tokens, @code{yylex} can use these symbolic names like
3395all others. In this case, the use of the literal string tokens in
3396the grammar file has no effect on @code{yylex}.
3397
3398@item
9ecbd125 3399@code{yylex} can find the multicharacter token in the @code{yytname}
931c7513 3400table. The index of the token in the table is the token type's code.
9ecbd125 3401The name of a multicharacter token is recorded in @code{yytname} with a
931c7513
RS
3402double-quote, the token's characters, and another double-quote. The
3403token's characters are not escaped in any way; they appear verbatim in
3404the contents of the string in the table.
3405
3406Here's code for looking up a token in @code{yytname}, assuming that the
3407characters of the token are stored in @code{token_buffer}.
3408
3409@smallexample
3410for (i = 0; i < YYNTOKENS; i++)
3411 @{
3412 if (yytname[i] != 0
3413 && yytname[i][0] == '"'
6f515a27
JT
3414 && strncmp (yytname[i] + 1, token_buffer,
3415 strlen (token_buffer))
931c7513
RS
3416 && yytname[i][strlen (token_buffer) + 1] == '"'
3417 && yytname[i][strlen (token_buffer) + 2] == 0)
3418 break;
3419 @}
3420@end smallexample
3421
3422The @code{yytname} table is generated only if you use the
3423@code{%token_table} declaration. @xref{Decl Summary}.
3424@end itemize
3425
bfa74976
RS
3426@node Token Values, Token Positions, Calling Convention, Lexical
3427@subsection Semantic Values of Tokens
3428
3429@vindex yylval
3430In an ordinary (nonreentrant) parser, the semantic value of the token must
3431be stored into the global variable @code{yylval}. When you are using
3432just one data type for semantic values, @code{yylval} has that type.
3433Thus, if the type is @code{int} (the default), you might write this in
3434@code{yylex}:
3435
3436@example
3437@group
3438 @dots{}
3439 yylval = value; /* Put value onto Bison stack. */
3440 return INT; /* Return the type of the token. */
3441 @dots{}
3442@end group
3443@end example
3444
3445When you are using multiple data types, @code{yylval}'s type is a union
3446made from the @code{%union} declaration (@pxref{Union Decl, ,The Collection of Value Types}). So when
3447you store a token's value, you must use the proper member of the union.
3448If the @code{%union} declaration looks like this:
3449
3450@example
3451@group
3452%union @{
3453 int intval;
3454 double val;
3455 symrec *tptr;
3456@}
3457@end group
3458@end example
3459
3460@noindent
3461then the code in @code{yylex} might look like this:
3462
3463@example
3464@group
3465 @dots{}
3466 yylval.intval = value; /* Put value onto Bison stack. */
3467 return INT; /* Return the type of the token. */
3468 @dots{}
3469@end group
3470@end example
3471
3472@node Token Positions, Pure Calling, Token Values, Lexical
3473@subsection Textual Positions of Tokens
3474
3475@vindex yylloc
3476If you are using the @samp{@@@var{n}}-feature (@pxref{Action Features, ,Special Features for Use in Actions}) in
3477actions to keep track of the textual locations of tokens and groupings,
3478then you must provide this information in @code{yylex}. The function
3479@code{yyparse} expects to find the textual location of a token just parsed
3480in the global variable @code{yylloc}. So @code{yylex} must store the
3481proper data in that variable. The value of @code{yylloc} is a structure
3482and you need only initialize the members that are going to be used by the
3483actions. The four members are called @code{first_line},
3484@code{first_column}, @code{last_line} and @code{last_column}. Note that
3485the use of this feature makes the parser noticeably slower.
3486
3487@tindex YYLTYPE
3488The data type of @code{yylloc} has the name @code{YYLTYPE}.
3489
3490@node Pure Calling, , Token Positions, Lexical
c656404a 3491@subsection Calling Conventions for Pure Parsers
bfa74976 3492
e425e872
RS
3493When you use the Bison declaration @code{%pure_parser} to request a
3494pure, reentrant parser, the global communication variables @code{yylval}
3495and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
3496Parser}.) In such parsers the two global variables are replaced by
3497pointers passed as arguments to @code{yylex}. You must declare them as
3498shown here, and pass the information back by storing it through those
3499pointers.
bfa74976
RS
3500
3501@example
13863333
AD
3502int
3503yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
bfa74976
RS
3504@{
3505 @dots{}
3506 *lvalp = value; /* Put value onto Bison stack. */
3507 return INT; /* Return the type of the token. */
3508 @dots{}
3509@}
3510@end example
3511
3512If the grammar file does not use the @samp{@@} constructs to refer to
3513textual positions, then the type @code{YYLTYPE} will not be defined. In
3514this case, omit the second argument; @code{yylex} will be called with
3515only one argument.
3516
c656404a 3517@vindex YYPARSE_PARAM
931c7513
RS
3518If you use a reentrant parser, you can optionally pass additional
3519parameter information to it in a reentrant way. To do so, define the
3520macro @code{YYPARSE_PARAM} as a variable name. This modifies the
3521@code{yyparse} function to accept one argument, of type @code{void *},
3522with that name.
e425e872
RS
3523
3524When you call @code{yyparse}, pass the address of an object, casting the
3525address to @code{void *}. The grammar actions can refer to the contents
3526of the object by casting the pointer value back to its proper type and
3527then dereferencing it. Here's an example. Write this in the parser:
3528
3529@example
3530%@{
3531struct parser_control
3532@{
3533 int nastiness;
3534 int randomness;
3535@};
3536
3537#define YYPARSE_PARAM parm
3538%@}
3539@end example
3540
3541@noindent
3542Then call the parser like this:
3543
3544@example
3545struct parser_control
3546@{
3547 int nastiness;
3548 int randomness;
3549@};
3550
3551@dots{}
3552
3553@{
3554 struct parser_control foo;
3555 @dots{} /* @r{Store proper data in @code{foo}.} */
3556 value = yyparse ((void *) &foo);
3557 @dots{}
3558@}
3559@end example
3560
3561@noindent
3562In the grammar actions, use expressions like this to refer to the data:
3563
3564@example
3565((struct parser_control *) parm)->randomness
3566@end example
3567
c656404a
RS
3568@vindex YYLEX_PARAM
3569If you wish to pass the additional parameter data to @code{yylex},
3570define the macro @code{YYLEX_PARAM} just like @code{YYPARSE_PARAM}, as
3571shown here:
3572
3573@example
3574%@{
3575struct parser_control
3576@{
3577 int nastiness;
3578 int randomness;
3579@};
3580
3581#define YYPARSE_PARAM parm
3582#define YYLEX_PARAM parm
3583%@}
3584@end example
3585
3586You should then define @code{yylex} to accept one additional
3587argument---the value of @code{parm}. (This makes either two or three
3588arguments in total, depending on whether an argument of type
3589@code{YYLTYPE} is passed.) You can declare the argument as a pointer to
3590the proper object type, or you can declare it as @code{void *} and
3591access the contents as shown above.
3592
931c7513
RS
3593You can use @samp{%pure_parser} to request a reentrant parser without
3594also using @code{YYPARSE_PARAM}. Then you should call @code{yyparse}
3595with no arguments, as usual.
3596
bfa74976
RS
3597@node Error Reporting, Action Features, Lexical, Interface
3598@section The Error Reporting Function @code{yyerror}
3599@cindex error reporting function
3600@findex yyerror
3601@cindex parse error
3602@cindex syntax error
3603
3604The Bison parser detects a @dfn{parse error} or @dfn{syntax error}
9ecbd125 3605whenever it reads a token which cannot satisfy any syntax rule. An
bfa74976 3606action in the grammar can also explicitly proclaim an error, using the
ceed8467
AD
3607macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use
3608in Actions}).
bfa74976
RS
3609
3610The Bison parser expects to report the error by calling an error
3611reporting function named @code{yyerror}, which you must supply. It is
3612called by @code{yyparse} whenever a syntax error is found, and it
3613receives one argument. For a parse error, the string is normally
3614@w{@code{"parse error"}}.
3615
3616@findex YYERROR_VERBOSE
3617If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations
ceed8467
AD
3618section (@pxref{Bison Declarations, ,The Bison Declarations Section}),
3619then Bison provides a more verbose and specific error message string
3620instead of just plain @w{@code{"parse error"}}. It doesn't matter what
3621definition you use for @code{YYERROR_VERBOSE}, just whether you define
3622it.
bfa74976
RS
3623
3624The parser can detect one other kind of error: stack overflow. This
3625happens when the input contains constructions that are very deeply
3626nested. It isn't likely you will encounter this, since the Bison
3627parser extends its stack automatically up to a very large limit. But
3628if overflow happens, @code{yyparse} calls @code{yyerror} in the usual
3629fashion, except that the argument string is @w{@code{"parser stack
3630overflow"}}.
3631
3632The following definition suffices in simple programs:
3633
3634@example
3635@group
13863333
AD
3636void
3637yyerror (char *s)
bfa74976
RS
3638@{
3639@end group
3640@group
3641 fprintf (stderr, "%s\n", s);
3642@}
3643@end group
3644@end example
3645
3646After @code{yyerror} returns to @code{yyparse}, the latter will attempt
3647error recovery if you have written suitable error recovery grammar rules
3648(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
3649immediately return 1.
3650
3651@vindex yynerrs
3652The variable @code{yynerrs} contains the number of syntax errors
3653encountered so far. Normally this variable is global; but if you
3654request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}) then it is a local variable
3655which only the actions can access.
3656
3657@node Action Features, , Error Reporting, Interface
3658@section Special Features for Use in Actions
3659@cindex summary, action features
3660@cindex action features summary
3661
3662Here is a table of Bison constructs, variables and macros that
3663are useful in actions.
3664
3665@table @samp
3666@item $$
3667Acts like a variable that contains the semantic value for the
3668grouping made by the current rule. @xref{Actions}.
3669
3670@item $@var{n}
3671Acts like a variable that contains the semantic value for the
3672@var{n}th component of the current rule. @xref{Actions}.
3673
3674@item $<@var{typealt}>$
3675Like @code{$$} but specifies alternative @var{typealt} in the union
3676specified by the @code{%union} declaration. @xref{Action Types, ,Data Types of Values in Actions}.
3677
3678@item $<@var{typealt}>@var{n}
3679Like @code{$@var{n}} but specifies alternative @var{typealt} in the
13863333 3680union specified by the @code{%union} declaration.
bfa74976
RS
3681@xref{Action Types, ,Data Types of Values in Actions}.@refill
3682
3683@item YYABORT;
3684Return immediately from @code{yyparse}, indicating failure.
3685@xref{Parser Function, ,The Parser Function @code{yyparse}}.
3686
3687@item YYACCEPT;
3688Return immediately from @code{yyparse}, indicating success.
3689@xref{Parser Function, ,The Parser Function @code{yyparse}}.
3690
3691@item YYBACKUP (@var{token}, @var{value});
3692@findex YYBACKUP
3693Unshift a token. This macro is allowed only for rules that reduce
3694a single value, and only when there is no look-ahead token.
3695It installs a look-ahead token with token type @var{token} and
3696semantic value @var{value}; then it discards the value that was
3697going to be reduced by this rule.
3698
3699If the macro is used when it is not valid, such as when there is
3700a look-ahead token already, then it reports a syntax error with
3701a message @samp{cannot back up} and performs ordinary error
3702recovery.
3703
3704In either case, the rest of the action is not executed.
3705
3706@item YYEMPTY
3707@vindex YYEMPTY
3708Value stored in @code{yychar} when there is no look-ahead token.
3709
3710@item YYERROR;
3711@findex YYERROR
3712Cause an immediate syntax error. This statement initiates error
3713recovery just as if the parser itself had detected an error; however, it
3714does not call @code{yyerror}, and does not print any message. If you
3715want to print an error message, call @code{yyerror} explicitly before
3716the @samp{YYERROR;} statement. @xref{Error Recovery}.
3717
3718@item YYRECOVERING
3719This macro stands for an expression that has the value 1 when the parser
3720is recovering from a syntax error, and 0 the rest of the time.
3721@xref{Error Recovery}.
3722
3723@item yychar
3724Variable containing the current look-ahead token. (In a pure parser,
3725this is actually a local variable within @code{yyparse}.) When there is
3726no look-ahead token, the value @code{YYEMPTY} is stored in the variable.
3727@xref{Look-Ahead, ,Look-Ahead Tokens}.
3728
3729@item yyclearin;
3730Discard the current look-ahead token. This is useful primarily in
3731error rules. @xref{Error Recovery}.
3732
3733@item yyerrok;
3734Resume generating error messages immediately for subsequent syntax
13863333 3735errors. This is useful primarily in error rules.
bfa74976
RS
3736@xref{Error Recovery}.
3737
3738@item @@@var{n}
3739@findex @@@var{n}
3740Acts like a structure variable containing information on the line
3741numbers and column numbers of the @var{n}th component of the current
3742rule. The structure has four members, like this:
3743
3744@example
3745struct @{
3746 int first_line, last_line;
3747 int first_column, last_column;
3748@};
3749@end example
3750
6f515a27
JT
3751Thus, to get the starting line number of the third component, you would
3752use @samp{@@3.first_line}.
bfa74976
RS
3753
3754In order for the members of this structure to contain valid information,
3755you must make @code{yylex} supply this information about each token.
3756If you need only certain members, then @code{yylex} need only fill in
3757those members.
3758
3759The use of this feature makes the parser noticeably slower.
3760@end table
3761
3762@node Algorithm, Error Recovery, Interface, Top
13863333
AD
3763@chapter The Bison Parser Algorithm
3764@cindex Bison parser algorithm
bfa74976
RS
3765@cindex algorithm of parser
3766@cindex shifting
3767@cindex reduction
3768@cindex parser stack
3769@cindex stack, parser
3770
3771As Bison reads tokens, it pushes them onto a stack along with their
3772semantic values. The stack is called the @dfn{parser stack}. Pushing a
3773token is traditionally called @dfn{shifting}.
3774
3775For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
3776@samp{3} to come. The stack will have four elements, one for each token
3777that was shifted.
3778
3779But the stack does not always have an element for each token read. When
3780the last @var{n} tokens and groupings shifted match the components of a
3781grammar rule, they can be combined according to that rule. This is called
3782@dfn{reduction}. Those tokens and groupings are replaced on the stack by a
3783single grouping whose symbol is the result (left hand side) of that rule.
3784Running the rule's action is part of the process of reduction, because this
3785is what computes the semantic value of the resulting grouping.
3786
3787For example, if the infix calculator's parser stack contains this:
3788
3789@example
37901 + 5 * 3
3791@end example
3792
3793@noindent
3794and the next input token is a newline character, then the last three
3795elements can be reduced to 15 via the rule:
3796
3797@example
3798expr: expr '*' expr;
3799@end example
3800
3801@noindent
3802Then the stack contains just these three elements:
3803
3804@example
38051 + 15
3806@end example
3807
3808@noindent
3809At this point, another reduction can be made, resulting in the single value
381016. Then the newline token can be shifted.
3811
3812The parser tries, by shifts and reductions, to reduce the entire input down
3813to a single grouping whose symbol is the grammar's start-symbol
3814(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
3815
3816This kind of parser is known in the literature as a bottom-up parser.
3817
3818@menu
3819* Look-Ahead:: Parser looks one token ahead when deciding what to do.
3820* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
3821* Precedence:: Operator precedence works by resolving conflicts.
3822* Contextual Precedence:: When an operator's precedence depends on context.
3823* Parser States:: The parser is a finite-state-machine with stack.
3824* Reduce/Reduce:: When two rules are applicable in the same situation.
3825* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
3826* Stack Overflow:: What happens when stack gets full. How to avoid it.
3827@end menu
3828
3829@node Look-Ahead, Shift/Reduce, , Algorithm
3830@section Look-Ahead Tokens
3831@cindex look-ahead token
3832
3833The Bison parser does @emph{not} always reduce immediately as soon as the
3834last @var{n} tokens and groupings match a rule. This is because such a
3835simple strategy is inadequate to handle most languages. Instead, when a
3836reduction is possible, the parser sometimes ``looks ahead'' at the next
3837token in order to decide what to do.
3838
3839When a token is read, it is not immediately shifted; first it becomes the
3840@dfn{look-ahead token}, which is not on the stack. Now the parser can
3841perform one or more reductions of tokens and groupings on the stack, while
3842the look-ahead token remains off to the side. When no more reductions
3843should take place, the look-ahead token is shifted onto the stack. This
3844does not mean that all possible reductions have been done; depending on the
3845token type of the look-ahead token, some rules may choose to delay their
3846application.
3847
3848Here is a simple case where look-ahead is needed. These three rules define
3849expressions which contain binary addition operators and postfix unary
3850factorial operators (@samp{!}), and allow parentheses for grouping.
3851
3852@example
3853@group
3854expr: term '+' expr
3855 | term
3856 ;
3857@end group
3858
3859@group
3860term: '(' expr ')'
3861 | term '!'
3862 | NUMBER
3863 ;
3864@end group
3865@end example
3866
3867Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
3868should be done? If the following token is @samp{)}, then the first three
3869tokens must be reduced to form an @code{expr}. This is the only valid
3870course, because shifting the @samp{)} would produce a sequence of symbols
3871@w{@code{term ')'}}, and no rule allows this.
3872
3873If the following token is @samp{!}, then it must be shifted immediately so
3874that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
3875parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
3876@code{expr}. It would then be impossible to shift the @samp{!} because
3877doing so would produce on the stack the sequence of symbols @code{expr
3878'!'}. No rule allows that sequence.
3879
3880@vindex yychar
3881The current look-ahead token is stored in the variable @code{yychar}.
3882@xref{Action Features, ,Special Features for Use in Actions}.
3883
3884@node Shift/Reduce, Precedence, Look-Ahead, Algorithm
3885@section Shift/Reduce Conflicts
3886@cindex conflicts
3887@cindex shift/reduce conflicts
3888@cindex dangling @code{else}
3889@cindex @code{else}, dangling
3890
3891Suppose we are parsing a language which has if-then and if-then-else
3892statements, with a pair of rules like this:
3893
3894@example
3895@group
3896if_stmt:
3897 IF expr THEN stmt
3898 | IF expr THEN stmt ELSE stmt
3899 ;
3900@end group
3901@end example
3902
3903@noindent
3904Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
3905terminal symbols for specific keyword tokens.
3906
3907When the @code{ELSE} token is read and becomes the look-ahead token, the
3908contents of the stack (assuming the input is valid) are just right for
3909reduction by the first rule. But it is also legitimate to shift the
3910@code{ELSE}, because that would lead to eventual reduction by the second
3911rule.
3912
3913This situation, where either a shift or a reduction would be valid, is
3914called a @dfn{shift/reduce conflict}. Bison is designed to resolve
3915these conflicts by choosing to shift, unless otherwise directed by
3916operator precedence declarations. To see the reason for this, let's
3917contrast it with the other alternative.
3918
3919Since the parser prefers to shift the @code{ELSE}, the result is to attach
3920the else-clause to the innermost if-statement, making these two inputs
3921equivalent:
3922
3923@example
3924if x then if y then win (); else lose;
3925
3926if x then do; if y then win (); else lose; end;
3927@end example
3928
3929But if the parser chose to reduce when possible rather than shift, the
3930result would be to attach the else-clause to the outermost if-statement,
3931making these two inputs equivalent:
3932
3933@example
3934if x then if y then win (); else lose;
3935
3936if x then do; if y then win (); end; else lose;
3937@end example
3938
3939The conflict exists because the grammar as written is ambiguous: either
3940parsing of the simple nested if-statement is legitimate. The established
3941convention is that these ambiguities are resolved by attaching the
3942else-clause to the innermost if-statement; this is what Bison accomplishes
3943by choosing to shift rather than reduce. (It would ideally be cleaner to
3944write an unambiguous grammar, but that is very hard to do in this case.)
3945This particular ambiguity was first encountered in the specifications of
3946Algol 60 and is called the ``dangling @code{else}'' ambiguity.
3947
3948To avoid warnings from Bison about predictable, legitimate shift/reduce
3949conflicts, use the @code{%expect @var{n}} declaration. There will be no
3950warning as long as the number of shift/reduce conflicts is exactly @var{n}.
3951@xref{Expect Decl, ,Suppressing Conflict Warnings}.
3952
3953The definition of @code{if_stmt} above is solely to blame for the
3954conflict, but the conflict does not actually appear without additional
3955rules. Here is a complete Bison input file that actually manifests the
3956conflict:
3957
3958@example
3959@group
3960%token IF THEN ELSE variable
3961%%
3962@end group
3963@group
3964stmt: expr
3965 | if_stmt
3966 ;
3967@end group
3968
3969@group
3970if_stmt:
3971 IF expr THEN stmt
3972 | IF expr THEN stmt ELSE stmt
3973 ;
3974@end group
3975
3976expr: variable
3977 ;
3978@end example
3979
3980@node Precedence, Contextual Precedence, Shift/Reduce, Algorithm
3981@section Operator Precedence
3982@cindex operator precedence
3983@cindex precedence of operators
3984
3985Another situation where shift/reduce conflicts appear is in arithmetic
3986expressions. Here shifting is not always the preferred resolution; the
3987Bison declarations for operator precedence allow you to specify when to
3988shift and when to reduce.
3989
3990@menu
3991* Why Precedence:: An example showing why precedence is needed.
3992* Using Precedence:: How to specify precedence in Bison grammars.
3993* Precedence Examples:: How these features are used in the previous example.
3994* How Precedence:: How they work.
3995@end menu
3996
3997@node Why Precedence, Using Precedence, , Precedence
3998@subsection When Precedence is Needed
3999
4000Consider the following ambiguous grammar fragment (ambiguous because the
4001input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
4002
4003@example
4004@group
4005expr: expr '-' expr
4006 | expr '*' expr
4007 | expr '<' expr
4008 | '(' expr ')'
4009 @dots{}
4010 ;
4011@end group
4012@end example
4013
4014@noindent
4015Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
9ecbd125 4016should it reduce them via the rule for the subtraction operator? It depends
bfa74976
RS
4017on the next token. Of course, if the next token is @samp{)}, we must
4018reduce; shifting is invalid because no single rule can reduce the token
4019sequence @w{@samp{- 2 )}} or anything starting with that. But if the next
4020token is @samp{*} or @samp{<}, we have a choice: either shifting or
4021reduction would allow the parse to complete, but with different
4022results.
4023
4024To decide which one Bison should do, we must consider the
4025results. If the next operator token @var{op} is shifted, then it
4026must be reduced first in order to permit another opportunity to
9ecbd125 4027reduce the difference. The result is (in effect) @w{@samp{1 - (2
bfa74976
RS
4028@var{op} 3)}}. On the other hand, if the subtraction is reduced
4029before shifting @var{op}, the result is @w{@samp{(1 - 2) @var{op}
40303}}. Clearly, then, the choice of shift or reduce should depend
4031on the relative precedence of the operators @samp{-} and
4032@var{op}: @samp{*} should be shifted first, but not @samp{<}.
4033
4034@cindex associativity
4035What about input such as @w{@samp{1 - 2 - 5}}; should this be
4036@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For
4037most operators we prefer the former, which is called @dfn{left
4038association}. The latter alternative, @dfn{right association}, is
4039desirable for assignment operators. The choice of left or right
4040association is a matter of whether the parser chooses to shift or
4041reduce when the stack contains @w{@samp{1 - 2}} and the look-ahead
4042token is @samp{-}: shifting makes right-associativity.
4043
4044@node Using Precedence, Precedence Examples, Why Precedence, Precedence
4045@subsection Specifying Operator Precedence
4046@findex %left
4047@findex %right
4048@findex %nonassoc
4049
4050Bison allows you to specify these choices with the operator precedence
4051declarations @code{%left} and @code{%right}. Each such declaration
4052contains a list of tokens, which are operators whose precedence and
4053associativity is being declared. The @code{%left} declaration makes all
4054those operators left-associative and the @code{%right} declaration makes
4055them right-associative. A third alternative is @code{%nonassoc}, which
4056declares that it is a syntax error to find the same operator twice ``in a
4057row''.
4058
4059The relative precedence of different operators is controlled by the
4060order in which they are declared. The first @code{%left} or
4061@code{%right} declaration in the file declares the operators whose
4062precedence is lowest, the next such declaration declares the operators
4063whose precedence is a little higher, and so on.
4064
4065@node Precedence Examples, How Precedence, Using Precedence, Precedence
4066@subsection Precedence Examples
4067
4068In our example, we would want the following declarations:
4069
4070@example
4071%left '<'
4072%left '-'
4073%left '*'
4074@end example
4075
4076In a more complete example, which supports other operators as well, we
4077would declare them in groups of equal precedence. For example, @code{'+'} is
4078declared with @code{'-'}:
4079
4080@example
4081%left '<' '>' '=' NE LE GE
4082%left '+' '-'
4083%left '*' '/'
4084@end example
4085
4086@noindent
4087(Here @code{NE} and so on stand for the operators for ``not equal''
4088and so on. We assume that these tokens are more than one character long
4089and therefore are represented by names, not character literals.)
4090
4091@node How Precedence, , Precedence Examples, Precedence
4092@subsection How Precedence Works
4093
4094The first effect of the precedence declarations is to assign precedence
4095levels to the terminal symbols declared. The second effect is to assign
4096precedence levels to certain rules: each rule gets its precedence from the
4097last terminal symbol mentioned in the components. (You can also specify
4098explicitly the precedence of a rule. @xref{Contextual Precedence, ,Context-Dependent Precedence}.)
4099
4100Finally, the resolution of conflicts works by comparing the
4101precedence of the rule being considered with that of the
4102look-ahead token. If the token's precedence is higher, the
4103choice is to shift. If the rule's precedence is higher, the
4104choice is to reduce. If they have equal precedence, the choice
4105is made based on the associativity of that precedence level. The
4106verbose output file made by @samp{-v} (@pxref{Invocation, ,Invoking Bison}) says
4107how each conflict was resolved.
4108
4109Not all rules and not all tokens have precedence. If either the rule or
4110the look-ahead token has no precedence, then the default is to shift.
4111
4112@node Contextual Precedence, Parser States, Precedence, Algorithm
4113@section Context-Dependent Precedence
4114@cindex context-dependent precedence
4115@cindex unary operator precedence
4116@cindex precedence, context-dependent
4117@cindex precedence, unary operator
4118@findex %prec
4119
4120Often the precedence of an operator depends on the context. This sounds
4121outlandish at first, but it is really very common. For example, a minus
4122sign typically has a very high precedence as a unary operator, and a
4123somewhat lower precedence (lower than multiplication) as a binary operator.
4124
4125The Bison precedence declarations, @code{%left}, @code{%right} and
4126@code{%nonassoc}, can only be used once for a given token; so a token has
4127only one precedence declared in this way. For context-dependent
4128precedence, you need to use an additional mechanism: the @code{%prec}
4129modifier for rules.@refill
4130
4131The @code{%prec} modifier declares the precedence of a particular rule by
4132specifying a terminal symbol whose precedence should be used for that rule.
4133It's not necessary for that symbol to appear otherwise in the rule. The
4134modifier's syntax is:
4135
4136@example
4137%prec @var{terminal-symbol}
4138@end example
4139
4140@noindent
4141and it is written after the components of the rule. Its effect is to
4142assign the rule the precedence of @var{terminal-symbol}, overriding
4143the precedence that would be deduced for it in the ordinary way. The
4144altered rule precedence then affects how conflicts involving that rule
4145are resolved (@pxref{Precedence, ,Operator Precedence}).
4146
4147Here is how @code{%prec} solves the problem of unary minus. First, declare
4148a precedence for a fictitious terminal symbol named @code{UMINUS}. There
4149are no tokens of this type, but the symbol serves to stand for its
4150precedence:
4151
4152@example
4153@dots{}
4154%left '+' '-'
4155%left '*'
4156%left UMINUS
4157@end example
4158
4159Now the precedence of @code{UMINUS} can be used in specific rules:
4160
4161@example
4162@group
4163exp: @dots{}
4164 | exp '-' exp
4165 @dots{}
4166 | '-' exp %prec UMINUS
4167@end group
4168@end example
4169
4170@node Parser States, Reduce/Reduce, Contextual Precedence, Algorithm
4171@section Parser States
4172@cindex finite-state machine
4173@cindex parser state
4174@cindex state (of parser)
4175
4176The function @code{yyparse} is implemented using a finite-state machine.
4177The values pushed on the parser stack are not simply token type codes; they
4178represent the entire sequence of terminal and nonterminal symbols at or
4179near the top of the stack. The current state collects all the information
4180about previous input which is relevant to deciding what to do next.
4181
4182Each time a look-ahead token is read, the current parser state together
4183with the type of look-ahead token are looked up in a table. This table
4184entry can say, ``Shift the look-ahead token.'' In this case, it also
4185specifies the new parser state, which is pushed onto the top of the
4186parser stack. Or it can say, ``Reduce using rule number @var{n}.''
4187This means that a certain number of tokens or groupings are taken off
4188the top of the stack, and replaced by one grouping. In other words,
4189that number of states are popped from the stack, and one new state is
4190pushed.
4191
4192There is one other alternative: the table can say that the look-ahead token
4193is erroneous in the current state. This causes error processing to begin
4194(@pxref{Error Recovery}).
4195
4196@node Reduce/Reduce, Mystery Conflicts, Parser States, Algorithm
4197@section Reduce/Reduce Conflicts
4198@cindex reduce/reduce conflict
4199@cindex conflicts, reduce/reduce
4200
4201A reduce/reduce conflict occurs if there are two or more rules that apply
4202to the same sequence of input. This usually indicates a serious error
4203in the grammar.
4204
4205For example, here is an erroneous attempt to define a sequence
4206of zero or more @code{word} groupings.
4207
4208@example
4209sequence: /* empty */
4210 @{ printf ("empty sequence\n"); @}
4211 | maybeword
4212 | sequence word
4213 @{ printf ("added word %s\n", $2); @}
4214 ;
4215
4216maybeword: /* empty */
4217 @{ printf ("empty maybeword\n"); @}
4218 | word
4219 @{ printf ("single word %s\n", $1); @}
4220 ;
4221@end example
4222
4223@noindent
4224The error is an ambiguity: there is more than one way to parse a single
4225@code{word} into a @code{sequence}. It could be reduced to a
4226@code{maybeword} and then into a @code{sequence} via the second rule.
4227Alternatively, nothing-at-all could be reduced into a @code{sequence}
4228via the first rule, and this could be combined with the @code{word}
4229using the third rule for @code{sequence}.
4230
4231There is also more than one way to reduce nothing-at-all into a
4232@code{sequence}. This can be done directly via the first rule,
4233or indirectly via @code{maybeword} and then the second rule.
4234
4235You might think that this is a distinction without a difference, because it
4236does not change whether any particular input is valid or not. But it does
4237affect which actions are run. One parsing order runs the second rule's
4238action; the other runs the first rule's action and the third rule's action.
4239In this example, the output of the program changes.
4240
4241Bison resolves a reduce/reduce conflict by choosing to use the rule that
4242appears first in the grammar, but it is very risky to rely on this. Every
4243reduce/reduce conflict must be studied and usually eliminated. Here is the
4244proper way to define @code{sequence}:
4245
4246@example
4247sequence: /* empty */
4248 @{ printf ("empty sequence\n"); @}
4249 | sequence word
4250 @{ printf ("added word %s\n", $2); @}
4251 ;
4252@end example
4253
4254Here is another common error that yields a reduce/reduce conflict:
4255
4256@example
4257sequence: /* empty */
4258 | sequence words
4259 | sequence redirects
4260 ;
4261
4262words: /* empty */
4263 | words word
4264 ;
4265
4266redirects:/* empty */
4267 | redirects redirect
4268 ;
4269@end example
4270
4271@noindent
4272The intention here is to define a sequence which can contain either
4273@code{word} or @code{redirect} groupings. The individual definitions of
4274@code{sequence}, @code{words} and @code{redirects} are error-free, but the
4275three together make a subtle ambiguity: even an empty input can be parsed
4276in infinitely many ways!
4277
4278Consider: nothing-at-all could be a @code{words}. Or it could be two
4279@code{words} in a row, or three, or any number. It could equally well be a
4280@code{redirects}, or two, or any number. Or it could be a @code{words}
4281followed by three @code{redirects} and another @code{words}. And so on.
4282
4283Here are two ways to correct these rules. First, to make it a single level
4284of sequence:
4285
4286@example
4287sequence: /* empty */
4288 | sequence word
4289 | sequence redirect
4290 ;
4291@end example
4292
4293Second, to prevent either a @code{words} or a @code{redirects}
4294from being empty:
4295
4296@example
4297sequence: /* empty */
4298 | sequence words
4299 | sequence redirects
4300 ;
4301
4302words: word
4303 | words word
4304 ;
4305
4306redirects:redirect
4307 | redirects redirect
4308 ;
4309@end example
4310
4311@node Mystery Conflicts, Stack Overflow, Reduce/Reduce, Algorithm
4312@section Mysterious Reduce/Reduce Conflicts
4313
4314Sometimes reduce/reduce conflicts can occur that don't look warranted.
4315Here is an example:
4316
4317@example
4318@group
4319%token ID
4320
4321%%
4322def: param_spec return_spec ','
4323 ;
4324param_spec:
4325 type
4326 | name_list ':' type
4327 ;
4328@end group
4329@group
4330return_spec:
4331 type
4332 | name ':' type
4333 ;
4334@end group
4335@group
4336type: ID
4337 ;
4338@end group
4339@group
4340name: ID
4341 ;
4342name_list:
4343 name
4344 | name ',' name_list
4345 ;
4346@end group
4347@end example
4348
4349It would seem that this grammar can be parsed with only a single token
13863333 4350of look-ahead: when a @code{param_spec} is being read, an @code{ID} is
bfa74976
RS
4351a @code{name} if a comma or colon follows, or a @code{type} if another
4352@code{ID} follows. In other words, this grammar is LR(1).
4353
4354@cindex LR(1)
4355@cindex LALR(1)
4356However, Bison, like most parser generators, cannot actually handle all
4357LR(1) grammars. In this grammar, two contexts, that after an @code{ID}
4358at the beginning of a @code{param_spec} and likewise at the beginning of
4359a @code{return_spec}, are similar enough that Bison assumes they are the
4360same. They appear similar because the same set of rules would be
4361active---the rule for reducing to a @code{name} and that for reducing to
4362a @code{type}. Bison is unable to determine at that stage of processing
4363that the rules would require different look-ahead tokens in the two
4364contexts, so it makes a single parser state for them both. Combining
4365the two contexts causes a conflict later. In parser terminology, this
4366occurrence means that the grammar is not LALR(1).
4367
4368In general, it is better to fix deficiencies than to document them. But
4369this particular deficiency is intrinsically hard to fix; parser
4370generators that can handle LR(1) grammars are hard to write and tend to
4371produce parsers that are very large. In practice, Bison is more useful
4372as it is now.
4373
4374When the problem arises, you can often fix it by identifying the two
4375parser states that are being confused, and adding something to make them
4376look distinct. In the above example, adding one rule to
4377@code{return_spec} as follows makes the problem go away:
4378
4379@example
4380@group
4381%token BOGUS
4382@dots{}
4383%%
4384@dots{}
4385return_spec:
4386 type
4387 | name ':' type
4388 /* This rule is never used. */
4389 | ID BOGUS
4390 ;
4391@end group
4392@end example
4393
4394This corrects the problem because it introduces the possibility of an
4395additional active rule in the context after the @code{ID} at the beginning of
4396@code{return_spec}. This rule is not active in the corresponding context
4397in a @code{param_spec}, so the two contexts receive distinct parser states.
4398As long as the token @code{BOGUS} is never generated by @code{yylex},
4399the added rule cannot alter the way actual input is parsed.
4400
4401In this particular example, there is another way to solve the problem:
4402rewrite the rule for @code{return_spec} to use @code{ID} directly
4403instead of via @code{name}. This also causes the two confusing
4404contexts to have different sets of active rules, because the one for
4405@code{return_spec} activates the altered rule for @code{return_spec}
4406rather than the one for @code{name}.
4407
4408@example
4409param_spec:
4410 type
4411 | name_list ':' type
4412 ;
4413return_spec:
4414 type
4415 | ID ':' type
4416 ;
4417@end example
4418
4419@node Stack Overflow, , Mystery Conflicts, Algorithm
4420@section Stack Overflow, and How to Avoid It
4421@cindex stack overflow
4422@cindex parser stack overflow
4423@cindex overflow of parser stack
4424
4425The Bison parser stack can overflow if too many tokens are shifted and
4426not reduced. When this happens, the parser function @code{yyparse}
4427returns a nonzero value, pausing only to call @code{yyerror} to report
4428the overflow.
4429
4430@vindex YYMAXDEPTH
4431By defining the macro @code{YYMAXDEPTH}, you can control how deep the
4432parser stack can become before a stack overflow occurs. Define the
4433macro with a value that is an integer. This value is the maximum number
4434of tokens that can be shifted (and not reduced) before overflow.
4435It must be a constant expression whose value is known at compile time.
4436
4437The stack space allowed is not necessarily allocated. If you specify a
4438large value for @code{YYMAXDEPTH}, the parser actually allocates a small
4439stack at first, and then makes it bigger by stages as needed. This
4440increasing allocation happens automatically and silently. Therefore,
4441you do not need to make @code{YYMAXDEPTH} painfully small merely to save
4442space for ordinary inputs that do not need much stack.
4443
4444@cindex default stack limit
4445The default value of @code{YYMAXDEPTH}, if you do not define it, is
444610000.
4447
4448@vindex YYINITDEPTH
4449You can control how much stack is allocated initially by defining the
4450macro @code{YYINITDEPTH}. This value too must be a compile-time
4451constant integer. The default is 200.
4452
4453@node Error Recovery, Context Dependency, Algorithm, Top
4454@chapter Error Recovery
4455@cindex error recovery
4456@cindex recovery from errors
4457
4458It is not usually acceptable to have a program terminate on a parse
4459error. For example, a compiler should recover sufficiently to parse the
4460rest of the input file and check it for errors; a calculator should accept
4461another expression.
4462
4463In a simple interactive command parser where each input is one line, it may
4464be sufficient to allow @code{yyparse} to return 1 on error and have the
4465caller ignore the rest of the input line when that happens (and then call
4466@code{yyparse} again). But this is inadequate for a compiler, because it
4467forgets all the syntactic context leading up to the error. A syntax error
4468deep within a function in the compiler input should not cause the compiler
4469to treat the following line like the beginning of a source file.
4470
4471@findex error
4472You can define how to recover from a syntax error by writing rules to
4473recognize the special token @code{error}. This is a terminal symbol that
4474is always defined (you need not declare it) and reserved for error
4475handling. The Bison parser generates an @code{error} token whenever a
4476syntax error happens; if you have provided a rule to recognize this token
13863333 4477in the current context, the parse can continue.
bfa74976
RS
4478
4479For example:
4480
4481@example
4482stmnts: /* empty string */
4483 | stmnts '\n'
4484 | stmnts exp '\n'
4485 | stmnts error '\n'
4486@end example
4487
4488The fourth rule in this example says that an error followed by a newline
4489makes a valid addition to any @code{stmnts}.
4490
4491What happens if a syntax error occurs in the middle of an @code{exp}? The
4492error recovery rule, interpreted strictly, applies to the precise sequence
4493of a @code{stmnts}, an @code{error} and a newline. If an error occurs in
4494the middle of an @code{exp}, there will probably be some additional tokens
4495and subexpressions on the stack after the last @code{stmnts}, and there
4496will be tokens to read before the next newline. So the rule is not
4497applicable in the ordinary way.
4498
4499But Bison can force the situation to fit the rule, by discarding part of
4500the semantic context and part of the input. First it discards states and
4501objects from the stack until it gets back to a state in which the
4502@code{error} token is acceptable. (This means that the subexpressions
4503already parsed are discarded, back to the last complete @code{stmnts}.) At
4504this point the @code{error} token can be shifted. Then, if the old
4505look-ahead token is not acceptable to be shifted next, the parser reads
4506tokens and discards them until it finds a token which is acceptable. In
4507this example, Bison reads and discards input until the next newline
4508so that the fourth rule can apply.
4509
4510The choice of error rules in the grammar is a choice of strategies for
4511error recovery. A simple and useful strategy is simply to skip the rest of
4512the current input line or current statement if an error is detected:
4513
4514@example
4515stmnt: error ';' /* on error, skip until ';' is read */
4516@end example
4517
4518It is also useful to recover to the matching close-delimiter of an
4519opening-delimiter that has already been parsed. Otherwise the
4520close-delimiter will probably appear to be unmatched, and generate another,
4521spurious error message:
4522
4523@example
4524primary: '(' expr ')'
4525 | '(' error ')'
4526 @dots{}
4527 ;
4528@end example
4529
4530Error recovery strategies are necessarily guesses. When they guess wrong,
4531one syntax error often leads to another. In the above example, the error
4532recovery rule guesses that an error is due to bad input within one
4533@code{stmnt}. Suppose that instead a spurious semicolon is inserted in the
4534middle of a valid @code{stmnt}. After the error recovery rule recovers
4535from the first error, another syntax error will be found straightaway,
4536since the text following the spurious semicolon is also an invalid
4537@code{stmnt}.
4538
4539To prevent an outpouring of error messages, the parser will output no error
4540message for another syntax error that happens shortly after the first; only
4541after three consecutive input tokens have been successfully shifted will
4542error messages resume.
4543
4544Note that rules which accept the @code{error} token may have actions, just
4545as any other rules can.
4546
4547@findex yyerrok
4548You can make error messages resume immediately by using the macro
4549@code{yyerrok} in an action. If you do this in the error rule's action, no
4550error messages will be suppressed. This macro requires no arguments;
4551@samp{yyerrok;} is a valid C statement.
4552
4553@findex yyclearin
4554The previous look-ahead token is reanalyzed immediately after an error. If
4555this is unacceptable, then the macro @code{yyclearin} may be used to clear
4556this token. Write the statement @samp{yyclearin;} in the error rule's
4557action.
4558
4559For example, suppose that on a parse error, an error handling routine is
4560called that advances the input stream to some point where parsing should
4561once again commence. The next symbol returned by the lexical scanner is
4562probably correct. The previous look-ahead token ought to be discarded
4563with @samp{yyclearin;}.
4564
4565@vindex YYRECOVERING
4566The macro @code{YYRECOVERING} stands for an expression that has the
4567value 1 when the parser is recovering from a syntax error, and 0 the
4568rest of the time. A value of 1 indicates that error messages are
4569currently suppressed for new syntax errors.
4570
4571@node Context Dependency, Debugging, Error Recovery, Top
4572@chapter Handling Context Dependencies
4573
4574The Bison paradigm is to parse tokens first, then group them into larger
4575syntactic units. In many languages, the meaning of a token is affected by
4576its context. Although this violates the Bison paradigm, certain techniques
4577(known as @dfn{kludges}) may enable you to write Bison parsers for such
4578languages.
4579
4580@menu
4581* Semantic Tokens:: Token parsing can depend on the semantic context.
4582* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
4583* Tie-in Recovery:: Lexical tie-ins have implications for how
4584 error recovery rules must be written.
4585@end menu
4586
4587(Actually, ``kludge'' means any technique that gets its job done but is
4588neither clean nor robust.)
4589
4590@node Semantic Tokens, Lexical Tie-ins, , Context Dependency
4591@section Semantic Info in Token Types
4592
4593The C language has a context dependency: the way an identifier is used
4594depends on what its current meaning is. For example, consider this:
4595
4596@example
4597foo (x);
4598@end example
4599
4600This looks like a function call statement, but if @code{foo} is a typedef
4601name, then this is actually a declaration of @code{x}. How can a Bison
4602parser for C decide how to parse this input?
4603
4604The method used in GNU C is to have two different token types,
4605@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
4606identifier, it looks up the current declaration of the identifier in order
4607to decide which token type to return: @code{TYPENAME} if the identifier is
4608declared as a typedef, @code{IDENTIFIER} otherwise.
4609
4610The grammar rules can then express the context dependency by the choice of
4611token type to recognize. @code{IDENTIFIER} is accepted as an expression,
4612but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
4613@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
4614is @emph{not} significant, such as in declarations that can shadow a
4615typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
4616accepted---there is one rule for each of the two token types.
4617
4618This technique is simple to use if the decision of which kinds of
4619identifiers to allow is made at a place close to where the identifier is
4620parsed. But in C this is not always so: C allows a declaration to
4621redeclare a typedef name provided an explicit type has been specified
4622earlier:
4623
4624@example
4625typedef int foo, bar, lose;
4626static foo (bar); /* @r{redeclare @code{bar} as static variable} */
4627static int foo (lose); /* @r{redeclare @code{foo} as function} */
4628@end example
4629
4630Unfortunately, the name being declared is separated from the declaration
4631construct itself by a complicated syntactic structure---the ``declarator''.
4632
9ecbd125 4633As a result, part of the Bison parser for C needs to be duplicated, with
bfa74976
RS
4634all the nonterminal names changed: once for parsing a declaration in which
4635a typedef name can be redefined, and once for parsing a declaration in
4636which that can't be done. Here is a part of the duplication, with actions
4637omitted for brevity:
4638
4639@example
4640initdcl:
4641 declarator maybeasm '='
4642 init
4643 | declarator maybeasm
4644 ;
4645
4646notype_initdcl:
4647 notype_declarator maybeasm '='
4648 init
4649 | notype_declarator maybeasm
4650 ;
4651@end example
4652
4653@noindent
4654Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
4655cannot. The distinction between @code{declarator} and
4656@code{notype_declarator} is the same sort of thing.
4657
4658There is some similarity between this technique and a lexical tie-in
4659(described next), in that information which alters the lexical analysis is
4660changed during parsing by other parts of the program. The difference is
4661here the information is global, and is used for other purposes in the
4662program. A true lexical tie-in has a special-purpose flag controlled by
4663the syntactic context.
4664
4665@node Lexical Tie-ins, Tie-in Recovery, Semantic Tokens, Context Dependency
4666@section Lexical Tie-ins
4667@cindex lexical tie-in
4668
4669One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
4670which is set by Bison actions, whose purpose is to alter the way tokens are
4671parsed.
4672
4673For example, suppose we have a language vaguely like C, but with a special
4674construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
4675an expression in parentheses in which all integers are hexadecimal. In
4676particular, the token @samp{a1b} must be treated as an integer rather than
4677as an identifier if it appears in that context. Here is how you can do it:
4678
4679@example
4680@group
4681%@{
4682int hexflag;
4683%@}
4684%%
4685@dots{}
4686@end group
4687@group
4688expr: IDENTIFIER
4689 | constant
4690 | HEX '('
4691 @{ hexflag = 1; @}
4692 expr ')'
4693 @{ hexflag = 0;
4694 $$ = $4; @}
4695 | expr '+' expr
4696 @{ $$ = make_sum ($1, $3); @}
4697 @dots{}
4698 ;
4699@end group
4700
4701@group
4702constant:
4703 INTEGER
4704 | STRING
4705 ;
4706@end group
4707@end example
4708
4709@noindent
4710Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
4711it is nonzero, all integers are parsed in hexadecimal, and tokens starting
4712with letters are parsed as integers if possible.
4713
4714The declaration of @code{hexflag} shown in the C declarations section of
13863333 4715the parser file is needed to make it accessible to the actions
bfa74976
RS
4716(@pxref{C Declarations, ,The C Declarations Section}). You must also write the code in @code{yylex}
4717to obey the flag.
4718
4719@node Tie-in Recovery, , Lexical Tie-ins, Context Dependency
4720@section Lexical Tie-ins and Error Recovery
4721
4722Lexical tie-ins make strict demands on any error recovery rules you have.
4723@xref{Error Recovery}.
4724
4725The reason for this is that the purpose of an error recovery rule is to
4726abort the parsing of one construct and resume in some larger construct.
4727For example, in C-like languages, a typical error recovery rule is to skip
4728tokens until the next semicolon, and then start a new statement, like this:
4729
4730@example
4731stmt: expr ';'
4732 | IF '(' expr ')' stmt @{ @dots{} @}
4733 @dots{}
4734 error ';'
4735 @{ hexflag = 0; @}
4736 ;
4737@end example
4738
4739If there is a syntax error in the middle of a @samp{hex (@var{expr})}
4740construct, this error rule will apply, and then the action for the
4741completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
4742remain set for the entire rest of the input, or until the next @code{hex}
4743keyword, causing identifiers to be misinterpreted as integers.
4744
4745To avoid this problem the error recovery rule itself clears @code{hexflag}.
4746
4747There may also be an error recovery rule that works within expressions.
4748For example, there could be a rule which applies within parentheses
4749and skips to the close-parenthesis:
4750
4751@example
4752@group
4753expr: @dots{}
4754 | '(' expr ')'
4755 @{ $$ = $2; @}
4756 | '(' error ')'
4757 @dots{}
4758@end group
4759@end example
4760
4761If this rule acts within the @code{hex} construct, it is not going to abort
4762that construct (since it applies to an inner level of parentheses within
4763the construct). Therefore, it should not clear the flag: the rest of
4764the @code{hex} construct should be parsed with the flag still in effect.
4765
4766What if there is an error recovery rule which might abort out of the
4767@code{hex} construct or might not, depending on circumstances? There is no
4768way you can write the action to determine whether a @code{hex} construct is
4769being aborted or not. So if you are using a lexical tie-in, you had better
4770make sure your error recovery rules are not of this kind. Each rule must
4771be such that you can be sure that it always will, or always won't, have to
4772clear the flag.
4773
4774@node Debugging, Invocation, Context Dependency, Top
4775@chapter Debugging Your Parser
4776@findex YYDEBUG
4777@findex yydebug
4778@cindex debugging
4779@cindex tracing the parser
4780
4781If a Bison grammar compiles properly but doesn't do what you want when it
4782runs, the @code{yydebug} parser-trace feature can help you figure out why.
4783
4784To enable compilation of trace facilities, you must define the macro
4785@code{YYDEBUG} when you compile the parser. You could use
4786@samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
13863333 4787YYDEBUG 1} in the C declarations section of the grammar file
bfa74976
RS
4788(@pxref{C Declarations, ,The C Declarations Section}). Alternatively, use the @samp{-t} option when
4789you run Bison (@pxref{Invocation, ,Invoking Bison}). We always define @code{YYDEBUG} so that
4790debugging is always possible.
4791
4792The trace facility uses @code{stderr}, so you must add @w{@code{#include
4793<stdio.h>}} to the C declarations section unless it is already there.
4794
4795Once you have compiled the program with trace facilities, the way to
4796request a trace is to store a nonzero value in the variable @code{yydebug}.
4797You can do this by making the C code do it (in @code{main}, perhaps), or
4798you can alter the value with a C debugger.
4799
4800Each step taken by the parser when @code{yydebug} is nonzero produces a
4801line or two of trace information, written on @code{stderr}. The trace
4802messages tell you these things:
4803
4804@itemize @bullet
4805@item
4806Each time the parser calls @code{yylex}, what kind of token was read.
4807
4808@item
4809Each time a token is shifted, the depth and complete contents of the
4810state stack (@pxref{Parser States}).
4811
4812@item
4813Each time a rule is reduced, which rule it is, and the complete contents
4814of the state stack afterward.
4815@end itemize
4816
4817To make sense of this information, it helps to refer to the listing file
4818produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking Bison}). This file
4819shows the meaning of each state in terms of positions in various rules, and
4820also what each state will do with each possible input token. As you read
4821the successive trace messages, you can see that the parser is functioning
4822according to its specification in the listing file. Eventually you will
4823arrive at the place where something undesirable happens, and you will see
4824which parts of the grammar are to blame.
4825
4826The parser file is a C program and you can use C debuggers on it, but it's
4827not easy to interpret what it is doing. The parser function is a
4828finite-state machine interpreter, and aside from the actions it executes
4829the same code over and over. Only the values of variables show where in
4830the grammar it is working.
4831
4832@findex YYPRINT
4833The debugging information normally gives the token type of each token
4834read, but not its semantic value. You can optionally define a macro
4835named @code{YYPRINT} to provide a way to print the value. If you define
4836@code{YYPRINT}, it should take three arguments. The parser will pass a
4837standard I/O stream, the numeric code for the token type, and the token
4838value (from @code{yylval}).
4839
4840Here is an example of @code{YYPRINT} suitable for the multi-function
4841calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}):
4842
4843@smallexample
4844#define YYPRINT(file, type, value) yyprint (file, type, value)
4845
4846static void
13863333 4847yyprint (FILE *file, int type, YYSTYPE value)
bfa74976
RS
4848@{
4849 if (type == VAR)
4850 fprintf (file, " %s", value.tptr->name);
4851 else if (type == NUM)
4852 fprintf (file, " %d", value.val);
4853@}
4854@end smallexample
4855
4856@node Invocation, Table of Symbols, Debugging, Top
4857@chapter Invoking Bison
4858@cindex invoking Bison
4859@cindex Bison invocation
4860@cindex options for invoking Bison
4861
4862The usual way to invoke Bison is as follows:
4863
4864@example
4865bison @var{infile}
4866@end example
4867
4868Here @var{infile} is the grammar file name, which usually ends in
4869@samp{.y}. The parser file's name is made by replacing the @samp{.y}
4870with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields
4871@file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields
4872@file{hack/foo.tab.c}.@refill
4873
4874@menu
13863333 4875* Bison Options:: All the options described in detail,
bfa74976 4876 in alphabetical order by short options.
9ecbd125 4877* Environment Variables:: Variables which affect Bison execution.
bfa74976
RS
4878* Option Cross Key:: Alphabetical list of long options.
4879* VMS Invocation:: Bison command syntax on VMS.
4880@end menu
4881
9ecbd125 4882@node Bison Options, Environment Variables, , Invocation
bfa74976
RS
4883@section Bison Options
4884
4885Bison supports both traditional single-letter options and mnemonic long
4886option names. Long option names are indicated with @samp{--} instead of
4887@samp{-}. Abbreviations for option names are allowed as long as they
4888are unique. When a long option takes an argument, like
4889@samp{--file-prefix}, connect the option name and the argument with
4890@samp{=}.
4891
4892Here is a list of options that can be used with Bison, alphabetized by
4893short option. It is followed by a cross key alphabetized by long
4894option.
4895
4896@table @samp
4897@item -b @var{file-prefix}
4898@itemx --file-prefix=@var{prefix}
4899Specify a prefix to use for all Bison output file names. The names are
4900chosen as if the input file were named @file{@var{prefix}.c}.
4901
4902@item -d
4903@itemx --defines
4904Write an extra output file containing macro definitions for the token
4905type names defined in the grammar and the semantic value type
4906@code{YYSTYPE}, as well as a few @code{extern} variable declarations.
4907
4908If the parser output file is named @file{@var{name}.c} then this file
4909is named @file{@var{name}.h}.@refill
4910
4911This output file is essential if you wish to put the definition of
4912@code{yylex} in a separate source file, because @code{yylex} needs to
4913be able to refer to token type codes and the variable
4914@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.@refill
4915
4916@item -l
4917@itemx --no-lines
4918Don't put any @code{#line} preprocessor commands in the parser file.
4919Ordinarily Bison puts them in the parser file so that the C compiler
4920and debuggers will associate errors with your source file, the
4921grammar file. This option causes them to associate errors with the
95e742f7 4922parser file, treating it as an independent source file in its own right.
bfa74976 4923
931c7513
RS
4924@item -n
4925@itemx --no-parser
4926Do not include any C code in the parser file; generate tables only. The
4927parser file contains just @code{#define} directives and static variable
4928declarations.
4929
4930This option also tells Bison to write the C code for the grammar actions
4931into a file named @file{@var{filename}.act}, in the form of a
4932brace-surrounded body fit for a @code{switch} statement.
4933
bfa74976
RS
4934@item -o @var{outfile}
4935@itemx --output-file=@var{outfile}
4936Specify the name @var{outfile} for the parser file.
4937
4938The other output files' names are constructed from @var{outfile}
931c7513 4939as described under the @samp{-v} and @samp{-d} options.
bfa74976
RS
4940
4941@item -p @var{prefix}
4942@itemx --name-prefix=@var{prefix}
4943Rename the external symbols used in the parser so that they start with
4944@var{prefix} instead of @samp{yy}. The precise list of symbols renamed
e425e872
RS
4945is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
4946@code{yylval}, @code{yychar} and @code{yydebug}.
bfa74976
RS
4947
4948For example, if you use @samp{-p c}, the names become @code{cparse},
4949@code{clex}, and so on.
4950
4951@xref{Multiple Parsers, ,Multiple Parsers in the Same Program}.
4952
931c7513
RS
4953@item -r
4954@itemx --raw
4955Pretend that @code{%raw} was specified. @xref{Decl Summary}.
4956
bfa74976
RS
4957@item -t
4958@itemx --debug
4959Output a definition of the macro @code{YYDEBUG} into the parser file,
4960so that the debugging facilities are compiled. @xref{Debugging, ,Debugging Your Parser}.
4961
4962@item -v
4963@itemx --verbose
4964Write an extra output file containing verbose descriptions of the
4965parser states and what is done for each type of look-ahead token in
4966that state.
4967
4968This file also describes all the conflicts, both those resolved by
4969operator precedence and the unresolved ones.
4970
4971The file's name is made by removing @samp{.tab.c} or @samp{.c} from
4972the parser output file name, and adding @samp{.output} instead.@refill
4973
4974Therefore, if the input file is @file{foo.y}, then the parser file is
4975called @file{foo.tab.c} by default. As a consequence, the verbose
4976output file is called @file{foo.output}.@refill
4977
4978@item -V
4979@itemx --version
ff51d159
DM
4980Print the version number of Bison and exit.
4981
4982@item -h
4983@itemx --help
4984Print a summary of the command-line options to Bison and exit.
bfa74976
RS
4985
4986@need 1750
4987@item -y
4988@itemx --yacc
4989@itemx --fixed-output-files
4990Equivalent to @samp{-o y.tab.c}; the parser output file is called
4991@file{y.tab.c}, and the other outputs are called @file{y.output} and
931c7513 4992@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
bfa74976
RS
4993file name conventions. Thus, the following shell script can substitute
4994for Yacc:@refill
4995
4996@example
4997bison -y $*
4998@end example
4999@end table
5000
9ecbd125
JT
5001@node Environment Variables, Option Cross Key, Bison Options, Invocation
5002@section Environment Variables
5003@cindex environment variables
5004@cindex BISON_HAIRY
5005@cindex BISON_SIMPLE
5006
5007Here is a list of environment variables which affect the way Bison
5008runs.
5009
5010@table @samp
5011@item BISON_SIMPLE
5012@itemx BISON_HAIRY
5013Much of the parser generated by Bison is copied verbatim from a file
5014called @file{bison.simple}. If Bison cannot find that file, or if you
5015would like to direct Bison to use a different copy, setting the
5016environment variable @code{BISON_SIMPLE} to the path of the file will
5017cause Bison to use that copy instead.
5018
13863333 5019When the @samp{%semantic_parser} declaration is used, Bison copies from
9ecbd125
JT
5020a file called @file{bison.hairy} instead. The location of this file can
5021also be specified or overridden in a similar fashion, with the
5022@code{BISON_HAIRY} environment variable.
5023
5024@end table
5025
5026@node Option Cross Key, VMS Invocation, Environment Variables, Invocation
bfa74976
RS
5027@section Option Cross Key
5028
5029Here is a list of options, alphabetized by long option, to help you find
5030the corresponding short option.
5031
5032@tex
5033\def\leaderfill{\leaders\hbox to 1em{\hss.\hss}\hfill}
5034
5035{\tt
5036\line{ --debug \leaderfill -t}
5037\line{ --defines \leaderfill -d}
5038\line{ --file-prefix \leaderfill -b}
5039\line{ --fixed-output-files \leaderfill -y}
ff51d159 5040\line{ --help \leaderfill -h}
bfa74976
RS
5041\line{ --name-prefix \leaderfill -p}
5042\line{ --no-lines \leaderfill -l}
931c7513 5043\line{ --no-parser \leaderfill -n}
bfa74976 5044\line{ --output-file \leaderfill -o}
931c7513
RS
5045\line{ --raw \leaderfill -r}
5046\line{ --token-table \leaderfill -k}
bfa74976
RS
5047\line{ --verbose \leaderfill -v}
5048\line{ --version \leaderfill -V}
5049\line{ --yacc \leaderfill -y}
5050}
5051@end tex
5052
5053@ifinfo
5054@example
5055--debug -t
5056--defines -d
5057--file-prefix=@var{prefix} -b @var{file-prefix}
5058--fixed-output-files --yacc -y
ff51d159 5059--help -h
931c7513 5060--name-prefix=@var{prefix} -p @var{name-prefix}
bfa74976 5061--no-lines -l
931c7513 5062--no-parser -n
bfa74976 5063--output-file=@var{outfile} -o @var{outfile}
13863333 5064--raw -r
931c7513 5065--token-table -k
bfa74976
RS
5066--verbose -v
5067--version -V
5068@end example
5069@end ifinfo
5070
5071@node VMS Invocation, , Option Cross Key, Invocation
5072@section Invoking Bison under VMS
5073@cindex invoking Bison under VMS
5074@cindex VMS
5075
5076The command line syntax for Bison on VMS is a variant of the usual
5077Bison command syntax---adapted to fit VMS conventions.
5078
5079To find the VMS equivalent for any Bison option, start with the long
5080option, and substitute a @samp{/} for the leading @samp{--}, and
5081substitute a @samp{_} for each @samp{-} in the name of the long option.
5082For example, the following invocation under VMS:
5083
5084@example
5085bison /debug/name_prefix=bar foo.y
5086@end example
5087
5088@noindent
5089is equivalent to the following command under POSIX.
5090
5091@example
5092bison --debug --name-prefix=bar foo.y
5093@end example
5094
5095The VMS file system does not permit filenames such as
5096@file{foo.tab.c}. In the above example, the output file
5097would instead be named @file{foo_tab.c}.
5098
5099@node Table of Symbols, Glossary, Invocation, Top
5100@appendix Bison Symbols
5101@cindex Bison symbols, table of
5102@cindex symbols in Bison, table of
5103
5104@table @code
5105@item error
5106A token name reserved for error recovery. This token may be used in
5107grammar rules so as to allow the Bison parser to recognize an error in
5108the grammar without halting the process. In effect, a sentence
5109containing an error may be recognized as valid. On a parse error, the
5110token @code{error} becomes the current look-ahead token. Actions
5111corresponding to @code{error} are then executed, and the look-ahead
5112token is reset to the token that originally caused the violation.
5113@xref{Error Recovery}.
5114
5115@item YYABORT
5116Macro to pretend that an unrecoverable syntax error has occurred, by
5117making @code{yyparse} return 1 immediately. The error reporting
ceed8467
AD
5118function @code{yyerror} is not called. @xref{Parser Function, ,The
5119Parser Function @code{yyparse}}.
bfa74976
RS
5120
5121@item YYACCEPT
5122Macro to pretend that a complete utterance of the language has been
13863333 5123read, by making @code{yyparse} return 0 immediately.
bfa74976
RS
5124@xref{Parser Function, ,The Parser Function @code{yyparse}}.
5125
5126@item YYBACKUP
5127Macro to discard a value from the parser stack and fake a look-ahead
5128token. @xref{Action Features, ,Special Features for Use in Actions}.
5129
5130@item YYERROR
5131Macro to pretend that a syntax error has just been detected: call
5132@code{yyerror} and then perform normal error recovery if possible
5133(@pxref{Error Recovery}), or (if recovery is impossible) make
5134@code{yyparse} return 1. @xref{Error Recovery}.
5135
5136@item YYERROR_VERBOSE
5137Macro that you define with @code{#define} in the Bison declarations
5138section to request verbose, specific error message strings when
5139@code{yyerror} is called.
5140
5141@item YYINITDEPTH
5142Macro for specifying the initial size of the parser stack.
5143@xref{Stack Overflow}.
5144
c656404a
RS
5145@item YYLEX_PARAM
5146Macro for specifying an extra argument (or list of extra arguments) for
5147@code{yyparse} to pass to @code{yylex}. @xref{Pure Calling,, Calling
5148Conventions for Pure Parsers}.
5149
bfa74976
RS
5150@item YYLTYPE
5151Macro for the data type of @code{yylloc}; a structure with four
5152members. @xref{Token Positions, ,Textual Positions of Tokens}.
5153
931c7513
RS
5154@item yyltype
5155Default value for YYLTYPE.
5156
bfa74976
RS
5157@item YYMAXDEPTH
5158Macro for specifying the maximum size of the parser stack.
5159@xref{Stack Overflow}.
5160
c656404a
RS
5161@item YYPARSE_PARAM
5162Macro for specifying the name of a parameter that @code{yyparse} should
5163accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}.
5164
bfa74976
RS
5165@item YYRECOVERING
5166Macro whose value indicates whether the parser is recovering from a
5167syntax error. @xref{Action Features, ,Special Features for Use in Actions}.
5168
5169@item YYSTYPE
5170Macro for the data type of semantic values; @code{int} by default.
5171@xref{Value Type, ,Data Types of Semantic Values}.
5172
5173@item yychar
13863333
AD
5174External integer variable that contains the integer value of the current
5175look-ahead token. (In a pure parser, it is a local variable within
5176@code{yyparse}.) Error-recovery rule actions may examine this variable.
5177@xref{Action Features, ,Special Features for Use in Actions}.
bfa74976
RS
5178
5179@item yyclearin
5180Macro used in error-recovery rule actions. It clears the previous
5181look-ahead token. @xref{Error Recovery}.
5182
5183@item yydebug
5184External integer variable set to zero by default. If @code{yydebug}
5185is given a nonzero value, the parser will output information on input
5186symbols and parser action. @xref{Debugging, ,Debugging Your Parser}.
5187
5188@item yyerrok
5189Macro to cause parser to recover immediately to its normal mode
5190after a parse error. @xref{Error Recovery}.
5191
5192@item yyerror
5193User-supplied function to be called by @code{yyparse} on error. The
5194function receives one argument, a pointer to a character string
13863333
AD
5195containing an error message. @xref{Error Reporting, ,The Error
5196Reporting Function @code{yyerror}}.
bfa74976
RS
5197
5198@item yylex
5199User-supplied lexical analyzer function, called with no arguments
5200to get the next token. @xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
5201
5202@item yylval
5203External variable in which @code{yylex} should place the semantic
5204value associated with a token. (In a pure parser, it is a local
5205variable within @code{yyparse}, and its address is passed to
5206@code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}.
5207
5208@item yylloc
13863333
AD
5209External variable in which @code{yylex} should place the line and column
5210numbers associated with a token. (In a pure parser, it is a local
5211variable within @code{yyparse}, and its address is passed to
bfa74976 5212@code{yylex}.) You can ignore this variable if you don't use the
13863333
AD
5213@samp{@@} feature in the grammar actions. @xref{Token Positions,
5214,Textual Positions of Tokens}.
bfa74976
RS
5215
5216@item yynerrs
13863333
AD
5217Global variable which Bison increments each time there is a parse error.
5218(In a pure parser, it is a local variable within @code{yyparse}.)
5219@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
bfa74976
RS
5220
5221@item yyparse
5222The parser function produced by Bison; call this function to start
5223parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
5224
5225@item %left
5226Bison declaration to assign left associativity to token(s).
5227@xref{Precedence Decl, ,Operator Precedence}.
5228
931c7513
RS
5229@item %no_lines
5230Bison declaration to avoid generating @code{#line} directives in the
5231parser file. @xref{Decl Summary}.
5232
bfa74976
RS
5233@item %nonassoc
5234Bison declaration to assign nonassociativity to token(s).
5235@xref{Precedence Decl, ,Operator Precedence}.
5236
5237@item %prec
5238Bison declaration to assign a precedence to a specific rule.
5239@xref{Contextual Precedence, ,Context-Dependent Precedence}.
5240
5241@item %pure_parser
5242Bison declaration to request a pure (reentrant) parser.
5243@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
5244
931c7513
RS
5245@item %raw
5246Bison declaration to use Bison internal token code numbers in token
5247tables instead of the usual Yacc-compatible token code numbers.
5248@xref{Decl Summary}.
5249
bfa74976
RS
5250@item %right
5251Bison declaration to assign right associativity to token(s).
5252@xref{Precedence Decl, ,Operator Precedence}.
5253
5254@item %start
5255Bison declaration to specify the start symbol. @xref{Start Decl, ,The Start-Symbol}.
5256
5257@item %token
5258Bison declaration to declare token(s) without specifying precedence.
5259@xref{Token Decl, ,Token Type Names}.
5260
931c7513
RS
5261@item %token_table
5262Bison declaration to include a token name table in the parser file.
5263@xref{Decl Summary}.
5264
bfa74976
RS
5265@item %type
5266Bison declaration to declare nonterminals. @xref{Type Decl, ,Nonterminal Symbols}.
5267
5268@item %union
5269Bison declaration to specify several possible data types for semantic
5270values. @xref{Union Decl, ,The Collection of Value Types}.
5271@end table
5272
5273These are the punctuation and delimiters used in Bison input:
5274
5275@table @samp
5276@item %%
5277Delimiter used to separate the grammar rule section from the
5278Bison declarations section or the additional C code section.
5279@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
5280
5281@item %@{ %@}
5282All code listed between @samp{%@{} and @samp{%@}} is copied directly
5283to the output file uninterpreted. Such code forms the ``C
5284declarations'' section of the input file. @xref{Grammar Outline, ,Outline of a Bison Grammar}.
5285
5286@item /*@dots{}*/
5287Comment delimiters, as in C.
5288
5289@item :
5290Separates a rule's result from its components. @xref{Rules, ,Syntax of Grammar Rules}.
5291
5292@item ;
5293Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}.
5294
5295@item |
5296Separates alternate rules for the same result nonterminal.
5297@xref{Rules, ,Syntax of Grammar Rules}.
5298@end table
5299
5300@node Glossary, Index, Table of Symbols, Top
5301@appendix Glossary
5302@cindex glossary
5303
5304@table @asis
5305@item Backus-Naur Form (BNF)
5306Formal method of specifying context-free grammars. BNF was first used
5307in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5308
5309@item Context-free grammars
5310Grammars specified as rules that can be applied regardless of context.
5311Thus, if there is a rule which says that an integer can be used as an
5312expression, integers are allowed @emph{anywhere} an expression is
5313permitted. @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5314
5315@item Dynamic allocation
5316Allocation of memory that occurs during execution, rather than at
5317compile time or on entry to a function.
5318
5319@item Empty string
5320Analogous to the empty set in set theory, the empty string is a
5321character string of length zero.
5322
5323@item Finite-state stack machine
5324A ``machine'' that has discrete states in which it is said to exist at
5325each instant in time. As input to the machine is processed, the
5326machine moves from state to state as specified by the logic of the
5327machine. In the case of the parser, the input is the language being
5328parsed, and the states correspond to various stages in the grammar
5329rules. @xref{Algorithm, ,The Bison Parser Algorithm }.
5330
5331@item Grouping
5332A language construct that is (in general) grammatically divisible;
13863333 5333for example, `expression' or `declaration' in C.
bfa74976
RS
5334@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5335
5336@item Infix operator
5337An arithmetic operator that is placed between the operands on which it
5338performs some operation.
5339
5340@item Input stream
5341A continuous flow of data between devices or programs.
5342
5343@item Language construct
5344One of the typical usage schemas of the language. For example, one of
5345the constructs of the C language is the @code{if} statement.
5346@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5347
5348@item Left associativity
5349Operators having left associativity are analyzed from left to right:
5350@samp{a+b+c} first computes @samp{a+b} and then combines with
5351@samp{c}. @xref{Precedence, ,Operator Precedence}.
5352
5353@item Left recursion
5354A rule whose result symbol is also its first component symbol;
5355for example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive Rules}.
5356
5357@item Left-to-right parsing
5358Parsing a sentence of a language by analyzing it token by token from
5359left to right. @xref{Algorithm, ,The Bison Parser Algorithm }.
5360
5361@item Lexical analyzer (scanner)
5362A function that reads an input stream and returns tokens one by one.
5363@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
5364
5365@item Lexical tie-in
5366A flag, set by actions in the grammar rules, which alters the way
5367tokens are parsed. @xref{Lexical Tie-ins}.
5368
931c7513 5369@item Literal string token
9ecbd125 5370A token which consists of two or more fixed characters.
931c7513
RS
5371@xref{Symbols}.
5372
bfa74976
RS
5373@item Look-ahead token
5374A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead Tokens}.
5375
5376@item LALR(1)
5377The class of context-free grammars that Bison (like most other parser
5378generators) can handle; a subset of LR(1). @xref{Mystery Conflicts, ,
5379Mysterious Reduce/Reduce Conflicts}.
5380
5381@item LR(1)
5382The class of context-free grammars in which at most one token of
5383look-ahead is needed to disambiguate the parsing of any piece of input.
5384
5385@item Nonterminal symbol
5386A grammar symbol standing for a grammatical construct that can
5387be expressed through rules in terms of smaller constructs; in other
5388words, a construct that is not a token. @xref{Symbols}.
5389
5390@item Parse error
5391An error encountered during parsing of an input stream due to invalid
5392syntax. @xref{Error Recovery}.
5393
5394@item Parser
5395A function that recognizes valid sentences of a language by analyzing
5396the syntax structure of a set of tokens passed to it from a lexical
5397analyzer.
5398
5399@item Postfix operator
5400An arithmetic operator that is placed after the operands upon which it
5401performs some operation.
5402
5403@item Reduction
5404Replacing a string of nonterminals and/or terminals with a single
5405nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison Parser Algorithm }.
5406
5407@item Reentrant
5408A reentrant subprogram is a subprogram which can be in invoked any
5409number of times in parallel, without interference between the various
5410invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
5411
5412@item Reverse polish notation
5413A language in which all operators are postfix operators.
5414
5415@item Right recursion
5416A rule whose result symbol is also its last component symbol;
5417for example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive Rules}.
5418
5419@item Semantics
5420In computer languages, the semantics are specified by the actions
5421taken for each instance of the language, i.e., the meaning of
5422each statement. @xref{Semantics, ,Defining Language Semantics}.
5423
5424@item Shift
5425A parser is said to shift when it makes the choice of analyzing
5426further input from the stream rather than reducing immediately some
5427already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm }.
5428
5429@item Single-character literal
5430A single character that is recognized and interpreted as is.
5431@xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
5432
5433@item Start symbol
5434The nonterminal symbol that stands for a complete valid utterance in
5435the language being parsed. The start symbol is usually listed as the
13863333 5436first nonterminal symbol in a language specification.
bfa74976
RS
5437@xref{Start Decl, ,The Start-Symbol}.
5438
5439@item Symbol table
5440A data structure where symbol names and associated data are stored
5441during parsing to allow for recognition and use of existing
5442information in repeated uses of a symbol. @xref{Multi-function Calc}.
5443
5444@item Token
5445A basic, grammatically indivisible unit of a language. The symbol
5446that describes a token in the grammar is a terminal symbol.
5447The input of the Bison parser is a stream of tokens which comes from
5448the lexical analyzer. @xref{Symbols}.
5449
5450@item Terminal symbol
5451A grammar symbol that has no rules in the grammar and therefore
5452is grammatically indivisible. The piece of text it represents
5453is a token. @xref{Language and Grammar, ,Languages and Context-Free Grammars}.
5454@end table
5455
5456@node Index, , Glossary, Top
5457@unnumbered Index
5458
5459@printindex cp
5460
5461@contents
5462
5463@bye